Unformatted text preview: bility • High scalability with the cost of Vendor lock-in (MapReduce) • Hybrid analysis capabilities of structured/unstructured data Data Association • X86 servers; Good compatibility • High scalability, over 10,000 node-level deployment Original Collected Data Commercial Database Data Decoding Currently data analysis is based on the data collected via signaling plane of wireless and core networks. By decoding, collected data invoked by application layer are divided into five types as shown on the left. For example, analyzed data of Beijing can exceed 800 TB per week MPP DW+Hadoop Traditional DB/DW TB TB PB 5 Distributed architecture EB 6 ZB ZB Top Ten Big Data Security and Privacy Challenges Big Data Management Issues in Privacy, Security and Provenance New datacenter architecture that can preserve data privacy, enforce security policy, and scale well with future dataset growth In MapReduce or Hadoop framework, a large data file (such as retailer consumer data) is splitted into many chunks for parallel I processing by mapper servers. Trust management of time-varying datasets with intrusion and anomaly detection to assure data integrity Securing access to data using innovative techniques to avoid Untrusted mappers may return with wrong results and thus the wrong aggregation by the Reducer servers. excessive replication of data to external entities Establishing community standards, provenance tracking, and Securing the mappers with proper partitioning of the large data set thus become critical in scientific and commercial distributed supercomputing apps. communication strategies...
This note was uploaded on 02/04/2014 for the course EE 599 taught by Professor Povinelli during the Spring '08 term at USC.

