Unformatted text preview: Data Warehousing Data Warehousing Data Mining Privacy Reading Reading Farkas CSCE 824 ­ Spring 2011 2 Data Warehousing Data Warehousing Repository of data providing organized and cleaned enterprise­wide data (obtained form a variety of sources) in a standardized format – – – Farkas Data mart (single subject area) Enterprise data warehouse (integrated data marts) Metadata CSCE 824 ­ Spring 2011 3 OLAP Analysis OLAP Analysis Farkas Aggregation functions Factual data access Complex criteria Visualization CSCE 824 ­ Spring 2011 4 Warehouse Evaluation Warehouse Evaluation Farkas Enterprise­wide support Consistency and integration across diverse domain Security support Support for operational users Flexible access for decision makers CSCE 824 ­ Spring 2011 5 Data Integration Data Integration Farkas Data access Data federation Change capture Need ETL (extraction, transformation, load) CSCE 824 ­ Spring 2011 6 Data Warehouse Users Data Warehouse Users Internal users – Employees – Managerial External users – Reporting and auditing – Research Farkas CSCE 824 ­ Spring 2011 7 Data Mining Data Mining Farkas Databases to be mined Knowledge to be mined Techniques Used Applications supported CSCE 824 ­ Spring 2011 8 Data Mining Task Data Mining Task Farkas Prediction Tasks – Use some variables to predict unknown or future values of other variables Description Tasks – Find human­interpretable patterns that describe the data CSCE 824 ­ Spring 2011 9 Common Tasks Common Tasks Farkas Classification [Predictive] Clustering [Descriptive] Association Rule Mining [Descriptive] Sequential Pattern Mining [Descriptive] Regression [Predictive] Deviation Detection [Predictive] CSCE 824 ­ Spring 2011 10 Security for Data Security for Data Warehousing Farkas Establish organizations security policies and procedures Implement logical access control Restrict physical access Establish internal control and auditing CSCE 824 ­ Spring 2011 11 Security for Data Security for Data Warehousing (cont.) Security Issues in Data Warehousing and Data Mining: Panel Discussion Panel discussion of Bhavani Thuraisingham, The MITRE Corporation, Linda Schlipper, The MITRE Corporation, Pierangela Samarati, SRI International, T. Y. Lin, San Jose State University, Sushil Jajodia, George Mason University, Chris Clifton, The MITRE Corporation Lin, San Jose State University, Sushil Jajodia, George Mason University, Chris Clifton, The MITRE Corporation, xanadu.cs.sjsu.edu/~tylin/publications/paperList/109_s ecurity.ps CSCE 824 ­ Spring 2011 12 Integrity Integrity Farkas Poor quality data: inaccurate, incomplete, missing meta­data Source data quality vs. derived data quality CSCE 824 ­ Spring 2011 13 Access Control Access Control Layered defense: – Access to processes that extract operational data – Access to data and process that transforms operational data – Access to data and meta­data in the warehouse Farkas CSCE 824 ­ Spring 2011 14 Access Control Issues Access Control Issues Farkas Mapping from local to warehouse policies How to handle “new” data Scalability Identity Management CSCE 824 ­ Spring 2011 15 Inference Problem Inference Problem Data Mining: discover “new knowledge” how to evaluate security risks? Example security risks: – Prediction of sensitive information – Misuse of information Assurance of “discovery” Interesting Read: C. C. Aggarwal and P.S. Yu, PRIVACY­ PRESERVING DATA MINING: MODELS AND ALGORITHMS, http://charuaggarwal.net/toc.pdf Farkas CSCE 824 ­ Spring 2011 16 Privacy Privacy Farkas Large volume of private (personal) data Need: – Proper acquisition, maintenance, usage, and retention policy – Integrity verification – Control of analysis methods (aggregation may reveal sensitive data) CSCE 824 ­ Spring 2011 17 Privacy Privacy Farkas What is the difference between confidentiality and privacy? Identity, location, activity, etc. Anonymity vs. accountability CSCE 824 ­ Spring 2011 18 Legislations Privacy Act of 1974, U.S. Department of Justice ( http://www.usdoj.gov/oip/04_7_1.html ) Family Educational Rights and Privacy Act (FERPA), U.S. Department of Education, ( http://www.ed.gov/policy/gen/guid/fpco/ferpa/index.html ) Health Insurance Portability and Accountability Act of 1996 (HIPAA), ( http://en.wikipedia.org/wiki/Health_Insurance_Portability_and_Accoun ) Telecommunications Consumer Privacy Act ( http://www.answers.com/topic/electronic­communications­privacy­act ) Farkas Farkas CSCE 824 ­ Spring 2011 19 Online Social Network Online Social Network Social Relationship Communication context changes social relationships Social relationships maintained through different media grow at different rates and to different depths No clear consensus which media is the best Farkas CSCE 824 ­ Spring 2011 20 Internet and Social Internet and Social Relationships Internet Bridges distance at a low cost New participants tend to “like” each other more Less stressful than face­to­face meeting People focus on communicating their “selves” (except a few malicious users) Farkas CSCE 824 ­ Spring 2011 21 Social Network Social Network Description of the social structure between actors Connections: various levels of social familiarities, e.g., from casual acquaintance to close familiar bonds Support online interaction and content sharing Farkas CSCE 824 ­ Spring 2011 22 Social Network Analysis Social Network Analysis The mapping and measuring of relationships and flows between people, groups, organizations, computers or other information processing entities Behavioral Profiling Note: Social Network Signatures – User names may change, family and friends are more difficult to change Farkas CSCE 824 ­ Spring 2011 23 Interesting Read: Interesting Read: Farkas M. Chew, D. Balfanz, B. Laurie, (Under)mining Privacy in Social Networks, http://citeseer.ist.psu.edu/viewdoc/summary?d http://citeseer.ist.psu.edu/viewdoc/summary? CSCE 824 ­ Spring 2011 24 Next Hippocratic Databases Next Hippocratic Databases Farkas CSCE 824 ­ Spring 2011 25 Next Class Stream Data Farkas Farkas CSCE 824 ­ Spring 2011 26 ...
