DM_02_04_Data Integration.pdf - Data Mining 2.4 Data...

This preview shows page 1 - 7 out of 22 pages.

Data Mining2.4 Data IntegrationData IntegrationFall 2008Instructor: Dr. Masoud Yaghini
Data Integrationcircle6Data integration: Combines data from multiple databases into a coherent storeDenormalization tables (often done to improve performance by avoiding joins)circle6Integration of the data from multiple sources may Data Integrationcircle6Integration of the data from multiple sources may produces redundancies and inconsistencies in the resulting data set.circle6Tasks of data integration:Detecting and resolving data value and schema conflictsHandling Redundancy in Data Integration
Outlinecircle6Detecting and Resolving Data Value and Schema Conflictscircle6Handling Redundancy in Data Integrationcircle6ReferencesData Integration
Detecting and Resolving Data Value and Schema ConflictsData IntegrationData Value and Schema Conflicts
Schema Integrationcircle6Schema Integration: Integrate metadata from different sourcesThe same attribute or object may have different names in different databasese.g. customer_idin one database and cust_numberin anotherData Integrationanothercircle6The metadata include:the name, meaning, data type, and range of values permitted for the attribute, and etc.
Detecting and resolving data value conflictscircle6For the same real world entity, attribute values from different sources are differentcircle6This may be due to differences in representation, scaling, or encoding. circle6Examples: Data Integrationthe data codes for pay_typein one database may be “H” and “S”, and 1 and 2 in another.a weight attribute may be stored in metric units in one system and British imperial units in another.

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture