Data about a single type protein might be stored in

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: agement Spring 2013 CS480 Principles of Data Management Duplicated Data? •  Data about a single type protein might be stored in many different scien/fic databases •  •  •  •  •  •  –  From different country –  From different research group Sangmi Lee Pallickara, CS480, Spring 2012 7 CS480 Principles of Data Management Spring 2013 Difficult to detect Decreases the usability of data Causes unnecessary expenses Customer dissa/sfac/on Incorrect performance indicators Inhibit comprehension of the data and its value Sangmi Lee Pallickara, CS480, Spring 2012 CS480 Principles of Data Management Exact Replica Vs. Fuzzy Duplicates FN R1 John R2 John R3 John LN Doe Doe Doe Phone (407) 356 8888 (407) 356 8888 (407) 356 8887 CS480 Principles of Data Management –  Maintaining duplicate iden//es to receive more credit •  Monitoring Inventory levels –  Products are recorded mul/ple /mes •  Catalogs are mailed mul/ple /mes to the same household 9 Spring 2013 Data Cleaning Sangmi Lee Pallickara, CS480, Spring 2012 CS480 Principles of Data Management 10 Spring 2013 Causes for Duplicates •  Intra ­source duplicates –  Takes one or more sets of data and produces as output a single clean dataset Look ­up checks Format transforma/ons Currenc...
View Full Document

This note was uploaded on 02/11/2014 for the course CS 480 taught by Professor Staff during the Spring '08 term at Colorado State.

Ask a homework question - tutors are online