Data_Clean_Thomas - The Complexity of Data Cleanliness...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
The Complexity of Data Cleanliness Method for determining how clean or unclean a set of records are. Thomas Jones COT 6410 – Fall 2008
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background Worked to develop applications to assist marketing department with CRM How to determine what records can be duplicates Comes to research for fuzzy matching, but that's too broad a subject, need to narrow it down.
Background image of page 2
Definitions Fuzzy Matches – 2 strings match if there exists a common subsequence of an arbitrary length P-Clean – a set is considered clean if all subsets of size P or more do not have a common subsequence of a specified length.
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Data Cleanliness Given: a set of names S, and integer P, and an integer K Are all subsets of S of size P or more free of subsequences of size K or more?
Background image of page 4
Co-Data Cleanliness (Data Uncleanliness) Given: a set of names S, an integer P, and an integer K; Does there exist a set S' subset of S such that |S'|>= P There exists a sequence D such that D is a subsequence of every element of S' |D| >= K
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 07/14/2011 for the course COT 4610 taught by Professor Dutton during the Fall '10 term at University of Central Florida.

Page1 / 15

Data_Clean_Thomas - The Complexity of Data Cleanliness...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online