Example Dissimilarity between Asymmetric Binary Variables Gender is a symmetric

Example dissimilarity between asymmetric binary

This preview shows page 51 - 58 out of 58 pages.

Example: Dissimilarity between Asymmetric Binary Variables Gender is a symmetric attribute (not counted in) The remaining attributes are asymmetric binary Let the values Y and P be 1, and the value N be 0 Distance: 51 Jack Mary Jim Jim Mary Jack
Image of page 51

Subscribe to view the full document.

Data Mining Exploratory Data Analysis Proximity Measure for Categorical Attributes 52 Categorical data, also called nominal attributes Example: Color (red, yellow, blue, green), profession, etc. Method 1 : Simple matching m : # of matches, p : total # of variables Method 2 : Use a large number of binary attributes Creating a new binary attribute for each of the M nominal states
Image of page 52
Data Mining Exploratory Data Analysis Ordinal Variables 53 An ordinal variable can be discrete or continuous Order is important, e.g., rank (e.g., freshman, sophomore, junior, senior) Can be treated like interval-scaled Replace an ordinal variable value by its rank: Map the range of each variable onto [0, 1] by replacing i -th object in the f -th variable by Example: freshman: 0; sophomore: 1/3; junior: 2/3; senior 1 Then distance: d(freshman, senior) = 1, d(junior, senior) = 1/3 Compute the dissimilarity using methods for interval-scaled variables 1 1 if if f r z M {1,..., } if f r M
Image of page 53

Subscribe to view the full document.

Data Mining Exploratory Data Analysis Attributes of Mixed Type 54 A dataset may contain all attribute types Nominal, symmetric binary, asymmetric binary, numeric, and ordinal One may use a weighted formula to combine their effects: If f is numeric: Use the normalized distance If f is binary or nominal: d ij (f) = 0 if x if = x jf ; or d ij (f) = 1 otherwise If f is ordinal Compute ranks (where ) Treat as interval-scaled
Image of page 54
Data Mining Exploratory Data Analysis Cosine Similarity of Two Vectors 55 A document can be represented by a bag of terms or a long vector, with each attribute recording the frequency of a particular term (such as word, keyword, or phrase) in the document Other vector objects: Gene features in micro-arrays Applications: Information retrieval, biologic taxonomy, gene feature mapping, etc. Cosine measure: If and are two vectors (e.g., term-frequency vectors), then where indicates vector dot product, : the length of vector
Image of page 55

Subscribe to view the full document.

Data Mining Exploratory Data Analysis Example: Calculating Cosine Similarity 56 Calculating Cosine Similarity: where indicates vector dot product, : the length of vector d Ex: Find the similarity between documents 1 and 2. First, calculate vector dot product Then, calculate || d 1 || and || d 2 || Calculate cosine similarity: 1 3 3 0 0 2 2 0 0 0 0 2 2 0 0 0 0 6.48 || || 5 0 0 1 5 d     2 3 2 2 0 0 1 1 1 1 || | 0 0 1 1 0 0 1 1 4.12 | 3 0 0 d      
Image of page 56
Data Mining Exploratory Data Analysis Summary Basic data descriptions (e.g., measures of central tendency and measures of dispersion) and graphic statistical displays (e.g., quantile plots, histograms, and scatter plots) provide valuable insight into the
Image of page 57

Subscribe to view the full document.

Image of page 58
  • Winter '18
  • nour

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Ask Expert Tutors You can ask You can ask ( soon) You can ask (will expire )
Answers in as fast as 15 minutes