37 data transformation discretization three main

This preview shows page 37 - 44 out of 59 pages.

37Data Transformation - Discretization Three main types of attributesNominal—values from an unordered set, e.g., color, professionOrdinal—values from an ordered set, e.g., military or academic rank Numeric—real numbers, e.g., integer or real numbersDiscretization: Divide the range of a continuous attribute into intervalsInterval labels can then be used to replace actual data values Reduce data size by discretizationSupervised vs. unsupervisedDiscretization can be performed recursively on an attributePrepare for further analysis, e.g., classification
38Data Transformation Discretization ExampleLow = 60-69Normal = 70-79High = 80+Example: discretizing the “Humidity” attribute using 3 bins (e.g. by looking at distributions or domain specific rules).
39Data Discretization MethodsTypical methods: All the methods can be applied recursivelyBinning Top-down split, unsupervisedHistogram analysisTop-down split, unsupervisedClustering analysis (unsupervised, top-down split or bottom-up merge)
40Note: Converting Categorical Attributes to Numerical AttributesAttributes:Outlook (overcast, rain, sunny)Temperature realHumidity realWindy (true, false)Attributes:Outlook (overcast, rain, sunny)Temperature realHumidity realWindy (true, false)Standard Spreadsheet FormatCreate separate columns for each value of a categorical attribute (e.g., 3 values for the Outlook attribute and two values of the Windy attribute). There is no change to the numerical attributes.
41Data TransformationConcept Hierarchy GenerationConcept hierarchyorganizes concepts (i.e., attribute values) hierarchically and is usually associated with each dimension in a data warehouseConcept hierarchies facilitate drilling and rollingin data warehouses to view data in multiple granularityConcept hierarchy formation: Recursively reduce the data by collecting and replacing low level concepts (such as numeric values for age) by higher level concepts (such as youth, adult, or senior)Concept hierarchies can be explicitly specified by domain experts and/or data warehouse designersConcept hierarchy can be automatically formed for both numeric and nominal data.
42Data Transformation – Concept Hierarchy Generation for Nominal DataSpecification of a partial/total ordering of attributes explicitly at the schema level by users or expertsstreet< city< state< countrySpecification of a hierarchy for a set of values by explicit data grouping{Urbana, Champaign, Chicago} < IllinoisSpecification of only a partial set of attributesE.g., only street< city, not othersAutomatic generation of hierarchies (or attribute levels) by the analysis of the number of distinct valuesE.g., for a set of attributes: {street, city, state, country}
43Some hierarchies can be automatically generated based on the analysis of the number of distinct values per attribute in the data set The attribute with the most distinct values is placed at the lowest level of the hierarchyExceptions, e.g., weekday, month, quarter, yearcountryprovince_or_ state

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture