Tutorial02 - SEEM4630 2016-2017 Tutorial 1 Classification...

This preview shows page 1 - 7 out of 20 pages.

SEEM4630 2016-2017 Tutorial 1ClassificationYingfan Liu, [email protected]
Classification: DefinitionGiven a collection of records (training set), eachrecord contains a set ofattributes, one of theattributes is theclass.Find amodelforclassattribute as a function ofthe values of other attributes.Decision treeNaïve bayesk-NNGoal:previously unseenrecords should beassigned a class as accurately as possible.2
Decision TreeGoalConstruct a tree so that instances belonging todifferent classes should be separatedBasic algorithm (a greedy algorithm)Tree is constructed in atop-down recursivemannerAt start, all the training examples are at therootTest attributes are selected on the basis of aheuristics or statistical measure (e.g.,information gain)Examples are partitioned recursively based onselected attributes3
Letpibe the probability that a tuple belongs to classCi,estimated by|Ci,D|/|D|Expected information(entropy) needed to classify a tuplein D:Informationneeded (after using A to split D into vpartitions) to classify D:Information gainedby branching on attribute A:Attribute Selection Measure 1: Information Gain4)(log)(21imiippDInfo)(||||)(1jvjjADInfoDDDInfo(D)InfoInfo(D)Gain(A)A
Information gain measure is biased towardsattributes with a large number of valuesC4.5 (a successor of ID3) uses gain ratio toovercome the problem (normalization to informationgain):GainRatio(A) = Gain(A)/SplitInfo(A)Attribute Selection Measure 2:GainRatio5)||||(log||||)(21DDDDDSplitInfojvjjA
If a data setDcontains examples fromnclasses,gini index,gini(D)

Upload your study docs or become a

Course Hero member to access this document

Upload your study docs or become a

Course Hero member to access this document

End of preview. Want to read all 20 pages?

Upload your study docs or become a

Course Hero member to access this document

Term
Summer
Professor
Cheng
Tags
Probability, Probability theory, English language films, Bayesian probability, Naive Bayes classifier, Statistical classification

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture