s10-text-classifn

s10-text-classifn - Text Classification Classification...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Text Classification Classification Learning (aka supervised learning) • Given labelled examples of a concept (called training examples) • Learn to predict the class label of new (unseen) examples – E.g. Given examples of fradulent and non- fradulent credit card transactions, learn to predict whether or not a new transaction is fradulent • How does it differ from Clustering? Many uses of Text Classification • Text classification is the task of classifying text documents to multiple classes – Is this mail spam? – Is this article from comp.ai or misc.piano? – Is this article likely to be relevant to user X? – Is this page likely to lead me to pages relevant to my topic? (as in topic-specific crawling) – Is this book possibly of interest to the user? Classification vs. Clustering • Coming from Clustering, classification seems significantly simple… • You are already given the clusters and names (over the training data) • All you need to do is to decide, for the test data, which cluster it should belong to. • Seems like a simple distance computation – Assign test data to the cluster whose centroid it is closest to – Assign test data to the cluster whose members seem to make the majority of its neighbors Relevance Feedback: A first case of text categorization • Main Idea: – Modify existing query based on relevance judgements • Extract terms from relevant documents and add them to the query • and/or re-weight the terms already in the query – Two main approaches: • Users select relevant documents – Directly or indirectly (by pawing/clicking/staring etc) • Automatic (psuedo-relevance feedback) – Assume that the top-k documents are the most relevant documents.. – Users/system select terms from an automatically- generated list Relevance Feedback • Usually do both: – expand query with new terms – re-weight terms in query • There are many variations – usually positive weights for terms from relevant docs – sometimes negative weights for terms from non- relevant docs – Remove terms ONLY in non-relevant documents Relevance Feedback for Vector Model ∑ ∑ ∉- ∈- = Cr dj Cr N Cr dj Cr opt dj dj Q 1 1 Cr = Set of documents that are truly relevant to Q N = Total number of documents In the “ideal” case where we know the relevant Documents a priori Rocchio Method ∑ ∑ ∈ ∈- + = Dn dj Dn Dr dj Dr dj dj Q Q | | | | 1 γ β α Qo is initial query. Q1 is the query after one iteration Dr are the set of relevant docs Dn are the set of irrelevant docs Alpha =1; Beta=.75, Gamma=.25 typically. Other variations possible, but performance similar How do beta and gamma affect precision and recall? Rocchio/Vector Illustration Retrieval Information 0.5 1.0 0.5 1.0 D1 D2 Q0 Q’ Q” Q0 = retrieval of information = (0.7,0.3) D1 = information science = (0.2,0.8) D2 = retrieval systems = (0.9,0.1) Q’ = ½*Q0+ ½ * D1 = (0.45,0.55) Q” = ½*Q0+ ½ * D2 = (0.80,0.20) Example Rocchio Calculation ( 29 ) 04...
View Full Document

This note was uploaded on 03/11/2012 for the course CSE 494 taught by Professor Rao during the Spring '08 term at ASU.

Page1 / 53

s10-text-classifn - Text Classification Classification...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online