{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

s10-text-classifn

s10-text-classifn - Text Classification Classification...

Info icon This preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon
Text Classification
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Classification Learning (aka supervised learning) Given labelled examples of a concept (called training examples) Learn to predict the class label of new (unseen) examples E.g. Given examples of fradulent and non- fradulent credit card transactions, learn to predict whether or not a new transaction is fradulent How does it differ from Clustering?
Image of page 2
Many uses of Text Classification Text classification is the task of classifying text documents to multiple classes Is this mail spam? Is this article from comp.ai or misc.piano? Is this article likely to be relevant to user X? Is this page likely to lead me to pages relevant to my topic? (as in topic-specific crawling) Is this book possibly of interest to the user?
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Classification vs. Clustering Coming from Clustering, classification seems significantly simple… You are already given the clusters and names (over the training data) All you need to do is to decide, for the test data, which cluster it should belong to. Seems like a simple distance computation Assign test data to the cluster whose centroid it is closest to Assign test data to the cluster whose members seem to make the majority of its neighbors
Image of page 4
Relevance Feedback: A first case of text categorization Main Idea: Modify existing query based on relevance judgements Extract terms from relevant documents and add them to the query and/or re-weight the terms already in the query Two main approaches: Users select relevant documents Directly or indirectly (by pawing/clicking/staring etc) Automatic (psuedo-relevance feedback) Assume that the top-k documents are the most relevant documents.. Users/system select terms from an automatically- generated list
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Relevance Feedback Usually do both: expand query with new terms re-weight terms in query There are many variations usually positive weights for terms from relevant docs sometimes negative weights for terms from non- relevant docs Remove terms ONLY in non-relevant documents
Image of page 6
Relevance Feedback for Vector Model - - = Cr dj Cr N Cr dj Cr opt dj dj Q 1 1 Cr = Set of documents that are truly relevant to Q N = Total number of documents In the “ideal” case where we know the relevant Documents a priori
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Rocchio Method - + = Dn dj Dn Dr dj Dr dj dj Q Q | | | | 0 1 γ β α Qo is initial query. Q1 is the query after one iteration Dr are the set of relevant docs Dn are the set of irrelevant docs Alpha =1; Beta=.75, Gamma=.25 typically. Other variations possible, but performance similar How do beta and gamma affect precision and recall?
Image of page 8
Rocchio/Vector Illustration Retrieval Information 0.5 1.0 0 0.5 1.0 D1 D2 Q0 Q’ Q” Q0 = retrieval of information = (0.7,0.3) D1 = information science = (0.2,0.8) D2 = retrieval systems = (0.9,0.1) Q’ = ½*Q0+ ½ * D1 = (0.45,0.55) Q” = ½*Q0+ ½ * D2 = (0.80,0.20)
Image of page 9

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Example Rocchio Calculation ( 29 ) 04 . 1 , 033 . 0 , 488 . 0 , 022 . 0 , 527 . 0 , 01 . 0 , 002 . 0 , 000875 . 0 , 011 . 0 ( 1 2 25 . 0
Image of page 10
Image of page 11
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern