cis6930fa11_NEROverTweets

cis6930fa11_NEROverTweets - Named Entity Recognition in...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
Named Entity Recognition in Tweets : An Experimental Study Alan Ritter, Sam Clark, Mausam and Oren Etzioni
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Outline About Tweets Named entity recognition NER Pipeline T-POS T-CHUNK T-CAP T-SEG T-NER Experiments and Results Related Work Conclusions
Background image of page 2
What is a tweet ?? Short status messages from users. Maximum of 140 characters per message. Government confirms blast n nuclear plants n japan. ..don’t knw wht s gona happen nw. ..
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Named Entity Recognition ?? Companies, products, brands, people, locations etc. . Yess! Yess! Its official Nintendo announced today that theyWill release the Nintendo 3DS in north America march 27 for $250
Background image of page 4
Why NER So Difficult ? Plethora of distinctive named entity types but infrequent. 140 character limit hence lack of context.
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Part of Speech Tagging Assigning tokens in the text their corresponding part of speech tags like NN, VB, NNS. [ NP He ] [ VPZ reckons ] [ DT the] [ JJ current] [ NN account] [ NN deficit ] [ MD will] [ VB narrow ] [ TO to ] [ RB only] [ # #] [ CD 1.8] [ CD billion ] [ IN in ] [ NNP September ] [ . .]
Background image of page 6
T-POS Manually annotated 800 tweets (~ 16K tokens). Tags Used: Penn Treebank New Tags: retweets, @usernames, #hashtags, and urls. Clustering to deal with OOV words Heirarchical clustering using Jcluster. 52 million tweets used Conditional Random Fields (CRF) for sequential learning.
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Clustering example ‘2m’, ‘2ma’, ‘2mar’, ‘2mara’, ‘2maro’, ‘2marrow’, ‘2mor’, ‘2mora’,
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 25

cis6930fa11_NEROverTweets - Named Entity Recognition in...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online