2 Pages

H05-2017

Course: H 05, Fall 2009
School: UPenn
Rating:
 
 
 
 
 

Word Count: 1401

Document Preview

Extracting OPINE: Product Features and Opinions from Reviews Ana-Maria Popescu Bao Nguyen Oren Etzioni Department of Computer Science and Engineering University of Washington Seattle, WA 98195-2350 {amp,omicron,etzioni}@cs.washington.edu Abstract Consumers have to often wade through a large number of on-line reviews in order to make an informed product choice. We introduce OPINE, an unsupervised, high-precision...

Register Now

Unformatted Document Excerpt

Coursehero >> Pennsylvania >> UPenn >> H 05

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Extracting OPINE: Product Features and Opinions from Reviews Ana-Maria Popescu Bao Nguyen Oren Etzioni Department of Computer Science and Engineering University of Washington Seattle, WA 98195-2350 {amp,omicron,etzioni}@cs.washington.edu Abstract Consumers have to often wade through a large number of on-line reviews in order to make an informed product choice. We introduce OPINE, an unsupervised, high-precision information extraction system which mines product reviews in order to build a model of product features and their evaluation by reviewers. Input: product class C, reviews R. Output: set of [feature, ranked opinion list] tuples R parseReviews(R); E ndExplicitFeatures(R, C); O ndOpinions(R, E); CO clusterOpinions(O); I ndImplicitFeatures(CO, E); RO rankOpinions(CO); {(f , oi , ...oj )}outputTuples(RO, IE); Figure 1: OPINE Overview. 1 Introduction The Web contains a wealth of customer reviews - as a result, the problem of review mining has seen increasing attention over the last few years from (Turney, 2003; Hu and Liu, 2004) and many others. We decompose the problem of review mining into the following subtasks: a) Identify product features, b) Identify opinions regarding product features, c) Determine the polarity of each opinion and d) Rank opinions according to their strength (e.g., abominable is stronger than bad). We introduce OPINE, an unsupervised information extraction system that embodies a solution to each of the above subtasks. The remainder of this paper is organized as follows: Section 2 describes OPINEs components together with their experimental evaluation and Section 3 describes the related work. 2 OPINE Overview is built on top of K NOW I TA LL, a Web-based, domain-independent information extraction system (Etzioni et al., 2005). Given a set of relations of interest, K NOW I TA LL instantiates relation-specic generic extraction patterns into extraction rules which nd candidate facts. The Assessor module then assigns a probability to each candidate using a form of Point-wise Mutual Information (PMI) between phrases that is estimated from Web search engine hit counts (Turney, 2003). It OPINE computes the PMI between each fact and discriminator phrases (e.g., is a scanner for the isA() relationship in the context of the Scanner class). Given fact f and discriminator d, the computed PMI score is: Hits(d + f ) PMI(f, d) = Hits(d)Hits(f ) The PMI scores are converted to binary features for a Naive Bayes Classier, which outputs a probability associated with each fact. Given product class C with instances I and reviews R, OPINE s goal is to nd the set of (feature, opinions) tuples {(f, oi , ...oj )} s.t. f F and oi , ...oj O, where: a) F is the set of product class features in R. b) O is the set of opinion phrases in R. c) opinions associated with a particular feature are ranked based on their strength. OPINE s solution to this task is outlined in Figure 1. In the following, we describe in detail each step. Explicit Feature Extraction OPINE parses the reviews using the MINIPAR dependency parser (Lin, 1998) and applies a simple pronoun-resolution module to the parsed data. The system then nds explicitly mentioned product features (E) using an extended version of K NOW I TA LLs extract-and-assess strategy described above. OPINE extracts the following types of product features: properties, parts, features of product parts (e.g., ScannerCoverSize), related concepts (e.g., Image 32 Proceedings of HLT/EMNLP 2005 Demonstration Abstracts, pages 3233, Vancouver, October 2005. is related to Scanner) and parts and properties of related concepts (e.g., ImageSize). When compared on this task with the most relevant previous review-mining system in (Hu and Liu, 2004), OPINE obtains a 22% improvement in precision with only a 3% reduction in recall on the relevant 5 datasets. One third of this increase is due to OPINEs feature assessment step and the rest is due to the use of Web PMI statistics. Opinion Phrases OPINE extracts adjective, noun, verb and adverb phrases attached to explicit features as potential opinion phrases. OPINE then collectively assigns positive, negative or neutral semantic orientation (SO) labels to their respective head words. This problem is similar to labeling problems in computer vision and OPINE uses a well-known computer vision technique, relaxation labeling, as the basis of a 3-step SO label assignment procedure. First, OPINE identies the average SO label for a word w in the context of the set. review Second, OPINE identies the average SO label for each word w in the context of a feature f and of the review set (hot has a negative connotation in hot room, but a positive one in hot water). Finally, OPINE identies the SO label of word w in the context of feature f and sentence s. For example, some people like large scanners (I love this large scanner) and some do not (I hate this large scanner). The phrases with non-neutral head words are retained as opinion phrases and their polarity is established accordingly. On the task of opinion phrase extraction, OPINE obtains a precision of 79% and a recall of 76% and on the task of opinion phrase polarity extraction OPINE obtains a precision of 86% and a recall of 84%. Implicit Features Opinion phrases refer to properties, which are sometimes implicit (e.g., tiny phone refers to the phone size). In order to extract such properties, OPINE rst clusters opinion phrases (e.g., tiny and small will be placed in the same cluster), automatically labels the clusters with property names (e.g., Size) and uses them to build implicit features (e.g., PhoneSize). Opinion phrases are clustered using a mixture of WordNet information (e.g., antonyms are placed in the same cluster) and lexical pattern information (e.g., clean, almost spotless suggests that clean and spotless are likely to refer to the same property). (Hu and Liu, 2004) doesnt handle implicit features, so we have evaluated the impact of implicit feature extraction on two separate sets of reviews in the Hotels and Scanners domains. Extracting implicit features (in addition to explicit features) has resulted in a 2% increase in precision and a 6% increase in recall for OPINE on the task of feature extraction. Ranking Opinion Phrases Given an opinion cluster, OPINE uses the nal probabilities associated with the SO labels in order to derive an initial opinion phrase strength ranking (e.g., great > good > average) in the manner of (Turney, 2003). OPINE then uses Web-derived con- straints on the relative strength of phrases in order to improve this ranking. Patterns such as a1 , (*) even a2 are good indicators of how strong a1 is relative to a2 . OPINE bootstraps a set of such patterns and instantiates them with pairs of opinions in order to derive constraints such as strength(deaf ening) > strength(loud). OPINE also uses synonymy and antonymy-based constraints such as strength(clean) = strength(dirty). The constraint set induces a constraint satisfaction problem whose solution is a ranking of the respective cluster opinions (the remaining opinions maintain their default ranking). OPINEs accuracy on the opinion ranking task is 87%. Finally, OPINE outpu...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

UPenn - P - 86
BRINGINGNATURALLANGUAGEPROCESSINGTO T H E M I C R O C O M P U T E RMARKETTHES T O R Y OF Q & AGary G. Hendrix Symantec Corporation 10201 Torre Avenue Cupertino, CA 95014OVERVIEWThis is the story of how one of the new natural langu
UPenn - H - 05
Extracting Product Features and Opinions from ReviewsAna-Maria Popescu and Oren Etzioni Department of Computer Science and Engineering University of Washington Seattle, WA 98195-2350 {amp, etzioni}@cs.washington.eduAbstractConsumers are often for
UPenn - N - 07
OMS-J: An Opinion Mining System for Japanese Weblog Reviews Using a Combination of Supervised and Unsupervised ApproachesGuangwei WangGraduate School of Information Science and Technology Hokkaido University Sapporo, Japan 060-0814 wgw@media.eng.ho
UPenn - C - 90
Automatic Indexing and Government-Binding TheoryRobert J. Kuhns 205 Walnut Street Brookline, M A 02146 USAABSTRACTThis project note describes a systern that receives, parses, indexes, and routes news reports. The core of this ,'mtomatic indexer
UPenn - X - 96
A N O V E R V I E W OF THE P R O T O T Y P E I N F O R M A T I O N DISSEMINATION SYSTEM (PRIDES)Virginia Cevasco Logicon, Operating Systems Division 2100 Washington Blvd. Arlington, VA 22204 gcevasco@logicon.com (703) 486-3500INTRODUCTIONThe Prot
UPenn - P - 81
J. Norwood Crout Artificial Intelligence Corporation The INTELLECT n a t u r a l language database query system, a product of Artificial Intelligence Corporation, is the only commercially available system with true English query capability. Based on
UPenn - P - 80
FUTURE PROSPECTS FOR COMPUTATIONAL LINGUISTICS Gary G. H e n d r i x SRI International Preparation of this paper was supported by the Defense Advance Research Projects Agency The views under contract N00039-79-C-0118 with the Naval Electronic Systems
UPenn - X - 93
DICTIONARY CONSTRUCTION BY DOMAIN EXPERTSEllen Riloff and Wendy G. LehnertDepartment of Computer Science University of Massachusetts Amherst MA 01003 Sites participating in the recent message understanding conferences have increasingly focused thei
UPenn - C - 86
Q&A: A l r e a d y a S u c c e s s ?Gary G. Hendrix Symantec Corporation Cupertino, CA 95014 USA When Prof. Wolfgang Wahlster (the organizer of this COLING-86 panel on "Natural Language Interfaces: Ready for Commercial Success?") sent out invitatio
UPenn - H - 91
Bayesian Learning of Gaussian M i x t u r e Densities for Hidden Markov ModelsJean-Luc Gauvain I and Chin-Hui LeeSpeech Research Department AT&T Bell Laboratories Murray Hill, NJ 07974 ABSTRACTAn investigation into the use of Bayesian learning of
UPenn - N - 06
From Pipedreams to Products, and Promise!Janet M. BakerSaras Institute / Dibner Institute MIT Bldg E56-100 38 Memorial Drive Cambridge, MA 02139 USA HistSpch@mit.eduPatri J. PuglieseSaras Institute / Dibner Institute MIT Bldg E56-100 38 Memori
UPenn - E - 89
Fourth Conferenceof the European Chapterof theAssociation for Computational LinguisticsProceedings of the Conference10- 12 April 1989 University of Manchester Institute of Science and Technology Manchester, EnglandPublished by the Associat
UPenn - J - 96
Book ReviewsJapanese Sentence Processing Reiko Mazuka and Noriko Nagai (editors)(Duke University) Hillsdale, NJ: Lawrence Erlbaum Associates, 1995, x+360 pp; hardbound, ISBN 0-8058-1125-7, $89.95Reviewed by Patrick Sturt University of Edinburgh
UPenn - J - 96
Computational LinguisticsVolume 22, Number 2Representing Time in Natural Language: The Dynamic Interpretation of Tense and AspectAlice G. B. ter M e u l e n(Indiana University) Cambridge, MA: The MIT Press, 1995, xii+144 pp; hardbound, ISBN 0-
UPenn - E - 87
Third Conference of t h e European Chapter of t h e A s s o c i a t i o n for C o m p u t a t i o n a l LinguisticsP r o c e e d i n g s of t h e C o n f e r e n c e1-3 A p r i l 1987 U n i v e r s i t y of C o p e n h a g e n Copenhagen, Denmark
UPenn - J - 95
Briefly NotedParallel natural language processing Geert Adriaens and Udo Hahn (editors)(University of Leuren and University of Freiburg) Norwood, NJ: Ablex Publishing Company, 1994, vi + 467 pp. Hardbound, ISBN 0-89391-869-5, $79.50 ($37.50 prepaid
UPenn - J - 96
Book ReviewsNatural Language Processing for Prolog Programmers Michael A. Covington(University of Georgia) Englewood Cliffs, NJ: Prentice-Hall, 1994, xvi+348 pp; hardbound, ISBN 0-13-629213-5, no price listedReviewed by Ken Barkerand Stan Szpako
UPenn - J - 01
Book ReviewsLexicon Development for Speech and Language Processing Frank Van Eynde and Dafydd Gibbon (editors)(University of Leuven and University of Bielefeld) Dordrecht: Kluwer Academic Publishers (Text, speech and language technology series, ed
UPenn - J - 89
Book ReviewsBriefly Notedfully, and with a pedagogically attractive style, it discusses topics related to expert systems (e.g., representation, rules, shells, explanation, and confidence factors) as well as topics related to natural language proc
UPenn - J - 89
Book ReviewsText Coherence in TranslationTEXT COHERENCE IN TRANSLATION Bart Papegaaij and Klaus Schubert (BSO/Research, Utrecht)Dordrecht: Foils Publications, 1988, 211 pp. (Distributed language translation 3) ISBN 90-6765-360-8, Dfl 110.- (hb)
UPenn - J - 92
Book ReviewsCognitive M o d e l s of Speech Processing Gerry T. M. Altmann (editor)(University of Sussex) Cambridge, MA: The MIT Press (The ACL-MIT Press Series in Natural Language Processing), 1990, x + 540 pp. Hardbound, ISBN 0-262-01117-4, $55.0
UPenn - J - 88
Book ReviewsMachine Translation: Past, Present, Futurepresented as the only significant computational contributions. There is no discussion of processing as it is defined computationally. There is no mention of processing concerns that require in
UPenn - J - 81
Book ReviewsNatural Language Processing Harry Tennant Petrocelli Books, Princeton, 1981, 276 pp., Paperback, $17.50, ISBN 089433-100-0. (Dist: Van Nostrand-Reinhold, New York)How do computers understand natural language? This question is asked by
UPenn - J - 92
Computational LinguisticsVolume 18, Number 3Adaptive Parsing: Self-Extending Natural Language Interfaces Jill Fain Lehman(Carnegie Mellon University) Boston: Kluwer Academic Publishers (The Kluwer International Series in Engineering and Computer
UPenn - E - 83
AssooJatil~ Computatlona in uisticsProceeding:s of the ConferenceFirst Conference of the European Chapter of the for1-2 September 1983 Pisa, ItalyPublished by the Association for Computational LinKuistics1983, Association [or Computational L
UPenn - E - 93
Sixth Conference of the European Chapter of the Association for Computational LinguisticsProceedings of the Conference9~~21 - 23 April 1993 OTS - ResearchInstitute for Languageand Speech Utrecht University Utrecht, The NetherlandsPublishe
UPenn - E - 99
Ninth Conference of the European C h a p t e r of the Association for C o m p u t a t i o n a l Linguistics8-12 June 1999 University of Bergen Bergen, NorwayPublished by the Association for Computational LinguisticsThe conference was sponsored
UPenn - J - 00
Book ReviewsSpeech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition Daniel Jurafskyand James H. Martin(University of Colorado, Boulder) Upper Saddle River, NJ: Prentice Hall
UPenn - J - 05
Book ReviewsSpoken Dialogue Technology: Toward the Conversational User InterfaceMichael F. McTear (University of Ulster) London: Springer-Verlag, 2004, x+432 pp; paperbound, ISBN 1-85233-672-2, $59.95 Reviewed by Johannes Pittermann University of
UPenn - J - 96
Computational LinguisticsVolume 22, Number 3D e i x i s in Narrative: A C o g n i t i v e S c i e n c e Perspective Judith F. Duchan, Gail A. Bruder, and Lynne E. Hewitt (editors)(State University of New York at Buffalo) Hillsdale, NJ: Lawrence
UPenn - J - 98
Computational LinguisticsVolume 24, Number 4Multilingual Text-to-Speech Synthesis: The Bell Labs Approach Richard Sproat(editor)(Bell Laboratories, Lucent Technologies) Dordrecht: Kluwer Academic Publishers, 1998, xxvi+300 pp; hardbound, ISBN 0-
UPenn - J - 86
Book ReviewsConceptual Structuresinferences are generated, how many inferences are generated, and what knowledge sources contribute to the generation of inferences. In their book Structures and Procedures of Implicit Knowledge, Graesser and Clark
UPenn - J - 95
Briefly NotedVerbmobih A Translation System for Face-to-Face Dialog Martin Kay, Jean Mark Gawron, and Peter Norvig(Xerox Palo Alto Research Center, SRI International, and Sun Microsystems Laboratories) Stanford: Center for the Study of Language and
UPenn - J - 99
Book ReviewsType-Logical SemanticsBob Carpenter(Lucent Technologies) Cambridge, MA: The MIT Press (Language, speech, and communication series), 1997, xxi+575 pp; hardbound, ISBN 0-262-03248-1, $60.00; paperbound, ISBN 0-262-53149-6, $35.00Revie
UPenn - J - 00
Book ReviewsLocal Constraints vs. E c o n o m y David E. Johnson and Shalom Lappin (IBM Thomas J. Watson Research Center and King's College, University of London)Stanford, CA: CSLI Publications (Stanford monographs in linguistics), 1999, x+150 pp;
UPenn - J - 91
The Logical Structure of English: Computing Semantic Content Allan Ramsay(University College Dublin) London: Pitman, 1990, iv + 209 pp. Paperbound ISBN 0-273-03287-9, 19.99Reviewed by Martin Volk University of Koblenz-LandauH o w do you judge whe
UPenn - J - 04
Book ReviewHandbook for Language EngineersAli Farghaly (editor) (SYSTRAN Software Corporation) Stanford, CA: CSLI Publications (CSLI lecture notes, number 164) (distributed by the University of Chicago Press), 2003, xi+442 pp; hardbound, ISBN 1-575
UPenn - N - 04
HLT-NAACL 2004Human Language Technology Conference of the North American Chapter of the Association for Computational LinguisticsProceedings of the Main ConferenceMay 2-7, 2004 Boston, Massachusetts, USAISBN 1-932432-23-X900009 781932 432
UPenn - J - 90
Book ReviewsComputationalLinguistics: An IntroductionG6ran Maimgren describes regularities in polysemy: types of metaphoric transfer of meaning in nouns, regular extensions in verb meanings, and changes in adjective meanings as the argument chang
UPenn - J - 98
Book ReviewsCorpus-Based Methods in Language and Speech Processing Steve Young and Gerrit Bloothooft (editors)(Cambridge University and Utrecht University) Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology series, edited b
UPenn - J - 01
Computational LinguisticsVolume 27, Number 2Learnability in Optimality Theory Bruce Tesar and Paul Smolensky(Rutgers University and The Johns Hopkins University) Cambridge, MA: The MIT Press, 2000, vii+140 pp; hardbound, ISBN 0-262-20126-7, $25.
UPenn - J - 93
Book ReviewsA n I n t r o d u c t i o n to M a c h i n e Translation W. John Hutchins and Harold L. Somers (University of East Anglia and University of Manchester Institute of Science and Technology)London: Academic Press, 1992, xxi + 362 pp. Hardb
UPenn - CIS - 610
Chapter 8 A Detour On Fractals8.1 Iterated Function Systems and FractalsA pleasant application of the Hausdor distance and of the xed point theorem for contracting mappings is a method for dening a class of self-similar fractals. For this, we can
UPenn - CIS - 610
Chapter 5 Lie Groups, Lie Algebras and the Exponential Map5.1 Lie Groups and Lie AlgebrasIn Chapter 2, we dened the notion of a Lie group as a certain type of manifold embedded in RN , for some N 1. Now that we have the general concept of a manif
UPenn - CIS - 610
Chapter 7 Geodesics on Riemannian Manifolds7.1 Geodesics, Local Existence and UniquenessIf (M, g) is a Riemannian manifold, then the concept of length makes sense for any piecewise smooth (in fact, C 1) curve on M . Then, it possible to dene the s
UPenn - CIS - 610
Chapter 6 The Classication Theorem for Compact Surfaces6.1 Cell ComplexesIt is remarkable that the compact (two-dimensional) polyhedras can be characterized up to homeomorphism. This situation is exceptional, as such a result is known to be essent
UPenn - CIS - 610
Chapter 4 The Fundamental Group, Orientability4.1 The Fundamental GroupIf we want to somehow classify surfaces, we have to deal with the issue of deciding when we consider two surfaces to be equivalent. It seems reasonable to treat homeomorphic su
UPenn - CIS - 610
Chapter 4 Polyhedra and Polytopes4.1 Polyhedra, H-Polytopes and V-PolytopesThere are two natural ways to dene a convex polyhedron, A: (1) As the convex hull of a nite set of points. (2) As a subset of En cut out by a nite number of hyperplanes, mo
UPenn - CIS - 511
Chapter 8 Phrase-Structure Grammars and Context-Sensitive Grammars8.1 Phrase-Structure GrammarsContext-free grammars can be generalized in various ways. The most general grammars generate exactly the recursively enumerable languages. Between the c
UPenn - CIS - 610
Chapter 4 Basics of Classical Lie Groups: The Exponential Map, Lie Groups, and Lie AlgebrasLe rle prpondrant de la thorie des groupes en mathmatiques a t longtemps o e e e e ee insouponn; il y a quatre-vingts ans, le nom mme de groupe tait ignor. Ce
UPenn - CIS - 511
Chapter 6 Elementary Recursive Function Theory6.1 Acceptable IndexingsIn a previous Section, we have exhibited a specic indexing of the partial recursive functions by encoding the RAM programs. Using this indexing, we showed the existence of a uni
UPenn - CIS - 610
Chapter 4 Manifolds, Tangent Spaces, Cotangent Spaces, Vector Fields, Flow, Integral Curves4.1 ManifoldsIn Chapter 2 we dened the notion of a manifold embedded in some ambient space, RN . In order to maximize the range of applications of the theor
UPenn - CIS - 610
Chapter 8 The Log-Euclidean Framework Applied to SPD Matrices and Polyane Transformations8.1 IntroductionIn this Chapter, we use what we have learned in previous chapters to describe an approach due to Arsigny, Fillard, Pennec and Ayache to dene a
UPenn - CIS - 610
Chapter 6 Riemannian Manifolds and Connections6.1 Riemannian MetricsFortunately, the rich theory of vector spaces endowed with a Euclidean inner product can, to a great extent, be lifted to various bundles associated with a manifold. The notion of
UPenn - CIS - 610
Chapter 4 Partial Orders, Lattices, Well Founded Orderings, Equivalence Relations, Distributive Lattices, Boolean Algebras, Heyting Algebras4.1 Partial OrdersThere are two main kinds of relations that play a very important role in mathematics and
UPenn - CIS - 610
Chapter 2 Relations, Functions, Partial Functions2.1 What is a Function?We use functions all the time in Mathematics and in Computer Science. But, what exactly is a function? Roughly speaking, a function, f , is a rule or mechanism, which takes in
UPenn - STAT - 102
Homework 1Spring 2007(HW for Sections 2 & 3 (Zhao) is due in class on Jan. 16th and for Section1 is due in class on Jan. 17th.)Read: Chapter 2: Sections 2.1 through 2.7 should be review. Sections 2.8 & 2.9 may be newWritten HW: Problems 2.25,
UPenn - STAT - 102
Statistics 102Lecture 2L. Brown & L. ZhaoSpring 2007Tests and Confidence Intervals for Two MeansRead: Sections 2.7 and 2.8 of Dielman Do advertisements help to increase store sales? Data from two independent samples Analysis assuming equal
UPenn - STAT - 102
Department of Statistics The Wharton School University of Pennsylvania Statistics 102L. Brown & L. ZhaoSpring 2007Administrative IssuesWeb site www-stat.wharton.upenn.edu/~stat102 TEXT: Dielman, T. Applied Regression Analysis Fourth Edition,
UPenn - STAT - 011
UPenn - STAT - 542
Stat 542 Homework 1 - Due Thursday, February 12th at 10:30am You are required to submit a hard copy document in class with answers to the following questions. This document must be generated using the LaTeX typesetting language. It should also includ