a simple average rating to give an overall summary of these reviews.In the Web 2.0 websites such as Tripadvisor are fully dedicated to storinghotel reviews so anyone can now search for almost any hotel in any city, andread user reviews to get an idea on the quality of the hotel. The problemhas now become a problem of reading the most relevant reviews and tryingto get an overall picture of what people who have stayed in this hotel think.Many sites have started using ranking systems based on relevance (x num-ber of users have found this relevant) but there is no easy system to get anoverall idea of what all users think. Simply using ratings (usually a valuefrom 1 to 5) is simply not enough to give us a description of what peoplethink of a hotel.The process of reading user reviews when searching for a hotel is a ratherdaunting and lengthy task, since there are hundreds of reviews per hotel,and they tend to vary too much to make a uniform decision.The sameproblem affects the hotel owners, in the sense that they also use these re-3
view sites to find out what is wrong with their hotel, and simply readingthe rating a review has given gives you no constructive feedback. In manysites you now see hotels answering a customers reviews which means theyhave to individually read every review and form a conclusion on what thatcostumer liked and disliked. This is time consuming and inefficient.The process of extracting the opinions would eliminate this problem bysummarising reviews in terms of the positive and negative features about ahotel as expressed by the users opinions. This of course would allow a newtype of customized search were users could give priority to specific featuresof a hotel over others, therefore skewing the ratings of a particular hotel.This is where my opinion mining system comes in, a merge of severalNatural Language Processing, Machine Learning and Information Extrac-tion techniques aimed at the extraction of user opinions from hotel reviewsin order to provide potential customers with a more intuitive access to thesentiment expressed in hundreds of reviews.4
1.1Problem DescriptionThis thesis falls within the feature-based opinion mining type, having as thebasic unit of opinions features of the domain as opposed to larger units usedin many systems such as sentences or documents.The main focus of this thesis is the development a system for process-ing a large database of textual hotel reviews in English to extract relevantopinions from users on a series of predefined features of potential interestto users.The aim of this system is to replace the baseline which is cur-rently being used to provide a basic opinion mining service within an onlinerecommendation service and to improve the systems ability to extract useropinions in both the accuracy of the opinions being extracted, and the num-ber of opinions detected.Given that the proposed system aims at beingimplemented within a larger framework, it is important to maintain thesame type of input/outputs as the original system as to prevent major mod-ifications to the online services.