Literature states that function words have methodological advantages in the

Literature states that function words have

This preview shows page 5 - 7 out of 23 pages.

Literature states that function words have methodological advantages in the study of authorial style (Binongo 2003 ). In one of his papers, Kestemont ( 2014 ) states the following properties of function words: 1. All authors writing in the same language and over the same period are bound to use the very same function words. Function words are therefore a reliable base for textual comparison. 2. Their high frequency makes them interesting from a quantitative point of view because we have many observations for them. 3. The use of function words is not strongly affected by a text’s topic or genre: the use of the article ‘the’, for instance, is unlikely to be influenced by a text’s topic. 4. The use of function words seems less under an author’s conscious control during the writing process. Any similarities between texts with respect to function words are therefore relatively content-independent and can be far more easily associated with authorship than topic-specific stylistics (Kestemont 2014 ). Use of part of speech tags has been common for text categorization in regular text and blogs (Argamon et al. 2009 ; Rao et al. 2010 ). Literature states that the use of parts-of-speech n-grams is a relatively efficient way to capture the heavier syntactic information, which is useful for distinguishing writing styles (Baayen et al. 1996 ). Extant literature also states that parts of speech used in a text are mostly independent of the topic under discussion (Koppel 2002 ; Argamon et al. 2007 ). Consequently, one can say that function words and part of speech n-grams are not affected by the topic of discussion in the text and hence are better features to classify text that depend on the author’s writing style. We attempt to classify tweets based on these features and their various combinations. We achieve higher accuracy and F-measures for these two feature types as compared to other common features used for classification such as words, character n-grams etc. (Ja ¨rvelin et al. 2007 ). We use a much smaller dataset (3000 tweets) than usually used in twitter-based classification and found higher accuracy than the commercially available software (e.g. Gender Guesser, Gender Genie). The results also show comparable accuracy and F-measure to the earlier research on the subject matter. Gender classification of microblog text based on authorial style 121 123
Image of page 5
3 Methodology A Twitter user profile can provide information about the user’s screen name, full name, location, URL and personal description. The user mandatorily provides the screen name, revealing the rest of the information is done at the user’s discretion. It’s important to emphasize that gender information is not a required field for having a Twitter account. For a small dataset like ours, visiting individual profile to excavate gender information for our training set makes sense. Although the method is labor intensive, it has been extensively used in extant literature (Rao et al. 2010 ; Miller et al. 2012 ). It should also be noted that for large datasets, our method might not be suitable and other methods such as—automatically associating blogger profile information to the associated Twitter account can be used (Burger et al.
Image of page 6
Image of page 7

You've reached the end of your free preview.

Want to read all 23 pages?

  • Winter '18
  • Amrita Chakraborty
  • Naive Bayes classifier, Document classification, P. K. Bala

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Ask Expert Tutors You can ask You can ask ( soon) You can ask (will expire )
Answers in as fast as 15 minutes