Further we applied two feature selection methods namelyterm frequency and

Further we applied two feature selection methods

This preview shows page 21 - 23 out of 23 pages.

content words and character n-grams. Further, we applied two feature selection methods namely—term frequency and information gain to reduce the number of features extracted to only the most relevant ones. The relevant features extracted and selected were then classified using Naı¨ve Bayes and maximum entropy algorithms. The algorithm performances were compared based on accuracy and F-measure. We found that the feature ‘‘part of speech n-gram’’ gave better classification accuracy and F-measure than the other features, like—words and character n-grams across the datasets. Optimal results reveal that part of speech n-grams are the best features for classifying the gender of microblog authors. Naive Bayesian and maximum entropy classifiers have similar precision, recall and accuracy performance with this feature. This establishes that authorial style based features can be applied to distinguish between the genders based on their writing behavior on microblogs like- Twitter, and Facebook and are more universal in nature as compared to other features. In future, the research could be extended by considering other feature selection techniques like IDF, TFIDF etc. and capturing their effect on the overall result. Also, other classification techniques which consider a non-linear decision boundary, like—SVM, neural networks or a Bayesian network could be applied. The research could also be extended to other related areas which have been traditionally difficult to classify such as detection of sarcasm in unstructured text. References Alowibdi JS, Buy UA, Yu P (2013) Language independent gender classification on Twitter. In: Proceedings of 2013 IEEE/ACM international conference on Advances in social networks analysis and mining (ASONAM). IEEE, Niagara Falls, pp 739–743. doi: 10.1145/2492517.2492632 Argamon S, Koppel M, Fine J, Shimoni AR (2003) Gender, genre, and writing style in formal written texts. Text Interdiscip J Study Discourse 23:321–346. doi: 10.1515/text.2003.014 Argamon S, Koppel M, Pennebaker J, Schler J (2009) Automatically profiling the author of an anonymous text. Commun ACM. doi: 10.1145/1461928.1461959 Argamon S, Koppel M, Pennebaker JW, Schler J (2007) Mining the blogosphere: age, gender and the varieties of self-expression. First Monday 12(9). doi: 10.5210/fm.v12i9.2003 Azam N, Yao J (2012) Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst Appl 39:4760–4768. doi: 10.1016/j.eswa.2011.09.160 Baayen H, Van Halteren H, Tweedie F (1996) Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit Linguist Comput 11:121–132 Berger A, Pietra V, Pietra S (1996) A maximum entropy approach to natural language processing. Comput Linguist 22:39–71. doi: 10.3115/1075812.1075844 Gender classification of microblog text based on authorial style 137 123
Image of page 21
Binongo JNG (2003) Who wrote the 15th book of Oz? An application of multivariate analysis to authorship attribution. Chance 16:9–17. doi: 10.1080/09332480.2003.10554843 Burger JD, Henderson J, Kim G, Zarrella G (2011) Discriminating gender on Twitter. Test 146:1301–1309. doi: 10.1007/s00256-005-0933-8 Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55:78. doi: 10.
Image of page 22
Image of page 23

You've reached the end of your free preview.

Want to read all 23 pages?

  • Winter '18
  • Amrita Chakraborty
  • Naive Bayes classifier, Document classification, P. K. Bala

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask ( soon) You can ask (will expire )
Answers in as fast as 15 minutes
A+ icon
Ask Expert Tutors