In another paper citeb3 the authors prepared three

This preview shows page 17 - 19 out of 26 pages.

In another paper \cite{b3}, the authors prepared three datasets of Bengali text of hate speeches to dothree different experiments: sentiment analysis, hate speech detection, and document classification.They introduced a word embedding model for Bangla language and named it BengFastText. It containsdata based on around 250 million articles. After that, based on the Multichannel Convolutional LongShort-Term Memory Network, they predicted the result of the three experiments as mentioned earlier.After that, they compared their prediction result with other models. Their experiment's outcomeshowed that BengFastText can detect the texts more correctly than other embedding methods likeWord2Vec and GloVe. Using the BengFastText method, they achieved around 92.30\% F1-scores indocument classification. Around 82.25\%, and 90.45\% F1-scores are achieved in sentiment analysis andhate speech detection.A research article \cite{b4} proposed a model for using Natural Language Processing (NLP) and MachineLearning (ML) approaches combined to detect abusive comments from social media in the Englishlanguage. They have collected data from a neo-Nazi website \cite{b5}, which contains around 10,568sentences, and each sentence is around 20.39 words in length. They explore the dataset using theirproposed method named as “A killer natural language processing optimization ensemble deep learningmethod” (KNLPEDNN). By using the approach, the dataset is classified into three different classes such ashate, offensive, and neutral languages. Their proposed method achieved a maximum accuracy of around98.71\% to predict hate speech from social media texts.In another paper \cite{b6}, the author proposed an approach to detect social media bullying usingdifferent Machine Learning algorithms on Bangla text. They have collected datasets from different social
media platforms like Facebook and Twitter. In order to extract the dataset from Facebook and Twitter,they have developed a java program. They have collected 1000 public Bangla comments from Facebookand 1400 comments from Twitter. After that, they labeled the dataset into two categories: bullied andnot bullied. At last, they applied machine learning algorithms like SVM, Naive Bayes, KNN (1-Nearest),KNN (3-Nearest) to predict bullying. After comparing different algorithms, they found that the supportvector machine’s prediction accuracy is highest than other applied machine learning algorithms. Theyachieved around 97\% accuracy while detecting bullying using the SVM approach.\section{Dataset}In our research paper, we have collected Bangla online comments dataset from the Mendeley Datawebsite [ ]. This dataset contains around 44,001 Bangla comments collected from Facebook posts. Thedataset is labeled into five categories: not bully, sexual, troll, threat, and religious. According to thedataset, 29,950 comments are for females, and 14,051 comments are for males. Besides, the dataset’scomments are categorized into five different professional categories like actor, singer, social, politician,and sports.

Upload your study docs or become a

Course Hero member to access this document

Upload your study docs or become a

Course Hero member to access this document

End of preview. Want to read all 26 pages?

Upload your study docs or become a

Course Hero member to access this document

Term
Fall
Professor
NoProfessor
Tags
Natural Language Processing, Regular expression, Bengali language

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture