100%(1)1 out of 1 people found this document helpful
This preview shows page 1 - 2 out of 3 pages.
Assignment Name - Advance Predictive ModellingProblem Statement -Answer the following questions to the best of your knowledge including the concepts taughtto you in the level.1.How will you treat text having short cut words (like bcz, u, thr etc…)?After a text is obtained, we start with text normalization. Text normalization includes:●convertingall letters to lower or upper case●convertingnumbers into words or removing numbers●removingpunctuations, accent marks and other diacritics●removingwhite spaces●expandingabbreviations●removingstop words, sparse terms, and particular words●textcanonicalizationShort cut words can be treated in 2 ways:Expand the short cut words:Stemming can bring the words in root form, though stemmingobject group needs to be defined for these words. Normalization techniques can be applied toexpand these words.Remove the short cut words from text: By Tokenization in python or using“re”regex library orStop words list can also be updated to remove these words from text.