100%(5)5 out of 5 people found this document helpful
This preview shows page 1 - 2 out of 2 pages.
Assignment Name - Advance Predictive Modeling1.How will you treat text having short cut words (like bcz, u, thr etc…)?Answer:After a text is obtained, we start with text normalization. Text normalization includes:•converting all letters to lower or upper case•converting numbers into words or removing numbers•removing punctuations, accent marks and other diacritics•removing white spaces•expanding abbreviations•removing stop words, sparse terms, and particular words•text canonicalizationShort cut words can be treated in 2 ways:➢Expand the short cut words:Stemming can bring the words in root form, thoughstemming object group needs to be defined for these words. Normalization techniquescan be applied to expand these words.➢Remove the short cut words from text: By Tokenization in python or using “re” regexlibrary or Stop words list can also be updated to remove these words from text.