ASSIGNMENT – ADVANCE PREDICTIVE MODELLING QUESTION 1- How will you treat text having short cut words (like bcz, u, thr etc…)? ANSWER 1- If we have a text having short cut words like bcz , u , thr ,… etc , then : Stemming can bring the words in root form , though stemming object group needs to be defined for these words. Stemming reduce tokens to root form of words to recognize morphological variation. Correct morphological analysis is language specific and can be complex. QUESTION 2- Write R and python code to replace “bcz” with “because” in whole text? ANSWER 2- PYTHON- import regex as re def remove_words(my_line): new_line ='' for i in my_line.split(): if i in compiler_bcz.findall(my_line): new_line = new_line + ' ' + 'because' else:
new_line = new_line + ' ' + i return new_line R- #preprocessing #convert bcz to because because = bcz(vector source) writeline(as.character(because)) #because = tm_map(because , plaintextdocuments) QUESTION 3- How do you deal with the English text having Hindi words in between?
You've reached the end of your free preview.
Want to read all 4 pages?
- Summer '20
- QUESTION 2Write R