A_6.docx - Assignment Name Advance Predictive Modelling 1...

  • No School
  • AA 1
  • monto05
  • 9
  • 92% (12) 11 out of 12 people found this document helpful

This preview shows page 1 - 4 out of 9 pages.

Assignment Name - Advance Predictive Modelling 1. How will you treat text having short cut words (like bcz, u, thr etc…)? To deal with such challenges in text mining there are few techniques and con- cepts to help deal with these shortcut words. We need to create predefined dictionaries . Whenever such shortcuts words comes in, it will convert it in full spelling it is called synonym check or removing noise from data. contractions = { "ain't" : "am not / are not" , "aren't" : "are not / am not" , "can't" : "cannot" , "can't've" : "cannot have" , "'cause" : "because" , "could've" : "could have" , "couldn't" : "could not" , "couldn't've" : "could not have" , "didn't" : "did not" , } text= "What's the best way to ensure this?" for word in text.split(): if word.lower() in contractions: text = text.replace(word, contractions[word.lower()]) print (text) Here, we have used for loop, in which split() is inbuilt function which splits the sentence into words. Also, if statement is used to convert shortcut words which are keys with values in dictionary named contradictions. So, this is how the shortcut words are re - placed in text mining. Another method is to import ‘re’ library and replace shop cut words with the full spellings .
Image of page 1
1. im - port re 2. 3. def decontracted(phrase): 4. # specific 5. phrase = re.sub(r "won\'t" , "will not" , phrase) 6. phrase = re.sub(r "can\'t" , "can not" , phrase) 7. 8. # general 9. phrase = re.sub(r "n\'t" , " not" , phrase) 10. phrase = re.sub(r "\'re" , " are" , phrase) 11. phrase = re.sub(r "\'s" , " is" , phrase) 12. phrase = re.sub(r "\'d" , " would" , phrase) 13. phrase = re.sub(r "\'ll" , " will" , phrase) 14. phrase = re.sub(r "\'t" , " not" , phrase) 15. phrase = re.sub(r "\'ve" , " have" , phrase) 16. phrase = re.sub(r "\'m" , " am" , phrase) 17. re - turn phrase 18. 19. 20. test = "Hey I'm Yann, how're you and how's it going ? That's interesting: I'd love to hear more about it." 1. print ( decontracted(test)) 2. Write R and python code to replace “bcz” with “because” in whole text? PYTHON CODE: We need to create dictionary to replace “bcz” with “because”. Following is the code in Python : First we need to import important libraries import numpy as np import pandas as pd from pandas import Series
Image of page 2
from pandas import DataFrame dic = { ‘bcz' : 'because' } #dictionary created text = ( 'bcz following are bcz graphical (non-con - trol) characters defined bcz’ ) #text is a variable def replace_all (text, dic): #definition of function for i, j in dic.items(): #.items will give us both key ans value pair #The syntax of replace() is:str.replace(old, new [, count]) text = text.replace(i, j) #i is key and j is value return text a = replace_all(text, dic) a
Image of page 3
Image of page 4

You've reached the end of your free preview.

Want to read all 9 pages?

  • Fall '19

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture