Course ID: ISMG 6470 Course Name: Text Analytics Student ID: 108731296 Student Name: Mrinal Bhat Submission Date: 10/09/18 1. library('tm') library("SnowballC") my.text.location <- "C:/Users/mrina/OneDrive/Documents/Text/Assignments/A4/EnglishAbstract/" apapers <- VCorpus(DirSource(my.text.location)) class(apapers) mpapers <- tm_map(apapers, removeNumbers) mpapers <- tm_map(mpapers, removePunctuation) mpapers <- tm_map(mpapers, stemDocument, language = "en") The basic ‘tm’ function to pre-process the document and clean it for better mining. 2. ptm.tf <- DocumentTermMatrix(mpapers) n <- 1 top <- findMostFreqTerms(ptm.tf, n = n) topunlist <- unlist(top) sort(topunlist, decreasing = T)[1] The code presented here is for the Document Term Matrix. ‘findMostFreqTerms’ gives the heighest value per document, then the sorting helps in giving the exact result per document. In this case its ‘Austin.Eric_45_3’ with the word ‘the’ and value ‘79’. 3. ptm.tfidf <- DocumentTermMatrix(mpapers, control=list(weighting=function(x) {weightTfIdf(x,normalize=FALSE)})) y <- 1 top1 <- findMostFreqTerms(ptm.tfidf, n = y) top1unlist <- unlist(top1) sort(top1unlist, decreasing = T)[1]
