Web & Social Media Analytics Problem Statement:- 1. A dataset of Shark Tank episodes is made available. It contains 495 entrepreneurs making their pitch to the VC sharks. 2. You will ONLY use “Description” column for the initial text mining exercise. 3. Step 1: a. Extract the text into text corpus and perform following operations: i. Create DTM ii. Use “Deal” as a Dependent Variable iii. Use CART model and arrive at your CART diagram iv. Build Logistic Regression Model and find out your accuracy of the model v. Build randomForst model and arrive at your varImpPlot 4. Step 2: a. Now, add a variable to you analysis called as “ratio”. This variable is “askedfor/valuation”. (This variable is to be added as a column to your dataframe in Step 1.) b. Rebuild “New” models- CART, randomForest and Logistic Regression 5. Deliverables: (in a word document) a. CART Tree (Before and After) (5 Marks) b. RandomForest plot (Before and After) (5 Marks) c. Confusion Matrix of Logistic Regression (Before and After) (5 Marks) (Most important)- Your interpretation in plain simple English not extending more than half a page. (15 Marks)

Debriefing:- # First we put the data set in R & Checked the structure of the data codes used:- setwd("D:/R Programming/WSMA") mydata<-read.csv(file.choose(),header=T) str(mydata) str:- 'data.frame': 495 obs. of 19 variables: \$ deal : logi FALSE TRUE TRUE FALSE FALSE TRUE ... \$ description : Factor w/ 493 levels "\"Magic\" aromatherapy sprays for kids to help alleviate common fears and anxieties.",..: 202 382 183 352 307 350 69 180 98 152 ...
• Winter '17
• Dr. Chandra Shekhar

