Assignment Name - Analytics basics 1. Write R code using data “IMDB_data” #remove all the objects stored rm(list=ls()) #set current working directory setwd(“D:/edwisor/PREDICTIVE ANALYSIS USING R AND PYTHON/BASIC ANALYTICS/Assignment 3”) #set current working directory getwd() 1. #Loading CSV in R by skipping second row IMDB_data=read.csv("IMDB_data.csv", header=TRUE)[-2,] 2. #Extract the unique genres and its count and store in data frame with index key df =length(unique(IMDB_data$Genre)) -249 3. #Converting the required data types IMDB_data=as.numeric(IMDB_data) 4. #Sort the genre by its name IMDB_data=IMDB_data[order(IMDB_data$Genre),] 5. #Create new variable whose values should be square of difference between x = x = x = (as.numeric (IMDB_data$imdbRating))^ 2; y =(as.numeric (IMDB_data$imdbVotes))^ 2; IMDB_data$Square_Diff = with(IMDB_data, (x-y)) 2 . Write Python code using data “IMDB_data” to Import os Os.chdir()
Os.getcwd() Import os Import pandas as pd Import numpy as np Import matplotlib as mlt Imdb_data_csv=pd.read_csv(“imdb_data.csv”,sep=”,”) Df_excel=pd.read_excel(“df_IMDB_data.xlsx”) Df_excel[‘genre’].unique() Df_excel[‘genre’].nunique() Df.sort_values([“genre”]) Degree_counts=df[“genre”].value_counts() print("datatype of imdbrating:",type(imdbrating)) print("datatype of imdbvotes:",type(imdbvotes)) import math rat=math.pow(imdbrating, 2) vot=math.pow(imdbvotes, 2) sqr= rat-vot 2. A chemist wants to find some interesting patterns in which patients are behaving upon administering the drug” The chemist type problem statement falls in Unsupervised Learning problem category. In the above problem the chemist will not focus on predetermined attributes, nor does it predict a target value. Rather, chemist wants to find hidden structure and relation among data. Unsupervised learning refers to techniques that find patterns in unlabeled data, or data that lacks a defined response measure. In Unsupervised learnings, we have less information about objects. So here we try to find some similarities between groups of objects and include them in an appropriate cluster which is called as segmentation clustering. Some object can differ hugely from all clusters, in this way we can say that these objects are anomalies. 4. How will you select suitable machine learning algorithm for a problem statement
For solving a problem statement we require following specifications.
- Winter '17
- Machine Learning, Artificial neural network, Statistical classification, Random Forest