Analytics basics.docx - Assignment Name Analytics basics 1 Write R code using data \u201cIMDB_data\u201d#remove all the objects stored rm(list=ls#set current

Analytics basics.docx - Assignment Name Analytics basics 1...

This preview shows page 1 - 4 out of 7 pages.

Assignment Name - Analytics basics 1. Write R code using data “IMDB_data” #remove all the objects stored rm(list=ls()) #set current working directory setwd(“D:/edwisor/PREDICTIVE ANALYSIS USING R AND PYTHON/BASIC ANALYTICS/Assignment 3”) #set current working directory getwd() 1. #Loading CSV in R by skipping second row IMDB_data=read.csv("IMDB_data.csv", header=TRUE)[-2,] 2. #Extract the unique genres and its count and store in data frame with index key df =length(unique(IMDB_data$Genre)) -249 3. #Converting the required data types IMDB_data=as.numeric(IMDB_data) 4. #Sort the genre by its name IMDB_data=IMDB_data[order(IMDB_data$Genre),] 5. #Create new variable whose values should be square of difference between x = x = x = (as.numeric (IMDB_data$imdbRating))^ 2; y =(as.numeric (IMDB_data$imdbVotes))^ 2; IMDB_data$Square_Diff = with(IMDB_data, (x-y)) 2 . Write Python code using data “IMDB_data” to Import os Os.chdir()
Image of page 1
Os.getcwd() Import os Import pandas as pd Import numpy as np Import matplotlib as mlt Imdb_data_csv=pd.read_csv(“imdb_data.csv”,sep=”,”) Df_excel=pd.read_excel(“df_IMDB_data.xlsx”) Df_excel[‘genre’].unique() Df_excel[‘genre’].nunique() Df.sort_values([“genre”]) Degree_counts=df[“genre”].value_counts() print("datatype of imdbrating:",type(imdbrating)) print("datatype of imdbvotes:",type(imdbvotes)) import math rat=math.pow(imdbrating, 2) vot=math.pow(imdbvotes, 2) sqr= rat-vot 2. A chemist wants to find some interesting patterns in which patients are behaving upon administering the drug” The chemist type problem statement falls in Unsupervised Learning problem category. In the above problem the chemist will not focus on predetermined attributes, nor does it predict a target value. Rather, chemist wants to find hidden structure and relation among data. Unsupervised learning refers to techniques that find patterns in unlabeled data, or data that lacks a defined response measure. In Unsupervised learnings, we have less information about objects. So here we try to find some similarities between groups of objects and include them in an appropriate cluster which is called as segmentation clustering. Some object can differ hugely from all clusters, in this way we can say that these objects are anomalies. 4. How will you select suitable machine learning algorithm for a problem statement
Image of page 2
For solving a problem statement we require following specifications.
Image of page 3
Image of page 4

You've reached the end of your free preview.

Want to read all 7 pages?

  • Winter '17
  • Madhu
  • Machine Learning, Artificial neural network, Statistical classification, Random Forest

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes