1) Copy the notebook to your google account.
2) Import the pandas library and alias it as
3) Read in the CSV dataset that is found at the following URL:
4) Print out the shape as well as the first 5 rows of the dataframe.
5) Print out the datatypes of the dataframe columns (dataset features).
6) Print out the summary statistics of the numeric values of your dataset i.e. min, max, mean, standard deviation, etc.
6.1) Describe how you addressed the NaN values and give an explanation justifying your decision.
7) Create scatter plots using Matplotlib. Can you find any interesting relationships in the data? Be sure to label your axis and to give your graphs a title.
Screenshot cool graphs that you create
and share them with the slack channel.
Don't forget to import matplotlib before trying to use it.
8) STRETCH GOAL (Extra Credit)
Machine Learning algorithms don't do well with categorical values that are represented by strings. In order to have this dataset completely cleaned we need to transform the categorical variables that are represented as strings into numeric categorical variables
Recently Asked Questions
- Please refer to the attachment to answer this question. This question was created from Homework 2.
- I have a code and all I need is whenever I enter a wrong id number to print out an output "*** ERROR: Employee ID not found! ***". How to do that? #define
- A fashion company has had an elaborate set of information systems applications developed over the years to support operations and management. However, there is