STERN SCHOOL OF BUSINESS
NEW YORK UNIVERSITY
COURSE SUPPLEMENT, PART II, REGRESSION
STATISTICS AND DATA ANALYSIS
COR1-GB.1305
Professor Peter Lakner
Office: Kaufman Management Center 8-61
Phone: (212) 998-0476
Email: plakner@stern.nyu.edu
Contents
Histogram for House Price Listings
Histogram of Listing
14
12
10
Frequency
A histogram
describes the
sample data and
suggests the
nature of the
underlying data
generating
process. Note the
skewness of the
distribution of
listings.
8
6
4
2
0
35/39
200000
Pie Chart vs. Frequency Table
Pizza Pies Sold, by Type
Pie Chart of Percent vs Type
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Pepperoni
21.8%
Data Representation:
Bar Chart vs. Pie Chart
Chart of Number vs Type
Pie Chart of Percent vs Type
4000
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
Number
3000
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
House Price Listings and
Per Capita Incomes. States.
Regression and Correlation. Are
these two variables correlated?
r = .48
How to describe/summarize them.
How to explain the variation across
states
How to determine if there is any
Ordered Qualitative Outcomes
Bond Ratings
Movie Ratings
Arithmetic Mean may not be
meaningful.
(a) Ordinal measure rankings
(b) Look at that distribution!
22/39
Summary
What story does the data presentation tell?
Data in raw form tell no story.
Visual representation of data tells something about the data
The representation of the data may reveal something about
the underlying process that the data measure.
Making a Box Plot for Per Capita Income
Maximum=31136
3rd Quartile = 24933
Median
=22610
Interquartile Range = IQR
= 24933-21677 = 3256
1st Quartile = 21677
Minimum=17043
33/39
STATISTICS AND DATA ANALYSIS
EXERCISE 16, SOLUTIONS
Peter Lakner
(a)
= .5, = .01, and E = .015. From the normal probability table we find z.005 = 2.57,
so
2.572 .5 .5
n=
= 7, 338.78
.0152
We round this up to the next integer, that is n = 7, 339.
(b) Her

STATISTICS AND DATA ANALYSIS
EXERCISE 13, SOLUTION
Peter Lakner
The sample size is small here so we use the t-table. In the row corresponding to the degree
of freedom 20-1=19 you find t.025 = 2.093 so the 95% confidence interval is
.1789
41.93 2.093
20
w

EXERCISE 12, SOLUTIONS
Statistics and Data Analysis
Peter Lakner
You would like to make a bid on the stock of an out-of-business toy company. The population of stocks consists of 2,860 sealed fiberboard cartons. Before making a bid, you
would like to perf

EXERCISE 11, SOLUTIONS
Statistics and Data Analysis
Peter Lakner
Bluefish purchased at the Lime Beach Fishing Terminal produce a filet weight which has
a mean of 4.5 pounds with a standard deviation of 0.8 pound. If you purchase five such
fish, then what

STATISTICS FOR AND DATA ANALYSIS
EXERCISE 10, SOLUTIONS
Peter Lakner
The market share of Master Card within all credit card holders in the USA is 36%. A
random sample of 100 credit card users has been selected.
(a) What is the mean and standard deviation

STATISTICS AND DATA ANALYSIS
EXERCISE 9, SOLUTIONS
Peter Lakner
The chocolate chip cookies that are produced at Perrys Cookie Emporium have weights
which are approximately normally distributed with the mean weight 180 grams and with
standard deviations 20

STATISTICS AND DATA ANALYSIS
EXERCISE 7, SOLUTION
Peter Lakner
We want to find the smallest w such that P [X w] .01. This happens if P [X w 1]
.99. One can see from the table for cumulative binomial probabilities (n = 20 and p = .05)
that w 1 = 4 so w =

EXERCISE 5, SOLUTIONS
STATISTICS AND DATA ANALYSIS
Peter Lakner
An industrial supply firm sometimes gets calls related to improperly filled orders. This
situation is related to the salespersons error in writing up the bill of sale. It happens that
Hank wi

STATISTICS AND DATA ANALYSIS
EXERCISE 6, SOLUTIONS
Peter Lakner
You are considering a quality inspection scheme to use on the spark plugs which are sent
from your supplier. These spark plugs come in a shipments of 50,000. Denote the unknown
proportion of

STATISTICS FOR BUSINESS CONTROL
EXERCISE 3 SOLUTIONS
Peter Lakner
(a)
Descriptive Statistics: INCOME
Variable
N N*
Mean
Median
Q3 Maximum
INCOME
50
0 26.35
22.68 33.71
59.84
SE Mean
StDev
Minimum
Q1
1.58
11.19
10.40
17.73
Mean income = 26.35
Median income

EXERCISE 4, SOLUTIONS
STATISTICS AND DATA ANALYSIS
Peter Lakner
John, Kathy, Len, and Marta are the final quality inspectors for computer monitors. If a
computer monitor functions properly, then
John will say OK with probability 0.92
Kathy will say OK wit

Homework: For Big-Data Scientists, Janitor Work Is Key Hurdle to Insights
Nan Li
Sept. 16, 2016
1. After reading this article, in your opinion can organizing data from many sources be
enhanced leaving more time for analysis?
Yes, I believe that organizing

Homework: Netflixed: The Epic Battle for Americas Eyeballs
Nan Li
Oct.22, 2016
1.
Any data management techniques Netflix used to learn about its costumers?
A website: a market research platform
A typical A-B test: measure the effect of different logos
Foc

Homework: What Stays in Vegas
Nan Li
Nov.11, 2016
1. According to the book, why are casinos the hardest industry to understand customer
behavior? How would you recommend they solve this problem?
Its hard for casinos to capture customer behavior since casi

Box and Whisker Plot
= extreme
observations
What is an outlier?
Why do we believe a
particular point is an
outlier?
Outliers
Smaller of (Maximum, Median + 1.5 IQR
75th Percentile
Interquartile
range=IQR
Median
25th Percentile
Larger of (Minimum, Median 1.

MINITAB GRAPHING TRICKS
This document will show a number of tricks that can be done in Minitab to make
attractive graphs. We work first with the file X:\SOR\2405\M\ANIMALS.MTP. This
first picture was obtained through Graph Plot . It shows gestation period