1.
Data Preprocessing
a Combining some categories with similar probability.
b.
c. The model has worse performance without OpenPrice.
d. H0:1=0;Ha:10
SincethePvalueislessthanthesignificancelevel(0.05),wecannotac
ceptthenullhypothesis.
Thecoefficientforclos
examples
t
i
m
i
l
l
a
r
m
t
e
n
r
)
e
s
c heo an
t me
r
o
f
(
Dr. Mine etinkaya-Rundel
Duke University
Suppose my iPod has 3,000 songs. The histogram below shows the distribution of the
lengths of these songs. We also know that, for this iPod, the mean
PDF
A probability density function (pdf), is a function associated with a continuous random variable
Areas under pdfs correspond to probabilities for that random variable
To be a valid pdf, a function must satisfy
1. It must be larger than or equal to zer
accuracy vs.
precision
confidence level
width of an interval
trade-offs
Dr. Mine etinkaya-Rundel
Duke University
confidence level
Suppose we took many samples
and built a confidence interval from
each sample using the equation
Then about 95% of th
another introduction
to inference
review simulation based inference
review hypothesis testing framework
Dr. Mine etinkaya-Rundel
Duke University
remember when
promotion
gender
promoted
not promoted
total
male
21
3
24
female
14
10
24
total
35
13
48
% o
required
sample size for ME
sample size vs. accuracy
Dr. Mine etinkaya-Rundel
Duke University
backtracking to n for a given ME
given a target margin of error, confidence level, and information on the
variability of the sample (or the population), we can
confidence interval
(for a mean)
what is a confidence interval?
conditions
finding & interpreting
Dr. Mine etinkaya-Rundel
Duke University
A plausible range of values for the population parameter is called a
confidence interval.
If we report a poin
examples
e
c
n
e
d
l
fi
a
)
n
v
n
o
r
a
c te
e
in e m
n
o
r
o
f
(
Dr. Mine etinkaya-Rundel
Duke University
The General Social Survey asks: For how many days during the past 30 days was your mental health,
which includes stress, depression, and problems w
Stat 280, Fall 2016
Chapter 2
Numerical summaries of data
1. What did we look for in a histogram?
2. Location
(a) mean. Show Section 1. Center of gravity. Sum of errors is 0.
(b) median. Show Section 2.
(c) mean, median, skewness. Show Section 3.
(d) sele
Stat 280, Fall 2016
Home, Chapter 2
Solutions
> data <- read.csv("data.csv")
> head(data)
x
1 22.5
2 2.0
3 0.4
4 2.7
5 42.9
6 4.2
1. > hist(data$x)
10
0
5
Frequency
15
Histogram of data$x
0
10
20
30
40
data$x
2. I think x will best be desscribed by the me
15.075J: Statistical Thinking and Data Analysis
Lecture 4: Confidence Intervals
Instructor: Hoda Bidkhori
1/58
Study: Young, unemployed, and optimistic
41% of the public believes young adults (i.e. not middle-aged
or older adults) are having the toughest
15.075J: Statistical Thinking and Data Analysis
Lecture 5: Statistical Tests
Instructor: Hoda Bidkhori
1/43
Example: fair coin?
We want to know whether our coin is a fair coin (whether the probability
of getting heads is 0.5).
2/43
Example: fair coin?
We
0910
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and
MS Word documents. For more details on using R Markdown see http:/nnarkdown.rstudio.com.
When you click the Knit button a document will be generated t
Stat 280, Fall 2016
Homework, Chapter 3
File animal.csv contains (fictional) data about birthweight and gestation
period for a number of animals. The file also contains information about each
animals habitat (forest, grassland, desert, rainforest) and die
Stat 280, Fall 2016
Homework, Chapter 1
We are studying production of a candy; it is a small chocolate covered in
a hard shell. The candy comes in six different colors. We are looking at
the output of two separate factories, labeled a and b. We sample 100
Stat 280, Fall 2016
Homework, Chapter 1
Solutions
Attention graders: the students answers do not have to be exactly the
same as mine. Please make sure you read and understand each question in
the assignment. Do not just compare the students answers to the
Stat 280, Fall 2016
Homework, Chapter 2
Read in the data file data.csv. This data set contains one variable - x.
Note: a common error when using a data set with only one variable is
to use the data set name instead of the variable name in R commands, e.g
Stat 280, Fall 2016
Homework, Chapter 4
Coin A has a 50% chance of coming heads on each flip. Coin B has a 75%
chance of coming up heads on each flip. We flip A twice and B twice. The
flips are independent. Let X be the total number of heads we observe in
Distribution of Sample Means
The distribution of the means of an infinite
number of samples of a certain size selected
from a population
Before, we talked about distributions of scores, but now
we are talking about the distribution of all possible
sample