00:13 Wednesday 16th September, 2015
See updates and corrections at http:/www.stat.cmu.edu/~cshalizi/mreg/
Lecture 1: Optimal Prediction (with Refreshers)
36-401, Fall 2015
1 September 2015
Regression analysis is about investigating quantitative, predicti
36-401 R Exercise
This document is an exercise to help you learn R, while incorporating some Exploratory
Data Analysis (EDA).
We will be using the stat programming language R in this class to analyze data in class.
R is a freely available, multi-platform
Homework 2
(1) Page 25 Question 1.25 (a) and 1.26 (a) [Data is provided]
(2) page 50 1.73
(3) Use the data from 1.73 to calculate the mean and variance.
1
Measuring Spread
Consider three data sets:
Set A: 0,2,6,10,12
Set B: 4,5,6,7,8
Set C: 6,6,6,6,6
Notice: all sets have the same mean of 6, but the spread of the distribution in set A is greater than B and
C and the spread in set C is = 0.
A measure of cen
Probability Applications to Counting Principles
Example: A group of 5 students contains 3 brothers. All students in this group will be randomly seated in
ordered chairs. Find the probability that the 3 brothers will be seated next to each other.
(a) 4/10
Homework 3
1. A password consists of 3 digits followed by 3 letters. If repetition of digits is allowed, but repetition
of letters is not allowed, determine the number of different passwords that can be made.
2. You are choosing an 8-character alpha-numer
Normal distribution: Non-standard normal distribution
Let X N (, 2 ) 6= 0, 2 6= 1,
Because the normal distribution is symmetric it follows that P (X > + a) = P (X < a).
Because the normal distribution is a continuous distribution. Therefore, P (X a) = P
Graphical representations of quantitative data
For quantitative variables we will study the following plots: Histograms; Stemplots (stem-and-leaf plots);
Dotplot and Boxplot.
1. A histogram is a bar chart for grouped numerical data, with no gaps between a
Probability
Definitions and Concepts
Random experiment is an experiment or a process for which the outcome cannot be predicted with
certainty.
Example1: Toss a coin.
Example2: Roll a die.
Sample space (denoted by S): is the set of all possible outcomes
Homework
The following data was collected comparing score on a measure of test anxiety and exam score.
Measure of test anxiety (x) 23 14
Exam score (y)
43 59
14 0 7
48 77 50
20 20 15
52 46 51
21
51
1. Construct a scatterplot. What does this plot reveal?
2
Continuous Distributions
In contrast to discrete random variables (like binomial distribution) there are many situations where the
values of a random variable (r.v.) cannot be counted. For example the height of a student is limited by the
accuracy of the
1 X
2)
Sampling distribution of (X
Sometimes, researchers want to get information about the difference between two population means. For
example:
(a) The difference between the mean income of two cities.
(b) The difference between the mean recovery time u
3.4 The Binomial Probability Distribution
A Bernoulli Trials is a random experiment with the following features:
* Each outcome is either success or failure.
* We denote the probability of success by p and probability of failure by q = 1 p.
A Bernoulli ra
Discrete random variables
A random variable is a numerical outcome that results from a random experiment. For each element of an
experiments sample space, the random variable can take on exactly one value.
A discrete random variable X has a finite numbe
Homework 12
1. A university has found over the years that out of all the students who are offered admission, the
proportion who accept is .70. After a new director of admissions is hired, the university wants to check
if the proportion of students accepti
Organizing and Summarizing Categorical Data
Frequency distribution
Definition: A categorical frequency distribution, used for categorical (qualitative) data, is a table listing
the categories, together with the frequency of occurrence of each category in
Homework 10
1. The heights of young American women, in inches, are normally distributed. I se-
lected a random sample of 4 young American women and measured their heights.
The 4 heights, in inches, are: 63
69
62
66.
(a) Compute a 99% CI for
(b) Is the tr
Homework 11
1. An air freight company wishes to test whether or not the mean weight of parcels shipped on a particular
root exceeds 10 pounds. A random sample of 49 shipping orders was examined and found to have
average weight of 11 pounds. Assume that th
36-201
Class Notes for Section 1.1
Types of data:
Population Data: is everything or everyone we are studying.
Sample Data: is a subset of the population.
Example: Identify the population and the sample for each of the following:
1. CMU is interested in
Sampling distributions
Sampling distributions are important in the understanding of statistical inference.
Probability distributions permit us to answer questions about sampling and they provide the foundation
for statistical inference procedures.
Defin
Homework 8
1. Using the Normal table or the minitab, find the following probabilities for a standard Normal variable
Z. Be sure to draw a picture to check your calculations.
(a) P (|Z| < 1.2)
(b) P (|Z| > 0.5)
2. Suppose that Professor Smiley has a policy
Homework 6
1. Assume indepenednet random variables solve 4.74 & 4.76 page 280
2. In a game that costs $3 to play, a fair die is rolled. If a five or a six is rolled the player receives $6. If any other number is
rolled, the player receives nothing. What i
Describing Distributions with numbers
How many scores are between 98 & 107?
What percentage of scores are between 98 & 107?
How many scores are greater or equal to 125? What is the percentage of these scores?
Measures of Central Tendency
Often it is n
36-303 Homework 1
01/26/2017
Reading for next week: Groves Ch 3, Ch 5
The following questions are due 1/26/2017, by 3:00PM
1) Define the sampled population and a sample. What is the difference between the two?
2) For each of the following exercises (taken
Regression Homework 4
US Home Sales
Janet Ye: jy1151
April 16, 2014
1
Introduction
Number of home sales can be thought of as a demand curve. Basic economic intuition tells us that the demand
of home sales is inversely related to the price of homes. In thi
Homework 4 ANSWER KEY
36-201 Spring 2016
Point distribution:
Exercise:
Pts:
1
42
2
44
3
11
neatness
3
TOTAL
100
1. A social scientist conducted a study on a random sample of 330 U.S. teenagers. The number of electronic
gadgets (such as computers, TVs, and
Homework 1 ANSWER KEY
36-201 Spring 2016
Point distribution:
Exercise:
Pts:
1
2
2
18
3
18
4
15
5
21
6
6
7
18
neatness
2
TOTAL
100
1. According to the syllabus, what is the location where late homework should be submitted? [Say the building and
room number
Homework 8 KEY
36-201 Spring 2016
Exercise:
Pts:
Point distribution:
1
35
2
38
3
12
4
5
5
8
neatness
2
1. To estimate the mean annual employee profit (in dollars) at financial companies, Forbes randomly sampled 43
employees at financial companies. Their a
Homework 3 ANSWER KEY
36-201 Spring 2016
Point distribution:
Exercise:
Pts:
1
50
2
49
neatness
1
TOTAL
100
1. In lab 2, we said we didnt yet know how to explore some interesting questions about relationship between
variables, but now we do know how to exp
Homework 7 KEY
36-201 Spring 2016
Point distribution:
Exercise:
Pts:
1
18
2
19
3
20
4
20
5
21
neatness
2
1. The amount of money that individuals in the U.S. spend for the holidays is strongly right-skewed.
(a) Suppose we select a random sample of 6 people