lecture1

# lecture1 - Random sample Selection bias nonresponse bias...

This preview shows pages 1–8. Sign up to view the full content.

Random sample Selection bias, nonresponse bias, measure- ment error— common to design of exper- iments. what is a good sample? Intuitively, we want something representative of the pop- ulation. In statistics, it is formalized as a random sample : a sample selected from the population in such a way that every dif- ferent sample of size n has an equal chance of selection. Of course, it is easy to say it, but not easy at all to get it. 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
EPA car mileage rating data However, one can easily get samples like EPA, EPAn2, EPAn06. See R output and compare their histograms using hist(). This is a good time to introduce R, a free statistical package, which is downloadable from http://cran.r-project.org/ on which, you can also ﬁnd introductions, both quick and comprehensive. 2
Advantages of R over minitab: (1) free; (2) written by research statisticians who are working at the frontier, which means more built in modern statistical packages. (3) interactive interface; and many other features. However, it is not as commecial- ized as minitab, so less popular in industry. 3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
How to make stem-and-leaf display? com- mand: stem( ) Numerical measures of central tendency: One obvious choice is the mean, which is deﬁned as ¯ x = n i =1 x i n , where x i ’s are data points. Look at the EPA data, one can get the sample mean by using mean(EPA). You can check that with sum(EPA)/100 . Mean tells you where most of the observations tend to center around. 4
The other competitive notion is median : suppose you have odd number of data points, the median is deﬁned to be the value right in the middle of the sorted data; but if your sample has even number of points, the me- dian is the average of those two values in the center of your sorted data. compare median and mean for the data: 2.3, 4.5, 6.4, 8.4, 3.4, 5.3, 4.7,3.8. Claim: median is robust to outliers. In this regard, median is more accurate in measuring the center. Indeed one may have skewed data due to measurement error, which may bring in out- liers. See the data EPAn06 . So be careful when measuring the center. 5

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
How should one measure the spread, or the variability of your data? You may think of the range, i.e., max-min. What if there are outliers due to measure- ment error. Will range reﬂect the true spread out? Statisticians tend to use the so-called sam- ple variance . By formula it is given by s 2 = n i =1 ( x i - ¯ x ) 2 n - 1 . Alternatively, a commonly used related quan- tity is the sample standard deviation , which is the square root of the sample variance: s = q s 2 . 6
As you can imagine, if the whole popu- lation is observed, the population variance and its standard deviation would be deﬁned in the similar way. Statisticians tend to denote them by σ 2 . But keep in mind, these are usually not available, because the population is unmanageable. So they are parameters (or characteristics, as you may call ) that need to be estimated. Look at

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 27

lecture1 - Random sample Selection bias nonresponse bias...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online