Lecture1 - Random sample Selection bias nonresponse bias measurement error common to design of experiments what is a good sample Intuitively we

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon
Random sample Selection bias, nonresponse bias, measure- ment error— common to design of exper- iments. what is a good sample? Intuitively, we want something representative of the pop- ulation. In statistics, it is formalized as a random sample : a sample selected from the population in such a way that every dif- ferent sample of size n has an equal chance of selection. Of course, it is easy to say it, but not easy at all to get it. 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
EPA car mileage rating data However, one can easily get samples like EPA, EPAn2, EPAn06. See R output and compare their histograms using hist(). This is a good time to introduce R, a free statistical package, which is downloadable from http://cran.r-project.org/ on which, you can also find introductions, both quick and comprehensive. 2
Background image of page 2
Advantages of R over minitab: (1) free; (2) written by research statisticians who are working at the frontier, which means more built in modern statistical packages. (3) interactive interface; and many other features. However, it is not as commecial- ized as minitab, so less popular in industry. 3
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
How to make stem-and-leaf display? com- mand: stem( ) Numerical measures of central tendency: One obvious choice is the mean, which is defined as ¯ x = n i =1 x i n , where x i ’s are data points. Look at the EPA data, one can get the sample mean by using mean(EPA). You can check that with sum(EPA)/100 . Mean tells you where most of the observations tend to center around. 4
Background image of page 4
The other competitive notion is median : suppose you have odd number of data points, the median is defined to be the value right in the middle of the sorted data; but if your sample has even number of points, the me- dian is the average of those two values in the center of your sorted data. compare median and mean for the data: 2.3, 4.5, 6.4, 8.4, 3.4, 5.3, 4.7,3.8. Claim: median is robust to outliers. In this regard, median is more accurate in measuring the center. Indeed one may have skewed data due to measurement error, which may bring in out- liers. See the data EPAn06 . So be careful when measuring the center. 5
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
How should one measure the spread, or the variability of your data? You may think of the range, i.e., max-min. What if there are outliers due to measure- ment error. Will range reflect the true spread out? Statisticians tend to use the so-called sam- ple variance . By formula it is given by s 2 = n i =1 ( x i - ¯ x ) 2 n - 1 . Alternatively, a commonly used related quan- tity is the sample standard deviation , which is the square root of the sample variance: s = q s 2 . 6
Background image of page 6
As you can imagine, if the whole popu- lation is observed, the population variance and its standard deviation would be defined in the similar way. Statisticians tend to denote them by σ 2 . But keep in mind, these are usually not available, because the population is unmanageable. So they are parameters (or characteristics, as you may call ) that need to be estimated. Look at
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 8
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 06/06/2011 for the course STAT 515 taught by Professor Zhao during the Spring '10 term at South Carolina.

Page1 / 27

Lecture1 - Random sample Selection bias nonresponse bias measurement error common to design of experiments what is a good sample Intuitively we

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online