lecture6 - ISYE 2028 A and B Lecture 6 Canonical Continuous...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
ISYE 2028 A and B Lecture 6 Canonical Continuous Random Variables and some brief results Dr. Kobi Abayomi February 10, 2009 1 Introduction - The Normal Distribution, Normal Data Remember that continuous data is data that can assume an uncountable number of values. We cannot list (in a frequency table , for example) the possible values of a continuous variable – the list would be infinitely long. If you think for just a second you can probably (that’s supposed to be funny!) convince yourself that the relative frequency of any one value of a continuous variable is very low when there are many observations. For a continuous variable we only determine the relative frequency for a range of values (Remember the histogram and how we bin values for discrete distributions). 1.1 What’s normal? Let’s use a toy example: say we have the discrete distribution of clown shoe sizes. Let’s say this is the distribution. 1
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Shoe Size Count Relative Frequency Cumulative Relative Frequency 17 1 1/7 1/7 18 1 1/7 2/7 19 1 1/7 3/7 20 2 2/7 5/7 22 1 1/7 6/7 24 1 1/7 7/7 These data are discrete and that the relative frequency sums to one. Let’s look at an illustration of the distribution of this data: Histogram of Observed Shoe Size x Density 17 18 19 20 21 22 23 24 0.00 0.05 0.10 0.15 0.20 0.25 Figure 1 Histogram of observed shoe size. Remember that the area of the histogram sums to one when the y-axis is density (2 * . 30 + 3 * . 14 1) . Notice how the observed data are binned. For this example, we’ll use R : x<-c(22,24,19,17,20,18,20) hist(x,breaks=7,col="red",density=20,main="Histogram of Observed Shoe Size",freq=FALSE) mean(x); mean(x^2) The mean of our observed data, the shoe sizes is μ = 20 What if, for some reason (indulge this example), we could only observe some of the data - never all of it at once. Say we can take only four observations at a time. 2
Background image of page 2
For example say we observe x 1 = { 22 , 19 , 18 , 20 } . An estimate of the mean — or sample mean, x = n - 1 x i — based on this subsample is x 1 = 19 . 75. If we could not observe all of the data at once, we might want to resample , again and again, and get x 2 , x 3 , x 4 , ...etc.,etc. It is reasonable to expect the mean of these resampled means to be very close to x = 20. As we take more and more resamples we can say that, almost surely, the mean of the sample means is 20. Let’s illustrate this using R samplesize<-4 #Here I’m going to take a sample of size 4 from #the clown shoe sizes numsamples<-12500 #Here I’m going to take a lot of these #samples. Over and over and. ... xmatrix<-matrix(0,nrow=numsamples,ncol=samplesize) #first a place to put all #of these samples so we can compute means #and graph the results for(i in 1:numsamples){ xmatrix[i,]<-sample(x,samplesize) } #a simple loop to take the sample #and put it in our new data table hist(apply(xmatrix,1,mean),breaks=20) #a graph of the results These resampled means, which are now continuous variables 1 have a distribution of their
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 18

lecture6 - ISYE 2028 A and B Lecture 6 Canonical Continuous...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online