E7_Lecture23_F08_Statistics - E7: E7: INTRODUCTION TO...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: E7: E7: INTRODUCTION TO COMPUTER PROGRAMMING FOR SCIENTISTS AND SC ENGINEERS Lecture Outline 1 Histogram A histogram is a plot of frequency of occurrence of data values histogram is plot of frequency of occurrence of data values versus the values themselves. 2 [z,x] = hist(X): Histograms, data representation data representation Continuous random variable and probability density function (pdf) Normal pdf, mean and standard deviation • Aggregates vector data X into 10 evenly spaced bins between min(X) between min(X) and max(X). Returns: • Copyright 2007, Horowitz, Packard. This work is licensed under the Creative Commons Attribution-Share 2007 Horowitz Packard This work is licensed under the Creative Commons Attribution Alike License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA. E7 L23 z vector that contains bin frequencies x vector that contains bin center locations. E7 L23 (# of times that data X is within range the bin) • Histogram Example: E7 scores (E7 F08 overall scores) E7_scores F08 overall scores) load('E7_scores') N = length(E7_scores) N = 311 max_score max score = max(E7 scores) max(E7_scores) max_score = 100 min_score = min(E7_scores) min_score = 42.82 mu = mean(E7_scores) mu = 76.37 sigma = std(E7_scores) sigma = 11.11 3 Histogram A histogram is a plot of frequency of occurrence of data values histogram is plot of frequency of occurrence of data values versus the values themselves. Example: E7 scores (E7 F08 overall scores) E7_scores F08 overall scores) [z,x]=hist(E7 scores) (E7_scores) z = [3 7 20 25 41 50 69 58 30 8 ] x= [~46 ~51 ~57 ~63 ~69 ~74 ~80 ~86 ~92 ~97] 4 bin frequencies bin centers bar(x,z),xlabel(x),ylabel(z) μ= z 20 1 N 1 N ∑ i =1 N i =1 N Xi 60 40 σ 2 = ∑ (X i − μ )2 E7 L23 0 40 50 60 70 x 80 90 100 E7 L23 Histogram Histogram [z_15,x_15]=hist(E7_scores, 15 ); # of desired bins bar(x_15,z_15),xlabel('x'),ylabel('z')) 5 Histogram x = 46:5:100; 46:5:100; [z,x]=hist(E7_scores, vector of bin centers of bin centers x ) 6 z = [ 3 6 17 22 28 36 48 62 49 31 9] x = [46 51 56 61 66 71 76 81 86 91 96] [46 51 56 61 66 71 76 81 86 91 96] bar(x,z),xlabel('x'),ylabel('z') 70 50 40 30 z 20 z 60 50 40 30 20 10 0 40 50 60 70 x 80 90 100 10 0 46 51 56 61 66 71 x 76 81 86 91 96 E7 L23 E7 L23 Relative Frequency Histogram Relative Frequency Histogram = Hi Absolute frequency histogram frequency histogram Total area of that histogram Approximate probability density function p df = abolute frequency histogram #bins × total area of that histogram (max(y ) − min(y )) 46:5:100; x = 46:5:100; 46:5:100; [z,x]=hist(E7_scores,x) z = [ 3 6 17 22 28 36 48 62 49 31 9] x = [46 51 56 61 66 71 76 81 86 91 96] zrel = z/sum(z); bar(x,z_rel),xlabel('x'),ylabel('zrel') X = E7_scores; x = [z,x]=hist(X,x); pdf = (z/sum(z))*(length(x)/(max(X)-min(X)); bar(x,pdf),xlabel('x'),ylabel('pdf') 0.04 0.035 0.2 0.03 0.15 pdf 0.025 0.02 zrel 0.1 0.015 0.01 0.05 0.005 0 46 51 56 61 66 x 71 76 81 86 91 96 0 46 51 56 61 66 x 71 76 81 86 91 96 Approximate Approximate probability density function p df = abolute frequency histogram #bins × total area of that histogram (max(y ) − min(y )) pdf - continuous approximation decrease the bin width and increase the bin width and increase the number of measurements. The top of the rectangles in the scaled histogram often form scaled histogram often form a smooth bell-shaped curve Scaled Frequency Continuous approximation: 0.25 0.2 0.15 0.1 0.05 0 60 65 70 Height (in.) 75 80 function [pdf, x] = my_pdf(X,x) [z,x]=hist(X,x); pdf = (z/sum(z))*(length(x)/(max(X)-min(X))); X = E7_scores; x = 46:5:100; bar(x,my_pdf(X,x)),xlabel('x'),ylabel('pdf') yp 0.04 0.035 Gaussian (normal) probability density function (pdf) 0.03 0.025 0.02 0.015 0.01 p( x) = 46 51 56 61 66 x 71 76 81 86 91 96 0.005 0 1 e σ 2π − pdf ( x − μ )2 2σ 2 μ - mean σ - standard deviation E7 scores example X = E7_scores; x = 46:5:100; [pdf, x] = my_pdf(X,x) 11 E7 scores example bar(x,pdf),xlabel('x'),ylabel('pdf') hold on plot(x,gaussian(x),'--rs','LineWidth',2,... 'MarkerEdgeColor','k',... 'MarkerFaceColor','g',... 'MarkerSize',10) 12 muX = mean(X) ; stdX = std(X); 1 μ= N ∑ i =1 N Xi σ 2 1 = N ∑ (X i =1 N i − μ )2 pdf 0.04 0.035 0.03 gaussian = @(x)… (1/stdX/sqrt(2*pi))*exp( -((x-muX)/stdX).^2/2) E7 L23 − 1 p( x) = e σ 2π ( x − μ )2 2σ 2 0.025 0.02 0.015 0.01 0.005 0 46 51 56 61 66 x 71 76 81 86 91 96 E7 L23 E7 E7 scores example X = E7 scores; E7_scores; x = 46:5:100 [z,x]=hist(X,x) 0.04 0.035 0.03 0.025 pdf 0.02 0.015 0.01 13 Normal or Gaussian pdf − 1 p( x) = e σ 2π ( x − μ )2 2σ 2 μ - mean σ - standard deviation “adjust” mean hold off 0.005 0 muX = 81 μ = 10 46 51 56 61 66 71 76 81 86 91 96 σ = 1,2,3 x stdX = std(X) pdf = (z*length(x))/sum(z) /(max(X)-min(X)) figure(6), bar(x,pdf),xlabel('x'),ylabel('pdf') gaussian = @(x) (1/stdX/sqrt(2*pi))*exp( -((x-muX)/stdX).^2/2) hold on plot(x,gaussian(x),'--rs','LineWidth',4,... 'MarkerEdgeColor','k',... 'MarkerFaceColor','g',... 'MarkerSize',10) E7 L23 E7 L23 Matlab – randn function Generates a Gaussian random number with Gaussian random number with mean = 0 and variance = 1 Y = randn(1e4,1) [pdf,y]= my_pdf(Y,100) bar(y,pdf),xlabel('y'),ylabel('pdf‘) 0.5 0.4 0.3 pdf 0.2 0.1 0 -4 15 Matlab – rand function function 16 Generates a uniformly distributed random number uniformly distributed random number between 0 and 1. (mean = 1/2 and standard deviation = 1/2 and standard deviation Y = rand(1e5,1) [pdf,y]= my_pdf(Y,100) bar(y,pdf),xlabel('y'),ylabel('pdf‘) 1.4 1.2 1 0.8 pdf 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 y 0.6 0.7 0.8 0.9 1 1 / 12 ) -3 -2 -1 0 y 1 2 3 4 E7 L23 E7 L23 Mean Mean and Standard deviation of random variables • The mean (expected value or ) of a random variable mean (expected value or of random variable 17 Mean and Standard deviation of random variables Example: Uniform random variable between [0,1] Uniform random variable between [0 18 −∞ • The standard deviation of a random variable Mean: f x (x) μ= ∞ ∫ x p ( x)dx f or ⎧1 for 0 ≤ x ≤ 1 p(x) = ⎨ ⎩ 0 elsew here 1 0.8 0.6 0.4 0.2 0 0 pdf 0.5 x 1 1.5 2 ⎛∞ ⎞ σ = ⎜ ∫ ( x − μ ) 2 p ( x ) dx ⎟ ⎝ −∞ ⎠ 1/ 2 μ= E7 L23 ∞ −∞ ∫ x p ( x ) dx = ∫ 0 1 x dx = 1 2 E7 L23 Mean and Standard deviation of random variables Example: Uniform random variable between [0,1] Uniform random variable between [0 1 19 Mean and variance • Let X be a random variable with mean μX and standard deviation σX • let a and b be two constants. • Define a new random variable Y to be 20 f or ⎧1 for 0 ≤ x ≤ 1 p(x) = ⎨ ⎩ 0 elsew here Standard deviation: 0.8 0.6 0.4 0.2 0 0 pdf f x (x) 0.5 x 1 1.5 2 σ ∞ 2 = −∞ ∫ (x − μ ) ∞ 2 p ( x ) dx = −∞ ∫ x 2 p ( x ) dx − μ 2 Y = aX + b Then, the mean and standard deviation of Y are the mean and standard de of = ∫ 0 1 2 ⎛1⎞ = 1−1 = 1 x dx − ⎜ ⎟ 3 4 12 ⎝2⎠ 2 E7 L23 μY = a μ X + b σ Y =| a | σ X E7 L23 Example Example Generate a uniformly distributed random number between uniformly distributed random number between -4 and 4 X = rand(1e5,1); rand(1e5,1); Y = 8*X – 4; [pdf,y]= my_pdf(Y,100); bar(y,pdf),xlabel('y'),ylabel('pdf‘) 0.14 0.12 0.1 0.08 pdf 0.06 0.04 0.02 0 -4 -3 -2 -1 0 y 1 2 3 4 21 Linear combination of random variables • • Let X and Y be a random variable with means μX and μY and standard deviations σX and σY let a and b be two constants. 22 • Define a new random variable Z to be Z = aX + bY Then, the mean and standard deviation of Z are μ Z = a μ X + bμY 2 σ Z = ( a σ X ) 2 + ( bσ y ) 2 E7 L23 E7 L23 Summing uniformly distributed random variables 23 Summing uniformly distributed random variables 24 • Let X and Y be 2 uniformly distributed variables di between [0,1] • The random variable Z = X +Y • Let X and Y be 2 uniformly distributed variables di between [0,1] that are independent from each other other. X = rand(1,1e5); Y = rand(1,1e5); Z = X+Y; Z = X +Y 1.4 1.2 • is not uniformly distributed. However, μ Z = μ X + μY 2 σZ = = 0.5 + 0.5 = 1 = 1 1 + = 12 12 1 = 0.41 6 E7 L23 (σ X ) 2 + (σ y ) 2 [pdf,z]= my_pdf(Z,100); bar(z,pdf) mean(z) ans = 1.00 std(Z) ans = 0.41 1 F Z( x ) 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 x E7 L23 Summing Summing a very large number of independent random variables 25 Normal (Gaussian) pdf − 1 p( x) = e σ 2π ( x − μ )2 2σ 2 26 Z=2*rand(1e4,1)-1; (uniform between [-1,1]) between N = 100; for for i=1:NN Z= Z+2*(rand(1e4,1)-1); end Z = Z/std(Z); [pdf,z]= my_pdf(Z,100); bar(z,pdf) μ σ : mean mean : standard deviation 0.7 0.6 0.5 0.4 0.3 0.2 μ =0 σ =1 As N increases, the pdf of the resulting random variable Z gets closer to a Normal (Gaussian) pdf E7 L23 0.1 0 -4 -2 0 2 4 E7 L23 Normal or Gaussian pdf – X random variable Probability Interpretation: Interpretation: Pr ( μ - σ ≤ X ≤ μ + σ ) ~ 68.3% Pr ( μ - 2σ ≤ X ≤ μ + 2σ ) ~ 95.5% Pr ( μ - 3σ ≤ X ≤ μ +3 σ ) ~ 99.7% 99 Properties of continuous random variables • Given a random variable X with PDF p(x) random variable PDF p(x) 28 Pr ( X ≤ x) = more general: −∞ ∫ x p ( y ) dy Pr (a < X ≤ b) = ∫ p ( x ) dx a E7 L23 b = P( X ≤ b) − P( X ≤ a) Uniformly Uniformly distributed random variables • Uniform random variable between [0,1] random variable between [0 1 29 Gaussian distribution: Matlab - erf function The error function erf(x) is twice the integral from 0 to x error erf the integral from of the Gaussian distribution with 0 mean and 0.5 variance 30 ⎧1 for 0 ≤ x ≤ 1 p(x) = ⎨ ⎩ 0 elsew here 0.8 0.6 0.4 0.2 0 0 erf ( x ) = 0.5 f x (x) 2 π e − y dy ∫ 2 x 0 x 1 1.5 2 0.7 0.6 0.5 0.4 pdf .5 μ =0 σ= erf ( x ) P r(.25 < X ≤ .5) = . 25 ∫ dx = 0.25 E7 L23 1 2 0.3 0.2 0. 1 0 -4 -2 x≥0 −x 0 x 2 4 E7 L23 MatlabMatlab- erfc function The complementary error function erfc(x) is complementar error erfc 31 MatlabMatlab- erf function The error function erf(x) is the integral from -x to x of error erf the integral from of the Gaussian distribution with 0 mean and 0.5 variance 32 er f c ( x ) = 1 − er f ( x ) = 2 ∞ π −y ∫ e dy 2 x P r(- x < X ≤ x ) = erf ( x ) 0.7 0.6 0.5 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -4 -2 μ =0 1 σ= 2 μ =0 σ= erf ( x ) x≥0 4 0.4 0.3 0.2 0. 1 1 2 x≥0 4 −x 0 x 2 0 -4 -2 −x 0 x 2 E7 L23 E7 L23 MatlabMatlab- erfc function The complementary error function erfc(x) is complementar error erfc 33 MatlabMatlab- erfc function Gaussian distribution with 0 mean and 0.5 variance distribution with mean and variance 34 P r(| X |≥ x ) = erfc( x ) = 1 − erf ( x ) 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -4 -2 P r( X ≥ x ) = 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 1 erfc( x ) 2 μ =0 σ= 1 2 x≥0 μ =0 σ= 1 2 −x 0 x 2 4 0 -4 -2 −x 0 x 2 4 E7 L23 E7 L23 MatlabMatlab- erf function Gaussian distribution with 0 mean and 0.5 variance distribution with mean and variance 35 Scaling random variables • Let 36 P r( X ≤ x ) = 1 (1+ erf ( x ) ) 2 μ =0 σ= 1 2 X be a random variable deviation σ with mean µ and standard x≥0 • If Y= Y X −μ σ2 0 and 0.7 0. 6 0.5 0.4 0.3 0.2 0. 1 0 -4 -2 erfc( x ) 2 = erf ( x ) + = erf ( x ) + 1 ( erfc( x ) ) 2 1 (1-erf ( x ) ) 2 • Then is a random variable with mean • standard deviation erf ( x ) 1 2 −x 0 x = 2 4 1 (1+ erf ( x ) ) 2 E7 L23 E7 L23 MatlabMatlab- erf function For a Gaussian distribution with mean 37 MatlabMatlab- erf function Given a Gaussian distribution with mean Assume that x 38 µ and standard deviation σ a≥0 µ and standard deviation σ ⎛a⎞ P r(| X − μ |≤ a ) = erf ⎜ ⎟ ⎝σ 2 ⎠ ⎛a⎞ P r(| X − μ |≥ a ) = erfc ⎜ ⎟ ⎝σ 2 ⎠ 0.7 0.6 0.5 0.4 0.3 0.2 0. 1 0 -4 -2 ≥µ P r( X ≤ x ) = P r( X ≥ x ) = 1⎡ ⎛ x − μ ⎞⎤ ⎢1 + erf ⎜ ⎟⎥ 2⎣ ⎝ σ 2 ⎠⎦ 1 ⎛ x−μ ⎞ erfc ⎜ ⎟ 2 ⎝σ 2 ⎠ For erf ( a σ 2 ) μ 0.7 0.6 0.5 0.4 0.3 0.2 0.1 erf ( |x−μ | ) σ2 μ x = μ +a a≥0 x≥μ -2 −x μ x 0 2 4 0 -4 E7 L23 −x μ x 0 2 4 E7 L23 Example - Height data for 100 men 20 years of age 0.25 Scaled Frequency 0.2 0.15 0.1 0.05 0 60 65 70 Height (in.) 75 80 Example Example - Height data for 100 men 20 years of age Assume the data is normally distributed the data is normally distributed μ = 69.3, σ = 1.96 in. Estimate the probability of a men being taller than 75 in. 75 − 69.3 1 √ Pr {X ≥ 75} = erfc 2 1.96 2 à ! = 0.0018 Mean and Standard deviation: μ = 69.3 69 σ = 1.96 in. Estimate the probability of a men being within 3 in. of the mean the probability of men being within in of the mean à Pr {|X − 69.3| ≤ 3} = erf 3 √ 1.96 2 ! = 0.8741 Summary Summary •Histograms: Absolute frequency, relative frequency, scaled frequency histogram Hist(y),hist(y,n),hist(y,x),bar Hist(y),hist(y,n),hist(y,x),bar •Probability Continuous scaled frequency histogram; Theoretical probability probability density function (pdf) •Normal Distribution Gaussian Function, Error Function •Random Number Generation Uniformly Distributed Random Numbers: rand Distributed Random Numbers: rand Normally Distributed Random Numbers: randn ...
View Full Document

Ask a homework question - tutors are online