Stats 202 - Basics of the Course The course is taught...

Info icon This preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Basics of the Course The course is taught using powerpoint. These lecture notes WILL change as the term progresses. Special Slide Pictures Data Notation Denote data by x1, x2 , x3, … xn where n is the number of data values we have, called the sample size. The collection of x1, x2 , x3, … xn is called a dataset whereas a particular value xi is called a datum, data value or observation. Example (Spoof) http://ca.youtube.com/watch?v=MQw12_kNAhU&feature=related Example Description This data set gives the average heights and weights for American women aged 30–39. Obs 1 2 3 4 5 height weight 58 115 59 117 60 120 61 123 62 126 Data Types NOTE: A quantitative variable can be made qualitative…we’ll see in a second… Examples with Clickers Height 2) Grades Clicker Responses: A) Qualitative B) Categorical C) Quantitative D) Both A and B 1) Quantitative Data Examples with Clickers 1) Height 2) Number of Cats Owned by Canadians Clicker Responses: A) Discrete B) Continuous C) Both D) Niether Qualitative Data Examples with Clickers Body Size: Skinny, Normal, Obese 2) Type of Stone: Granite Clicker Responses: A) Discrete B) Continuous C) Both D) Neither 1) Analysis Raw data is hard to analyse. For example, consider the Ph values below. Remember a Ph of 7 is neutral, a Ph <7 acidic and a Ph > 7 basic. Are the 100 lakes below acidic, basic or neutral? Data 5 6867887868677557 98 7 6 8 8 8 5 6 8 8 5 7 6 7 8 8 4866666788577578 7866768868877876 10 6 8 6 6 8 6 6 7 8 7 6 8 7 6 7 7765787776687777 87 Well??? Dataset Characteristics 3 Characteristics 1. 2. 3. Analysis Two Techniques: 1. 2. Example Dataset Consider the following data: 1,2,2,3,3,3,4,4,4,4,5,5,5,6,6,7 We can build a display simply by ticking off every time we see a number. 1,2,2,3,3,3,4,4,4,4,5,5,5,6,6,7 Center Rough Definition – The middle of the data Pictorially - Spread Rough Definition - How separated our data values are. Shape The appearance of the data. Shape The shape of a dataset can be determined numerically using measures such as Kurtosis and Skew – but we will not investigate these statistics in this course. Center There are 3 measures of center: A) B) C) Mode The most popular value. Also the most useless statistic. e.g. 1,1,1,2,20 Mean You would call it an “average”. Notation: Data: Mean: Example Consider the data: 1, 1, The mean is: 1, 2, 20 Median The middle value of the data. Notation (Median): Notation (Sorted Data): Algorithm: Median Given the data: x1, x2 , x3, … xn. 1. Sort the data from smallest to largest. 2. If n is odd, then take the middle value. 3. Else if n is even, take the average of the middle 2 values. Example 1. Sort 2. n odd = middle n even = average 1,1,20,2,1 Example 1,1,20,3,1,4,12,2 What is the median? A) 1 B) 1.5 C) 2 D) 2.5 E) 3 Example 1. Sort 1, 1, 1, 2, 3, 4,12, 20 2. n odd = middle n=8 n even = average Q2 = Outliers Outliers are values that are more extreme than the others. For example: 1, 2, 3, 4, 1000 For example: -0.8, -11, 0.1, -0.6, 1, 0.3, -0.9 Summary: 1,1,1,2,20 Mode 1 Mean 5 Median 1 Question Why is the mean different from the median and mode? Order Statistics The median is called the “second quartile”. This implies there are “other” quartiles. A quartile derives it's name from quarter and each quartile divides the data into quarters. Pictorially In Words 25% of our data is below Q1, the first quartile. 50% of our data is below Q2, the second quartile. 75% of our data is below Q3, the third quartile. Algorithm: Q1 1. Perform the Median Algorithm. 2. Remove all datum above the median. 3. Perform the Median Algorithm on the remaining data. 4. This is the middle of the lower half of the data, the first quartile. Example Given the data: -0.8, -11, 0.1, -0.6, 1, 0.3 1. Sort it -11.0, -0.8, -0.6, 0.1, 0.3, 1.0 2. RECALL Dataset Characteristics 3 Characteristics 1. Center 2. Spread 3. Shape Spread There are several ways in which we can calculate spread: 3. 4. 1. 2. 5. Range The range gives the distance between the largest and smallest values. Formula in Words: Formula with Notation: Interquartile Range The interquartile range gives the distance covered by the middle 50% of the data. Formula: Data: Which dataset has more spread? A) 1 B) 2 C) 3 D) 1 = 2 E) none of the above Data 1: 1, 2, 3 Data 2: 1, 1, 1, 2, 3, 3, 3 Data 3: 100, 100.5, 101, 101.5, 102 Range Calculation Data 1: 1, 2, 3 Data 2: 1, 1, 1, 2, 3, 3, 3 Data 3: 100, 100.5, 101, 101.5, 102 IQR Calculation Data 1: 1, 2, 3 Data 2: 1, 1, 1, 2, 3, 3, 3 Data 3: 100, 100.5, 101, 101.5, 102 Standard Deviation In words: the standard deviation is approximately the average distance the data values are from the center. Formulas 1. Not nice for Calculation, but great for interpretation. Formulas 2. Useful for calculation but NOT interpretation. Formulas 3. Another one! Useful for calculation but NOT interpretation. 3 Formulae Example Consider the data 1, 2, 3. Calculate the st. dev. Example Given: 10 ∑i 10 2 i x i=1 0 ; ∑ i x =1 5 0 0 Calculate the standard deviation: Example Given: 10 ∑i 10 2 i x i=1 0 ; ∑ i x =1 5 0 0 Calculate the standard deviation: Interpretation Deviation Definition in words: Definition numerically: Standard Deviation The standard deviation is approximately, the average deviation. Why approximately???? Deviation Example Consider the data: 1, 2, 3 Calculate the average deviation: Clicker Question Pick 3 numbers. Calculate the average deviation. The answer is: A) 0 B) >0 C) <0 D) I just want the clicker mark. E) None of the above. What's the problem!!!??? How do we correct it????? Other Issues But…. 1.Square rooting doesn’t undo squared terms! Example: √(12+22+32) ≠ √12+ √22 + √32 2. Because of “1”, our value for s is too small, so we divide by n-1 instead of n. n vs n-1 Degrees of Freedom n-1 is called the degrees of freedom. Another way of thinking about degrees of freedom: Suppose I gave you n data values at random. They are “free” to be whatever I want them to be. Degrees of Freedom Continued Now, instead of n data values, I give you n-1 values + the average. Is that last data value, the nth, “free”? Range Vs. Standard Deviation (Typical Plot) Standard Deviations minimum Center maximum Maximum - Mininmum = 6s Which means.... s = Range/6 Note: Sometimes it is not 6 but 4 or another constant...this depends on the data. Interpretation For a set of data, the standard deviation is 5. Is this big, small or uncertain? A) Big B) Small C) Uncertain Interpretation and Units Variance The variance is merely the square of the standard deviation. Notation: Coefficient of Variation (CV) Formula: Interpretation/Use: Example The length of fish Riley catches m's on Monday: 1, 2, 3 In cm's on Tuesday: 100, 200, 300 Surface Investigation Monday Tuesday Which has the greatest spread? A) Monday B) Tuesday C) Neither Answer Units The standard deviation, mean, mode, median all have the same units as the data. The variance, which is equal to standard deviation squared has units squared. Graphical Techniques In addition to numeric techniques, we have graphical techniques that can be used to analyze data. These graphical techniques include boxplots, dot plots etc… Example Dataset Consider the following data: 1,2,2,3,3,3,4,4,4,4,5,5,5,6,6,7 We can build a display simply by ticking off every time we see a number. 1,2,2,3,3,3,4,4,4,4,5,5,5,6,6,7 Dotplots A dot plot is similar to this tick mark game that we've played since children. Each data value is plotted and replaced by a point. Hence the data 1,2,3 would look like: 1 2 3 Dotplots with Repeats For a single set of data we may be interested in the repeats. In such a case we may draw a dot for every repeat. Eg. 1,1,2,3 1 2 3 Example: Soybean What can you see with this plot?? Frequency Distribution Example Example: Who is your favourite actor? A) Brad Pitt B) This guy C) Angelina Jolie D) Her E) Someone else/don't want to answer Frequency We build bars which have a height equal to the frequency with which a response occurs. Non-Categorical Data If our data is not categorical, we first build intervals for the data. Intervals are created subjectively but should all be the same size. The x axis contains the intervals while the y is the frequency. Example: Grades What is your Calculus 1 grade? A) 85% to 100% B) 70% to 85% C) 55% to 70% D) 40% to 55% E) Prefer not to say. Intervals These intervals are chosen subjectively. I could have chosen any set. I did try to chose them to make them all the same size. Clicker Questions The shape is: A) Bell B) Skewed left C) Skewed right D) uniform (flat) E) none of the above Clicker Questions The center is: A) 576 B) 578 C) 579 D) 581 E) none of the above Relative Frequency Example We divide each freqency by n. The plot is otherwise the same. Example 0 .2 0 0 .1 5 0 .1 0 0 .0 5 0 .0 0 D e n s ity 0 .2 5 0 .3 0 0 .3 5 D e p th o f L a k e H u r o n in F e e t 1 8 7 5 - 1 9 7 2 575 576 577 578 579 L a k e H u ro n 580 581 582 Clicker Question What is the proportion of times that lake Huron was less than 578 feet deep? A) 10% B) 12% C) 24% D) Not able to say. Boxplots Unmodified Boxplot Min Q1 Q2 IQR=Q3-Q1 Range = Max - Min Q3 Max Recall: Outliers Outliers: Data values that are more extreme (larger or smaller) than the others. E.g. 1,1,2,2,3,3,4,4,5,5,6,6,25 Finding Outliers What is an outlier mathematically? Obviously from the data above the number 25 is suspect. Any value that is: Less than the lower limit: LL=Q1-1.5(IQR) Greater than the upper limit: UL= Q3+1.5(IQR) Why 1.5 times?? Math to Prove 25 is an Outlier 1,1,2,2,3,3,4,4,5,5,6,6,25 Example Continued 1,1,2,2,3,3,4,4,5,5,6,6,25 Example Continued 1,1,2,2,3,3,4,4,5,5,6,6,25 Modified Boxplot Unless stated otherwise I am asking about the modified boxplot! The difference: The upper whiskers are either the maximum or the closest point below the UL to the center. The lower whiskers are either the minimum or closest point to the LL, which ever is closer to the center. Modified Boxplot Q1 Q2 outlier IQR=Q3-Q1 Range = Max - Min Q3 Example Using: 1,1,2,2,3,3,4,4,5,5,6,6,25 Boxplots and Shape • The box (Q1 to Q3) gives a good indication of the shape of our data. » A » »C B Boxplot A is: A) Symmetric (Bell) B) Skewed left C) Skewed right D) Uniform (flat) E) None of the above. Boxplots and Shape • The box (Q1 to Q3) gives a good indication of the shape of our data. » A » »C B Boxplot B is: A) Symmetric (Bell) B) Skewed left C) Skewed right D) Uniform (flat) E) None of the above. Stem And Leaf Plots Loss of Information Individual data values are lost when we draw a boxplot, histogram, dot plot etc… The Stem and Leaf plot attempts to counter this issue. Example: Problem: Measurements of the annual flow of the river Nile at Ashwan 1871–1970. Plan: Not relevant. Data 1120 1160 963 1210 1160 1160 813 1230 1370 1140 995 935 1110 994 1020 960 1180 799 958 1140 1100 1210 1150 1250 1260 1220 1030 1100 774 840 874 694 940 833 701 916 692 1020 1050 969 831 726 456 824 702 1120 1100 832 764 821 768 845 864 862 698 845 744 796 1040 759 781 865 845 944 984 897 822 1010 771 676 649 846 812 742 8011040 860 874 848 890 744 749 838 1050 918 986 797 923 975 815 1020 906 901 1170 912 746 919 718 714 740 Stem and Leaf Plot Parts The decimal point is 2 digit(s) to the right of the | 4|6 5| 6 | 5899 7 | 000123444455667778 8 | 000011222233344555556667779 9 | 0011222244466678899 10 | 0122234455 11 | 00012244566678 12 | 112356 13 | 7 Stem and Leaf Plot Example The decimal point is 2 digit(s) to the right of the | 4|6 5| 6 | 5899 7 | 000123444455667778 8 | 000011222233344555556667779 9 | 0011222244466678899 10 | 0122234455 11 | 00012244566678 12 | 112356 13 | 7 Stem and Leaf Plot What do you notice???? The decimal point is 2 digit(s) to the right of the | 4|6 5| 6 | 5899 7 | 000123444455667778 8 | 000011222233344555556667779 9 | 0011222244466678899 10 | 0122234455 11 | 00012244566678 12 | 112356 13 | 7 Parts 1) Legend: “The decimal point is 2 digit(s) to the right of the |” a) This tells me that the numbers are 4|6=460. b) If it had said “2 digit(s) to the LEFT of the |” then 4| 6=0.046 2) Stem is the part to the left of “|” 3) Leaves are the parts to the right of the “|” 4) Each leaf represents a data value. Hence we have 6 data values starting with 12. Example Measurements of vein diameters were taken on 100 patients. The following stem and leaf plot was obtained. Example Continued The decimal point is 2 digit(s) to the left of the | 32 | 78 33 | 224 33 | 5577777899 34 | 0000011111233333444 34 | 5566666678888888999 35 | 0001111111122223344 35 | 5555677788889999 36 | 0112244 36 | 56678 Based on the Legend 32|1 Means: A) 321 B) 32.1 C) 3201 D) 3.21 E) None of the above The decimal point is 2 digit(s) to the left of the | 32 | 78 33 | 224 33 | 5577777899 34 | 0000011111233333444 34 | 5566666678888888999 35 | 0001111111122223344 35 | 5555677788889999 36 | 0112244 36 | 56678 What do you notice that is interesting about the stems??? Why was this done?? The decimal point is 2 digit(s) to the left of the | 32 | 78 33 | 224 33 | 5577777899 34 | 0000011111233333444 34 | 5566666678888888999 35 | 0001111111122223344 35 | 5555677788889999 36 | 0112244 36 | 56678 Example: Problem: Does the stress of machinery affect the ability of a soya plant to grow? Further, does the amount of light influence it’s ability to grow? Plan: 52 seeds were potted with one seed per pot. The 52 seeds were randomly divided into 4 samples with 13 seeds per sample. The seeds in 2 samples were stressed by being shaken for 20 minutes daily, while the seeds in the other two were not shaken (no stress). The two samples that received the same exposure to stress were grown under different levels of light. Thus the four samples of plants were allocated to one of 4 treatments that were defined by 2 basic treatments, stress and light. Data: ln ly mn my 264 235 314 283 200 188 320 312 225 195 310 291 268 205 340 259 215 212 299 216 241 214 268 201 232 182 345 267 256 215 271 326 229 272 285 241 288 163 309 291 253 230 337 269 288 255 282 282 230 202 273 257 Analysis: Under which conditions would you want to grow your Soybeans? A) B) C) D) Moderate Light, Stress Low Light, Stress Moderate Light, no stress Low light, no stress Example 2 - View Article From: Medical Article http://www.amstat.org/publications/jse/v11n2/datasets.heinz.html Problem: To investigate the human body. Plan: Measure the items shown at left on males and females. Data: Measurements of 247 men & 260 women Analysis: See article on last slide. Is is possible for the Biacromial Measurement of a particular female exceeds that of a particular male? Yes B) No C) zzzzzzz A) Probability We can define probability in 3 ways. Subjective Relative frequency Mathematical / classical Subjective Based on intuition we guess what the probability is. i.e. There’s a 99% chance I’ll pass! Subjective Adv: Disad: Relative frequency The probability of something happening is the number of times it occurs divided by the # of attempts. e.g. Coins Pretend everyone in class is using the same coin. Flip it. What did you get?? A) Heads B) Tails Question Will you write the quizzes more than once even if you got 100% on the first try? A) Yes B) No Relative Frequency Adv: Disad: Classical Experiment A theoretically repeatable process or phenomenon e.g. Trial e.g. One repetition of an experiment Classical ctd. Outcome The result of our experiment. Also called a “simple” event We use capital letters to denote outcomes e.g. A Classical continued Compound Event: e.g. If an event A is made up of more than one “simple event” Classical Ctd Universe or Sample Space: The collection of all outcomes of an experiment. We denote it by “S”. e.g. Review An outcome might be A = roll a one An event might be, get an even #, B = {2, 4, 6} The size of an event/sample space is the objects/simple events in it. We size by |B| e.g. B = {2, 4, 6} |B| = 3 # of denote the Probability Let E be an event containing |E| simple outcomes. Let S be the sample space with |S| simple outcomes. Then the probability E occurs is Pr(E)=|E|/|S| Example 1. What is the probability of getting a head on a coin? e.g. A biologist classifies a colony of wild baboons by fur colour. E = having light-coloured fur Of 150 animals observed, 5 are light-coloured P (light-coloured fur) = Example In a genetic experiment brown rabbits are crossed with black rabbits. As a result, of the 44 progeny, 13 are brown and 5 are black. The remainder are mottled (various colours). What is the probability you select a mottled rabbit? Properties of Probabilities 1. 2. Properties of Probabilities 3. 4. Properties of Probabilities 1 0 P)1 ) ≤( ≤ E 2)P ( E ) = 0 ≡ E never happens 3 P = E ah e ) ( 1 a ya n E ≡ l sps wp ) 4 EEK ,E representssimple events mutually ) 12 , , m exclusive all possible and P1+(2+ +(m= EE E ( ) P )K P )1 Leading Questions What if… we want to know the probability we select either a brown OR mottled rabbit? We want to know the probability that in 2 tries we select a brown AND a mottled rabbit? Symbol 1 “OR” Notationally we write: In words we mean: Symbol 2 “AND” Notationally we write: In words we mean: Symbol 3 “Not” Notationally we write: In words we mean: Venn Diagrams A Venn diagram is a pictorial representation of our probability The box is the sample space. e.g. A circle within the box denotes a probability for an event. e.g. Mutually Exclusive Two events are mutually exclusive (ME) if they have no outcomes in common or cannot occur together. e.g. ME Events: e.g. Not ME Events: Clicker ME Is the event “Person wears glasses” mutually exclusive from the event “Person has freckles”? A) Yes B) No C) Uncertain Mutual Exclusion Are the events A = Roll a one on dice 1; B = Roll a one on dice 2; mutually exclusive (ME)? A) Yes B) No Venn Diagram In the following Venn diagram, the square represents the…(best answer) A) B) C) D) Event Simple Event An Outcome Sample Space ME ME and VENN Diagrams If two events are ME or disjoint, the circles are also disjoint. e.g. Hence P o) P)P) r r =( + ( . ( Br A Ar B Or in terms of our notation: ME and VENN Diagrams If two events are not ME, they overlap: e.g. Hence P o ) P ) P) P B r r =( +( −( ) ( Br A Ar Br A Proof by Picture: P o ) P ) P) P B r r =( +( −( ) ( Br A Ar Br A Example Problem: To investigate Seal pup fur colour. Plan: Pups Categorized by Coat Colour and Sex Data Sex Colour Male Female Total Yellow 25 10 35 Thin White 10 5 15 Fat White 25 5 30 Grey 15 5 20 Total 75 25 N = 100 Notation Let G denote Grey. Let Y denote Yellow. Let M denote Male. Let W denote White. Let T denote Thin. Question 0 What is the probability a pup is not thin and white? Sex Colour M F Total Y 25 10 35 TW 10 5 15 FW 25 5 30 G 15 5 20 Total 75 25 N= 100 Question 1 What is the probability a coat is Yellow? Sex B) 10/100 C) 35/100 D) 25/75 M F Total Y A) 25/100 Colour 25 10 35 TW 10 5 15 FW 25 5 30 G 15 5 20 Total 75 25 N= 100 Details Details.... Sex Colour M F Total Y 25 10 35 TW 10 5 15 FW 25 5 30 G 15 5 20 Total 75 25 N= 100 Question 2 What is the probability a coat is Yellow or Grey? Sex Colour M F Total Y 25 10 35 TW 10 5 15 B) 40/100 FW 25 5 30 C) 55/100 G 15 5 20 D) 40/75 Total 75 25 N= 100 A) 25/100 E) None of the Above Details Details.... Sex Colour M F Total Y 25 10 35 TW 10 5 15 FW 25 5 30 G 15 5 20 Total 75 25 N= 100 Question 3 What is the probability a randomly selected pup is yellow and male? A) 85/100 B) 75/100 C) 35/100 D) 25/100 Sex Colour M F Total Y 25 10 35 TW 10 5 15 FW 25 5 30 G 15 5 20 Total 75 25 N= 100 Details Details.... Sex Colour M F Total Y 25 10 35 TW 10 5 15 FW 25 5 30 G 15 5 20 Total 75 25 N= 100 Question 4 Are the events yellow and male ME? Sex Colour M F Total A) Yes Y 25 10 35 B) No TW 10 5 15 C) Can't say FW 25 5 30 G 15 5 20 Total 75 25 N= 100 Question 4 - Start What about Yellow OR male?? What is the probability a randomly Colour selected pup is yellow Y OR male? Sex M F Total 25 10 35 TW 10 5 15 FW 25 5 30 G 15 5 20 Total 75 25 N= 100 Details, Details… What about Yellow OR male?? W...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern