{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Notes_1_-_3_with_comments

Notes_1_-_3_with_comments - CHAPTER 1 Statisticians fall...

Info icon This preview shows pages 1–19. Sign up to view the full content.

View Full Document Right Arrow Icon
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 4
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 6
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 8
Image of page 9

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 10
Image of page 11

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 12
Image of page 13

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 14
Image of page 15

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 16
Image of page 17

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 18
Image of page 19
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CHAPTER 1 Statisticians fall asleep faster by Taking a random sample of sheep. Chapter 1: Data and Distributions Data Example: Determine the nature of the "unusual episode" whose data appears below. You are permitted to ask me "yes/no" questions suggested by the data. Data Set: Population at Risk and Death Rates for an Unusual Episode [The article associated with this dataset appears in the Journal of Statistics Education, Volume 3, Number 3 (November 1995).] Tables: Population at Risk, Deaths, and Death Rates for an “Unusual Episode” By Economic Status and Sex Population Exposed Number of Deaths per 100 to Risk Deaths Exposed to Risk Economic ------------------------------------------------------------ Status Male Female Both Male Female Both Male Female Both 1(high) 180 145 325 118 4 122 65 3 37 II 179 106 285 154 13 167 87 12 59 III 510 196 706 422 106 528 83 54 73 Other 862 23 885 670 3 673 78 13 76 Total 1731 470 2201 1364 126 1490 80 27 67 By Economic Status and Age Population Exposed Number of Deaths per 100 to Risk Deaths Exposed to Risk Economic ————————————————————————————————————————————————————————————— Status Adult Child Both Adult Child Both Adult Child Both I(high) 319 6 325 122 O 122 38 O 37 II 261 24 285 167 O 167 64 0 59 III 627 79 706 476 52 528 76 66 73 Other 885 O 885 673 O 673 76 76 Total 2092 109 2201 1438 52 1490 69 48 67 Section 1.1: Populations, Samples, and Processes Definition: Statistics is the discipline concerned with the optimal acquisition (garbage in = garbage out) and analysis of data in order to model a population or process. “T he long-range contribution of statistics depends not so much upon getting a lot of highly trained statisticians into industry as it does in creating a statistically minded generation of physicists, chemists, engineers, and others who will in any way have a hand in developing and directing the production processes oftomorrow — W. A. Shewhart & W. E. Deming Definition: Target population — the set (actual or conceptual) of all entities of interest in a survey or study; the group about which the statistician wishes to draw conclusions. - Different surveys have different target populations. - The target population must be clearly defined for a sample to be drawn from it; including time frame. Definition: Variable - characteristic of interest from the population (e.g., height, color, response) which varies across the entities in the population. Definition: Observation —~ values of a variable of interest for a given entity. Definition: Census - attempt to acquire data on every entity in the target population. Example 1. When I go up for tenure, the promotion and tenure committee need feedback about my teaching. For this study, a possible target population is: W ’ltgié Sl‘ide/llfi J» [tit/<3 7éttjl4+/WJP’/4clz<l wfifia ‘3 he” e “1+, {£055. Branches of Statistics: Descriptive and Inferential Statistics Descriptive Statistics: Graphical and numerical techniques for describing or summarizing data that capture the essence of the data. 0 Some numerical summaries for describing data include: Mean, median, mode, maximum, standard deviation, variation, correlation coefficient (bivariate data) 0 Some graphical methods for displaying data include: Dotplot, stem-and-leaf diagram, histograms, time series plots (time ordered data) We can summarize the relevant information in a data set by determining what data are present and how often they occur. This representation is the distribution of the data. Definition: The distribution of a data set provides the following information about a set of numbers: 1. the unique numbers which appear in the set, and 2. how often each number appears in the set. You can read more about distributions in the optional notes entitled “Distributions and the IID Assumption” in the TMI folder. Example 2. (Data taken from site http://www.shodororg/interactivate/activities/boxplot/). Only 1 variable of interest: Average gas mileage for year 2000 cars by size. 49, 49, 45, 45, 41, 38, 38, 38, 40, 37, 37, 34, 35, 36, 35, 38, 38, 32, 32, 32, 37, 31, 32, 31, 32, 30, 30, 32, 30, 30, 29, 28, 29, 29, 29, 30, 28, 27, 29, 30, 28, 27, 28, 27, 27, 29, 29, 29, 26, 27, 25, 25, 25, 25, 25, 25, 25, 26, 26, 27 Frequency and Relative Frequency Distribution of the Data: Relative Fre- uenc Definition: Inferential statistics are techniques used to make inferences about a population from a sample of that population. Population versus Process Data Definition: Population data: Data in which the order in which the observations were collected does NOT matter! Definition: Process data: Data in which the order in which the observations were collected does matters! The order provides additional information that we lose by representing it as a distribution. Example 3. Population versus process data. (a) The height of all students in this classroom. (b) The total number of minutes we meet each day in class. (c) The number of ounces of peanut butter taken from ajar off an assembly line every hour. (d) The number of books each student in this class bought for this quarter’s classes. (e) A student’s test scores throughout the quarter. Univariate, Bivariate, and Multivariate data Univariate data: Observations on a single variable Example 4. Number of trick-or—treaters at my door at various times this past Halloween Number of trick or treaters by 10 minute intervals 6:00 to 6:10 --10 6: 10 to 6:20 -- 15 Frequency distribution for the # of trick-or-treaters per 10 minutes: 6:20 to 6:30 -- 25 6:30 to 6:40 -— 25 6:40 to 6:50 —— 28 6:50 to 7:00 -- 30 7:00 to 7:10 —- 28 7:10 to 7:20 —- 21 7:20 to 7:30 —— 16 7:30 to 7:40 -- 15 7:40 to 7:50 -- 16 7:50 to 8:00 —- 11 8:00 to 8:10 -- 10 8:10 to 8:20 —- 5 Isthisprocessdata? KW (1L6 430/1 (10,64555n) (/pr #(3 ,5 ‘P/occ’iS QLJI Bivariate data: Observations are made on two variables Example 5. For each trick-or-treater at my door on Halloween, I recorded their age and weight: Age (years) Weight (pounds) 5 50 Frequency distribution for the age of trick-or—treaters: 6 39 18 138 2 24 4 40 68 110 28 156,“. Multivariate data: Observations are made on more than two variables Example 6. Collecting data on variables that affect the price of a house: Number of bedrooms Age of house (yrs) Size of yard (acres) Pool? 5 15 1.03 No 4 10 0.33 No 3 2 1.5 Yes Section 1.2: Visual Displays for Univariate Data Read text pages: 8-16 (up to Histograms for Unequal Class Widths), and 17 (Histogram Shapes) - 18 Recall: two major branches of statistics 0 Descriptive Statistics 0 Descriptive statistics describe the data in your population or sample 0 The most common descriptive statistics provide information about a sample's central tendency (mean, median, mode) and variability (variance, standard deviation, range). 0 Inferential Statistics 0 Inferential statistics are techniques for drawing reasonable inferences about populations based on samples from the population. Types of variables 0 Qualitative data or variables have only categories or qualities 1. Car type: Saturn, Ford, Chevy, BMW... 2. Customer satisfaction: Excellent, Good, Fair, Poor, Horrible 3. Classification of a bolt meeting a length requirement: Acceptable or Not Acceptable 0 Quantitative data or variables have numerical measures 1. Daily temperature of my office: 68° F, 72° F, 800 F 2. Time for Bounty to soak up a milk spill with a 1.5” radius: 30.0 sec, 15.3 sec, 13.2 sec 3. The length ofa bolt measured in centimeters: 12.6, 12.9, 13.4, 12.3, 13.6, 13.5, 12.6 Graphs for displaying QUALITATIVE DATA: 0 Pie Chart, Bar Chart — pick up any issue of USA Today and you’ll probably see at least one of these! Example 1. Using a pie chart versus a bar chart to display qualitative data. Which do you prefer? if Numbaofemais h hbox on 11/8, 11/9, 11/10, 11/11, 11/12 Number ofemalls In lnboxon 11/8~11/12 Cannery I 11/8 100 [I 11/9 I 11/10 E5 11/11 I 11/12 dekhhbox 20 11/8 11/9 11/10 11/11 11/12 Date Going off on a tangent for a minute Anytime throughout the quarter, bring me or email me a bad graphic example for a bonus point! JOB SECURITY Asmara; nhnw‘ m: “4,3: a 15-2 rt" Percjn'age Z‘ma Pg .tvsu. ‘ I": rt). L, GNP change per person (SUS) 1 990-1 997 - IE1 Swat: '- businesses For donor nations 00A change per person (SUB) 1990—1997 for donor nations 1 W W0 ".". I71“!- '1! .BJL“ 5." MM! M E'J‘. mam u 0' d , i Aware 12:13 1". .t. .d . g munch“ ~' "1‘” ' lg» Caru- :W M l 512* v ., 4 “- 7 q pt MW. ‘1 L ~ .mru' . car w‘Umf + 220% + 479 % l “i g +202% i I mean £131! wired an: wumw a»... 3km.» Clwmfm my um: 8 Graphical methods that we will consider for displaying QUANTITATIVE DATA: Dot diagrams, histograms, stem-and—leaf plots, box plots The method used is determined by the type of data and the idea to be presented with the data. 0 Dot Plots (or Dot Diagrams) represent each observation by a dot on a numerical axis Example 2. 12 measurements on the strength of paper to be used in cardboard tubes (in pounds) 163 145 165 170 155 168 163 201 179 139 14D 150 150 171] 180 191] 201] Dotplot of Paper Strengths (in pounds) o Dot Plots make it easy to pick out outliers — an outlier is an unusual observation or extreme value, and it usually warrants further attention 0 Used on “reasonably small” data sets 0 Histograms: Unlike our text, histograms will not be discussed in the context of two separate categories, discrete and continuous. We’ll be following the “continuous method” for all histogram displays (as described below) 0 Frequency distributions condense data into a more manageable and readable format 0 Frequency Distribution —- a table giving a count (frequency) of the number values within a particular category or class interval. 0 Relative Frequency Distribution — a table giving the proportion (rather than the number of values) falling within a particular category or class interval. 0 Histogram —— a graphical way to display a frequency (or relative frequency, cumulative frequency, cumulative relative frequency) distribution. 0 Unlike the text suggests (page 16), its best to keep the class intervals of the same width — otherwise, the graph is hard to read (in terms of summarizing data) 0 Be careful of using too many or too few cells: Number of bins 21 J; , where n is the number of data values Example 3. Age at which the US. presidents began their first terms. Data taken from http://'home.comcast.net/~sharenday7/Presidents/APOO.htm 57 61 57 57 58 57 61 54 68 51 49 64 50 48 65 52 56 46 54 49 51 47 55 55 54 42 51 56 55 51 54 51 60 62 43 55 56 61 52 69 64 46 54 9 Note: Clearly define your bin “boundaries” so your reader knows which bins contain the borderline values. Label your bins clearly when there is a question involved; don’t leave your reader to decipher your binning scheme! Frequency Table of Ages. Let x represent the age of the president at the beginning of his term. Age of Presidents at Frequency . Relative Cumulative Cumulative beginning of their 5 Frequency Frequency Relative first terms Frequency .._......_..._..._____4_.........._.._....._.____... ......_......_________ ___.__..........-__._.._____-.._.....,.._______........_._ ____.....__.... ...... Frequency Histogram Pruldcnts mday «term-9c Relative Frequency Histogerm in Minitab (gives percents, not progclfions) ”stingramof Presidents Istday often-nag: 42 46 50 54 58 62 66 70 ”about: at day 0! turn out 10 Cool Graphic: Histogram of heights constructed using the people. Photograph by Peter Morenus in conjunction with Professor Linda Strausberg, University of Connecticut. Subjects are University of Connecticut genetics students, females in white tops, males in dark tops. Shapes of Histograms 4U 30 20 Percent 10 40 30 ~45 —3.5 «2.5 «15 >05 05 1.5 2.5 3.5 45 Symmetric,Unimodal Negatively skewed 10 E’ a: 0 B 5 a. 101 Percent Positively Skewed l l 4.4 5,2 8.0 1‘2 2.] 2.8 3.5 6.8 11 o Stem-and-Leaf Display 0 Effective display of *large* data sets 0 Each data value has 2 parts: > Stem: one of more of the leading digits > Leaf: remaining digits after the stem value 0 Possible Problems: too few stems or too many stems 0 Information a stem-and-leaf diagram conveys: identification of a typical value extent of spread about the typical value presence of gaps in data extent of symmetry in the distribution of values number and locations of peaks presence of outlying data values VVVVVV Steps for constructing a stem-and—leaf diagram by hand: (1) Select one or more digits for the stem values. The trailing digits become the leaves. (2) List possible stem values in a vertical column. (3) Record the leaf for every observation beside the corresponding stem value. (4) Indicate the units for stems and leaves someplace in the display! Example 4. Stem and leaf diagram for age of President at beginning of first term. See Example 3 for data. Stem-and-Leaf Display: Presidents 1st day of term age Stem—and‘leaf of Presidents lst day of term age N = 43 Leaf Unit = 1.0 2 4 23 2 4 5 4 667 8 4 899 14 5 011111 16 5 22 (9) 5 444445555 18 5 6667777 11 5 8 10 6 0111 6 6 2 5 6 445 HI 68, 69 Here’s another view of the same data when the display is incremented by 10’s instead of 2’s: Stem-and-Leaf Display: Presidents 1st day of term age ll Stem—and—leaf of Presidents lst day of term age N 43 Leaf Unit = 1.0 8 4 23667899 (25) 5 0111112244444555566677778 10 6 01112445 H1 68, 69 12 1.3 Describing Distributions By its definition, statistics is concerned with the acquisition and analysis of data. Often, the data of interest are numbers or measurements obtained from observation of the units in a population or process. More formally, data are facts that represent particular characteristics of the units. The characteristics themselves are variables. e.g. The time that the sun rises each morning at the Terre Haute International Airport is a variable. The fact that the sun rose yesterday morning at 7:47 am. is data. Recall: There are two types of variables: 1. Categorical (Qualitative) Variables — variables whose values are categories. e.g. gender, favorite soda, state of residence, etc. In this class, we do not do much with categorical variables, but there is an entire branch of statistics dealing with categorical data analysis (if you’re interested). 2. Quantitative Variables — variables which are numeric by nature. e.g. income, weight, height, time required to complete a task, etc. Two types of quantitative variables: Discrete and Continuous (see page 11 of our text) The support of a variable is the possible values that the variable can assume. The support of a variable is either: (1) Discrete: the support is discrete if its set of possible values is either finite or countably infinite (e.g., 3, 4, 5, ...). Variables with a discrete support are discrete variables. Examples: 0 The number of siblings you have 0 The number of books you purchased for this quarter 0 The number of tails obtained when you flip a coin twice 0 The number of non—smooth Lego side tosses out of 100 o The number of matching lottery picks a The number of rolls of a die before a “6” appears / o The number of questions correct on the first test 57 ’ ., :Fl 1 1.. E .l . I (“.41“) I”) {p {Z} aquC/j‘ifi‘flqflr o The number of phone calls you received last week i (2) Continuous: the support is continuous if its set of possible values consists of an entire interval of real numbers. These variables are continuous variables. Text: A continuous variable is one whose value is determined by making a measurement of some sort. Examples: The time we spend in class today is any real number in the interval 0 to 55 minutes The temperature in my office each morning at 7:30 am. The weight of your grandmother The miles per gallon achieved by your car The time on the phone during a call to with mother The waiting time in Subway’s line for lunch The lifetime of the battery in your computer 13 Example 1. I surveyed 200 Terre Haute residents this morning as to how many hours they slept last night. Suppose I can construct a histogram of the sleep hours in which: 0 for each rectangle, area = relative frequency of the interval 0 total area of all rectangles = 1. With a large amount of data, we can envision a smooth curve being a model for the relative frequency histogram. «2 ' 2 4 e a 10 12 sleep hours last night forXZOO TH residents Using the histogram above, determine the approximate proportion of TH residents that: (a) slept less than 4 hours last night. (b) slept at least 10 hours last night. .Oi+.0L+.ox+.iL‘.l(O 4:er .oL/ The density function that “best” fits the data above is fix ) = A —"~* for all real values of x. We can now calculate the proportion of values with the function fix). To calculate the proportion of TH residents that slept less than 4 hours, we just need to determine the area under the curve fix) for x’s from O to 4. Similarly for residents that slept at least 10 hours. 2 i z 0.1573 (via Maple). 9-2 j x 0.0228 (via Maple). 7 H) 14 Definition: A density function fix) is used to describe (at least approximately) the population or process distribution of a continuous variable x. The graph of fix) is called the density curve and must satisfy the following properties: (1) fix) 2 0 for all x, (2) I f (x)dx = 1, (that is, the area under the density curve is 1), [7 (3) The proportion of x values between the values a and b = I f (x)dx . FACT: Since there is no area under a density curve at a single point, then: I the proportion of values between a and a is 0. I “the proportion of x satisfying a s x S b” = “the proportion of x satisfying a < x < b” Example 2. Suppose I take a bus to RHIT every day and a bus arrives at my bus stop every 5 minutes. Because I don’t always leave my house at exactly the same time, I don’t always arrive at the bus stop at the same time. Let x be my waiting time (in minutes) at the bus stop. Then x is a continuous variable with support 0 S x S 5. One possible density curve that I can use to model x is: f() 1/5 0 S x S 5 x = 0 otherwise We can graph flx). Isf(x) a legitimate density function? What proportion ofq the time will I have wait between 2 and 4 minutes for the bus? ~ 1. a i ?(ZLXLL{)’ fs’cjac vi g What proportion of the time will I have to wait at most 1 minute? I ’MXL') ? {.Jx : "gt ’1 15 Example 3. “Time headway” in traffic flow is the elapsed time between the time that one car finished passing a fixed point and the instant that the next car begins to pass that point. The following density function f(x) is essentially the time headway (in seconds) for two randomly chosen consecutive cars on a freeway during a period of heavy flow as suggested in “The Statistical Properties of Freeway Traffic” (Transportation Research, Volume 11: 221-228): 0.15e‘0'15("’°‘5) x 2 0.5 f(x) — i 0 otherwise A graph of fix): 0.14 0.12 0.1 BIB [1% 0.04 0.02 2 4 E B 10 12 14 x Isflx) a legitimate density function? YES! Why? (1) fix) 2 O for all real numbers x (2) vii/”(906136 = 1- 0.151641”): 'e0.075dx 0.5 H Tf(x)dx 0.15.60.075Te«0.15xdx 0.5 r .075 —0. 11m (r 60 re 15" ll \ B H B—no 0-5 j ' 0.075 vOJSB 0.075 410.75 ‘ = 11m (- e - e + e . e 3—»: ’ = 1 . Determine the proportion of cars in which the headway time between them and the following car is at most five seconds. 5 Proportion of headway times between 0.5 and 5 seconds = 0.15 Jew" -e0'°75 dxz 0.4908 ()5 Determine the proportion of cars in which the headway time between them and the following car is exactly 2 seconds. Z Mtg/Kn?) \ ?(7{~:2> : [pike I <,;L(, :i O / Z 16 One special continuous distribution: the exponential distribution (text page 29) Definition: A continuous random variable x is said to have an exponential distribution with parameter 7» > 0 if its density function is: ile‘l' x20 f(x)={ 0 x<0 Example 4. Suppose the length of a phone call (in minutes) x has an exponential distribution with parameter )L = 1/10. (a) Determine the density curve and graph it. W ,‘e-“\ t0 y" X (b) What proportion of phone calls made in the U.S....
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern