This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Background, Definitions, and Graphical Displays Motivation: Why analyze data? Clinical trials/drug development: compare existing treatments with new methods Agriculture: enhance crop yields, improve pest resistance Ecology: study how ecosystems develop/ to environmental impacts Lab studies: learn more about biological tissue/cellular activity Statistics is the science of collecting, summarizing, analyzing, and interpreting data. Our goal is to understand the underlying biological phenomena that generate the data. Random Variables Data are generated by some underlying random process or phenomenon. Any datum (data point) represents the outcome of a random variable. We represent random variables with capital letters, usually X, Y, and Z. Example: Let X = weight of a newborn baby. Let Y = weight of baby at one week old Types of Random Variables Qualitative Categorical (or nominal) Ordinal Quantitative Discrete Continuous Definition: A sample is a collection of `subjects' upon which we measure one or more variables. Definition: The sample size is the number of subjects in a sample. The sample size in a study is (almost always) denoted by the lower case letter, "n". Definition: The observational unit is the type of subject being sampled. Observational units could be a baby, moth, Petri dish, etc... Definition: An observation is a recorded outcome of a variable from a random sample. We represent observations with lower case letters. For example, suppose we are measuring the outcome of a random variable X = weight of 10 newborn babies. Our observations would be denoted by x1, x2, ... ,x10. Notation: x1 is the first observation. Example: For the following setting identify the(i) variable(s) in the study, (ii) type of variable (iii) observational unit (iv) sample size (From exercise 2.5) In a study of schizophrenia, researchers measured the activity of the enzyme monoamine oxidase (MAO) in the blood platelets of 18 patients. The results were recorded as nmols benzylaldehyde product per 108 platelets. Definition: A frequency distribution is a summary display of the frequencies of occurrence of each value in a sample. Definition: A relative frequency is a raw frequency divided by n (sample size). Example 2.4 Y = number of piglets surviving 21 days (litter size at 21 days) What is the sample size? Background, Definitions, and Graphical Displays Page 2 Graphical Displays After you collect data, we hope that it tells us something. We look at it in order to learn something about the process that generated it. One way to summarize (or describe) the data is through a graphical display. A graphical display should always be as clear as possible. It should be well labeled with a title, key (if necessary), labels on the graphical display itself, units should be clear from your display, and the sample size should be clear. Do not overlabel your graphical display! A dot plot is a graphical display where dots indicate observed data in a sample. Figure 2.4 Surviving Piglets at 21 Days
n=36 sows A histogram is a graphical display where bars (or bins) replace the dots from a dotplot. Figure 2.5 Surviving Piglets at 21 Days (n=36 sows) Background, Definitions, and Graphical Displays Page 3 Warning: A histogram can be a very misleading display we'll see an example of this in a minute. A stemplot or stem and leaf plot is a lot like a dot plot, usually turned on its side. We use a stemplot when we have more detailed information to replace the dots with. The "stems" are the core values of the data and the "leaves" are the last values of the data points. We'll put the "leaves" in numerical order in this class and the resulting plot is called an ordered stemplot ... but, we'll never use the unordered kind, so we'll refer to ours simply as a stemplot. A stemplot should always include a key with units! Example Exercise 2.5 (modified) In a study of schizophrenia, researchers measured the activity of the enzyme monoamine oxidase (MAO) in the blood platelets of 18 patients. The results were recorded as nmols benzylaldehyde product per 108 platelets. 4.1 5.2 6.8 7.3 7.4 7.8 7.8 8.4 8.7 9.7 9.9 10.6 10.7 11.9 12.7 14.2 14.5 18.8 Create a stemplot for these data. Background, Definitions, and Graphical Displays Page 4 Describing the "shape" of a frequency distribution We can see the shape of a frequency distribution by looking at an appropriate graphical display. The following are some basic words we use to describe the shape of a frequency distribution symmetric / asymmetric bellshaped / skewed left / skewed right unimodal / bimodal Some examples... Background, Definitions, and Graphical Displays Page 5 Example (From Exercise 2.13) Trypanosomes are parasites which cause disease in humans and animals. In an early study of trypanosome morphology, researchers measured the lengths (mm) of 500 individual trypanosomes taken from the blood of a rat. The results are summarized in the accompanying frequency distribution. The following is the default histogram returned by a statistical software package (not well labeled yet!) Describe the shape of the frequency distribution. This next histogram is returned by the same software package for the trypanosome data, with the binwidth changed. How would you describe the shape of the distribution now? Discuss the changes. Background, Definitions, and Graphical Displays Page 6 ...
View Full Document
- Fall '09