Review Notes highlighted

Review Notes highlighted - STAT 2120, Notes on Topic 1,...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
STAT 2120, Notes on Topic 1, Spring 2010 Introduction to Examining Distributions: A variable records characteristics of individuals ( i.e. , objects of interest) in its values. Classify a variable by its possible values: o Categorical: records group labels; numeric labels mean nothing, except possible order. o Quantitative: records meaningful numbers; may be discrete or continuous A time series is a record of values across time. A variable’s distribution describes the counts or relative proportions of its values. Exploratory data analysis seeks to describe distributions and relationships in data. Displaying a distribution with graphs: Bar graphs and pie charts describe the distribution of a categorical variable. o Bar graphs emphasize counts; pie charts, proportions. o A Pareto chart is a bar graph with categories ordered by decreasing frequency. Histograms are essentially bar graphs of a quantitative variable. o Bar-widths are not absolute; use equal bar- widths and “eyeball” for best picture. o Look for overall pattern, shape center, spread, deviations in shape, and “outlier” deviations. o A symmetric distribution is such that its histogram mirrors itself about its center. o A right- or left-skewed distribution shows a long tail to the right or left in its histogram. Stemplots are back-of-the-envelope histograms drawn with the digits of quantitative values o “Stem” digits define bars; “leaf” digits display counts and sub-counts. o Customize by rounding digits and splitting stems. Time plots graph time series values by time. o Emphasize patterns of change over time, such as trends and seasonal variations. o Some time plots are seasonally adjusted. Describing distributions with numbers: Denote by    the values of observations. th percentile is a number such that percent of values fall on or below. Describe a distribution with numerical summaries of shape, center, and spread. A summary is resistant if it is insensitive to changes in skewness or extreme values. Measure of center: mean,  o     , the arithmetic average. o  is not resistant. Measure of center: median, o is the 50 th percentile. o Calculate as the middle value or average of two middle values. o is resistant. Measure of spread: extreme values o Smallest and largest values o Extreme values are not resistant. Measure of spread: quartiles, and o is the 25 th percentile; is the 75 th percentile o Calculate and as medians of values falling to the left or right of (but not on) . o and are resistant. Measure of spread: standard deviation, o   , where     , a rescaled average of squared-deviations from  .
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 05/03/2010 for the course STAT 212 taught by Professor Holt during the Spring '08 term at UVA.

Page1 / 33

Review Notes highlighted - STAT 2120, Notes on Topic 1,...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online