CHAPTER 2 Organizing Data A long list of data is not...

CHAPTER 2 Organizing Data A long list of data is not organized. Raw Data – numbers and category labels that have been collected but have not yet been processed. Variable – a characteristic that can vary from one individual to the other Observational unit / observation – a single individual who participates in a study Sample size – the total number of observational units Dataset – the complete set of raw data for all observational units and variables in a survey or experiment Sample vs. Population We generally want to describe a population using statistics but it is unrealistic to measure variables on every observational unit in the population. A subset of the population from which we can gather information is called a sample. Consider the following hypothesis of interest:

Smoking, weight and parent’s disease status are associated with heart disease in people 18 years of age and older. Population: Persons 18 years of age and older with heart disease Sample: 50 people 18+ years old with heart disease Sample size: n = 50 Variables of interest: smoking, weight, parent’s disease status Parameter vs. Statistic Values from the entire p opulation are called p arameters. Values from the s ample are called s tatistics. Types of Variables Categorical – group or category names that don’t necessarily have any logical ordering Color of M&M’s
Gender Stat 200 Section Ordinal – categorical variable where values or categories have a natural ordering Rate the roller coaster on a scale of 1-5 (1 is terrible and 5 is excellent) Age groups (child, teen, adult, senior citizen) Shirt sizes (S, M, L, XL) Quantitative – numerical values taken on each individual Height Temperature # of Red M&M’s Possible Roles Played by Variables: Response Variables – the variables of which we want to determine the outcome. These are the variables of main interest.

