Chapter_2

Chapter 2: Exploring Data with Graphs and Numerical Summaries Read Chapters 1-2 and Start Homework 1

Descriptive Statistics ± Goal: To understand your data! ² Summary & Description ² Find peculiarities ± Descriptive statistics are a vital precursor for more sophisticated, model-based inferential techniques
Survivorship on the Titanic Goal: To describe patterns of survival in passengers on the Titanic

The Data Variable Description age Age in years gender m = male, f = female class 1 = first, 2 = second, 3 = third survived N, Y n = 1046
Data Types ± Qualitative variables have values that vary in kind but not degree (not measurements). ² Survived = Y/N ² Passenger class = 1 st , 2 nd , or 3 rd ² Gender = M/F ± Quantitative variables have actual units of measure. One can perform arithmetic operations. ² Age = passenger’s age in years

Class Question #1: ± Identify the variable type as either categorical or quantitative 1. Number of siblings in a family 2. County of residence 3. Distance (in miles) of commute to school 4. Marital status
Class Question #2 ± Identify each of the following variables as continuous or discrete 1. Length of time to take a test 2. Number of people waiting in line 3. Number of speeding tickets received last year 4. Your dog’s weight

Data Structure Rectangular arrays ± Rows – observations or subjects ± Columns – variables Variable – characteristics recorded about each individual or observation. These characteristics are expected to vary from individual to individual
The Titanic Data in StatCrunch This a portion of the data set.

Graphical Methods ± Univariate Methods: ² Categorical variables: bar chart, pie chart ² Quantitative variables: histogram, stem and leaf plot, box plot, normal quantile plot (aka normal probability plot) ± The Goals: ² Getting to know the data ² Who were the Titanic passengers?
Bar plots: Categorical Variables Bar plots are used to display the total number of observations or the percentage of the total number of measurements falling into each (displayed) category. Graphics > Bar plot > with data > frequency (option)

Proportion & Percentage (Relative Frequencies) ± The proportion of the observations that fall in a certain category is the frequency (count) of observations in that category divided by the total number of observations Frequency of that class Sum of all frequencies ± The percentage is the proportion multiplied by 100. Proportions and percentages are also called relative frequencies.
Bar plots: Categorical Variables Bar plots are used to display the total number of observations or the percentage of the total number of measurements falling into each (displayed) category.

