Chapter 1: Reading Exercise
Chapter 2: Data Analysis (Variables and Describing/Summarizing data
-Data is information about n number of subjects.
-Analysis includes understanding the information in a manner that is useful in decision making.
Sometimes analysis has well defined objective.
-Information is organized in the format of subjects and variables.
-- Set of subjects make sample(population) and attributes or characteristics of these subjects are
Note that population is a collection all subjects related to the data of interest. However sample
includes only some subjects of the population. For example, CENSUS includes all the US
and legal immigrants (about 310millions) but CPS, the current population survey,
includes only a sample of the current population (less than a million).
There are 2 types of variables: Qualitative (categorical) and Quantitative (numerical)
Qualitative variable takes non-numbers (categories) as values/observations. Ex: Major,
Ethnicity, Student status, Buy/Sell status of a stock. No algebra can be done on qual. variable, except
There are two types of Qual. variables: 1. Nominal (categories have no order/rankings).
Major, Gender, Race… 2. Ordinal (categories have order/ranking). Ex: Student
(freshman, sophomore …), Earned education (HS, Bachelor, Master,…)
Quantitative variable takes numbers as values. Ex: GPA, SAT scores, # of mistakes, monthly
unemployment rate, stock prices.
When small data (small n and few variables) is presented, it may not be difficult to understand the
information content. However, when large data is presented, how do we understand or explain the
-Always eyeball it top to bottom and left to right! Does it help, may be not!
What is information? In stats/probability, this is may be understood as Distribution.
Distribution of a variable is about breaking down the information content to
Different categories/values this variable takes
The counts/frequencies/relative frequencies of these values
Note: 1. Distribution helps in understanding the pattern of variation in the information. 2. Relative
frequencies are objective probabilities.