Unformatted text preview: Lecture 2: Data Types: 1. Discrete Data: whole numbers, can be counted 2. Continuous Data: a real number (usually measured) 3. Categorical Data: non
numerical, usually from some pre
EXAMPLES Notations: Datatypes: Dataset: A collection of Data Transformations of Data; Monotone Transformations: • A transformation F is called monotone increasing if the ranks of {x1, x2, x3,……,xn} are the same as {F(x1), F(x2), F(x3),……F(xn)} • If the ranks are reversed, it is called monotone decreasing. • An affine transformation is one of the form y= Ax + B • Examples • Coding: Categorical to Numerical • Ranking: Ordering the data from smallest to largest • Example: Data {1, 0.5, 3, 100, 4, 6} → {3, 2, 3, 6, 1, 5} • Log transformation
0.5, 3, 100,
4, 6} _
{3, 2, 3, 6, 1, 5} • Log transformation Statistical Terminology: Revisiting PPDAC Problem: • Statements about Populations of individuals • Individual members of the population= units • Characteristic of a unit = a variate • Functions defined on the units = attribute Aspect of a Problem: • Descriptive: The answer involves learning about some attribute about the population. • Causative: involves the existence of a causal link between variates (or non
Errors Target population → study error? → Study population → Conclusions (Induction) → Sample → sample error? → Analysis Then Study error: α(ΡStudy) ≠ α(ΡTarget) Sample Error: α (S) ≠ α(ΡStudy) • Examples: • Errors are unavoidable • Suppose the attribute of interest is α(.), a function of the population
α(ΡTarget) Sample Error: α (S)
Plans: (usually for Causative Aspects) • Experimental • Observational Examples Data: Things to remember: (i) Inconsistent Observations (ii) Extreme Observations: Outliers (iii) Sources of Bias: Bias = Systematic Error (iv) Missing Observations: 0, 99, *, NA Analysis Address the questions of interest using the data • construct an appropriate model (STAT 230) • use formal statistical methods (STAT 231) • prepare appropriate numerical and graphical summaries Examples: Conclusion: Address the questions of interest • in contextual language using the output from the Analysis step • discuss possible limitations and uncertainties Roadmap: A look ahead. (i) Summarization of Data (ii) Recap of STAT 230
