This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Lecture 2: Data Types: 1. Discrete Data: whole numbers, can be counted 2. Continuous Data: a real number (usually measured) 3. Categorical Data: non
numerical, usually from some pre
determined categories 4. Binary Data: categorical data with two categories 5. Ordinal data: has an underlying order 6. Grouped or frequency data: data that has been reduced to # of observations in particular categories EXAMPLES Notations: Datatypes: Dataset: A collection of Data Transformations of Data; Monotone Transformations: • A transformation F is called monotone increasing if the ranks of {x1, x2, x3,……,xn} are the same as {F(x1), F(x2), F(x3),……F(xn)} • If the ranks are reversed, it is called monotone decreasing. • An affine transformation is one of the form y= Ax + B • Examples • Coding: Categorical to Numerical • Ranking: Ordering the data from smallest to largest • Example: Data {1,
0.5, 3, 100,
4, 6} _
{3, 2, 3, 6, 1, 5} • Log transformation Statistical Terminology: Revisiting PPDAC Problem: • Statements about Populations of individuals • Individual members of the population= units • Characteristic of a unit = a variate • Functions defined on the units = attribute Aspect of a Problem: • Descriptive: The answer involves learning about some attribute about the population. • Causative: involves the existence of a causal link between variates (or non
existence) (i) (ii) Changes in the explanatory variates “cause” a change in the response variates • Predictive: involves predicting value of a response variate for a given unit • Examples: • Correlation ≠ Causation Response variates: Explanatory variates Units • The target population: set of units we set out to investigate • The study population: the set of units which could have been included in the sample • The sample: the set of units actually selected by sampling protocol Errors Target population study error? Study population Conclusions (Induction) Sample sample error? Analysis Then Study error: α(ΡStudy)
α(ΡTarget) Sample Error: α (S)
α(ΡStudy) • Examples: • Errors are unavoidable • Suppose the attribute of interest is α(.), a function of the population Plans: (usually for Causative Aspects) • Experimental • Observational Examples Data: Things to remember: (i) Inconsistent Observations (ii) Extreme Observations: Outliers (iii) Sources of Bias: Bias = Systematic Error (iv) Missing Observations: 0, 99, *, NA Analysis Address the questions of interest using the data • construct an appropriate model (STAT 230) • use formal statistical methods (STAT 231) • prepare appropriate numerical and graphical summaries Examples: Conclusion: Address the questions of interest • in contextual language using the output from the Analysis step • discuss possible limitations and uncertainties Roadmap: A look ahead. (i) Summarization of Data (ii) Recap of STAT 230 ...
View
Full
Document
This note was uploaded on 01/27/2011 for the course STAT 231 taught by Professor Cantremember during the Winter '08 term at Waterloo.
 Winter '08
 CANTREMEMBER

Click to edit the document details