Formula Card 2

# Formula Card 2 - Statistics the discipline is the art and...

This preview shows pages 1–2. Sign up to view the full content.

Statistics, the discipline, is the art and science of extracting useful information from data. Statistic: number calculated from data When do we know that X affects Y? Controlled experiment o Physical sciences, x is controlled, y is measured o Med sciences, clinical trials, double-blind placebo o Social sciences, subjects exposed to controlled X, y observed Strong theory o Physics, opposite charges attract o Bio, descent of species o No strong theory for social sciences Observational data --> prediction The ASSOCIATION Clustered: curve or line NOT the right fit Convex = cup, Concave = cap No association = random cov(x,y) = 1/(n-1)[(x1-xbar)(y1-ybar)+(x2-xbar)(y2-ybar)+…(xn-xbar)(yn-ybar)] s(x+y) 2 =s(x) 2 + s(y) 2 + 2cov(x,y) Cauchy-Schwartz inequality -s(x)s(y) ≤ cov(x,y) ≤ s(x)s(y) If perfect positive linear assoc cov(x,y) = s(x)s(y) If perfect negative linear assoc -s(x)s(y) = cov(x,y) Comparing w/ products of sdev is inconvenient c(x,y) = cov(x,y)/(s(x)s(y)) covariance is theoretically important for algebra correlation is practically important for convenient measure of linear assoc zbar = 0, sample sdev z = 1, ALWAYS, unit-free Standardization can be done w/ ANY location measure (median) and ANY dispersion measure (IQR), result will still be 0 location measure & unit dispersion measure of 1. cov(Zx,Zy) = c(x,y) Scatterplot matrices Plots in column all have same x axis Plots in row all have same y axis Zy = a(Zx) + b , abs val of a = 1, b = 0 Zy = ± Zx cor(x,y) = (s 2 (Zy+Zx) – s 2 (Zy-Zx))/4 = cov(x,y)/(s(x)s(y)) ni = count of ith label pi = proportion of ith label sdev even more problematic w/ skew, squares amplify influence of outliers time series simple: daily stock price of 1 company multiple: daily stock prices of multiple companies 1 var: barplot, comparing freq across labels 2 var: mosaic, comparing conditional freq; compare proportion of Y by groups of X Quant 1 var: Histogram: see shape/skew Boxplot: show location, dispersion, outliers=pts outside wkrs 2 var: scatterplot Comparison box plot: compare levels of quant variable by groups of qual variables Nested barplots: heights of bars reflect freq of Y groups nested w/in X groups Compare importance of Y w/in X Graphical methods -see data as whole -discover unexpected facts Numerical summaries -simplicity by condensing lot of data to few #’s -precision i.e. when comparing groups (eyeballing) -ways to reason about uncertainty NEITHER REPLACES THE OTHER Quant variables Measures of location: mean, median, quantiles, min, max Dispersion: sd, IQR, range

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 3

Formula Card 2 - Statistics the discipline is the art and...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online