stats_review_card_triola

stats_review_card_triola - QXP6326677.qxp 3:57 PM Page 1...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: QXP6326677.qxp 12/10/08 3:57 PM Page 1 Triola Statistics Series Review Basic Terms Statistics: Methods for planning experiments, obtaining data, organizing, summarizing, analyzing, interpreting, and drawing conclusions based on data. Population: Collection of all elements to be studied. Census: Data from every member of a population. Sample: Subcollection of members from a population. Parameter: Numerical measurement of characteristic of a population. Statistic: Numerical measurement of characteristic of a sample. Random Sample: Every member of population has same chance of being selected. Simple Random Sample: Every sample of same size n has the same chance of being selected. Normal Distribution Continuous random variable having bell-shaped and symmetric graph and defined by specific equation. Standard Normal Distribution: Normal distribution with m = 0 and s = 1. x-m Standard z score: z = s Central Limit Theorem: As sample size increases, sample means x approach normal distribution; s m x = m and s x = q q 1n x - mx q so that z = s 1n Normal Approximation to Binomial: Requires np Ú 5 and nq Ú 5. Use m = np and s = 1npq. Confidence Intervals (Using Two Samples) Two Proportions: N N N N 1 p1 - p22 - E 6 1 p1 - p22 6 1 p1 - p22 + E where E = za>2 NN p1q1 NN p2q2 n2 Common Critical z Values CONFIDENCE INTERVAL Confidence Level 0.90 0.95 0.99 Critical Value 1.645 1.96 2.575 Choosing Between z and t for Inferences about Mean • • • s known and normally distributed population: use z s known and n 7 30: use z s unknown and normally distributed population: use t Hypothesis Testing (Two Proportions or Two Independent Means) Two Proportions: Requires two independent simple random samples and np Ú 5 and nq Ú 5 for each. Test statistic: z= N N 1 p1 - p22 - 1 p1 - p22 pq pq n2 Matched Pairs Requires simple random samples of matched pairs and either the number of matched pairs is n 7 30 or the pairs have differences from a population with a distribution that is approximately normal. d: Individual difference between values in a single matched pair md: Population mean difference for all matched pairs d: Mean of all sample differences d sd: Standard deviation of all sample differences d n: Number of pairs of data d - md Test statistic: t = where df = n - 1 sd 1n Probability Rare Event Rule: If, under a given assumption, the probability of an event is extremely small, conclude that the assumption is probably not correct. Relative Frequency: number of times A occurred P1A2 = number of trials Classical Approach: s P 1 A2 = (equally likely outcomes) n Probability property: 0 … P 1A2 … 1 Complement of Event A: P1A2 = 1 - P 1A2 Addition Rule: Disjoint Events: Cannot occur together. If A, B are disjoint: P1A or B2 = P1A2 + P1B2 If A, B are not disjoint: P1A or B2 = P1A2 + P1B2 - P1A and B2 Multiplication Rule: Independent Events: No event affects probability of other event. If A, B are independent: P1A and B2 = P1A2 # P1B2 If A, B are dependent: P1A and B2 = P1A2 # P1B ƒ A2 Random Variables Random Variable: Variable that has a single numerical value, determined by chance, for each outcome. Probability Distribution: Graph, table, or formula that gives the probability for each value of the random variable. Requirements of random variable: 1. g P1x2 = 1 2. 0 … P1x2 … 1 Parameters of random variable: m = g x # P1x2 2 C n1 + Two Means (Independent): 1 x1 - x22 - E 6 1m1 - u22 6 1 x1 - x22 + E where E = ta>2 s2 1 HYPOTHESIS TEST: RIGHT-TAILED Significance Level A 0.05 0.025 0.01 0.005 Critical Value 1.645 1.96 2.33 2.575 • s unknown and n 7 30: use t If none of the above apply, use nonparametric method or bootstrapping. C n1 + s2 2 n2 C n1 where p = x1 + x2 n1 + n2 + df = smaller of n1 - 1 and n2 - 1. Alternative Cases for Two Independent Means: s2 s2 1 2 Known s1 and s2: E = za>2 + C n1 n2 If s1, s2 unknown but assumed equal, use pooled variance s2 : p 2 2 sp sp E = ta>2 C n1 + n2 where s2 = p 1n1 - 12s2 + 1n2 - 12s2 1 2 1n1 - 12 + 1n2 - 12 Wording of Conclusion Does original claim include equality? Yes Fail to reject H0 “There is not sufficient evidence to warrant rejection of the claim that … [original claim].” No “There is not sufficient evidence to support the claim that … [original claim].” and q = 1 - p N and p1 = x1 x2 N and p2 = n1 n2 HYPOTHESIS TEST: LEFT-TAILED Significance Level A 0.05 0.025 0.01 0.005 Critical Value 1.645 1.96 2.33 2.575 s = 2 g 3x # P1x24 - m 2 Expected value: E = g x # P1x2 Binomial Distribution: Requires fixed number of independent trials with all outcomes in two categories, and constant probability. n: x: p: q: P(x): Fixed number of trials Number of successes in n trials Probability of success in one trial Probability of failure in one trial Probability of x successes in n trials P1x2 = m = np s = 1npq n! # px # qn-x 1n - x2!x! Mean (binomial) St. dev. (binomial) Proportion: n= n= Measures of Center: Population mean: m gx Sample mean: x = n Mean from frequency dist.: g 1 f # x2 x= n Median: Middle value of data arranged in order. Mode: Most frequent data value. highest + lowest Midrange: 2 Measures of Variation: Range: maximum - minimum Sample standard deviation: s= g 1 x - x2 2 3za>242 # 0.25 E 2 Reject H0 Describing, Exploring, and Comparing Data Determining Sample Size NN 3za>242 pq E 2 N N 1 p and q known2 Mean: n = c za>2s E d 2 Matched Pairs: d - E 6 md 6 d + E sd where E = ta>2 and df = n - 1 1n HYPOTHESIS TEST: TWO-TAILED Significance Level A 0.05 0.01 0.10 Critical Value ±1.96 ±2.575 ±1.645 “There is sufficient evidence to warrant rejection of the claim that … [original claim].” “There is sufficient evidence to support the claim that … [original claim].” Two Means (independent samples): Requires two independent simple random samples with both populations normally distributed or n1 7 30 and n2 7 30. The population standard deviations s1 and s2 are usually unknown. Recommendation: do not assume that s1 = s2. Test statistic (unknown s1 and s2, and not assuming s1 = s2): 1 x1 - x22 - 1m1 - m22 t= s2 s2 1 2 + C n1 n2 df = smaller of n1 - 1 and n2 - 1 Hypothesis Testing (Two Variances or Two Standard Deviations) Requires independent simple random samples from populations with normal distributions. s2: 1 n1: s2: 1 larger of the two sample variances size of the sample with the larger variance variance of the population with the larger sample variance s2 1 s2 2 where s2 is the larger of the 1 Test statistic: F = two sample variances and numerator df = n1 - 1 and denominator df = n2 - 1 Hypothesis T esting (One Sample) One Proportion: Requires simple random sample, np Ú 5 and nq Ú 5, and conditions for binomial distribution. N p-p x N Test statistic: z = where p = n pq Correlation Hypothesis Testing (Alternative Cases for Two Means with Independent Samples) Requires two independent simple random samples and either of these two conditions: Both populations normally distributed or n1 7 30 and n2 7 30. Alternative case when S1 and S2 are both known values: z= 1x1 - x22 - 1m1 - m22 s2 1 s2 2 Scatterplot: Graph of paired (x, y) sample data. Linear Correlation Coefficient r : Measures strength of linear association between the two variables. Property of r : - 1 … r … 1 Correlation Requirements: Bivariate normal distribution (for any fixed value of x, the values of y are normally distributed, and for any fixed value of y, the values of x are normally distributed). Linear Correlation Coefficient: r= n g xy - 1 g x21 g y2 2n1 g x22 - 1 g x22 2n1 g y22 - 1 g y22 or r = g 1zxzy2 n-1 Hypothesis Testing Confidence Intervals (Using One Sample) N N Proportion: p - E 6 p 6 p + E where E = za>2 Bn NN pq N and p = x . n Hypothesis Test: Procedure for testing claim about a population characteristic. Null Hypothesis H0: Statement that value of population parameter is equal to some claimed value. Alternative Hypothesis H1: Statement that population parameter has a value that somehow differs from value in the null hypothesis. Critical Region: All values of test statistic leading to rejection of null hypothesis. A Significance Level: Probability that test statistic falls in critical region, assuming null hypothesis is true. Type I Error: Rejecting null hypothesis when it is true. Probability of type I error is significance level r . Type II Error: Failing to reject null hypothesis when it is false. Probability of type II error is denoted by b . Power of test: Probability of rejecting a false null hypothesis. Traditional method of testing hypotheses: Uses decision criterion of rejecting null hypothesis only if test statistic falls within critical region bounded by critical value. Critical Value: Any value separating critical region from values of test statistic that do not lead to rejection of null hypothesis. P-value method of testing hypotheses: Uses decision criterion of rejecting null hypothesis only if Pvalue … a (where a = significance level). P-value: Probability of getting value of test statistic at least as extreme as the one found from sample data, assuming that null hypothesis is true. Left-Tailed Test: P-value = area to left of test statistic. Right-Tailed Test: P-value = area to right of test statistic. Two-Tailed Test: P-value = twice the area in tail beyond test statistic. where P1B ƒ A2 is P1B2 assuming that event A has already occurred. Cn One Mean: Requires simple random sample and either n 7 30 or normally distributed population. Test statistic z= x - mx q s 1n x - mx q s 1n 1for s known2 Counting Fundamental Counting Rule: If an event can occur m ways and a second event can occur n ways, together they can occur m # n ways. Factorial Rule: n different items can be arranged n! different ways. Permutations (order counts) of r items selected from n different items: nPr Poisson Distribution: Discrete probability distribution that applies to occurrences of some event over a specified interval. P1x2 = mx # e-m x! where e L 2.71828 C n-1 or n1 g x22 - 1 g x22 Mean: x - E 6 m 6 x + E E = za>2 # E = ta>2 # s 1s known2 1n s 1s not known2 1n 1n - 12s2 x2 R 1n - 12s2 C n1n - 12 A n1 + n2 St. dev. from frequency dist.: s= n C g 1f # x22 D - C g 1f # x2 D 2 E 2 t= 1for s not known2 n1n - 12 Sample variance: s Population st. dev.: = n! 1n - r2! ISBN-13: 978-0-321-57080-2 ISBN-10: 0-321-57080-4 Standard Deviation: Alternative case when S1 and S2 are not known, but it is assumed that S1 = S2: Pool variances and use test statistic t= 1x1 - x22 - 1m1 - m22 Explained Variation: r 2 is the proportion of the variation in y that is explained by the linear association between x and y. Hypothesis test 1. Using r as test statistic: If ƒ r ƒ 7 critical value (from table), then there is sufficient evidence to support a claim of linear correlation. If ƒ r ƒ … critical value, there is not sufficient evidence to support a claim of linear correlation. 2. Using t as test statistic: r t= with df = n - 2 1 - r2 An - 2 90000 C 6s6 C x2 L where df = n - 1 One Standard Dev. or Variance: Requires simple random sample and normally distributed population. Test statistic: x = 2 s= g 1 x - m2 2 C 2 N Permutations when some items are identical to others: n! n1!n2! Á nk! Combinations (order doesn’t count) of r items selected from n different items: n! nCr = 1n - r2!r! Procedure 1. Identify original claim, then state null hypothesis (with equality) and alternative hypothesis (without equality). 2. Select significance level a. 3. Evaluate test statistic. 4. Proceed with traditional method or P-value method: An + n 1 2 where s2 = p 1n1 - 12s2 + 1n2 - 12s2 1 2 1n1 - 12 + 1n2 - 12 s2 p s2 p 1n - 12s2 s2 Population variance: s Distribution: Explore using frequency distribution, histogram, dotplot, stemplot, boxplot. Outlier: Value far away from almost all other values. Time: Consider effects of changes in data over time. (Use time-series graphs, control charts.) 9 780321 570802 where df = n - 1 and df = n1 + n2 - 2 1 2 3 QXP6326677.qxp 12/10/08 3:57 PM Page 1 Triola Statistics Series Review Basic Terms Statistics: Methods for planning experiments, obtaining data, organizing, summarizing, analyzing, interpreting, and drawing conclusions based on data. Population: Collection of all elements to be studied. Census: Data from every member of a population. Sample: Subcollection of members from a population. Parameter: Numerical measurement of characteristic of a population. Statistic: Numerical measurement of characteristic of a sample. Random Sample: Every member of population has same chance of being selected. Simple Random Sample: Every sample of same size n has the same chance of being selected. Normal Distribution Continuous random variable having bell-shaped and symmetric graph and defined by specific equation. Standard Normal Distribution: Normal distribution with m = 0 and s = 1. x-m Standard z score: z = s Central Limit Theorem: As sample size increases, sample means x approach normal distribution; s m x = m and s x = q q 1n x - mx q so that z = s 1n Normal Approximation to Binomial: Requires np Ú 5 and nq Ú 5. Use m = np and s = 1npq. Confidence Intervals (Using Two Samples) Two Proportions: N N N N 1 p1 - p22 - E 6 1 p1 - p22 6 1 p1 - p22 + E where E = za>2 NN p1q1 NN p2q2 n2 Common Critical z Values CONFIDENCE INTERVAL Confidence Level 0.90 0.95 0.99 Critical Value 1.645 1.96 2.575 Choosing Between z and t for Inferences about Mean • • • s known and normally distributed population: use z s known and n 7 30: use z s unknown and normally distributed population: use t Hypothesis Testing (Two Proportions or Two Independent Means) Two Proportions: Requires two independent simple random samples and np Ú 5 and nq Ú 5 for each. Test statistic: z= N N 1 p1 - p22 - 1 p1 - p22 pq pq n2 Matched Pairs Requires simple random samples of matched pairs and either the number of matched pairs is n 7 30 or the pairs have differences from a population with a distribution that is approximately normal. d: Individual difference between values in a single matched pair md: Population mean difference for all matched pairs d: Mean of all sample differences d sd: Standard deviation of all sample differences d n: Number of pairs of data d - md Test statistic: t = where df = n - 1 sd 1n Probability Rare Event Rule: If, under a given assumption, the probability of an event is extremely small, conclude that the assumption is probably not correct. Relative Frequency: number of times A occurred P1A2 = number of trials Classical Approach: s P 1 A2 = (equally likely outcomes) n Probability property: 0 … P 1A2 … 1 Complement of Event A: P1A2 = 1 - P 1A2 Addition Rule: Disjoint Events: Cannot occur together. If A, B are disjoint: P1A or B2 = P1A2 + P1B2 If A, B are not disjoint: P1A or B2 = P1A2 + P1B2 - P1A and B2 Multiplication Rule: Independent Events: No event affects probability of other event. If A, B are independent: P1A and B2 = P1A2 # P1B2 If A, B are dependent: P1A and B2 = P1A2 # P1B ƒ A2 Random Variables Random Variable: Variable that has a single numerical value, determined by chance, for each outcome. Probability Distribution: Graph, table, or formula that gives the probability for each value of the random variable. Requirements of random variable: 1. g P1x2 = 1 2. 0 … P1x2 … 1 Parameters of random variable: m = g x # P1x2 2 C n1 + Two Means (Independent): 1 x1 - x22 - E 6 1m1 - u22 6 1 x1 - x22 + E where E = ta>2 s2 1 HYPOTHESIS TEST: RIGHT-TAILED Significance Level A 0.05 0.025 0.01 0.005 Critical Value 1.645 1.96 2.33 2.575 • s unknown and n 7 30: use t If none of the above apply, use nonparametric method or bootstrapping. C n1 + s2 2 n2 C n1 where p = x1 + x2 n1 + n2 + df = smaller of n1 - 1 and n2 - 1. Alternative Cases for Two Independent Means: s2 s2 1 2 Known s1 and s2: E = za>2 + C n1 n2 If s1, s2 unknown but assumed equal, use pooled variance s2 : p 2 2 sp sp E = ta>2 C n1 + n2 where s2 = p 1n1 - 12s2 + 1n2 - 12s2 1 2 1n1 - 12 + 1n2 - 12 Wording of Conclusion Does original claim include equality? Yes Fail to reject H0 “There is not sufficient evidence to warrant rejection of the claim that … [original claim].” No “There is not sufficient evidence to support the claim that … [original claim].” and q = 1 - p N and p1 = x1 x2 N and p2 = n1 n2 HYPOTHESIS TEST: LEFT-TAILED Significance Level A 0.05 0.025 0.01 0.005 Critical Value 1.645 1.96 2.33 2.575 s = 2 g 3x # P1x24 - m 2 Expected value: E = g x # P1x2 Binomial Distribution: Requires fixed number of independent trials with all outcomes in two categories, and constant probability. n: x: p: q: P(x): Fixed number of trials Number of successes in n trials Probability of success in one trial Probability of failure in one trial Probability of x successes in n trials P1x2 = m = np s = 1npq n! # px # qn-x 1n - x2!x! Mean (binomial) St. dev. (binomial) Proportion: n= n= Measures of Center: Population mean: m gx Sample mean: x = n Mean from frequency dist.: g 1 f # x2 x= n Median: Middle value of data arranged in order. Mode: Most frequent data value. highest + lowest Midrange: 2 Measures of Variation: Range: maximum - minimum Sample standard deviation: s= g 1 x - x2 2 3za>242 # 0.25 E 2 Reject H0 Describing, Exploring, and Comparing Data Determining Sample Size NN 3za>242 pq E 2 N N 1 p and q known2 Mean: n = c za>2s E d 2 Matched Pairs: d - E 6 md 6 d + E sd where E = ta>2 and df = n - 1 1n HYPOTHESIS TEST: TWO-TAILED Significance Level A 0.05 0.01 0.10 Critical Value ±1.96 ±2.575 ±1.645 “There is sufficient evidence to warrant rejection of the claim that … [original claim].” “There is sufficient evidence to support the claim that … [original claim].” Two Means (independent samples): Requires two independent simple random samples with both populations normally distributed or n1 7 30 and n2 7 30. The population standard deviations s1 and s2 are usually unknown. Recommendation: do not assume that s1 = s2. Test statistic (unknown s1 and s2, and not assuming s1 = s2): 1 x1 - x22 - 1m1 - m22 t= s2 s2 1 2 + C n1 n2 df = smaller of n1 - 1 and n2 - 1 Hypothesis Testing (Two Variances or Two Standard Deviations) Requires independent simple random samples from populations with normal distributions. s2: 1 n1: s2: 1 larger of the two sample variances size of the sample with the larger variance variance of the population with the larger sample variance s2 1 s2 2 where s2 is the larger of the 1 Test statistic: F = two sample variances and numerator df = n1 - 1 and denominator df = n2 - 1 Hypothesis T esting (One Sample) One Proportion: Requires simple random sample, np Ú 5 and nq Ú 5, and conditions for binomial distribution. N p-p x N Test statistic: z = where p = n pq Correlation Hypothesis Testing (Alternative Cases for Two Means with Independent Samples) Requires two independent simple random samples and either of these two conditions: Both populations normally distributed or n1 7 30 and n2 7 30. Alternative case when S1 and S2 are both known values: z= 1x1 - x22 - 1m1 - m22 s2 1 s2 2 Scatterplot: Graph of paired (x, y) sample data. Linear Correlation Coefficient r : Measures strength of linear association between the two variables. Property of r : - 1 … r … 1 Correlation Requirements: Bivariate normal distribution (for any fixed value of x, the values of y are normally distributed, and for any fixed value of y, the values of x are normally distributed). Linear Correlation Coefficient: r= n g xy - 1 g x21 g y2 2n1 g x22 - 1 g x22 2n1 g y22 - 1 g y22 or r = g 1zxzy2 n-1 Hypothesis Testing Confidence Intervals (Using One Sample) N N Proportion: p - E 6 p 6 p + E where E = za>2 Bn NN pq N and p = x . n Hypothesis Test: Procedure for testing claim about a population characteristic. Null Hypothesis H0: Statement that value of population parameter is equal to some claimed value. Alternative Hypothesis H1: Statement that population parameter has a value that somehow differs from value in the null hypothesis. Critical Region: All values of test statistic leading to rejection of null hypothesis. A Significance Level: Probability that test statistic falls in critical region, assuming null hypothesis is true. Type I Error: Rejecting null hypothesis when it is true. Probability of type I error is significance level r . Type II Error: Failing to reject null hypothesis when it is false. Probability of type II error is denoted by b . Power of test: Probability of rejecting a false null hypothesis. Traditional method of testing hypotheses: Uses decision criterion of rejecting null hypothesis only if test statistic falls within critical region bounded by critical value. Critical Value: Any value separating critical region from values of test statistic that do not lead to rejection of null hypothesis. P-value method of testing hypotheses: Uses decision criterion of rejecting null hypothesis only if Pvalue … a (where a = significance level). P-value: Probability of getting value of test statistic at least as extreme as the one found from sample data, assuming that null hypothesis is true. Left-Tailed Test: P-value = area to left of test statistic. Right-Tailed Test: P-value = area to right of test statistic. Two-Tailed Test: P-value = twice the area in tail beyond test statistic. where P1B ƒ A2 is P1B2 assuming that event A has already occurred. Cn One Mean: Requires simple random sample and either n 7 30 or normally distributed population. Test statistic z= x - mx q s 1n x - mx q s 1n 1for s known2 Counting Fundamental Counting Rule: If an event can occur m ways and a second event can occur n ways, together they can occur m # n ways. Factorial Rule: n different items can be arranged n! different ways. Permutations (order counts) of r items selected from n different items: nPr Poisson Distribution: Discrete probability distribution that applies to occurrences of some event over a specified interval. P1x2 = mx # e-m x! where e L 2.71828 C n-1 or n1 g x22 - 1 g x22 Mean: x - E 6 m 6 x + E E = za>2 # E = ta>2 # s 1s known2 1n s 1s not known2 1n 1n - 12s2 x2 R 1n - 12s2 C n1n - 12 A n1 + n2 St. dev. from frequency dist.: s= n C g 1f # x22 D - C g 1f # x2 D 2 E 2 t= 1for s not known2 n1n - 12 Sample variance: s Population st. dev.: = n! 1n - r2! ISBN-13: 978-0-321-57080-2 ISBN-10: 0-321-57080-4 Standard Deviation: Alternative case when S1 and S2 are not known, but it is assumed that S1 = S2: Pool variances and use test statistic t= 1x1 - x22 - 1m1 - m22 Explained Variation: r 2 is the proportion of the variation in y that is explained by the linear association between x and y. Hypothesis test 1. Using r as test statistic: If ƒ r ƒ 7 critical value (from table), then there is sufficient evidence to support a claim of linear correlation. If ƒ r ƒ … critical value, there is not sufficient evidence to support a claim of linear correlation. 2. Using t as test statistic: r t= with df = n - 2 1 - r2 An - 2 90000 C 6s6 C x2 L where df = n - 1 One Standard Dev. or Variance: Requires simple random sample and normally distributed population. Test statistic: x = 2 s= g 1 x - m2 2 C 2 N Permutations when some items are identical to others: n! n1!n2! Á nk! Combinations (order doesn’t count) of r items selected from n different items: n! nCr = 1n - r2!r! Procedure 1. Identify original claim, then state null hypothesis (with equality) and alternative hypothesis (without equality). 2. Select significance level a. 3. Evaluate test statistic. 4. Proceed with traditional method or P-value method: An + n 1 2 where s2 = p 1n1 - 12s2 + 1n2 - 12s2 1 2 1n1 - 12 + 1n2 - 12 s2 p s2 p 1n - 12s2 s2 Population variance: s Distribution: Explore using frequency distribution, histogram, dotplot, stemplot, boxplot. Outlier: Value far away from almost all other values. Time: Consider effects of changes in data over time. (Use time-series graphs, control charts.) 9 780321 570802 where df = n - 1 and df = n1 + n2 - 2 1 2 3 QXP6326677.qxp 12/10/08 3:57 PM Page 1 Triola Statistics Series Review Basic Terms Statistics: Methods for planning experiments, obtaining data, organizing, summarizing, analyzing, interpreting, and drawing conclusions based on data. Population: Collection of all elements to be studied. Census: Data from every member of a population. Sample: Subcollection of members from a population. Parameter: Numerical measurement of characteristic of a population. Statistic: Numerical measurement of characteristic of a sample. Random Sample: Every member of population has same chance of being selected. Simple Random Sample: Every sample of same size n has the same chance of being selected. Normal Distribution Continuous random variable having bell-shaped and symmetric graph and defined by specific equation. Standard Normal Distribution: Normal distribution with m = 0 and s = 1. x-m Standard z score: z = s Central Limit Theorem: As sample size increases, sample means x approach normal distribution; s m x = m and s x = q q 1n x - mx q so that z = s 1n Normal Approximation to Binomial: Requires np Ú 5 and nq Ú 5. Use m = np and s = 1npq. Confidence Intervals (Using Two Samples) Two Proportions: N N N N 1 p1 - p22 - E 6 1 p1 - p22 6 1 p1 - p22 + E where E = za>2 NN p1q1 NN p2q2 n2 Common Critical z Values CONFIDENCE INTERVAL Confidence Level 0.90 0.95 0.99 Critical Value 1.645 1.96 2.575 Choosing Between z and t for Inferences about Mean • • • s known and normally distributed population: use z s known and n 7 30: use z s unknown and normally distributed population: use t Hypothesis Testing (Two Proportions or Two Independent Means) Two Proportions: Requires two independent simple random samples and np Ú 5 and nq Ú 5 for each. Test statistic: z= N N 1 p1 - p22 - 1 p1 - p22 pq pq n2 Matched Pairs Requires simple random samples of matched pairs and either the number of matched pairs is n 7 30 or the pairs have differences from a population with a distribution that is approximately normal. d: Individual difference between values in a single matched pair md: Population mean difference for all matched pairs d: Mean of all sample differences d sd: Standard deviation of all sample differences d n: Number of pairs of data d - md Test statistic: t = where df = n - 1 sd 1n Probability Rare Event Rule: If, under a given assumption, the probability of an event is extremely small, conclude that the assumption is probably not correct. Relative Frequency: number of times A occurred P1A2 = number of trials Classical Approach: s P 1 A2 = (equally likely outcomes) n Probability property: 0 … P 1A2 … 1 Complement of Event A: P1A2 = 1 - P 1A2 Addition Rule: Disjoint Events: Cannot occur together. If A, B are disjoint: P1A or B2 = P1A2 + P1B2 If A, B are not disjoint: P1A or B2 = P1A2 + P1B2 - P1A and B2 Multiplication Rule: Independent Events: No event affects probability of other event. If A, B are independent: P1A and B2 = P1A2 # P1B2 If A, B are dependent: P1A and B2 = P1A2 # P1B ƒ A2 Random Variables Random Variable: Variable that has a single numerical value, determined by chance, for each outcome. Probability Distribution: Graph, table, or formula that gives the probability for each value of the random variable. Requirements of random variable: 1. g P1x2 = 1 2. 0 … P1x2 … 1 Parameters of random variable: m = g x # P1x2 2 C n1 + Two Means (Independent): 1 x1 - x22 - E 6 1m1 - u22 6 1 x1 - x22 + E where E = ta>2 s2 1 HYPOTHESIS TEST: RIGHT-TAILED Significance Level A 0.05 0.025 0.01 0.005 Critical Value 1.645 1.96 2.33 2.575 • s unknown and n 7 30: use t If none of the above apply, use nonparametric method or bootstrapping. C n1 + s2 2 n2 C n1 where p = x1 + x2 n1 + n2 + df = smaller of n1 - 1 and n2 - 1. Alternative Cases for Two Independent Means: s2 s2 1 2 Known s1 and s2: E = za>2 + C n1 n2 If s1, s2 unknown but assumed equal, use pooled variance s2 : p 2 2 sp sp E = ta>2 C n1 + n2 where s2 = p 1n1 - 12s2 + 1n2 - 12s2 1 2 1n1 - 12 + 1n2 - 12 Wording of Conclusion Does original claim include equality? Yes Fail to reject H0 “There is not sufficient evidence to warrant rejection of the claim that … [original claim].” No “There is not sufficient evidence to support the claim that … [original claim].” and q = 1 - p N and p1 = x1 x2 N and p2 = n1 n2 HYPOTHESIS TEST: LEFT-TAILED Significance Level A 0.05 0.025 0.01 0.005 Critical Value 1.645 1.96 2.33 2.575 s = 2 g 3x # P1x24 - m 2 Expected value: E = g x # P1x2 Binomial Distribution: Requires fixed number of independent trials with all outcomes in two categories, and constant probability. n: x: p: q: P(x): Fixed number of trials Number of successes in n trials Probability of success in one trial Probability of failure in one trial Probability of x successes in n trials P1x2 = m = np s = 1npq n! # px # qn-x 1n - x2!x! Mean (binomial) St. dev. (binomial) Proportion: n= n= Measures of Center: Population mean: m gx Sample mean: x = n Mean from frequency dist.: g 1 f # x2 x= n Median: Middle value of data arranged in order. Mode: Most frequent data value. highest + lowest Midrange: 2 Measures of Variation: Range: maximum - minimum Sample standard deviation: s= g 1 x - x2 2 3za>242 # 0.25 E 2 Reject H0 Describing, Exploring, and Comparing Data Determining Sample Size NN 3za>242 pq E 2 N N 1 p and q known2 Mean: n = c za>2s E d 2 Matched Pairs: d - E 6 md 6 d + E sd where E = ta>2 and df = n - 1 1n HYPOTHESIS TEST: TWO-TAILED Significance Level A 0.05 0.01 0.10 Critical Value ±1.96 ±2.575 ±1.645 “There is sufficient evidence to warrant rejection of the claim that … [original claim].” “There is sufficient evidence to support the claim that … [original claim].” Two Means (independent samples): Requires two independent simple random samples with both populations normally distributed or n1 7 30 and n2 7 30. The population standard deviations s1 and s2 are usually unknown. Recommendation: do not assume that s1 = s2. Test statistic (unknown s1 and s2, and not assuming s1 = s2): 1 x1 - x22 - 1m1 - m22 t= s2 s2 1 2 + C n1 n2 df = smaller of n1 - 1 and n2 - 1 Hypothesis Testing (Two Variances or Two Standard Deviations) Requires independent simple random samples from populations with normal distributions. s2: 1 n1: s2: 1 larger of the two sample variances size of the sample with the larger variance variance of the population with the larger sample variance s2 1 s2 2 where s2 is the larger of the 1 Test statistic: F = two sample variances and numerator df = n1 - 1 and denominator df = n2 - 1 Hypothesis T esting (One Sample) One Proportion: Requires simple random sample, np Ú 5 and nq Ú 5, and conditions for binomial distribution. N p-p x N Test statistic: z = where p = n pq Correlation Hypothesis Testing (Alternative Cases for Two Means with Independent Samples) Requires two independent simple random samples and either of these two conditions: Both populations normally distributed or n1 7 30 and n2 7 30. Alternative case when S1 and S2 are both known values: z= 1x1 - x22 - 1m1 - m22 s2 1 s2 2 Scatterplot: Graph of paired (x, y) sample data. Linear Correlation Coefficient r : Measures strength of linear association between the two variables. Property of r : - 1 … r … 1 Correlation Requirements: Bivariate normal distribution (for any fixed value of x, the values of y are normally distributed, and for any fixed value of y, the values of x are normally distributed). Linear Correlation Coefficient: r= n g xy - 1 g x21 g y2 2n1 g x22 - 1 g x22 2n1 g y22 - 1 g y22 or r = g 1zxzy2 n-1 Hypothesis Testing Confidence Intervals (Using One Sample) N N Proportion: p - E 6 p 6 p + E where E = za>2 Bn NN pq N and p = x . n Hypothesis Test: Procedure for testing claim about a population characteristic. Null Hypothesis H0: Statement that value of population parameter is equal to some claimed value. Alternative Hypothesis H1: Statement that population parameter has a value that somehow differs from value in the null hypothesis. Critical Region: All values of test statistic leading to rejection of null hypothesis. A Significance Level: Probability that test statistic falls in critical region, assuming null hypothesis is true. Type I Error: Rejecting null hypothesis when it is true. Probability of type I error is significance level r . Type II Error: Failing to reject null hypothesis when it is false. Probability of type II error is denoted by b . Power of test: Probability of rejecting a false null hypothesis. Traditional method of testing hypotheses: Uses decision criterion of rejecting null hypothesis only if test statistic falls within critical region bounded by critical value. Critical Value: Any value separating critical region from values of test statistic that do not lead to rejection of null hypothesis. P-value method of testing hypotheses: Uses decision criterion of rejecting null hypothesis only if Pvalue … a (where a = significance level). P-value: Probability of getting value of test statistic at least as extreme as the one found from sample data, assuming that null hypothesis is true. Left-Tailed Test: P-value = area to left of test statistic. Right-Tailed Test: P-value = area to right of test statistic. Two-Tailed Test: P-value = twice the area in tail beyond test statistic. where P1B ƒ A2 is P1B2 assuming that event A has already occurred. Cn One Mean: Requires simple random sample and either n 7 30 or normally distributed population. Test statistic z= x - mx q s 1n x - mx q s 1n 1for s known2 Counting Fundamental Counting Rule: If an event can occur m ways and a second event can occur n ways, together they can occur m # n ways. Factorial Rule: n different items can be arranged n! different ways. Permutations (order counts) of r items selected from n different items: nPr Poisson Distribution: Discrete probability distribution that applies to occurrences of some event over a specified interval. P1x2 = mx # e-m x! where e L 2.71828 C n-1 or n1 g x22 - 1 g x22 Mean: x - E 6 m 6 x + E E = za>2 # E = ta>2 # s 1s known2 1n s 1s not known2 1n 1n - 12s2 x2 R 1n - 12s2 C n1n - 12 A n1 + n2 St. dev. from frequency dist.: s= n C g 1f # x22 D - C g 1f # x2 D 2 E 2 t= 1for s not known2 n1n - 12 Sample variance: s Population st. dev.: = n! 1n - r2! ISBN-13: 978-0-321-57080-2 ISBN-10: 0-321-57080-4 Standard Deviation: Alternative case when S1 and S2 are not known, but it is assumed that S1 = S2: Pool variances and use test statistic t= 1x1 - x22 - 1m1 - m22 Explained Variation: r 2 is the proportion of the variation in y that is explained by the linear association between x and y. Hypothesis test 1. Using r as test statistic: If ƒ r ƒ 7 critical value (from table), then there is sufficient evidence to support a claim of linear correlation. If ƒ r ƒ … critical value, there is not sufficient evidence to support a claim of linear correlation. 2. Using t as test statistic: r t= with df = n - 2 1 - r2 An - 2 90000 C 6s6 C x2 L where df = n - 1 One Standard Dev. or Variance: Requires simple random sample and normally distributed population. Test statistic: x = 2 s= g 1 x - m2 2 C 2 N Permutations when some items are identical to others: n! n1!n2! Á nk! Combinations (order doesn’t count) of r items selected from n different items: n! nCr = 1n - r2!r! Procedure 1. Identify original claim, then state null hypothesis (with equality) and alternative hypothesis (without equality). 2. Select significance level a. 3. Evaluate test statistic. 4. Proceed with traditional method or P-value method: An + n 1 2 where s2 = p 1n1 - 12s2 + 1n2 - 12s2 1 2 1n1 - 12 + 1n2 - 12 s2 p s2 p 1n - 12s2 s2 Population variance: s Distribution: Explore using frequency distribution, histogram, dotplot, stemplot, boxplot. Outlier: Value far away from almost all other values. Time: Consider effects of changes in data over time. (Use time-series graphs, control charts.) 9 780321 570802 where df = n - 1 and df = n1 + n2 - 2 1 2 3 QXP6326677.qxp 12/10/08 3:57 PM Page 2 Triola Statistics Series Review Regression N Regression Equation: y = b0 + b1x x: Multiple Regression Procedure: Obtain results using computer or calculator. Multiple Regression Equation: N y = b0 + b1x1 + b2x2 + Á + bk xk n: Sample size k: Number of independent (x) variables Important factors to consider: P-value (from computer display) and adjusted R2, where adj. R2 = 1 1n - 12 3n - 1k + 124 11 - R22 McNemar’s Test Use for a 2 * 2 frequency table from matched pairs. Requirement: b + c Ú 10, where b and c are frequencies from discordant pairs. Test is right-tailed with test statistic x2 = 1 ƒ b - c ƒ 2 - 122 b+c Nonparametric Methods Nonparametric (distribution-free) tests: Do not require assumptions about the population distributions. Rank: Number assigned to a sample value according to its order in the sorted list. Lowest value has rank 1, 2nd lowest has rank 2, and so on. Sign Test: Uses plus/minus signs instead of original data values. Used to test claims involving matched pairs, nominal data, or claims about median. n: Total number of (nonzero) signs x: Number of the less frequent sign Test statistic if n … 25: x Test statistic if n 7 25: 1n> 2 Reject H0 if test statistic … critical values. Wilcoxon Signed-Ranks Test: Uses ranks of data consisting of matched pairs; based on ranks of differences between pairs of values. Used to test null hypothesis that two populations have same distribution. T : Smaller of two rank sums (sum of absolute values of negative ranks; sum of positive ranks) z= 1x + 0.52 - 1n> 22 Kruskal-Wallis Test: Used to test null hypothesis that three or more independent samples are from populations with same distribution. Requires at least five observations in each independent sample of randomly selected values. N: R1: n1: Total number of observations Sum of ranks for Sample 1 Number of values in Sample 1 Statistical Process Control Process Data: Data arranged according to some time sequence. Run Chart: Sequential plot of individual data values over time Control Chart of a process characteristic: Sequential plot over time of the characteristic values, including a centerline, lower control limit (LCL), and upper control limit (UCL). R Chart (range chart): Control chart of sample ranges. UCL: D4R Centerline: LCL: R D3R b1 = r sy sx b0 : n1 g x22 - 1 g x22 y-intercept of regression line b0 = y - b1x or b1 = n g 1xy2 - 1 g x21 g y2 where df = 1 Test statistic: H= R2 R2 R2 12 1 2 k a + +Á+ b - 31N + 12 N1N + 12 n1 n2 nk Predicting value of y : If no linear correlation, best predicted y value is y; if there is a linear correlation, the best predicted y value is found by substituting x value into regression equation. Marginal Change in a variable: Amount a variable changes when the other variable changes by one unit. Slope b1 is the marginal change in y when x changes by one unit. Influential Point: Strongly affects graph of regression line. Residual: Difference between an observed sample y N value and the value y that is predicted using the regression equation. N Residual = y - y. Least-Squares Property: The sum of squares of the residuals is the smallest sum possible. One-Way ANOVA Procedure: Obtain P-value using computer or calculator. ANOVA (analysis of variance): Method of testing equality of three or more population means (by analyzing sample variances). Requirements: Populations have approximately normal distributions, populations have same variance s2, samples are simple random samples, and samples are independent. Decision criterion using significance level r : P-value … a: P-value 7 a: Reject null hypothesis of equal population means Fail to reject equality of population means. where test is right-tailed using x2 distribution with df = k - 1 and k number of samples Rank Correlation: Uses ranks to test for correlation from random paired data. rs: n: d: Rank correlation coefficient Number of pairs of sample data Difference between ranks for two values within a pair Goodness-of-Fit Goodness-of-Fit Test: Test the null hypothesis that an observed frequency distribution fits claimed distribution. O: Observed frequency E: Expected frequency k: Number of different categories n: Total number of trials Requirements: Random data of frequency counts, and E Ú 5 for each category. Test is right-tailed with test statistic: 1O - E22 x2 = a E where df = k - 1 x Chart: Control chart of sample means. UCL: x + A2 R Centerline: LCL: x x - A2 R Test statistic (if no ties in ranks): rs = 1 6gd2 n1n2 - 12 p Chart: Control chart to monitor the proportion p of some attribute. pq UCL: p + 3 An Centerline: LCL: p p-3 pq An Test statistic: variance within samples df numerator = k - 1 df denominator = N - k k N Number of population means being compared Total number of values in all samples combined F= variance between samples Test statistic (with ties among ranks): rs = n g xy - 1 g x21 g y2 2n1 g x22 - 1 g x22 2n1 g y22 - 1 g y22 ;z 1n - 1 where p = Test statistic if n … 30: T Test statistic if n 7 30: Tz= n1n + 12 4 Correlation/Regression: Variation and Prediction Intervals Total Deviation: y - y N Explained Deviation: y - y N Unexplained Deviation: y - y N N g (y - y) = g ( y - y) + g ( y - y)2 Coefficient of Determination: r2 = explained variation total variation 2 2 total number of defects total number of items Contingency Tables Contingency Table: Two-way frequency table with row variable and column variable. Requirements: Random data of frequency counts and E Ú 5 for each cell (where E is expected frequency). Test of Independence: Test null hypothesis of no association between row variable and column variable. Test of Homogeneity: Test that different populations have same proportions of some characteristic. Test (independence or homogeneity) is right-tailed with test statistic: 1O - E22 x2 = a E where df = 1r - 121c - 12 Expected frequency (of cell): E= 1row total21column total2 1grand total2 Critical values for n > 30: rs = n1n + 1212n + 12 A 24 Statistically Stable Process (or within statistical control): Process with only natural variation and no patterns, cycles, or unusual points Out-of-Control Criteria: Two-Way ANOVA Two-way ANOVA (analysis of variance) uses two factors: row factor and column factor. Requirements: Data in each cell are from a normally distributed population; populations have same variance; sample data are from simple random samples; samples are independent; data are categorized two ways. Balanced Design: All cells have the same number of sample values. Procedure: Use computer or calculator to obtain P-values, then use the P-values to test for 1. Interection effect between the two variables. (Stop here if there appears to be an interaction effect.) 2. Effect from row variable 3. Effect from column variable Standard Error of Estimate: se = N g 1y - y22 Wilcoxon Rank-Sum Test: Uses ranks from two independent samples. Used to test null hypothesis that two independent samples are from populations with same distribution. Requires two independent random samples, each with more than 10 values. Size of Sample 1 n2: Size of Sample 2 R - mR Test statistic: z = sR n11n1 + n2 + 12 where mR = 2 and sR = R: n1: Sum of ranks for Sample 1 C n-2 Prediction Interval for an Individual y: N N y - E 6 y 6 y + E where x0 is given, 1 + A n n1 g x22 - 1 g x22 and ta>2 has df = n - 2. E = ta>2se 1+ n1x0 - x22 Runs Test for Randomness: Used to determine whether sequence of sample data is in random order. Run: Sequence of data having same characteristic. n1: Total number of sample elements having one common characteristic n2: Total number of sample elements having the other characteristic G: Total number of runs If a = 0.05, n1 … 20, and n2 … 20, test statistic is G. Otherwise, G - mG Test statistic: z = sG 2n1n2 where mG = + 1 and n1 + n2 sG = A 1n1 + n2221n1 + n2 - 12 12n1n2212n1n2 - n1 - n22 1. There is a pattern, trend, or cycle that is not random. 2. There is a point above the upper control limit or below the lower control limit. 3. There are eight consecutive points all above or all below the centerline. CONTROL CHART CONSTANTS n A2 D3 2 3 4 5 6 7 1.880 1.023 0.729 0.577 0.483 0.419 0.000 0.000 0.000 0.000 0.000 0.076 D4 3.267 2.574 2.282 2.114 2.004 1.924 Reprinted with permission of the publisher. A n1n21n1 + n2 + 12 12 4 5 6 Positive z Scores b1: Negative z Scores N y: Independent (predictor, explanatory) variable Dependent (response) variable Slope of regression line QXP6326677.qxp 12/10/08 3:57 PM Page 2 Triola Statistics Series Review Regression N Regression Equation: y = b0 + b1x x: Multiple Regression Procedure: Obtain results using computer or calculator. Multiple Regression Equation: N y = b0 + b1x1 + b2x2 + Á + bk xk n: Sample size k: Number of independent (x) variables Important factors to consider: P-value (from computer display) and adjusted R2, where adj. R2 = 1 1n - 12 3n - 1k + 124 11 - R22 McNemar’s Test Use for a 2 * 2 frequency table from matched pairs. Requirement: b + c Ú 10, where b and c are frequencies from discordant pairs. Test is right-tailed with test statistic x2 = 1 ƒ b - c ƒ 2 - 122 b+c Nonparametric Methods Nonparametric (distribution-free) tests: Do not require assumptions about the population distributions. Rank: Number assigned to a sample value according to its order in the sorted list. Lowest value has rank 1, 2nd lowest has rank 2, and so on. Sign Test: Uses plus/minus signs instead of original data values. Used to test claims involving matched pairs, nominal data, or claims about median. n: Total number of (nonzero) signs x: Number of the less frequent sign Test statistic if n … 25: x Test statistic if n 7 25: 1n> 2 Reject H0 if test statistic … critical values. Wilcoxon Signed-Ranks Test: Uses ranks of data consisting of matched pairs; based on ranks of differences between pairs of values. Used to test null hypothesis that two populations have same distribution. T : Smaller of two rank sums (sum of absolute values of negative ranks; sum of positive ranks) z= 1x + 0.52 - 1n> 22 Kruskal-Wallis Test: Used to test null hypothesis that three or more independent samples are from populations with same distribution. Requires at least five observations in each independent sample of randomly selected values. N: R1: n1: Total number of observations Sum of ranks for Sample 1 Number of values in Sample 1 Statistical Process Control Process Data: Data arranged according to some time sequence. Run Chart: Sequential plot of individual data values over time Control Chart of a process characteristic: Sequential plot over time of the characteristic values, including a centerline, lower control limit (LCL), and upper control limit (UCL). R Chart (range chart): Control chart of sample ranges. UCL: D4R Centerline: LCL: R D3R b1 = r sy sx b0 : n1 g x22 - 1 g x22 y-intercept of regression line b0 = y - b1x or b1 = n g 1xy2 - 1 g x21 g y2 where df = 1 Test statistic: H= R2 R2 R2 12 1 2 k a + +Á+ b - 31N + 12 N1N + 12 n1 n2 nk Predicting value of y : If no linear correlation, best predicted y value is y; if there is a linear correlation, the best predicted y value is found by substituting x value into regression equation. Marginal Change in a variable: Amount a variable changes when the other variable changes by one unit. Slope b1 is the marginal change in y when x changes by one unit. Influential Point: Strongly affects graph of regression line. Residual: Difference between an observed sample y N value and the value y that is predicted using the regression equation. N Residual = y - y. Least-Squares Property: The sum of squares of the residuals is the smallest sum possible. One-Way ANOVA Procedure: Obtain P-value using computer or calculator. ANOVA (analysis of variance): Method of testing equality of three or more population means (by analyzing sample variances). Requirements: Populations have approximately normal distributions, populations have same variance s2, samples are simple random samples, and samples are independent. Decision criterion using significance level r : P-value … a: P-value 7 a: Reject null hypothesis of equal population means Fail to reject equality of population means. where test is right-tailed using x2 distribution with df = k - 1 and k number of samples Rank Correlation: Uses ranks to test for correlation from random paired data. rs: n: d: Rank correlation coefficient Number of pairs of sample data Difference between ranks for two values within a pair Goodness-of-Fit Goodness-of-Fit Test: Test the null hypothesis that an observed frequency distribution fits claimed distribution. O: Observed frequency E: Expected frequency k: Number of different categories n: Total number of trials Requirements: Random data of frequency counts, and E Ú 5 for each category. Test is right-tailed with test statistic: 1O - E22 x2 = a E where df = k - 1 x Chart: Control chart of sample means. UCL: x + A2 R Centerline: LCL: x x - A2 R Test statistic (if no ties in ranks): rs = 1 6gd2 n1n2 - 12 p Chart: Control chart to monitor the proportion p of some attribute. pq UCL: p + 3 An Centerline: LCL: p p-3 pq An Test statistic: variance within samples df numerator = k - 1 df denominator = N - k k N Number of population means being compared Total number of values in all samples combined F= variance between samples Test statistic (with ties among ranks): rs = n g xy - 1 g x21 g y2 2n1 g x22 - 1 g x22 2n1 g y22 - 1 g y22 ;z 1n - 1 where p = Test statistic if n … 30: T Test statistic if n 7 30: Tz= n1n + 12 4 Correlation/Regression: Variation and Prediction Intervals Total Deviation: y - y N Explained Deviation: y - y N Unexplained Deviation: y - y N N g (y - y) = g ( y - y) + g ( y - y)2 Coefficient of Determination: r2 = explained variation total variation 2 2 total number of defects total number of items Contingency Tables Contingency Table: Two-way frequency table with row variable and column variable. Requirements: Random data of frequency counts and E Ú 5 for each cell (where E is expected frequency). Test of Independence: Test null hypothesis of no association between row variable and column variable. Test of Homogeneity: Test that different populations have same proportions of some characteristic. Test (independence or homogeneity) is right-tailed with test statistic: 1O - E22 x2 = a E where df = 1r - 121c - 12 Expected frequency (of cell): E= 1row total21column total2 1grand total2 Critical values for n > 30: rs = n1n + 1212n + 12 A 24 Statistically Stable Process (or within statistical control): Process with only natural variation and no patterns, cycles, or unusual points Out-of-Control Criteria: Two-Way ANOVA Two-way ANOVA (analysis of variance) uses two factors: row factor and column factor. Requirements: Data in each cell are from a normally distributed population; populations have same variance; sample data are from simple random samples; samples are independent; data are categorized two ways. Balanced Design: All cells have the same number of sample values. Procedure: Use computer or calculator to obtain P-values, then use the P-values to test for 1. Interection effect between the two variables. (Stop here if there appears to be an interaction effect.) 2. Effect from row variable 3. Effect from column variable Standard Error of Estimate: se = N g 1y - y22 Wilcoxon Rank-Sum Test: Uses ranks from two independent samples. Used to test null hypothesis that two independent samples are from populations with same distribution. Requires two independent random samples, each with more than 10 values. Size of Sample 1 n2: Size of Sample 2 R - mR Test statistic: z = sR n11n1 + n2 + 12 where mR = 2 and sR = R: n1: Sum of ranks for Sample 1 C n-2 Prediction Interval for an Individual y: N N y - E 6 y 6 y + E where x0 is given, 1 + A n n1 g x22 - 1 g x22 and ta>2 has df = n - 2. E = ta>2se 1+ n1x0 - x22 Runs Test for Randomness: Used to determine whether sequence of sample data is in random order. Run: Sequence of data having same characteristic. n1: Total number of sample elements having one common characteristic n2: Total number of sample elements having the other characteristic G: Total number of runs If a = 0.05, n1 … 20, and n2 … 20, test statistic is G. Otherwise, G - mG Test statistic: z = sG 2n1n2 where mG = + 1 and n1 + n2 sG = A 1n1 + n2221n1 + n2 - 12 12n1n2212n1n2 - n1 - n22 1. There is a pattern, trend, or cycle that is not random. 2. There is a point above the upper control limit or below the lower control limit. 3. There are eight consecutive points all above or all below the centerline. CONTROL CHART CONSTANTS n A2 D3 2 3 4 5 6 7 1.880 1.023 0.729 0.577 0.483 0.419 0.000 0.000 0.000 0.000 0.000 0.076 D4 3.267 2.574 2.282 2.114 2.004 1.924 Reprinted with permission of the publisher. A n1n21n1 + n2 + 12 12 4 5 6 Positive z Scores b1: Negative z Scores N y: Independent (predictor, explanatory) variable Dependent (response) variable Slope of regression line QXP6326677.qxp 12/10/08 3:57 PM Page 2 Triola Statistics Series Review Regression N Regression Equation: y = b0 + b1x x: Multiple Regression Procedure: Obtain results using computer or calculator. Multiple Regression Equation: N y = b0 + b1x1 + b2x2 + Á + bk xk n: Sample size k: Number of independent (x) variables Important factors to consider: P-value (from computer display) and adjusted R2, where adj. R2 = 1 1n - 12 3n - 1k + 124 11 - R22 McNemar’s Test Use for a 2 * 2 frequency table from matched pairs. Requirement: b + c Ú 10, where b and c are frequencies from discordant pairs. Test is right-tailed with test statistic x2 = 1 ƒ b - c ƒ 2 - 122 b+c Nonparametric Methods Nonparametric (distribution-free) tests: Do not require assumptions about the population distributions. Rank: Number assigned to a sample value according to its order in the sorted list. Lowest value has rank 1, 2nd lowest has rank 2, and so on. Sign Test: Uses plus/minus signs instead of original data values. Used to test claims involving matched pairs, nominal data, or claims about median. n: Total number of (nonzero) signs x: Number of the less frequent sign Test statistic if n … 25: x Test statistic if n 7 25: 1n> 2 Reject H0 if test statistic … critical values. Wilcoxon Signed-Ranks Test: Uses ranks of data consisting of matched pairs; based on ranks of differences between pairs of values. Used to test null hypothesis that two populations have same distribution. T : Smaller of two rank sums (sum of absolute values of negative ranks; sum of positive ranks) z= 1x + 0.52 - 1n> 22 Kruskal-Wallis Test: Used to test null hypothesis that three or more independent samples are from populations with same distribution. Requires at least five observations in each independent sample of randomly selected values. N: R1: n1: Total number of observations Sum of ranks for Sample 1 Number of values in Sample 1 Statistical Process Control Process Data: Data arranged according to some time sequence. Run Chart: Sequential plot of individual data values over time Control Chart of a process characteristic: Sequential plot over time of the characteristic values, including a centerline, lower control limit (LCL), and upper control limit (UCL). R Chart (range chart): Control chart of sample ranges. UCL: D4R Centerline: LCL: R D3R b1 = r sy sx b0 : n1 g x22 - 1 g x22 y-intercept of regression line b0 = y - b1x or b1 = n g 1xy2 - 1 g x21 g y2 where df = 1 Test statistic: H= R2 R2 R2 12 1 2 k a + +Á+ b - 31N + 12 N1N + 12 n1 n2 nk Predicting value of y : If no linear correlation, best predicted y value is y; if there is a linear correlation, the best predicted y value is found by substituting x value into regression equation. Marginal Change in a variable: Amount a variable changes when the other variable changes by one unit. Slope b1 is the marginal change in y when x changes by one unit. Influential Point: Strongly affects graph of regression line. Residual: Difference between an observed sample y N value and the value y that is predicted using the regression equation. N Residual = y - y. Least-Squares Property: The sum of squares of the residuals is the smallest sum possible. One-Way ANOVA Procedure: Obtain P-value using computer or calculator. ANOVA (analysis of variance): Method of testing equality of three or more population means (by analyzing sample variances). Requirements: Populations have approximately normal distributions, populations have same variance s2, samples are simple random samples, and samples are independent. Decision criterion using significance level r : P-value … a: P-value 7 a: Reject null hypothesis of equal population means Fail to reject equality of population means. where test is right-tailed using x2 distribution with df = k - 1 and k number of samples Rank Correlation: Uses ranks to test for correlation from random paired data. rs: n: d: Rank correlation coefficient Number of pairs of sample data Difference between ranks for two values within a pair Goodness-of-Fit Goodness-of-Fit Test: Test the null hypothesis that an observed frequency distribution fits claimed distribution. O: Observed frequency E: Expected frequency k: Number of different categories n: Total number of trials Requirements: Random data of frequency counts, and E Ú 5 for each category. Test is right-tailed with test statistic: 1O - E22 x2 = a E where df = k - 1 x Chart: Control chart of sample means. UCL: x + A2 R Centerline: LCL: x x - A2 R Test statistic (if no ties in ranks): rs = 1 6gd2 n1n2 - 12 p Chart: Control chart to monitor the proportion p of some attribute. pq UCL: p + 3 An Centerline: LCL: p p-3 pq An Test statistic: variance within samples df numerator = k - 1 df denominator = N - k k N Number of population means being compared Total number of values in all samples combined F= variance between samples Test statistic (with ties among ranks): rs = n g xy - 1 g x21 g y2 2n1 g x22 - 1 g x22 2n1 g y22 - 1 g y22 ;z 1n - 1 where p = Test statistic if n … 30: T Test statistic if n 7 30: Tz= n1n + 12 4 Correlation/Regression: Variation and Prediction Intervals Total Deviation: y - y N Explained Deviation: y - y N Unexplained Deviation: y - y N N g (y - y) = g ( y - y) + g ( y - y)2 Coefficient of Determination: r2 = explained variation total variation 2 2 total number of defects total number of items Contingency Tables Contingency Table: Two-way frequency table with row variable and column variable. Requirements: Random data of frequency counts and E Ú 5 for each cell (where E is expected frequency). Test of Independence: Test null hypothesis of no association between row variable and column variable. Test of Homogeneity: Test that different populations have same proportions of some characteristic. Test (independence or homogeneity) is right-tailed with test statistic: 1O - E22 x2 = a E where df = 1r - 121c - 12 Expected frequency (of cell): E= 1row total21column total2 1grand total2 Critical values for n > 30: rs = n1n + 1212n + 12 A 24 Statistically Stable Process (or within statistical control): Process with only natural variation and no patterns, cycles, or unusual points Out-of-Control Criteria: Two-Way ANOVA Two-way ANOVA (analysis of variance) uses two factors: row factor and column factor. Requirements: Data in each cell are from a normally distributed population; populations have same variance; sample data are from simple random samples; samples are independent; data are categorized two ways. Balanced Design: All cells have the same number of sample values. Procedure: Use computer or calculator to obtain P-values, then use the P-values to test for 1. Interection effect between the two variables. (Stop here if there appears to be an interaction effect.) 2. Effect from row variable 3. Effect from column variable Standard Error of Estimate: se = N g 1y - y22 Wilcoxon Rank-Sum Test: Uses ranks from two independent samples. Used to test null hypothesis that two independent samples are from populations with same distribution. Requires two independent random samples, each with more than 10 values. Size of Sample 1 n2: Size of Sample 2 R - mR Test statistic: z = sR n11n1 + n2 + 12 where mR = 2 and sR = R: n1: Sum of ranks for Sample 1 C n-2 Prediction Interval for an Individual y: N N y - E 6 y 6 y + E where x0 is given, 1 + A n n1 g x22 - 1 g x22 and ta>2 has df = n - 2. E = ta>2se 1+ n1x0 - x22 Runs Test for Randomness: Used to determine whether sequence of sample data is in random order. Run: Sequence of data having same characteristic. n1: Total number of sample elements having one common characteristic n2: Total number of sample elements having the other characteristic G: Total number of runs If a = 0.05, n1 … 20, and n2 … 20, test statistic is G. Otherwise, G - mG Test statistic: z = sG 2n1n2 where mG = + 1 and n1 + n2 sG = A 1n1 + n2221n1 + n2 - 12 12n1n2212n1n2 - n1 - n22 1. There is a pattern, trend, or cycle that is not random. 2. There is a point above the upper control limit or below the lower control limit. 3. There are eight consecutive points all above or all below the centerline. CONTROL CHART CONSTANTS n A2 D3 2 3 4 5 6 7 1.880 1.023 0.729 0.577 0.483 0.419 0.000 0.000 0.000 0.000 0.000 0.076 D4 3.267 2.574 2.282 2.114 2.004 1.924 Reprinted with permission of the publisher. A n1n21n1 + n2 + 12 12 4 5 6 Positive z Scores b1: Negative z Scores N y: Independent (predictor, explanatory) variable Dependent (response) variable Slope of regression line ...
View Full Document

This note was uploaded on 06/08/2010 for the course MATH 1123 taught by Professor Serpa during the Spring '10 term at Hawaii Pacific.

Ask a homework question - tutors are online