Ch15_Nonparametric

# Ch15_Nonparametric - Chapter 15 Chapter Nonparametric...

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Chapter 15 Chapter Nonparametric Statistical Tests Parametric statistics Parametric Estimation and testing are based on Estimation population parameters. Parametric stats are what you have Parametric learned about. Estimation: mean, variance, standard Estimation: deviation. (estimates of parameters of the parameters distribution) distribution) Tests: t tests, F tests, etc. (assume Tests: normal distribution) normal Parametric tests Parametric Parametric tests are developed with Parametric certain assumptions. 1. A common underlying assumption is 1. common random sampling random 2. Furthermore, sampling is assumed to 2. Furthermore, draw from normally distributed population population 3. Sampling variances do not differ Sampling significantly. significantly Nonparametric tests Nonparametric Practically speaking, nonparametric Practically statistics are what we use when we have unusual data that has Nonnormal distribution Very small sample size Problem with measurement, for example, Problem scale measure is ordinal instead of interval or ratio. or Types of tests Types Parametric tests: use sample statistics to use make inferences about population parameters parameters Nonparametric tests: hypotheses do not hypotheses state relationships about population parameters parameters Nonparametric tests Nonparametric Make fewer assumptions about population Make distribution than do parametric tests distribution Sometimes called distribution free tests Sometimes distribution Why not parametric? Why Data may be skewed Ex. Reaction time May have ordinal or nominal data Must be able to calculate mean to use Must parametric tests parametric Two types of non-parametric tests tests Contingency table (chi-square) Rank tests Between-Subject Design Between-Subject Scale of Scale Measure Measure Parametric NonParametric Ratio or Interval Ordinal Two Levels > Two Levels t-test or ANOVA MannWhitney U ANOVA Nominal KruskalWallis ANOVA by ranks Chi-Square Chi-Square Within-Subject Design Within-Subject Scale of Scale Measure Measure Parametric NonParametric Ratio or Interval Ordinal Two Levels > Two Levels t-test or ANOVA Wilcoxan Signed Signed ranks ranks ANOVA Friedman ANOVA ANOVA by ranks by Nominal A common non-parametric test common If you have qualitative data (which means If your variables are based on words or nominal scales) or if assumptions of the other parametric tests are violated, we turn to nonparametric statistic tests. turn χ 2 is one of the most frequently used nonparametric statistics. nonparametric Chi-square tests Chi-square The purpose of χ 2 is to evaluate if the The “pattern” created by categorical data is typical. typical. Two common applications of χ 2 test are: Test whether the data fit a particular Test distribution. distribution. Test a H0 about whether one variable is related to the other. related Chi-Square Test Chi-Square Analysis of frequency data Between-subjects design Between-subjects (not within-subject) (not Scores are nominal – frequency of Scores event occurring or not event Example 1: Do Data have certain pattern? certain We observe cigarette We smoking of a random sample of 863 men 40-50 years old. years Observed freq A0=None A1=one pack A2=Two pack A3=Three pack A4=Four or more Total 406 164 189 78 26 863 Relative freq A0 A1 A2 A3 A4 Total 0.43 0.17 0.24 0.10 0.06 1.00 We have the following We frequency table, and also frequency the relative frequency on relative cigarette smoking for men at that age group 10 years ago. ago. Question Question Are the “observed numbers” of men by Are “cigarette use category” (A1-A4) consistent with those “expected” based on the frequencies from ten years ago? the Calculate age group expecteds based on 863 total subjects based A0 A1 A2 A3 A4 Total Relative freq 0.43 0.17 0.24 0.10 0.06 1 Calculation 863* 0.43 863* 0.17 863* 0.24 863* 0.10 863* 0.06 863 Expected 371.09 146.71 207.12 86.3 51.78 863 Example 1 To see if the cigarette To use habits of American men in the 40-50 age group are the same now as they were 10 years ago, we can compare the difference between the obtained vs. expected frequency. Observed A0 A1 A2 A3 A4 406 164 189 78 26 863 Expected 371.09 146.71 207.12 86.3 51.78 863 Observed vs. Expected Frequency Observed O is Observed (or Obtained) frequency Observed E is Expected frequency, with Ej =N× Pj . Expected Compare Os to Es. Statistical question is whether the Statistical differences are likely or unlikely to be due to chance. to Chi-square test statistic Chi-square ∑ j (O j − E j ) Ej 2 where sum is over each category. Each squared difference is weighted by the Each inverse of expected frequency, Ej. Distribution under the null hypothesis of no difference hypothesis If Os and Es are all “big” (>5) the Chi-square If and are statistic has a χ 2 distribution with J-1 df (J=5 in the above example). in Test of smoking patterns Test Calculate χ2obs Calculate Determine df (J-1 where J is number of Determine categories) categories) Choose significance level (e.g., α=.05) Choose =.05) Look up χ 2crit Table C.7, p. 504. Look crit Table If χ2obs > χ 2crit then reject the hypothesis If that recent smoking pattern is the same as previous. previous. If χ2obs < χ 2crit then don’t reject the If Calculations Calculations A0 A1 A2 A3 A4 Total Observed 406 164 189 78 26 863 Expected 371.09 146.71 207.12 86.3 51.78 863 (O-E)2/E 3.28 2.04 1.59 0.80 12.84 20.54 •χ 2obs = 20.54, df = 5-1 = 4 •χ 2crit(4) =9.48 •Since χ 2obs > χ 2crit(4) reject H0. Therefore we conclude that smoking patterns have changed. Comparison involving two variables Comparison Adult female TV characters by hair color and career level Hair Color Blonde Dark Career Professional Career Level Level Nonprofessional 36 24 48 72 Raw data Raw B1 B2 total B1 B2 total A1 36 24 60 A1 A2 48 72 120 A2 total 84 96 180 total 84 96 180 84*60/180 84*120/180 96*60/180 96*120/180 60 120 Expected values under independence independence B1 B2 total A1 84*60/180 96*60/180 60 A2 84*120/180 96*120/180 120 total 84 96 180 B1 B2 total A1 28 32 60 A2 56 64 120 total 84 96 180 Chi-square table Chi-square Also called a rows-by-columns Also contingency table contingency Each data point fits into one cell Row and column totals are called Row marginals marginals We use marginals to calculate expected We marginals expected cell totals, and compare to observed cell observed totals totals Expected Values Expected Hair Color E = row marg.*col marg. total responses Blonde Dark Marginals Career Professional O = 36 Career Level Level E = 28 NonO = 24 professional E = 32 Marginals O = 48 E = 56 O = 72 E = 64 120 84 96 N = 180 60 Chi-square test Chi-square χ = ∑∑ 2 i =1 j =1 c r (Oij − Eij ) Eij 2 2 (36 − 28) (24 − 32) χ= + + 28 32 2 2 (48 − 56) (72 − 64) + = 6.43 56 64 2 2 Hypothesis testing Hypothesis H0: The row and column variable are The independent in the population. independent The job level of the character is independent The of the hair color of the character. of H1: The row and column variables are The related in the population. related The job level of the character is related to the The hair color of the character. hair Hypothesis testing Hypothesis Look up critical value at alpha of 0.05 or Look 0.01 (Table C.7, p. 504). 0.01 df = (r-1)(c-1) If χ2obs > χ2crit then reject H0, accept H1 If If χ2obs < χ2crit then fail to reject H0, do not If do accept H1 accept Example results Example χ2obs = 6.43 χ2crit = 3.84 (df = 1, α = 0.05) Reject H0, accept H1. There is a relationship between job level There and hair color of a TV character. and Example 2 Example Are firstborn Are children more creative than later-born children? children? Creativity Top Creativity Top test score 1/3 test 1/3 Birth Order Firstborn Laterborn 47 29 35 36 Middle 29 Middle 1/3 1/3 Bottom 24 Bottom 1/3 1/3 Hypotheses Hypotheses H0: distribution of creativity scores is the distribution the same for first and later born H1: distribution of creativity scores is distribution different for first and later born different Expected Scores Expected E = row marg.*col marg. total responses Birth Order Firstborn LaterRow Row born Marginal Marginal Creativity Creativity test score test O=47 E= Middle 1/3 O=29 E= Bottom 1/3 O=24 E= Column Column 100 Marginal Marginal Top 1/3 O=29 E= O=35 E= O=36 E= 100 76 64 60 200 Birth order example Birth χ2obs= 7.22 df=(3-1)*(2-1)=2 χ2crit= 5.99 Reject H0 that birth order and creativity are independent. independent. Rank tests Rank Continuous data. Rank them. Test is a function of the ranks. Rank tests Rank Spearman correlation (test that r=0) Mann-Whitney U Wilcoxon Rank tests: When to use Rank Continuous (interval or ratio) data Normality assumption doesn’t hold Equal variance assumption doesn’t hold Mann-Whitney U test Mann-Whitney Use with: Two-factor between-subjects design At least ordinal measurement (so the values At can be “ranked”) can This is the “non-parametric” version of the ttest for two independent groups. Schroeder staircase: Schroeder Can you see both directions? Two groups saw this figure and reported Two One group had no distractions The other group counted backwards by 3s The while viewing the figure while how long it took them to see the reverse. how Results Results Group Control 2 5 6 8 9 13 15 21 42 Experimental 4 10 11 12 14 17 85 98 ∞ This is why we can’t do a ttest! Mann-Whitney U analyses Mann-Whitney Scores Scores in order in 2 4 5 6 8 9 10 11 12 13 14 15 17 21 42 85 98 ∞ Group Group ID ID 1 2 1 1 1 12 2 2 1 2 1 2 1 1 2 2 2 Rank 1 2 3 4 5 67 8 9 10 11 12 13 14 15 16 17 18 # Times Times A1 before A2 A2 # Times Times A2 before A1 A1 9 8888 5 4 3 3 8 4 4 4 3 2 Mann-Whitney U analyses Mann-Whitney UA1 = 9+8+8+8+8+5+4+3+3 = 56 UA2 = 8+4+4+4+3+2 = 25 Or use formula UA2 = n1n2- UA1 UA2 = 9*9-56 = 81-56 = 25 Use smaller value in test Mann-Whitney U hypotheses Mann-Whitney H0: The population distribution of A1 scores is identical to the pop. dist. of A2 scores scores scores H1: The pop. dist. of A1 scores is not identical to the pop. dist. of A2 scores identical Mann-Whitney U hypothesis testing Mann-Whitney Red Alert! This test is different! If Uobs < Ucrit Reject H0, accept H1 If Uobs > Ucrit (p. 505-506, Tables C8.A and crit 8.B) 8.B) Fail to reject H0, do not accept H1 Ucrit=17, Uobs=25 Fail to reject H0, do not accept H1 Computational formula Computational UA1 = nA1nA2+ nA1(nA1+1) - Σ RA1 A1 A1 2 UA2 = n1n2- UA1 Where nA1=number of scores in group A1 nA2=number of scores in group A2 Σ RA1=sum of ranks assigned to scores in =sum group A1 group Example Example Will receiving an alcohol education Will program result in reduced estimated daily alcohol consumption? alcohol A1: 0.31, 0.53, 0.58, 0.14, 0.16, 0.52, 0.53, A1: 0.02 0.02 A2: 0.41, 0.63, 1.14, 0.21, 0.89, 0.55, 0.89, A2: 0.91, 0.08, 0.59 0.91, Summary Summary Non-parametric tests are appropriate Non-parametric when parametric (normal distribution) tests cannot or should not be used. cannot Contingency table (chi-square) tests are of Contingency the form ∑ (O-E)/E where E is the “expected” number under the null hypothesis. hypothesis. Rank tests are for continuous data when Rank and are functions of the ranks. and ...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online