11 Pages

Chapter 9

Course: STAT 301, Fall 2011
School: Purdue
Rating:
 
 
 
 
 

Word Count: 1673

Document Preview

2.5: Section Data Analysis for Two-Way Tables Section 9.1: Chi-square test for Two-Way Tables Learning goals for this chapter: Find the joint, marginal, and conditional distributions from a two-way table of the counts by hand and with SPSS. Determine from the wording of the story whether the question is asking for a joint, marginal, or conditional percentage/probability. Know when it two-way tables and the...

Register Now

Unformatted Document Excerpt

Coursehero >> Indiana >> Purdue >> STAT 301

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
2.5: Section Data Analysis for Two-Way Tables Section 9.1: Chi-square test for Two-Way Tables Learning goals for this chapter: Find the joint, marginal, and conditional distributions from a two-way table of the counts by hand and with SPSS. Determine from the wording of the story whether the question is asking for a joint, marginal, or conditional percentage/probability. Know when it two-way tables and the chi-square test are the correct statistical technique for a story. Perform a hypothesis test for a 2 test, including: stating the hypotheses, obtaining the test statistic and P-value from SPSS, and writing a conclusion in terms of the story. Check assumption to see if it is appropriate to use a 2 test using the footnote of the SPSS 2 test. Two-way tables and the chi-square test are used when you are studying the association between 2 categorical variables. The joint distribution of the 2 categorical variables is the cell # (the inner squares). total # All the joint distribution should add to 1. The marginal distribution allows us to study 1 variable at a time. You get them just by adding across a row or down a column for the specific variable you are interested in. The marginals are written in the margins of the table (far right and very bottom). The marginals for the row variable should add to 1. The marginals for the column variable should add to 1. Conditional distribution: If you know one variable for sure (you have reduced your world), what are the respective percentages for the other variable? Bar graphs are a good way to demonstrate conditional distributions. Hypothesis testing with 2-way tables H0: There is no association between the row and column variables in the population. Ha: There is an association between the row and column variables in the population. To test the null hypothesis, compare observed cell counts with expected cell counts calculated under the assumption that the null hypothesis is true. 1 Test statistic: Chi Square Test Statistic X 2 observed count - expected count 2 expected count row total x column total , n where n = total # of observations for the table. Expected count = The X2 test statistic has an approximately chi-square distribution. To use the chi-square table, you need the degrees of freedom, (r-1)(c-1). Go to Table F in the back of the book. WE WILL LET SPSS CALCULATE THE TEST STATISTIC AND P-VALUE FOR US. YOU DO NOT NEED TO KNOW HOW TO USE THE TABLE. P-value for chi-square test is: P ( 2 X 2) (Well be using SPSS to do the test.) The chi-square test becomes more accurate as the cell counts increase and for tables larger than 2x2. For tables larger than 2x2: use chi-square test whenever the average of the expected counts is 5 or more and the smallest expected count is 1 or more <20% of cells have expected counts of less than 5. For 2x2 tables: use chi-square test whenever all 4 expected cell counts to be 5 or more Example: Market researchers know that background music can influence the mood and purchasing behavior of customers. One study in a supermarket in Northern Ireland compared 3 treatments: no music, French accordion music, and Italian string music. Under each condition, the researchers recorded the number of bottles of French, Italian, and other wine purchased. Here is the 2-way table that summarizes the data in counts (total # of bottles sold = 243: Wine French Italian Other None 30 11 43 Music French 39 1 35 2 Italian 30 19 35 Calculate the joint distribution for music and wine: Music Wine French Italian Other None 12.3 4.5 17.7 French 16.0 0.4 14.4 Italian 12.3 7.8 14.4 Calculate the marginal distribution for music: Music Wine French Italian Other Marg. for music None 12.3 4.5 17.7 34.6 French 16.0 0.4 14.4 30.9 Italian 12.3 7.8 14.4 34.6 Calculate the marginal distribution for wine: Music Wine None French Italian Marg. for wine French 12.3 16.0 12.3 40.7 Italian 4.5 0.4 7.8 12.8 Other 17.7 14.4 14.4 46.5 Marg. for music 34.6 30.9 34.6 100 3 Questions (joint, marginal, conditional?): 1. What percent of all wine bought was Italian with French music playing in the store? 2. Of the Italian wine purchased, what percent was from a store playing French music? 3. What percent of wine bought was Italian? 4. What percent of the wine purchased from French music-playing stores was French? 5. What percent of wine was purchased from a store with no music playing? Using SPSS, set up the data so that you have a wine column, a music column, and a purchase column (where you will input the counts inside the chart). Wine French Italian Other French Italian Other French Italian Other Music None None None French French French Italian Italian Italian Purchase 30 11 43 39 1 35 30 19 35 Then go to Data --> Weight Cases. Click Weight cases by and then move purchase into the frequency variable box. Click OK. Do Analyze--> Descriptive Statistics --> Crosstabs. Make sure observed is checked. Put wine into the Rows box and music into the Columns box. Click OK. You will get: 4 Type of W ine * Type of Music Crosstabulation Count Type of Wine French Italian Other Total French 39 1 35 75 Type of Music Italian 30 19 35 84 None 30 11 43 84 Total 99 31 113 243 Then if you want the %s for joint and marginal distributions instead of counts, you go back to your data and do Analyze --> Descriptive Statistics --> Crosstabs --> (your rows and columns should still be entered from the previous step) --> Click Cells --> Click Total. Also, un-click observed so your table wont include also the counts and be too crowded. Click Continue and then OK. You will get: Type of W ine * Type of Music Crosstabulation % of Total Type of Wine French Italian Other Total French 16.0% .4% 14.4% 30.9% Type of Music Italian 12.3% 7.8% 14.4% 34.6% None 12.3% 4.5% 17.7% 34.6% Total 40.7% 12.8% 46.5% 100.0% Is there a relationship in the population between the type of wine purchased and the type of music that is playing? Perform a significance test, and write a short summary of your conclusion. Hypotheses: Test statistic: P-value: Conclusion in terms of the story: Was it appropriate to use the chi-square test here? Justify your answer. 5 To make SPSS do the hypothesis test, you go back to Analyze --> Descriptive Statistics -> Crosstabs --> Cells. Then click total to make their checks go away. Also click expected under counts. Click Continue. Then click Statistics --> Chi-Square --> Continue --> OK. You will get: Chi -Square Tests Pearson Chi-Square Likelihood Ratio N of Valid Cases Value 18.279a 21.875 243 df 4 4 Asy mp. Sig. (2-s ided) .001 .000 a. 0 cells (.0% ) have expect ed count less than 5. The minimum expected count is 9.57. Use the Pearson Chi-Square to get your X2 test statistic, and the Asymp. Sig. to get the P-value. Example: Psychological and social factors can influence the survival of patients with serious diseases. One study examined the relationship between survival of patients with coronary heart disease and pet ownership. Each of 92 patients was classified as having a pet or not and by whether they survived for one year. The researchers suspect that having a pet might be connected to the patient status. Here are the data: Patient Status Alive Dead Total Pet ownership No Yes 28 50 11 3 39 53 a) Find the joint and marginal distributions (in probabilities) of patient status and pet ownership. Patient Status Pet ownership No Yes Alive Dead Marg. for pets 0.304 0.543 0.120 0.033 0.424 0.576 Marg, for status 0.847 0.153 b) Assuming a patient is still alive, what is the probability he owns a pet? Is this a joint, marginal, or conditional probability? 6 c) What is the probability a patient is still alive and owns a pet? Is this a joint, marginal, or conditional probability? d) What is the probability a patient owns a pet? Is this a joint, marginal, or conditional probability? e) State the hypotheses for a 2 test of this problem, find the X2 test statistic, its degrees of freedom, and the P-value. State your conclusion in terms of the original problem. Hypotheses: Test statistic: P-value: Conclusion in terms of the story: Chi-Square Tests Pearson Chi-Square Continuity Correction(a) Likelihood Ratio 1 Asymp. Sig. (2-sided) .003 7.190 1 .007 9.011 1 .003 Value 8.851(b) df Fisher's Exact Test Exact Sig. (2-sided) .006 Linear-by-Linear Association 8.755 N of Valid Cases 92 1 .003 7 Exact Sig. (1-sided) .004 Student Handout for M&Ms/Skittles Activity (Chapter 9: Two Way Distributions) Part 1: Plain vs. Peanut M&Ms 1. Your data for plain (mine for peanut), in counts: Brown Plain Peanut Total 2 Yellow Red Blue 5 0 3 Orange Green 8 Total 4 22 Overall total number of plain and peanut M&Ms counted: 2. Joint Distribution (in white boxes). Divide each count above by the overall total of M&Ms. Brown Yellow Red Blue Plain Peanut Marginal for color Orange Green Marginal for flavor 100% 3. Marginal Distributions (above in shading). Add down the columns and across the rows. The bottom numbers should add to 100%, and the right column should add to 100%. 4. Conditional distribution of flavor for green M&Ms (you know the M&M is green, now what is the chance it is ). The denominator will be the same for both of these calculations. These two percentages should add to 100%. Plain 5. Peanut Bar graph for the conditional distributions above (you will have 2 bars on 1 graph): 8 6. Conditional distribution of color for plain M&M. Denominator will be the same for all 6 calculations. All 6 add to 100%. Brown Yellow Red Blue Orange Green 7. Sketch a bar graph for the conditional distribution of color for plain M&Ms. You will have 6 bars on the graph. 8. Conditional distribution of color for peanut M&Ms. Denominator will be the same for all 6 calculations. All 6 add to 100%. Brown 9. Yellow Red Blue Orange Green Sketch a bar graph for the conditional distribution of color for peanut M&Ms. You will have 6 bars on the graph. Use the same y-axis scale that you used for the bar graph for plain M&Ms so that you can easily compare your results? 9 How do they compare? In order to do a hypothesis test, we need a large data set, like one from the whole class. Plain Peanut Brown 147 69 Yellow 302 110 Red 264 70 Blue 407 162 Orange 330 148 Green 373 123 10. Hypotheses for the M&Ms 2 hypothesis test. Be sure to state whether your conclusion refers to the population or the sample. 11. Test statistic and P-value for the 12. Conclusion for the 13. Was it appropriate to use the chi-square test here? 2 2 hypothesis test: hypothesis test ( = 0.01) in terms of the story. 10 Part 2: M&Ms vs. Skittles Table for counts for the whole class: Yellow 302 361 663 M&Ms Skittles Total 2 Non-yellow 1521 1351 2872 14. Hypotheses for test: 15. Test statistic and P-value: 16. Conclusion ( = 0.01) in terms of the story. 17. Was it appropriate to use the chi-square test here? 11 Total 1823 1712 3535
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Purdue - STAT - 301
Chapter 11: Multiple RegressionLearning goals for this chapter:Interpret a scatterplot, residual plot, and Normal probability plot.Use SPSS to find the following: least-squares regression line, correlation, r2, andestimate for .Find the predicted res
Purdue - STAT - 301
Chapter 13: Two-Way Analysis of VarianceLearning goals for this chapter:Know how two-way ANOVA is related to one-way ANOVA and 2-samplecomparison of means techniques.Test the standard deviations to see if it is OK to pool the variances.Understand why
Purdue - STAT - 301
Chapters 2 and 10: Least Squares RegressionLearning goals for this chapter:Describe the form, direction, and strength of a scatterplot.Use SPSS output to find the following: least-squares regression line, correlation,r2, and estimate for .Interpret a
Purdue - STAT - 301
MATCHING: For the following problems, write the letter of the most appropriatestatistical analysis technique next to the story. Note: each answer choice may be usedonce, more than once, or not at all.A. Mean and/orstandard deviationE. Matched pairst
Purdue - STAT - 416
NameStudent ID #Instructor:Sergey KirshnerSTAT 416 Spring 2012Practice Exam #1February 8, 2012You are not allowed to use books or notes. Non-programmable non-graphing calculatorsare permitted. Please read the directions carefully. There are 9 prob
Purdue - STAT - 416
NameStudent ID #Instructor:SOLUTIONSergey KirshnerSTAT 416 Spring 2012Practice Exam #1February 8, 2012You are not allowed to use books or notes. Non-programmable non-graphing calculatorsare permitted. Please read the directions carefully. There a
Purdue - STAT - 416
NameStudent ID #Instructor:Sergey KirshnerSTAT 416 Spring 2012Practice Exam #2March 20, 2012You are not allowed to use books or notes. Non-programmable non-graphing calculatorsare permitted. Please read the directions carefully. There are 8 proble
Purdue - STAT - 416
NameStudent ID #Instructor:SOLUTIONSergey KirshnerSTAT 416 Spring 2012Practice Exam #2March 20, 2012You are not allowed to use books or notes. Non-programmable non-graphing calculatorsare permitted. Please read the directions carefully. There are
Purdue - STAT - 416
NamePID #InstructorSergey KirshnerSTAT/MATH 416 Spring 2012Practice Quiz #1January 25, 2012You are not allowed to use books or notes. Non-programmable non-graphing calculators arepermitted. Please read the directions carefully. The quiz is graded
Purdue - STAT - 416
NamePID #InstructorSOLUTIONSergey KirshnerSTAT/MATH 416 Spring 2012Practice Quiz #1January 25, 2012You are not allowed to use books or notes. Non-programmable non-graphing calculators arepermitted. Please read the directions carefully. The quiz i
Purdue - STAT - 416
NamePID #InstructorSergey KirshnerSTAT/MATH 416 Spring 2012Practice Quiz #2February 21, 2012You are not allowed to use books or notes. Non-programmable non-graphing calculators arepermitted. Please read the directions carefully. The quiz is graded
Purdue - STAT - 416
NamePID #InstructorSOLUTIONSergey KirshnerSTAT/MATH 416 Spring 2012Practice Quiz #2February 21, 2012You are not allowed to use books or notes. Non-programmable non-graphing calculators arepermitted. Please read the directions carefully. The quiz
Purdue - STAT - 416
NamePID #InstructorSergey KirshnerSTAT/MATH 416 Spring 2012Practice Quiz #3March 11, 2012You are not allowed to use books or notes. Non-programmable non-graphing calculators arepermitted. Please read the directions carefully. Evaluate the normal C
Purdue - STAT - 416
NamePID #InstructorSOLUTIONSergey KirshnerSTAT/MATH 416 Spring 2012Practice Quiz #3March 11, 2012You are not allowed to use books or notes. Non-programmable non-graphing calculators arepermitted. Please read the directions carefully. Evaluate the
Purdue - STAT - 511
Statistics 511 Midterm Exam II (Evening)Name:Section (circle one):1:30-2:203:30-4:20Please write down your answer and all relevant calculations(for partial credits) on the exam paper.12(1) (7) The joint mass function for random variables X and Y
Purdue - STAT - 511
Purdue - STAT - 511
Purdue - STAT - 511
Statistics 511-2Midterm Examination 1Wednesday, February 15, 2012Name (please print) :_ANSWER_Time: 50 minutesThis exam is closed-book. You may not consult any notes or books during this exam.One formula page is provided. Calculators are permitted.
Purdue - STAT - 511
Statistics 511-2Midterm Examination 2Wednesday, March 28, 2012Name (please print) :_ANSWER_Time: 50 minutesThis exam is closed-book. You may not consult any notes or books during this exam.One formula page and distribution tables are provided. Calcu
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 18Feb 22, 2012Jun Xie5.4 The distribution of sample meanDefinitionA statistic is any quantity whose value can be calculated from sample data. Prior to obtaining data, thereis uncertainty as to what value of any particu
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 19Feb 24, 2012Jun XieFinish the contents of Central Limit Theory in the last post.6.1 Point estimationThe sample mean could be used to draw a conclusion about the value of . Similarly, the value of thesample variance s
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 20Feb 27, 2012Jun Xie6.1 Point estimationThe sample mean could be used to draw a conclusion about the value of . Similarly, the value of thesample variance s2 can be used to infer something about 2.DefinitionA point e
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 22March 2, 2012Jun Xie6.2 Methods of estimationMaximum likelihood estimateThe likelihood function tells us how likely the observed sample is as a function of the possible parametervalues. Maximizing the likelihood give
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 23March 5, 2012Jun Xie7. Confidence IntervalsSuppose that the parameter of interest is a population mean and that1. The population distribution is normal;2. The value of the population standard deviation is known.This
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 24March 7, 2012Jun XieFinish the contents of large sample confidence interval in the last post.7.3 Confidence intervals based on a normal distributionThe CI for presented in earlier section is valid provided that n is l
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 25March 9, 2012Jun Xie1. Finish the content of one-sample CI by a t-distribution.2. Briefly introduce the idea of prediction interval for a single future value.3. Summary on a general method of deriving a CI.7.4 Confid
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 26March 19, 2012Jun Xie1. Summary on constructing CIs7.4 Confidence intervals for the variance and standard deviationIn case of a normal population distribution, we can construct a CI for the population variance 2 orst
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 27March 21, 2012Jun Xie8.1 Introduction of hypothesis testingConsider a pair of null and alternative hypotheses, H0 versus Ha. A test procedure is a rule, based onsample data, for deciding whether to reject H0.Example
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 28March 23, 2012Jun Xie8.2 Tests about a population mean A normal population with known Example 6 A manufacturer of sprinkler systems used for fire protection in office buildings claims that thetrue average system-acti
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 29March 26, 2012Jun Xie1. Review the general test procedure and the t-test.8.2 Tests about a population mean t-testConsider testing against H0: = 0 against Ha: &gt; 0. We use the t test statisticWhen H0 is true, the test
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 30March 30, 2012Jun Xie1. Review and sample size determination.2. Illustrate probabilities of type I error and type II error in curves of the test statistic.8.2 Tests about a population mean Large-sample testsWhen the
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 31April 2, 2012Jun Xie1. Finish large sample z tests.8.4 P-valuesBesides the rejection region method, we now consider another way of reaching a conclusion in ahypothesis testing analysis, based on calculation of a cert
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 32April 4, 2012Jun Xie1. More discussions on P-values2. Two sample confidence interval, referring to the last post (Section 9.1 is skipped).9.2 Two-sample t test and confidence intervalExample 7 In a study of liner ten
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 33April 6, 2012Jun Xie9.3 Analysis of paired dataConsider experiments with only one set of n individuals and two observations on each hence havepaired data.AssumptionsThe data consists of n independently selected pair
Purdue - STAT - 511
STAT 511-2Spring 2012Lecture 34April 9, 2012Jun Xie10. Analysis of Variance (ANOVA)Single-factor ANOVA focuses on a comparison of more than two population or treatment means. Letl = the number of populations or treatments beingcompared1 = the mea
Purdue - STAT - 511
Name:ANSWER.Statistics 511-2, Quiz 3The heights of men in a certain population follow a normal distribution with mean 69.7 inchesand standard deviation 2.8 inches.a) If a man is chosen at random from the population, find the probability that he will
Purdue - STAT - 511
STAT 511-2 Sample Questions for Midterm 2Ch4.3-8.21. Let X = the time between two successive arrivals at the drive -up window of a local bank. If X has an exponentialdistribution with = 1 (which is identical to a standard gamma distribution with=1), c
Purdue - STAT - 512
Review for Exam 2You are not required to write any SAS code for this exam, however, you will be answering questionsbased on the SAS output (with certain key values removed). The missing values can be calculated fromthe values provided. Think about the
Purdue - STAT - 513
Exam I Spring 2009 _ _ Name1. Name exactly three signs that a process is out of control: a. b. c.2. What are the three main components of the Cost of Quality: a. b. c.3. If we used two sigma limits in the X-bar chart instead of three sigma limits, how
Purdue - STAT - 513
Exam I Stat 513 Spring 2009__Name1. Name exactly three signs that a process is out of control:a.b.c.2. What are the three main components of the Cost of Quality:a.b.c.3. If we used two sigma limits in the X-bar chart instead of three sigma li
Purdue - STAT - 513
Exam II Stat 513Spring 2008_name1. If management wants workers to produce fewer defectives, managementmust2. Why is Cp so misleading as an indicator of how a process is performing?3. Instead of computing a hypothetical capability for an out of cont
Purdue - STAT - 513
Exam II Stat 513 Fall 2008 _ name 1. What is the most common cause of white space on an X-bar chart?2. Cp would be a good indicator of how well a process is performing if X double-bar equaled3. Can control charts help you reduce common cause variation?
Purdue - STAT - 513
Exam II Stat 513 Fall 2008 _ name 1. What is the most common cause of white space on an X-bar chart?2. Cp would be a good indicator of how well a process is performing if X double-bar equaled3. Can control charts help you reduce common cause variation?
Purdue - STAT - 513
Exam II Stat 513/IE 530 Spring 2006 _ name 1. In order to remove redundant steps in a process one would use a _ _.2. Once one has identified a specific problem in a process, in order to find the cause, you would the use a _.3. When using p-charts, if yo
Purdue - STAT - 513
Exam III Spring 2008 _ _ name 1. When do you use a p chart instead of an np chart?2. When do you use a moving average chart instead of an XmR chart?3.What does an Acceptable Quality Level of 5% mean?4. When should you change your control limits (name
Purdue - STAT - 513
IE 530/ Stat 513Exam II Spring 04_1.Why is it important to have adequate measurement units?2. In setting the process aim, what is implicitly assumed?3. In acceptance sampling, what area. Producers riskb. Suppliers risk4. For Xbar and R charts, is
Purdue - STAT - 513
AStat 513/IE 530 Midterm Spring 2006 _ name1. What is the most important component of quality costs? Why? (be brief) 2. With a large sample, what is the best way to estimate the proportion out of spec? 3. From lecture, if you were to include processes w
Purdue - STAT - 516
Purdue - STAT - 516
Purdue - STAT - 516
Purdue - STAT - 516
Purdue - STAT - 516
Purdue - STAT - 516
Purdue - STAT - 516
Purdue - STAT - 516
Purdue - STAT - 516
Purdue - STAT - 516
X P(X t) E(X ) &lt; E(X ).t E(X ) = t &gt; 0txf (x)dx =xf (x)dx +0t0xf (x)dx txf (x)dx tP(X &gt; t). I (X t) (X/t) X
Purdue - STAT - 516
Purdue - STAT - 516
Purdue - STAT - 516
STAT 516Spring 2012Practice Midterm: Probability and Distributions, MultivariateDistributionsName:Please return this page with your solution after exam.1. Five cards are drawn at random from a 52-card deck.(a). Compute the probability that at least
Purdue - STAT - 517