Statistical (Chi-square) Analysis of Data

Statistical (Chi-square) Analysis of Data - BIO325L –...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant Data Analysis: Introduction To Analysis: Why Do We Need Statistics? Biological questions are often answered through the interpretation of experimental results. However, experimental results may not be conclusive. In fact most experimental results are not readily interpretable. For instance experimental reagents may not have worked or there may be variability between different time periods and results between different labs. Statistics are used to help experimenters determine how to appropriately collect information, how to summarize their information (so that the data can be easily understood and presented) and how to make inferences about their data to the real world. Statistics can be thought of as a formal way of critically thinking and understanding the world. Critical thinking is defined as the amassing of information, summarization of information and the understanding of information in order to make decisions about the world. What Are Statistics? Statistics consists of a series of protocols to collect information, summarize information and to infer from sample information back to the population, from which the data was originally gathered. Statistics offers a way for researchers to collect information, in order to maximize the representation of a small number of sample cases, to describe the much larger population. It is likely that the sample will have error due to sampling, as cases within the data are frequently randomly selected in order to minimize sample bias. Sampling error is the innate inability of the sample to describe the true population, from which the sample was selected. For instance, if we randomly selected 4 students from the UT population to describe the mean height of students, it is possible by chance that we could choose students that were very tall or very short and our estimate of UT student height would be inaccurate. Therefore, there would be an error in the data from who we sampled – which is called sampling error. Data collection is optimized to minimize both sample bias and sampling error. Once data has been collected it needs to be summarized, as it is not possible for us to understand a lot of numerical information from a large number of cases, just by looking at numbers. Information can be summarized using summary indexes, such as the sample mean, sample median, or sample standard deviation, to name a few. Alternatively sample data can be summarized graphically using charts – like bar charts, histograms, or line graphs, to name a few of the most common types of graphs. Summary indexes have the advantage of being quantitative, but don’t always allow the experimenter to see the whole picture of how the data looks. Summary charts have the advantage of allowing the experimenter to see a summary of the whole data, but graphs Page 1 of 17 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant tend not to be quantitative. Therefore, data is typically summarized by both tables of summary indexes and graphs – so that the data can be completely represented qualitatively and quantitatively. The determination of which summary indexes and graphs to use is dependent upon the type of variable being presented. Data is recorded as a measured quantity called the variable. Variables can be broken down into quantitative or categorical variables. Quantitative variables are measured on the subjects with number values, for instance blood pressure or weight. Categorical variables are measured on the subjects with category values, such as gender, eye color and hair color. Quantitative variables are typically summarized by means and standard deviation and graphically summarized using histograms or box-plots. Categorical data is not typically summarized with number summaries (other than frequencies or proportions) and is typically summarized graphically using bar charts. Once the sample has been understood using summary tables and graphs, the experimenter would like to know if their results are correct and representative of the population from which the sample was drawn. It is possible that the sample results may be explained as a result of a sample bias or sampling error. For instance the four people we chose previously to describe the height of UT students may be atypical and so may not describe the height of UT students. In order to determine this we need to run inferential statistics, in order to measure the degree of sampling error in the sample. What Is Inferential Statistics? Inferential statistics are the least important branch of statistics – after collection of data and summarization of data. However, inferential statistics forms a vital part of interpretation of experimental results. Without inferential statistics it is not possible to determine if the results of an experiment are reflective of the population or not. In order to infer from the sample to the population, we need to determine if the experimental results can be explained as a sampling error or not. What Is Chi-Square Analysis? The Chi-square analysis is used to determine statistical significance for sample data composed of categorical nominal variables. For instance, gender is a categorical nominal variable with two categories. The values of a person’s gender are measured as categories – male or female and there is no order to the categories. Sample differences are said to be significantly different when the differences between the samples cannot be explained by sampling error and so we can then infer that the differences were due to true population differences. As the difference between a sample and a control group increases we would expect there to be less likelihood that these differences can be explained solely by sampling error. Also, as the sample size increases there is a lower likelihood that the results can be attributed to sampling error only, as the chances that the sample is made up of anomalous cases decreases. Page 2 of 17 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant To determine if the experimental results are due to sampling error only we would calculate a test statistic. The test statistic can be thought of as a measure of the standardized signal strength or standardized difference between the samples. Most inferential test statistics are ratios of a measure of signal over a measure of noise. The signal is typically represented as a measure of how different the samples are. Noise typically contains components of the sample dispersion and sample size (i.e. sampling error). The value of the test statistics will be larger for bigger sample differences and larger sample sizes. The value of the test statistic will be smaller for smaller sample differences and larger sample dispersions. What is the chi-square test statistic? In order to judge statistical significance for categorical nominal variables a chi-square inferential test is conducted. To assess the magnitude of the difference between samples the test statistic is calculated. For the chi-square test the test statistic is: s 2 = Sum (Observed – Expected)2/Expected The test statistic works by looking at how dissimilar the observations are from the expected value. These differences are squared, as some differences may be negative and so the sum of the differences could get smaller and not larger. The squared differences between observed and expected values are divided through by the expected value to undo the square function and to standardize the resulting test statistic. The fractional squared differences are summed to give the total difference between all categories within the data set – which produces the complete chi-square test statistic. As expected when the differences between the observed and expected frequencies are large, the test statistics value will be large. The test statistic is now translated into a pvalue, which is determined from what is called the sampling distribution. What is a p-value? The p-value represents the probability of explaining the sample differences, as large as those seen or larger, by a sampling error only. If the p-value is very small, then the sample differences couldn’t be explained by sampling error and so the samples must have originated from different populations. The p-value is calculated by examining the area of the sampling distribution which is greater than the test statistic. The sampling distribution is a distribution of the frequencies of sampling outcomes from a large number of samplings of the same population, with a given sample size. As sample size increases, we would expect the outcomes of samplings to be closer to the population parameter and so the standard deviation of the sampling distribution should decrease. Page 3 of 17 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant So long as we can calculate the shape and area of the sampling distribution we can calculate the p-value (or probability of explaining sample differences by sampling error only) assuming that the experimental samples only differed by sampling error only. What are degrees of freedom? As the sampling distribution changes shape with a changing sample size, it is necessary to take into account the sample size or sample complexity when looking up a p-value. The degrees of freedom are used to look up p-values from the sampling distribution, rather than the sample size. Degrees of freedom represent the number of independent pieces of information in the data set and prevent calculations which could give silly numbers. So why do we care about degrees of freedom? The degrees of freedom are used to assess the amount of information within the data set. They are used as they stop us calculating silly numbers – there is no point in calculating the standard deviation of height from a single person – as the standard deviation is a measure of the average distance of cases from the mean. The standard deviation is meaningless for a single case – which would have a degree of freedom of zero – so when we use this in our calculation we cannot get a standard deviation from a single case. We also use degrees of freedom because p-values from tables will depend upon how large our sample is or how complicated the sample information is. The sample size or complexity of the data will affect the shape of the sampling distribution, so we need to know the degrees of freedom so we can determine which sampling distribution to look up. Degrees of freedom for a quantitative continuous variable (n-1) Degrees of freedom represent the pieces of independent information which makes up the data set. For instance in the example of measuring 10 people for height, the variable is quantitative and height is measured for each person as a number value, there are 10 numerical measurements of height. In this example, there are 9 pieces of independent pieces of information in this sample (n-1). We could calculate the last person’s height from the total height by subtracting the initial 9 people’s heights. For quantitative data, once we know n-1 pieces of information, the last piece case is fixed and its value can be calculated. Degrees of freedom for a single categorical variable (k-1) If we are interested in a person’s gender and sampled 10 people for gender, the variable would be categorical nominal and people are measured for whether they are male or female. The number of people, or frequency, of a specific gender becomes the measure we are interested in. In this case, there are two categories of information, one is dependent and one is independent – there are, therefore, k-1 pieces of independent information. The frequency of the second category can be calculated from the total minus the values of the first category, i.e. if you know the number of females you can calculate the number of males from subtracting the number of females from the total. Page 4 of 17 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant Degrees of freedom for two categorical variables (k-1)(r-1) If we are interested in the frequency of handedness and people’s gender and sampled 10 people for gender and handedness, we would have results from two categorical nominal variables. In this case we would record whether people were male/female and left/right handed. In this case the degrees of freedom would be (k-1)(r-1), as there are two categorical variables to consider. Using Hypotheses In order to actually carry out our inferential test we need to formulate our questions and keep on top of all of the questions we may ask. It is very unlikely that a single inferential test will allow us to completely infer from our sample to the population – as there may be several potential models the data may conform to. If the sample data were completely clear and definitive then we wouldn’t need inferential statistics – as the results would always be obvious and consistent. There are 4 hypotheses which are run in each inferential statistical test; in order these are, the informal null, the formal null, the informal alternative and the formal alternative. The informal hypotheses allow you to follow which question you are asking your data and to translate your biological data into your inferential test. This is particularly important for categorical data – as the null hypothesis is used to calculate the expected frequencies to obtain the test statistic. The formal hypotheses are used to translate the biological question into the indexes which will be inferentially tested in the population. This is important as a number of indexes for any data set could be assessed and if the wrong index is chosen the results may not reflect the sample data and so errors in analysis can be make. What Hypotheses Are Used? There are four hypotheses proposed for each round of analysis (i.e. each biological model that the data is tested against.) The order of the hypotheses are: 1) The informal null (H0*:) 2) The formal null (H0:) 3) The informal alternative (HA*:) 4) The formal alternative (HA:) The informal hypotheses aid in the translation of the biological question into an inferential statistical analysis. The informal hypotheses also help you to formulate your inference and ensure that you do not extrapolate an inference beyond the population which you collected data from. The formal hypotheses aid in the translation of the biological question into the math which will be conducted in the analysis and emphasize the parameter which will be inferred about. The formal null is essential for calculating the expected frequencies in Page 5 of 17 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant the Chi-square goodness of fit test, when the null is not that all categories have the same frequency. The alternative hypotheses are essential to highlight if you are running a directional hypothesis test and aid you in reaching an appropriate directional inference – if one is possible. Why use the null hypothesis? The data is tested for consistency against the null. For example we could test the sample mean height of men and women – to determine if the population height of men and women is different. The sample data could have a lot of sampling error and so may not tell us about the population. Therefore, we test the data against the null hypothesis (i.e. H0*: There is no difference in the population mean height of men and women). We cannot determine directly if the population height of men and women differs, as we can only see the world through the sample data – which has to have some sampling error. Therefore, we need to determine how reliable the sample information is to show what is going on in the population. If the sample differences are not due to sampling error only, then we would reject the null (that there was no difference) and would say that there was sufficient evidence (from the sample) to say that the population height of men and women was different. Note – rejecting the null doesn’t mean that we have proved that the population height of men and women are different – we only have data which is consistent with that view. Running through the chi-square analysis: To make things clearer we will run through a number of different data sets and test a number of hypotheses, to determine what the sample data tells us about reality – or the population. To carry out a successful and complete inferential analysis, it is important that you follow all of the inferential steps in order and don’t just plug numbers into equations. To fully understand your data you may have to run several complete inferential tests, for each data set - to determine which model the data supports. Therefore, to understand and piece your results together it is vital that you label each analysis (write out your hypotheses and inferences carefully) so you don’t get mixed up. Page 6 of 17 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant CONFIRMING IF ALU PV92 XX16 GENE FREQUENCY IS IN H-W EQUILIBRIUM: Data was pooled for a previous class and the frequency of the various genotypes of the 300 bp Alu insert within the PV92 locus on chromosome 16 was recorded. We wish to know if the population is in Hardy-Weinberg Equilibrium (HWE). However, the sample numbers from the pooled data are not conclusive, as the pooled sample size is relatively small and the frequencies are not obviously and exactly consistent with the Hardy-Weinberg Equilibrium frequencies. The sample discrepancies could indicate either that the data was inconsistent with HWE or alternatively the discrepancies could just be due to sampling error (as the sample size was relatively small). The pooled class data for the 300 bp Alu insert within the PV92 locus is listed below and we will run through the complete analysis to determine if the data is inconsistent with the HWE distribution or if the data discrepancies can be explained by sample error. Observed Frequencies +/+ -/- +/- Total (Contributing) 37 80 20 137 In order to test the data against the HWE model we must first calculate the frequencies of the plus alleles (p) and the minus alleles (q), from the data. Frequency of plus alleles (p): p = total number of p alleles / total number of alleles p = 2 x homozygous positive + heterozygous positive / total number of alleles p = (2 x 37) + 20 / 2 x (37 + 80 + 20) p = 94 / 274 p = 0.34306 Frequency of minus alleles (q): q = total number of q alleles / total number of alleles q = 2 x homozygous negative + heterozygous negative / total number of alleles q = (2 x 80) + 20 / 2 x (37 + 80 + 20) q = 180 / 274 q = 0.65693 H0*: The data is consistent with the HWE H0: Pr{p2} = 0.11769 Pr{2pq} = 0.45073 Pr{q2} = 0.43155 HA*: The data is not consistent with the HWE HA: At least one of the proportions stipulated in the H0 is incorrect = 0.05 ( is our threshold for rejecting the null and is the projection from a false positive) Page 7 of 17 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant In order to calculate the expected frequencies we must multiply the probabilities stipulated in the formal null by the total number of case (137). +/+ -/+/Total (Contributing) Observed Frequencies 37 80 20 137 Expected Frequencies 16.12 61.75 59.12 137 s2 (O E )2 E s2 (37 16.12) 2 (80 61.75) 2 (20 59.12) 2 16.12 61.75 29.12 s2 435.9744 333.0625 1530.3744 16.12 61.75 29.12 s 2 27.04 5.39 52.6 s 2 84.9 Degrees of freedom = k-1 = 3 -1 = 2 p-value (from the Chi-Square table) < 0.005 Conclude the test: The p-value (< 0.005) is < , therefore the null is rejected. Inference (It is vital to make a detailed inference in words, as it puts the inferential test results back into terms of biology): There is sufficient evidence to say that the 300 bp Alu frequency within the PV92 locus on XX16 is not in Hardy-Weinberg equilibrium. Page 8 of 17 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant TESTING DATA FROM AN UNKNOWN DROSOPHILA MUTATION: Supposed we have derived F2 progeny from two unknown sets of virgin F1 flies and counted the frequency of the F2 phenotypes. In order to determine the genotype of the flies we may have to carry out several rounds of inferential Chi-square analysis, to determine which model the data is consistent with. Wild-Type Mutant Sibling Cross(es) Male Female Male Female Setup Label 47 42 2 0 - Additional Notes Unknown F2 We need to run inferential analysis as the results will not exactly conform to a predicted gene frequency, as all scientific data contains error. In order to determine which model of genetic inheritance the data conforms to we examine the frequencies of the phenotypes and test these against various chi-square hypotheses tests. The data which is provided is fairly obvious, so that we can run through the various analyses without much trouble. Testing The Unknown Against An Autosomal Dominant Inheritance Model: We can test to determine if the data is consistent with an autosomal dominant inheritance model, in which we would expect 75% of progeny to be mutant and 25% of flies to be wild type. In addition the gender of the flies should have no influence upon the distribution of the mutant and wild-type frequencies. NOTE: There are two ways we can test this model using a Chi-square analysis – as there are two variables it would be better to test the dependence of the two variables directly. However, this involves using Chi-square contingency tables – so we will look at this example at the end of this section, as an optional test. We can test the fit of the data to the model that the frequency of mutant males equals that of mutant females (both 75%/2 = 37.5%) and the frequency of the wild type male and females are equal (both 25%/2 = 12.5%) - this model involves testing two factors (gender and another phenotype) together in the same null hypothesis. H0*: The data is consistent with an autosomal dominant inheritance model, in which the frequency of males and females are equal and there are 3 times as many mutants as wild type progeny. H0: Pr{Male & Mutant} = 0.375 Pr{Female & Mutant} = 0.375 Pr{Male & Wild type} = 0.125 Pr{Female & Wild Type} = 0.125 HA*: The data is not consistent with an autosomal dominant inheritance model. HA: At least one of the proportions stipulated in the H0 is incorrect Page 9 of 17 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant = 0.05 ( represents the threshold for rejecting the null and is the projection from making a false positive inference) In order to calculate the expected frequencies we must multiply the probabilities stipulated in the formal null by the total number of cases. Wild-Type Mutant Male Female Male Female Total Observed 47 42 2 0 91 Expected 34.125 34.125 11.375 11.375 91.00 s2 (O E )2 E s2 (47 34.125)2 (42 34.125) 2 (2 11.375)2 (0 11.375) 2 34.125 34.125 11.375 11.375 s2 165.765 62.015 87.891 129.391 34.125 34.125 11.375 11.375 s 2 4.857 1.817 7.727 11.375 s 2 25.776 Degrees of freedom = k-1 = 4 -1 = 3 p-value (from the Chi-Square table) < 0.005 Conclude the test: The p-value (< 0.005) is < , therefore the null is rejected. Inference: (It is vital to make a detailed inference in words, as this puts the inferential test results back into terms of biology): There is sufficient evidence to say that the phenotype data deviates from an autosomal dominant inheritance model. Page 10 of 17 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant Testing The Unknown Against An Autosomal Recessive Inheritance Model: We can test to determine if the data is consistent with an autosomal recessive inheritance model, in which we would expect 25% of progeny to be mutant and 75% of flies to be wild type. In addition the gender of the flies should have no influence upon the distribution of the mutant and wild-type frequencies. We can test the fit of the data to the model that the frequency of mutant males equals that of mutant females (both 25%/2 = 12.5%) and the frequency of the wild type male and females are equal (both 75%/2 = 35.5%) - this model involves testing two factors (gender and another phenotype together in the same null hypothesis). H0*: The data is consistent with an autosomal recessive inheritance model, in which the frequency of males and females are equal and there are 3 times as many wild type progeny as mutant progeny. H0: Pr{Male & Mutant} = 0.125 Pr{Female & Mutant} = 0.125 Pr{Male & Wild type} = 0.375 Pr{Female & Wild Type} = 0.375 HA*: The data is not consistent with an autosomal recessive inheritance model. HA: At least one of the proportions stipulated in the H0 is incorrect = 0.05 In order to calculate the expected frequencies we must multiply the probabilities stipulated in the formal null by the total number of cases. Wild-Type Mutant Male Female Male Female Total Observed 47 42 2 0 91 Expected 11.375 11.375 34.125 34.125 91.00 (O E )2 s E 2 (47 11.375)2 (42 11.375) 2 (2 34.125)2 (0 34.125)2 s 11.375 11.375 34.125 34.125 2 s2 1269.141 937.891 1032.016 1164.516 11.375 11.375 34.125 34.125 Page 11 of 17 BIO325L – Genetics Laboratory s 2 111.573 82.452 30.242 34.125 FALL 2011 Dr J. Bryant s 2 285.392 Degrees of freedom = k-1 = 4 -1 = 3 p-value (from the Chi-Square table) < 0.005 Conclude the test: The p-value (< 0.005) is < , therefore the null is rejected. Inference: (It is vital to make a detailed inference in words, as this puts the inferential test results back into terms of biology): There is sufficient evidence to say that the phenotype data deviates from an autosomal recessive inheritance model. NOTE: In this test we have a very large test statistic – which means the data strongly disagrees with the null. The p-value on more extensive Chi-Square tables would actually be very small – indicating that the chance of explaining this discrepancy by chance alone is exceedingly small. Page 12 of 17 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant Testing The Unknown Against A Wild-Type By Wild-Type Mating Model: We can test to determine if the phenotype distribution is consistent with a a wild type back cross model, in which we would expect all flies to be wild type and that the gender of the flies has no influence upon the distribution of the wild type phenotype. We can also test the fit of the data to the model that the frequency of wild type male and females is equal and 100% - this model involves testing two factors (gender and eye color together in the null hypothesis). H0*: The data is consistent with an autosomal wild type model, in which the frequency of males and females are equal and there is a 100% frequency of brick red eye color. H0: Pr{Male & Wild type eye color } = 0.50 Pr{Female & Wild Type eye color} = 0.50 HA*: The data is not consistent with the wild type back cross model. HA: At least one of the proportions stipulated in the H0 is incorrect = 0.05 ( represents the threshold for rejecting the null and is the projection from making a false positive inference) In order to calculate the expected frequencies we must multiply the probabilities stipulated in the formal null by the total number of cases. Wild-Type Mutant Male Female Male Female Total Observed 47 42 2 0 91 (NOT 89!) Expected 45.50 45.50 ND ND 91.00 NOTE: In order for the calculation of the Chi-square statistic to work – we MUST carry out a little trick when the expected would be zero – this ensures that we don’t have a division by zero in the calculation. The only way to still use the Chi-square analysis and not unfairly promote the test statistics is to include any observations which would be within the zero expected categories within the total number of cases. The categories where expected values should be zero are then EXCLUDED from the Chi-square test statistic. s2 (O E )2 E s2 (47 45.50) 2 (42 45.50) 2 45.50 45.50 Page 13 of 17 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant s2 2.25 12.25 45.50 45.50 s 2 0.05 0.27 s 2 0.59 Degrees of freedom = k-1 = 2 -1 = 1 p-value (from the Chi-Square table) > 0.10 Conclude the test: The p-value (> 0.10) is > , therefore the null is NOT rejected. Inference: There is insufficient evidence to say that the data deviates from an autosomal wild type mechanism of inheritance. Therefore, we can say that the data is consistent with an autosomal wild type inheritance model. Page 14 of 17 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant ALTERNATIVE METHOD Of Testing The Unknown Against A Wild-type Backcross Model: We can test to determine if the data is consistent with wild type back cross inheritance model, in which we would expect 100% of progeny to be wild-type and the gender of the flies should have no influence upon the distribution of the wild-type frequencies. In this way of fitting the data we can test if the two variables gender and wild type phenotype are associated. H0*: The data is consistent with an autosomal dominant inheritance model, in which the frequency of males and females are equal and 100% of the progeny should be wild type. H0: Pr{Mutant|Male} = Pr{Mutant|Female} Pr{Wild type|Male} = Pr{Wild Type|Female} ALTERNATIVELY we could also write: Pr{Male| Mutant} = Pr{Male| Wild type } Pr{Female| Mutant} = Pr{Female| Wild type} HA*: The data is not consistent with a wild type back cross model. HA: Pr{Mutant|Male} ≠ Pr{Mutant|Female} Pr{Wild type|Male} ≠ Pr{Wild Type|Female} = 0.05 In order to calculate the expected frequencies we can use the equation: Expected = (Row Total * Column Total)/Grand Total The data is presented as an amended table – showing the two variables (one as rows and one as columns) and the expected values are shown in parentheses. Male Wild-Type 47 (47.923) 2 (1.077) Mutant 49 Total Female 42 (41.076) 0 (0.923) 42 Page 15 of 17 Total 89 2 91 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant (O E )2 s E 2 s2 (47 47.923) 2 (42 41.076)2 (2 1.077)2 (0 0.923) 2 47.923 41.076 1.077 0.923 s2 0.852 0.854 0.852 0.852 47.923 41.076 1.077 0.923 s 2 0.017 0.021 0.791 0.923 s 2 1.752 Degrees of freedom = (r-1)(k-1) = 1 X 1 = 1 p-value (from the Chi-Square table) > 0.1 Conclude the test: The p-value (> 0.1) is > , therefore the null is NOT rejected. Inference: There is insufficient evidence to say that the proportion of the wild type phenotype is influenced by fly gender. Therefore, the phenotype of the flies is consistent with a wild type phenotype which is not sex-linked. Page 16 of 17 BIO325L – Genetics Laboratory FALL 2011 Dr J. Bryant STUDENT QUESTIONS - UNDERSTANDING CHI-SQUARE: What is statistics? When would you run a Chi-square analysis What is Chi-square analysis? In the context of the Chi-square test what is the null hypothesis? In the context of the chi-square test what is the alternative hypothesis? What is the Chi-square test statistic? What does the Chi-square test statistics show? What is a p-value? In the context of the Chi-square test when would you reject the null hypothesis? In the context of the Chi-square test when would you not reject the null hypothesis? The null hypothesis may not be rejected. However the null hypothesis is never accepted. Explain why not? In the Chi-square test, what does a failure to reject the null hypothesis mean? In the Chi-square test, what does rejecting the null hypothesis mean? Analysis of Your Drosophila Inheritance Data: Analyze your inheritance patterns using Chi-square analysis to determine the model which your data conforms to. In your analysis you should carry out at least two Chisquare analyses for each set of matings – to demonstrate which model your data is consistent with and a model which is not supported by your data. For some patterns you may need to test more than 2 models. The homework assignment will indicate when these cases arise. Page 17 of 17 ...
View Full Document

Ask a homework question - tutors are online