This preview shows page 1. Sign up to view the full content.
Unformatted text preview: BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant Data Analysis:
Introduction To Analysis:
Why Do We Need Statistics?
Biological questions are often answered through the interpretation of experimental
results. However, experimental results may not be conclusive. In fact most experimental
results are not readily interpretable. For instance experimental reagents may not have
worked or there may be variability between different time periods and results between
different labs.
Statistics are used to help experimenters determine how to appropriately collect
information, how to summarize their information (so that the data can be easily
understood and presented) and how to make inferences about their data to the real
world.
Statistics can be thought of as a formal way of critically thinking and understanding the
world. Critical thinking is defined as the amassing of information, summarization of
information and the understanding of information in order to make decisions about the
world.
What Are Statistics?
Statistics consists of a series of protocols to collect information, summarize information
and to infer from sample information back to the population, from which the data was
originally gathered.
Statistics offers a way for researchers to collect information, in order to maximize the
representation of a small number of sample cases, to describe the much larger
population. It is likely that the sample will have error due to sampling, as cases within
the data are frequently randomly selected in order to minimize sample bias. Sampling
error is the innate inability of the sample to describe the true population, from which the
sample was selected. For instance, if we randomly selected 4 students from the UT
population to describe the mean height of students, it is possible by chance that we
could choose students that were very tall or very short and our estimate of UT student
height would be inaccurate. Therefore, there would be an error in the data from who we
sampled – which is called sampling error. Data collection is optimized to minimize both
sample bias and sampling error.
Once data has been collected it needs to be summarized, as it is not possible for us to
understand a lot of numerical information from a large number of cases, just by looking
at numbers. Information can be summarized using summary indexes, such as the
sample mean, sample median, or sample standard deviation, to name a few.
Alternatively sample data can be summarized graphically using charts – like bar charts,
histograms, or line graphs, to name a few of the most common types of graphs.
Summary indexes have the advantage of being quantitative, but don’t always allow the
experimenter to see the whole picture of how the data looks. Summary charts have the
advantage of allowing the experimenter to see a summary of the whole data, but graphs
Page 1 of 17 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant
tend not to be quantitative. Therefore, data is typically summarized by both tables of
summary indexes and graphs – so that the data can be completely represented
qualitatively and quantitatively.
The determination of which summary indexes and graphs to use is dependent upon the
type of variable being presented. Data is recorded as a measured quantity called the
variable. Variables can be broken down into quantitative or categorical variables.
Quantitative variables are measured on the subjects with number values, for instance
blood pressure or weight. Categorical variables are measured on the subjects with
category values, such as gender, eye color and hair color. Quantitative variables are
typically summarized by means and standard deviation and graphically summarized
using histograms or boxplots. Categorical data is not typically summarized with number
summaries (other than frequencies or proportions) and is typically summarized
graphically using bar charts.
Once the sample has been understood using summary tables and graphs, the
experimenter would like to know if their results are correct and representative of the
population from which the sample was drawn. It is possible that the sample results may
be explained as a result of a sample bias or sampling error. For instance the four people
we chose previously to describe the height of UT students may be atypical and so may
not describe the height of UT students. In order to determine this we need to run
inferential statistics, in order to measure the degree of sampling error in the sample.
What Is Inferential Statistics?
Inferential statistics are the least important branch of statistics – after collection of data
and summarization of data. However, inferential statistics forms a vital part of
interpretation of experimental results. Without inferential statistics it is not possible to
determine if the results of an experiment are reflective of the population or not. In order
to infer from the sample to the population, we need to determine if the experimental
results can be explained as a sampling error or not.
What Is ChiSquare Analysis?
The Chisquare analysis is used to determine statistical significance for sample data
composed of categorical nominal variables. For instance, gender is a categorical
nominal variable with two categories. The values of a person’s gender are measured as
categories – male or female and there is no order to the categories.
Sample differences are said to be significantly different when the differences between
the samples cannot be explained by sampling error and so we can then infer that the
differences were due to true population differences.
As the difference between a sample and a control group increases we would expect
there to be less likelihood that these differences can be explained solely by sampling
error. Also, as the sample size increases there is a lower likelihood that the results can
be attributed to sampling error only, as the chances that the sample is made up of
anomalous cases decreases. Page 2 of 17 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant
To determine if the experimental results are due to sampling error only we would
calculate a test statistic. The test statistic can be thought of as a measure of the
standardized signal strength or standardized difference between the samples. Most
inferential test statistics are ratios of a measure of signal over a measure of noise. The
signal is typically represented as a measure of how different the samples are. Noise
typically contains components of the sample dispersion and sample size (i.e. sampling
error).
The value of the test statistics will be larger for bigger sample differences and larger
sample sizes. The value of the test statistic will be smaller for smaller sample
differences and larger sample dispersions.
What is the chisquare test statistic?
In order to judge statistical significance for categorical nominal variables a chisquare
inferential test is conducted. To assess the magnitude of the difference between
samples the test statistic is calculated. For the chisquare test the test statistic is: s 2 = Sum (Observed – Expected)2/Expected The test statistic works by looking at how dissimilar the observations are from the
expected value. These differences are squared, as some differences may be negative
and so the sum of the differences could get smaller and not larger. The squared
differences between observed and expected values are divided through by the expected
value to undo the square function and to standardize the resulting test statistic. The
fractional squared differences are summed to give the total difference between all
categories within the data set – which produces the complete chisquare test statistic.
As expected when the differences between the observed and expected frequencies are
large, the test statistics value will be large. The test statistic is now translated into a pvalue, which is determined from what is called the sampling distribution.
What is a pvalue?
The pvalue represents the probability of explaining the sample differences, as large as
those seen or larger, by a sampling error only. If the pvalue is very small, then the
sample differences couldn’t be explained by sampling error and so the samples must
have originated from different populations.
The pvalue is calculated by examining the area of the sampling distribution which is
greater than the test statistic. The sampling distribution is a distribution of the
frequencies of sampling outcomes from a large number of samplings of the same
population, with a given sample size. As sample size increases, we would expect the
outcomes of samplings to be closer to the population parameter and so the standard
deviation of the sampling distribution should decrease. Page 3 of 17 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant
So long as we can calculate the shape and area of the sampling distribution we can
calculate the pvalue (or probability of explaining sample differences by sampling error
only) assuming that the experimental samples only differed by sampling error only.
What are degrees of freedom?
As the sampling distribution changes shape with a changing sample size, it is necessary
to take into account the sample size or sample complexity when looking up a pvalue.
The degrees of freedom are used to look up pvalues from the sampling distribution,
rather than the sample size. Degrees of freedom represent the number of independent
pieces of information in the data set and prevent calculations which could give silly
numbers.
So why do we care about degrees of freedom?
The degrees of freedom are used to assess the amount of information within the data
set. They are used as they stop us calculating silly numbers – there is no point in
calculating the standard deviation of height from a single person – as the standard
deviation is a measure of the average distance of cases from the mean. The standard
deviation is meaningless for a single case – which would have a degree of freedom of
zero – so when we use this in our calculation we cannot get a standard deviation from a
single case.
We also use degrees of freedom because pvalues from tables will depend upon how
large our sample is or how complicated the sample information is. The sample size or
complexity of the data will affect the shape of the sampling distribution, so we need to
know the degrees of freedom so we can determine which sampling distribution to look
up.
Degrees of freedom for a quantitative continuous variable (n1)
Degrees of freedom represent the pieces of independent information which makes up
the data set. For instance in the example of measuring 10 people for height, the variable
is quantitative and height is measured for each person as a number value, there are 10
numerical measurements of height. In this example, there are 9 pieces of independent
pieces of information in this sample (n1). We could calculate the last person’s height
from the total height by subtracting the initial 9 people’s heights. For quantitative data,
once we know n1 pieces of information, the last piece case is fixed and its value can be
calculated.
Degrees of freedom for a single categorical variable (k1)
If we are interested in a person’s gender and sampled 10 people for gender, the
variable would be categorical nominal and people are measured for whether they are
male or female. The number of people, or frequency, of a specific gender becomes the
measure we are interested in. In this case, there are two categories of information, one
is dependent and one is independent – there are, therefore, k1 pieces of independent
information. The frequency of the second category can be calculated from the total
minus the values of the first category, i.e. if you know the number of females you can
calculate the number of males from subtracting the number of females from the total. Page 4 of 17 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant
Degrees of freedom for two categorical variables (k1)(r1)
If we are interested in the frequency of handedness and people’s gender and sampled
10 people for gender and handedness, we would have results from two categorical
nominal variables. In this case we would record whether people were male/female and
left/right handed. In this case the degrees of freedom would be (k1)(r1), as there are
two categorical variables to consider.
Using Hypotheses
In order to actually carry out our inferential test we need to formulate our questions and
keep on top of all of the questions we may ask. It is very unlikely that a single inferential
test will allow us to completely infer from our sample to the population – as there may
be several potential models the data may conform to. If the sample data were
completely clear and definitive then we wouldn’t need inferential statistics – as the
results would always be obvious and consistent.
There are 4 hypotheses which are run in each inferential statistical test; in order these
are, the informal null, the formal null, the informal alternative and the formal alternative.
The informal hypotheses allow you to follow which question you are asking your data
and to translate your biological data into your inferential test. This is particularly
important for categorical data – as the null hypothesis is used to calculate the expected
frequencies to obtain the test statistic.
The formal hypotheses are used to translate the biological question into the indexes
which will be inferentially tested in the population. This is important as a number of
indexes for any data set could be assessed and if the wrong index is chosen the results
may not reflect the sample data and so errors in analysis can be make.
What Hypotheses Are Used?
There are four hypotheses proposed for each round of analysis (i.e. each biological
model that the data is tested against.) The order of the hypotheses are:
1) The informal null (H0*:)
2) The formal null (H0:)
3) The informal alternative (HA*:)
4) The formal alternative (HA:)
The informal hypotheses aid in the translation of the biological question into an
inferential statistical analysis. The informal hypotheses also help you to formulate your
inference and ensure that you do not extrapolate an inference beyond the population
which you collected data from.
The formal hypotheses aid in the translation of the biological question into the math
which will be conducted in the analysis and emphasize the parameter which will be
inferred about. The formal null is essential for calculating the expected frequencies in Page 5 of 17 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant
the Chisquare goodness of fit test, when the null is not that all categories have the
same frequency. The alternative hypotheses are essential to highlight if you are running
a directional hypothesis test and aid you in reaching an appropriate directional inference
– if one is possible.
Why use the null hypothesis?
The data is tested for consistency against the null. For example we could test the
sample mean height of men and women – to determine if the population height of men
and women is different. The sample data could have a lot of sampling error and so may
not tell us about the population. Therefore, we test the data against the null hypothesis
(i.e. H0*: There is no difference in the population mean height of men and women).
We cannot determine directly if the population height of men and women differs, as we
can only see the world through the sample data – which has to have some sampling
error. Therefore, we need to determine how reliable the sample information is to show
what is going on in the population. If the sample differences are not due to sampling
error only, then we would reject the null (that there was no difference) and would say
that there was sufficient evidence (from the sample) to say that the population height of
men and women was different. Note – rejecting the null doesn’t mean that we have
proved that the population height of men and women are different – we only have data
which is consistent with that view.
Running through the chisquare analysis:
To make things clearer we will run through a number of different data sets and test a
number of hypotheses, to determine what the sample data tells us about reality – or the
population.
To carry out a successful and complete inferential analysis, it is important that you
follow all of the inferential steps in order and don’t just plug numbers into equations. To
fully understand your data you may have to run several complete inferential tests, for
each data set  to determine which model the data supports. Therefore, to understand
and piece your results together it is vital that you label each analysis (write out your
hypotheses and inferences carefully) so you don’t get mixed up. Page 6 of 17 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant
CONFIRMING IF ALU PV92 XX16 GENE FREQUENCY IS IN HW EQUILIBRIUM:
Data was pooled for a previous class and the frequency of the various genotypes of the
300 bp Alu insert within the PV92 locus on chromosome 16 was recorded. We wish to
know if the population is in HardyWeinberg Equilibrium (HWE). However, the sample
numbers from the pooled data are not conclusive, as the pooled sample size is
relatively small and the frequencies are not obviously and exactly consistent with the
HardyWeinberg Equilibrium frequencies. The sample discrepancies could indicate
either that the data was inconsistent with HWE or alternatively the discrepancies could
just be due to sampling error (as the sample size was relatively small).
The pooled class data for the 300 bp Alu insert within the PV92 locus is listed below and
we will run through the complete analysis to determine if the data is inconsistent with
the HWE distribution or if the data discrepancies can be explained by sample error.
Observed Frequencies +/+ / +/ Total (Contributing)
37 80 20
137 In order to test the data against the HWE model we must first calculate the frequencies
of the plus alleles (p) and the minus alleles (q), from the data.
Frequency of plus alleles (p):
p = total number of p alleles / total number of alleles
p = 2 x homozygous positive + heterozygous positive / total number of alleles
p = (2 x 37) + 20 / 2 x (37 + 80 + 20)
p = 94 / 274
p = 0.34306
Frequency of minus alleles (q):
q = total number of q alleles / total number of alleles
q = 2 x homozygous negative + heterozygous negative / total number of alleles
q = (2 x 80) + 20 / 2 x (37 + 80 + 20)
q = 180 / 274
q = 0.65693
H0*: The data is consistent with the HWE
H0:
Pr{p2} = 0.11769
Pr{2pq} = 0.45073
Pr{q2} = 0.43155
HA*: The data is not consistent with the HWE
HA: At least one of the proportions stipulated in the H0 is incorrect = 0.05
( is our threshold for rejecting the null and is the projection from a false positive)
Page 7 of 17 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant
In order to calculate the expected frequencies we must multiply the probabilities
stipulated in the formal null by the total number of case (137).
+/+
/+/Total (Contributing)
Observed Frequencies
37
80
20
137
Expected Frequencies 16.12 61.75 59.12
137 s2 (O E )2
E s2 (37 16.12) 2 (80 61.75) 2 (20 59.12) 2 16.12
61.75
29.12 s2 435.9744 333.0625 1530.3744 16.12
61.75
29.12 s 2 27.04 5.39 52.6 s 2 84.9
Degrees of freedom = k1 = 3 1 = 2
pvalue (from the ChiSquare table) < 0.005
Conclude the test:
The pvalue (< 0.005) is < , therefore the null is rejected.
Inference (It is vital to make a detailed inference in words, as it puts the inferential test
results back into terms of biology):
There is sufficient evidence to say that the 300 bp Alu frequency within the PV92
locus on XX16 is not in HardyWeinberg equilibrium. Page 8 of 17 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant
TESTING DATA FROM AN UNKNOWN DROSOPHILA MUTATION:
Supposed we have derived F2 progeny from two unknown sets of virgin F1 flies and
counted the frequency of the F2 phenotypes. In order to determine the genotype of the
flies we may have to carry out several rounds of inferential Chisquare analysis, to
determine which model the data is consistent with.
WildType
Mutant
Sibling Cross(es)
Male Female Male Female
Setup Label
47 42 2 0  Additional
Notes
Unknown F2 We need to run inferential analysis as the results will not exactly conform to a predicted
gene frequency, as all scientific data contains error. In order to determine which model
of genetic inheritance the data conforms to we examine the frequencies of the
phenotypes and test these against various chisquare hypotheses tests. The data which
is provided is fairly obvious, so that we can run through the various analyses without
much trouble.
Testing The Unknown Against An Autosomal Dominant Inheritance Model:
We can test to determine if the data is consistent with an autosomal dominant
inheritance model, in which we would expect 75% of progeny to be mutant and 25% of
flies to be wild type. In addition the gender of the flies should have no influence upon
the distribution of the mutant and wildtype frequencies.
NOTE: There are two ways we can test this model using a Chisquare analysis – as
there are two variables it would be better to test the dependence of the two variables
directly. However, this involves using Chisquare contingency tables – so we will look at
this example at the end of this section, as an optional test.
We can test the fit of the data to the model that the frequency of mutant males equals
that of mutant females (both 75%/2 = 37.5%) and the frequency of the wild type male
and females are equal (both 25%/2 = 12.5%)  this model involves testing two factors
(gender and another phenotype) together in the same null hypothesis.
H0*: The data is consistent with an autosomal dominant inheritance model, in which the
frequency of males and females are equal and there are 3 times as many mutants as
wild type progeny.
H0:
Pr{Male & Mutant} = 0.375
Pr{Female & Mutant} = 0.375
Pr{Male & Wild type} = 0.125
Pr{Female & Wild Type} = 0.125
HA*: The data is not consistent with an autosomal dominant inheritance model.
HA: At least one of the proportions stipulated in the H0 is incorrect
Page 9 of 17 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant = 0.05
( represents the threshold for rejecting the null and is the projection from making a
false positive inference)
In order to calculate the expected frequencies we must multiply the probabilities
stipulated in the formal null by the total number of cases. WildType
Mutant
Male Female Male Female Total
Observed
47
42
2
0
91
Expected 34.125 34.125 11.375 11.375 91.00 s2 (O E )2
E s2 (47 34.125)2 (42 34.125) 2 (2 11.375)2 (0 11.375) 2 34.125
34.125
11.375
11.375 s2 165.765 62.015 87.891 129.391 34.125 34.125 11.375 11.375 s 2 4.857 1.817 7.727 11.375 s 2 25.776
Degrees of freedom = k1 = 4 1 = 3
pvalue (from the ChiSquare table) < 0.005
Conclude the test:
The pvalue (< 0.005) is < , therefore the null is rejected.
Inference:
(It is vital to make a detailed inference in words, as this puts the inferential test results
back into terms of biology):
There is sufficient evidence to say that the phenotype data deviates from an
autosomal dominant inheritance model. Page 10 of 17 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant
Testing The Unknown Against An Autosomal Recessive Inheritance Model:
We can test to determine if the data is consistent with an autosomal recessive
inheritance model, in which we would expect 25% of progeny to be mutant and 75% of
flies to be wild type. In addition the gender of the flies should have no influence upon
the distribution of the mutant and wildtype frequencies.
We can test the fit of the data to the model that the frequency of mutant males equals
that of mutant females (both 25%/2 = 12.5%) and the frequency of the wild type male
and females are equal (both 75%/2 = 35.5%)  this model involves testing two factors
(gender and another phenotype together in the same null hypothesis).
H0*: The data is consistent with an autosomal recessive inheritance model, in which the
frequency of males and females are equal and there are 3 times as many wild type
progeny as mutant progeny.
H0:
Pr{Male & Mutant} = 0.125
Pr{Female & Mutant} = 0.125
Pr{Male & Wild type} = 0.375
Pr{Female & Wild Type} = 0.375
HA*: The data is not consistent with an autosomal recessive inheritance model.
HA: At least one of the proportions stipulated in the H0 is incorrect = 0.05
In order to calculate the expected frequencies we must multiply the probabilities
stipulated in the formal null by the total number of cases. WildType
Mutant
Male Female Male Female Total
Observed
47
42
2
0
91
Expected 11.375 11.375 34.125 34.125 91.00 (O E )2
s E
2 (47 11.375)2 (42 11.375) 2 (2 34.125)2 (0 34.125)2 s 11.375
11.375
34.125
34.125
2 s2 1269.141 937.891 1032.016 1164.516 11.375
11.375
34.125
34.125 Page 11 of 17 BIO325L – Genetics Laboratory s 2 111.573 82.452 30.242 34.125 FALL 2011
Dr J. Bryant s 2 285.392
Degrees of freedom = k1 = 4 1 = 3
pvalue (from the ChiSquare table) < 0.005
Conclude the test:
The pvalue (< 0.005) is < , therefore the null is rejected.
Inference:
(It is vital to make a detailed inference in words, as this puts the inferential test results
back into terms of biology):
There is sufficient evidence to say that the phenotype data deviates from an
autosomal recessive inheritance model.
NOTE: In this test we have a very large test statistic – which means the data
strongly disagrees with the null. The pvalue on more extensive ChiSquare tables
would actually be very small – indicating that the chance of explaining this
discrepancy by chance alone is exceedingly small. Page 12 of 17 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant
Testing The Unknown Against A WildType By WildType Mating Model:
We can test to determine if the phenotype distribution is consistent with a a wild type
back cross model, in which we would expect all flies to be wild type and that the gender
of the flies has no influence upon the distribution of the wild type phenotype.
We can also test the fit of the data to the model that the frequency of wild type male and
females is equal and 100%  this model involves testing two factors (gender and eye
color together in the null hypothesis).
H0*: The data is consistent with an autosomal wild type model, in which the frequency of
males and females are equal and there is a 100% frequency of brick red eye color.
H0:
Pr{Male & Wild type eye color } = 0.50
Pr{Female & Wild Type eye color} = 0.50
HA*: The data is not consistent with the wild type back cross model.
HA: At least one of the proportions stipulated in the H0 is incorrect = 0.05
( represents the threshold for rejecting the null and is the projection from making a
false positive inference)
In order to calculate the expected frequencies we must multiply the probabilities
stipulated in the formal null by the total number of cases. WildType
Mutant
Male Female Male Female
Total
Observed
47
42
2
0
91 (NOT 89!)
Expected 45.50 45.50
ND
ND
91.00
NOTE: In order for the calculation of the Chisquare statistic to work – we MUST carry
out a little trick when the expected would be zero – this ensures that we don’t have
a division by zero in the calculation. The only way to still use the Chisquare analysis
and not unfairly promote the test statistics is to include any observations which would be
within the zero expected categories within the total number of cases. The categories
where expected values should be zero are then EXCLUDED from the Chisquare test
statistic. s2 (O E )2
E s2 (47 45.50) 2 (42 45.50) 2 45.50
45.50
Page 13 of 17 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant s2 2.25 12.25 45.50 45.50 s 2 0.05 0.27 s 2 0.59
Degrees of freedom = k1 = 2 1 = 1
pvalue (from the ChiSquare table) > 0.10
Conclude the test:
The pvalue (> 0.10) is > , therefore the null is NOT rejected.
Inference:
There is insufficient evidence to say that the data deviates from an autosomal wild
type mechanism of inheritance.
Therefore, we can say that the data is consistent with an autosomal wild type
inheritance model. Page 14 of 17 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant
ALTERNATIVE METHOD Of Testing The Unknown Against A Wildtype Backcross
Model:
We can test to determine if the data is consistent with wild type back cross inheritance
model, in which we would expect 100% of progeny to be wildtype and the gender of the
flies should have no influence upon the distribution of the wildtype frequencies.
In this way of fitting the data we can test if the two variables gender and wild type
phenotype are associated.
H0*: The data is consistent with an autosomal dominant inheritance model, in which the
frequency of males and females are equal and 100% of the progeny should be wild
type.
H0:
Pr{MutantMale} = Pr{MutantFemale}
Pr{Wild typeMale} = Pr{Wild TypeFemale}
ALTERNATIVELY we could also write:
Pr{Male Mutant} = Pr{Male Wild type }
Pr{Female Mutant} = Pr{Female Wild type}
HA*: The data is not consistent with a wild type back cross model.
HA:
Pr{MutantMale} ≠ Pr{MutantFemale}
Pr{Wild typeMale} ≠ Pr{Wild TypeFemale} = 0.05
In order to calculate the expected frequencies we can use the equation:
Expected = (Row Total * Column Total)/Grand Total
The data is presented as an amended table – showing the two variables (one as rows
and one as columns) and the expected values are shown in parentheses.
Male
WildType 47 (47.923)
2 (1.077)
Mutant
49
Total Female
42 (41.076)
0 (0.923)
42 Page 15 of 17 Total
89
2
91 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant (O E )2
s E
2 s2 (47 47.923) 2 (42 41.076)2 (2 1.077)2 (0 0.923) 2 47.923
41.076
1.077
0.923 s2 0.852
0.854 0.852 0.852 47.923 41.076 1.077 0.923 s 2 0.017 0.021 0.791 0.923 s 2 1.752
Degrees of freedom = (r1)(k1) = 1 X 1 = 1
pvalue (from the ChiSquare table) > 0.1
Conclude the test:
The pvalue (> 0.1) is > , therefore the null is NOT rejected.
Inference:
There is insufficient evidence to say that the proportion of the wild type phenotype is
influenced by fly gender.
Therefore, the phenotype of the flies is consistent with a wild type phenotype which
is not sexlinked. Page 16 of 17 BIO325L – Genetics Laboratory
FALL 2011
Dr J. Bryant
STUDENT QUESTIONS  UNDERSTANDING CHISQUARE:
What is statistics?
When would you run a Chisquare analysis
What is Chisquare analysis?
In the context of the Chisquare test what is the null hypothesis?
In the context of the chisquare test what is the alternative hypothesis?
What is the Chisquare test statistic?
What does the Chisquare test statistics show?
What is a pvalue?
In the context of the Chisquare test when would you reject the null hypothesis?
In the context of the Chisquare test when would you not reject the null hypothesis?
The null hypothesis may not be rejected. However the null hypothesis is never
accepted. Explain why not?
In the Chisquare test, what does a failure to reject the null hypothesis mean?
In the Chisquare test, what does rejecting the null hypothesis mean? Analysis of Your Drosophila Inheritance Data:
Analyze your inheritance patterns using Chisquare analysis to determine the model
which your data conforms to. In your analysis you should carry out at least two Chisquare analyses for each set of matings – to demonstrate which model your data is
consistent with and a model which is not supported by your data. For some patterns you
may need to test more than 2 models. The homework assignment will indicate when
these cases arise. Page 17 of 17 ...
View Full
Document
 Spring '08
 FRANKS
 Genetics

Click to edit the document details