Unformatted text preview: Exercises on BIOSTATISTICS I. Introduction to descriptive statistics
1. Consider the “body length of a student who follows the course biostatistics”
as continuous random variable. Construct the frequency distribution and
define the classes such that they all contain about 5 outcomes. Construct the
bar graph and histogram. Compute the mean height in 2 ways: i) using the
individual heights and ii) using the classes (class frequencies and class
centers). Do the two results agree perfectly? Why?
2. Consider the “Degree or certificate last obtained” as a nonquantitative
random variable. Display the results in a pie chart (Small sectors shall be
grouped into a sector “other” such that the total number of sectors amounts
to about 6).
3. The voltage of a battery is measured repeatedly. One obtains consecutively:
1.65 V, 1.48 V, 1.71 V, 1.45 V, 1.79 V, 1.71 V. Calculate the mean, median,
modus, standard deviation of the sample, variance and coefficient of
variation. Knowing that the real voltage amounts to 1.5 V, do you expect the
errors to be random or systematic? [Answer: mx = 1.63 V, median = 1.68 V,
modus = 1.71 V, s = 0.14 V, s2 = 0.019 V2, vc=0.085]
4. After measuring the blood pressure of a group of selected patients, the
physician calculates a mean of 12 cm Hg and a standard deviation of 2 cm.
Suddenly, the physician detects a systematic error: the sphygmomanometer has
indicated too low by 1 cm. How will the physician correct the mean, the
standard deviation and the coefficient of variation (without measuring
again)? II. Basics of probability
1. Do you agree with the following reasoning: “I have won the Lottery today.
Now, I will stop gambling since my chances have shrunk away. The probability
of winning two times a year is extremely low”. Explain. 2. Consider a family that has 4 children. Assume that the probability for a son
equals 0.51. What is the probability that the family has 0, 1, 2, 3 and 4
boys? Make the sum of the probabilities found. 3. The probability for exposure to influenza during an epidemic amounts to 0.6.
There is a serum on the market that protects the vaccinated person for 80%
when exposed. A person who has not been vaccinated, however, has a
probability of being infected of 0.9 when exposed. We randomly consider two
persons, one is vaccinated and the other is not. What is the probability
that at least one of the two will catch influenza? 4. A radiation therapist has set up a frequency distribution of the total
radiation dose delivered to his patients. He used the obsolete unit rad. He
has computed the following parameters: mean, standard deviation, variance,
coefficient of variation, quartiles, skewness and kurtosis. Now he wants to
present his data expressed in the unit Gy = 100 rad. In which way the
therapist has to adapt the numerical values of his parameters? 1 5. In medicine, it often happens that a number of diseases cause the same
symptoms. Assume that a given symptom, which we call event H, can only occur
as a consequence of 3 possible underlying diseases A, B and C (mutually
exclusive). A study has shown that P(A) = 0.01, P(B) = 0.005, P(C) = 0.02,
P(HA) = 0.90, P(HB) = 0.95 en P(HC) = 0.75. Which is the probability that
a person who displays symptom H, suffers from disease A ? 6. A given genetic characteristic prevails in mice with a probability of 0.2.
Three mice are selected randomly. Calculate that the genetic characteristic
occurs in these 3 mice. 7. A medical researcher want to assess the efficacy of 2 drugs that are
intended to decrease blood pressure, A en B. He administers the drugs to 4
identical twins: A to one member and B to the other member of each twin
pair. The researcher detects that the 4 people who have received drug A
develop a lower blood pressure with respect to the respective twin partner.
What does this result suggest? Why did the researcher opt for identical
(monozygotic) twins? (Hint: compute the probability that the four people who
received drug A develop a lower blood pressure if both drugs had the same
efficacy). 8. A road victim will die unless he receives blood within 10 minutes. The blood
must be of type A Rhpositive and must be given by one donor. It takes 2
minutes to type a donor and the transfusion takes 2 minutes. There is only
one nondisposable bloodtyping kit available. Quite a lot of untyped
donors are present and 40% has the right blood type. (a) Which is the
probability that the victim is rescued? (b) What would have been the
probability if there was no bloodtyping kit available? [Answer: (a) 0.8704,
(b) 0.4] 9. The frequency of inhalation of a resting human adult is Gaussian distributed
according to (µ = 16 min1,σ2 = 16 min2). Which is the probability that such
an adult reaches a frequency higher than 22 min1? [Answer: 0.0668] 10. The following choice is presented by poll to 180 randomly selected students
in medicine: “Pro or contra numerus clausus?”. 68% of the students vote
against the numerus clausus. Is this “against” statistical significant? (In
case the real opinion of ALL students is 50%50%, which would have been the
probability of obtaining the vote of the students?) Hint: the variance of
the binomial distribution equals n*θ*(1θ), with θ probability of “success”
on any trial and n the number of trials. Would a total number of 20 students
have been sufficient when also assuming the 68% poll result?
11. By computer. The number of emergency admissions in a hospital is distributed
according to Poisson with a mean of 5 people a day.
i) Compute P(0), P(1), P(2), ..., P(10) and make a chart of the
probability distribution. What is the probability that 3 persons are
admitted on one day? Is it likely that a day 10 or more persons are
admitted? ii) Construct the binomial distribution that has the same mean as the above
Poisson distribution (the mean of the binomial distribution is µ=nθ,
where θ is the “success” of the single trial, and n is the number of
trials). Compare graphically to the Poisson distribution for i) n=10
and ii) n=50. What do you conclude? iii) Construct the normal distribution that has the same mean and variance
as the above mentioned Poisson distribution. Compare graphically to the
Poisson distribution. When would the similarity be still higher?
2 12. By computer. The manufacturer of a
in at least 99% of the cases. As a
cases, the drug was NOT effective.
the using the i) binomial, ii) the
= 0.01). drug claims that his product is effective
check, 200 people are interrogated. In 8
Test whether the manufacturer was right
Poisson and iii) the χ2 distribution. (α 13. By computer. Illustration of the central limit theorem. From a population
real numbers that are uniformly distributed in the interval [0,1], we take a
sample of 100 observations xi. The resulting sample mean is considered as
random variable y.
i) Compute y and s{xi} from your sample data. Compare to the theoretical
mean and standard deviation of the uniform distribution (σ2 is equal to
1/12 of the square of the interval width). ii) Draw 20 samples from the same population in order to verify the central
limit theorem for the mean (use a different seed number for each
sample). Compute y and s{xi} for each sample. Construct a histogram for
y and check whether it reflects the Gaussian distribution. iii) Compute s{y} and compare to s{xi} or σ{xi}. Why is the ratio about 10 ?
14. By computer. As random variable, we consider “the number of goals made in a
football match”. The following frequency table was recorded after analyzing
636 matches.
Number of goals Number of matches 0 32 1 92 2 141 3 132 4 111 5 70 6 32 7 12 8 – 10 14 Test (α = 0.01) the hypothesis that the number of goals in one match is
distributed according to Poisson. [Answer: Yes, according to Poisson, χ2 =
5.24(χ2.01 (df=7) = 18.48), p = 0.630]. 3 15. Manually or by computer. Women with a tumor (benign or malignant) in the
liver were asked whether they took oral contraceptives. The following
contingency table was obtained.
n tumor Malignant tumor Take oral contraceptives 138 49 Do NOT take oral contraceptives 39 41 Take of oral contraceptives NOT KNOWN 35 76 Does this sample indicate a correlation (α = 0.05) between the type of tumor
and the use of oral contraceptives? [Answer: There is significant
correlation, χ2 = 52.73 (χ2.05 = 5.99)].
16. A study published in the New England Journal of Medicine (8 December 1977)
suggested that aspirin protects male surgery patients against formation of
blood clots in the veins after surgical operation. Among 23 men who received
4 aspirin tablets a day, only 4 developed blood clots to be compared with 14
from 25 men who received placebo tablets. Do these data point to a
significant (α=0.01) correlation between the use of aspirin and the
occurrence of blood clots. [Answer: There is significant correlation, χ2 =
7.62 (χ2.01 = 6.63)].
17. By computer. Generate the χ2 probability distribution for 2 degrees of
freedom. To that end, generate 2 columns of numbers (at least 150 numbers in
each column) that are standard normally distributed. Square and sum up by
column. Construct the histogram. Compare also the cumulative histogram to
the cumulative χ2 distribution, which you can obtain by the function CHIDIST
(CHI.KWADRAAT in the Dutchlanguage version) of MS Excel.
18. The water intake is determined of 17 rats that had been administered an
amount of NaCl. The sample has a mean of 31.0 cm3 and a standard deviation
of 6.2 cm3. We know that the normal water intake with rats amounts 22.0 cm3.
Do these sample data prove that rats drink more after administration of NaCl
? (α=0.05). [Answer: yes, t=5.985, tα=1.746].
19. The recovery time is measured of patients who have undergone surgery of type
1 or 2. Two independent sample surveys, conducted on the respective patient
groups, revealed the following data:
Surgery type 1 Surgery type 2 n1 = 21 n2 = 23 m1 = 7.3 days m2 = 8.9 days s12 = 1.23 days2 s22 = 1.49 days2 Do these data demonstrate a significant (α = 0.05) difference in recoverytime between the two groups? [Answer: yes, t=4.535, tα=2.120].
20. In a study published in the New England Journal of Medicine" (vol. 297, pp.
528530), the authors investigated the effect of alcohol (ab)use on the
unborn child. First, they assessed the age of the pregnant women allocated
to groups A (heavy drinkers) and B (moderate drinkers to temperance):
4 Group A Group B nA = 58 nB = 575 mA = 25.7 year mB = 22.8 year sΑ = 5.9 year sΒ = 5.5 year The authors used a ttest to test the null hypothesis that µΑ = µΒ. (a) Was it
necessary to use the ttest to analyze these data? (b) The authors
concluded that “p < 0.001 by ttest”, e.g. that their results proved a
difference in mean age for α = 0.001. Do you agree? [Answer: (a) no, the
sample sizes are large enough to allow a ztest toe; (b) yes, t = 3.801].
21. Manually or by computer. The effect of alcohol on the human body seems to be
higher on elevated heights above sea level due to higher alcohol retention.
In order to test this hypothesis, a researcher randomly subdivides 12
persons into 2 groups. The 1st group is brought to a height of 4000 m and
consumes a drink that contains 100 cc of alcohol. The 2nd group consumes the
same amount of alcohol at sea level. After 2 hours, the concentration of
alcohol in the blood (gram per 100 cc) is measured for each test subject:
At sea level At 4000 m .07 .13 .10 .17 .09 .15 .12 .14 .09 .10 .13 .14 Do these data support the hypothesis that de alcohol retention in the blood
is higher at elevated heights (α = 0.10)? [Answer: yes, t=2.945, df = 10,
t(1).1 = 1.372, p = 0.007332].
22. Manually or by computer. A psychologist wants to verify whether a given drug
increases the reaction time. The following reaction times were recorded
before and after the injection of the drug for 4 test subjects:
Subject Reaction time (0.1 s)
Before After 1 7 13 2 2 3 3 12 18 4 12 13 5 Test for a 5%level of significance whether the drug increases the reaction
time. [Answer: yes, t=2.425, df = 3, t(1).05 = 2.353, p = 0.04688].
23. Manually or by computer. Singlefactor ANOVA. An experiment was conducted to
assess the role of age on the heart rate during jogging. Ten male test
subjects were selected randomly in 4 age groups: 1019, 2039, 4059 and 6069 year. The following increases in heart rate ware recorded:
Age
1019 year 2039 year 4059 year 6069 year
29 24 37 28 33 27 25 29 26 33 22 34 27 31 33 36 39 21 28 21 35 28 26 20 33 24 30 25 29 34 34 24 36 21 27 33 22 32 33 32 309 275 295 282 TOTAL Display these data in an xy scatter diagram and make a rough graphical
analysis. Do these data support the hypothesis that the heart rate depends
on the age group? (α = 0.05). [Answer: no, F = 0.8655, df = 3 and 36, F.05 =
2.87, p = 0.46785]
24)
Manually. Twofactor ANOVA. To study the risk factors for cardiovascular
system diseases, a statistician compares a group of runners with a control
group. Both men and women are included. The heartbeat is measured 6 minutes
after exercise. The following table lists the means found (for n' = 200
persons per group).
B Men Women Means Runners Means 150.168 = a 117.864 = b 134.016 = ½(a+b) (s = 15.218) = mab 129.403 = c 103.096 = d 116.250 = ½(c+d) (s = 16.919) A→ Controls (s = 15.909) ↓ (s = 12.668) = mcd 139.786 =
½(a+c) = mac 110.480 =
½(b+d) = mbd 125.133 = ¼(a+b+c+d)
=m 6 i) Compute the following sums of squares: S2intragroup, S2intergroup, S2A and
S B. Determine S2AB from the relation S2intergroup = S2A + S2B + S2AB. ii) Verify that S2AB for this (2x2)problem can be obtained directly as
S AB/n’=(amab)(amac)+(bmab)(bmbd)+(cmcd)(cmac)+(dmcd)(dmbd). iii) Determine whether the main effects and the interaction are
statistically significant (α = 0.05). Which effects are the most
significant? iv) Interpret and illustrate your conclusion on interaction by using a
plot of groups means 2 2 [Answer: 3 times statistically significant, S2intragroup = 185351, S2intergroup =
236690, S2A = 171768, S2B = 63126, S2AB = 1796, FA = 737.7, FB = 271.1, FAB =
7.7]
25. A very smallsized sample leads to the following 2x2 contingency table for
the dichotomous random variables x and y :
x
0 marginal marginal 0 0 9 9 1 y 1 4 2 6 4 11 Global total = 15 Test onesidedly the null hypothesis that there is NO correlation between x
en y (α = 0.05). [Answer: according to the Fisher Exact test: pabcd =
0.0110, so we reject the null hypothesis]. 7 26. In a study the time is assessed that two drugs A and B need to realize a
peak value of plasma concentration. 11 patients are treated with both drugs.
The times recorded are summarized in the following table. Is there a
significant (α = 0.05) difference between both drugs? Use i) Wilcoxon rank
sum test and ii) ttest.
Patient Time needed
Drug A Drug B 1 2.5 3.5 2 3.0 4.0 3 1.25 2.5 4 1.75 2.0 5 2.5 4.0 6 1.75 1.5 7 2.25 2.5 8 3.5 3.0 9 2.5 3.0 10 2.0 3.5 11 3.5 4.0 27. Survival analysis. In a clinical trial containing 48 patients, two
treatments are compared: A (20) and B (28 patients). The endpoint is dying
from cancer. The following events happen:
1) 14 Apatients die from cancer after 18, 25, 30, 34, 35, 36, 38, 39, 43,
60, 86, 130, 520 and 819 weeks respectively. 2) 1 Apatient dies, but NOT from cancer, after 121 weeks 3) 5 Apatients are still alive at the last followup consultation after
68, 77, 82, 324 and 546 weeks respectively. 4) 24 Bpatients die from cancer after 13, 17, 22, 26, 27, 31, 40, 44, 47,
48, 61, 65, 71, 78, 83, 95, 108, 123, 186, 481, 568, 685, 711 and 1105
weeks (21 year) respectively. 5) 1 Bpatient dies, though NOT from cancer, after 88 weeks 6) 3 Bpatients are still alive at the last followup consultation after
37, 69 en 213 weeks respectively. Construct the survival curves for both treatment groups. Use the logrank
test to test whether there is a significant (α = 0.05) difference in
survival between the two groups. 8 28. A medical researcher has recorded the data below. Assume that the Xvalues
do not contain uncertainty.
1) Compute the coefficients of linear regression, correlation and
determination. Interpret. 2) Test the statistical significance (α = 0.05) of the results obtained
in 1). 3) Construct the regression line using the criterion of minimal sum of
squared deviations.
Patient Years
having
smoked Measured
lung
damage X Y 1 25 55 2 36 60 3 22 50 4 15 30 5 48 75 6 39 70 7 42 70 8 31 55 9 28 30 10 34 35 [Answer: a = 1.28, b = 11.91, r = 0.76, F = 10.96, t = 3.31, statistical
significant] 9 Extra exercises on BIOSTATISTICS
1. By computer. Construct a series of curves as in figure 8 of the syllabus for
the case that C1 is related to an established therapy and that C2 is the
success rate for a new therapy. The sample involves N patients that receive
the new therapy. Take α = β = 0.05.
2. By computer. Construct Table T (pvalues for the sign test) of the syllabus.
3. By computer. Illustrate that the attached Table A10 (critical values of the
correlation coefficient r) can be computed from the critical values of t by
using equation (67) of the syllabus. 10 Guidelines for the exercises on computer Tools > Data Analysis Tools > AddIns > Analysis Toolpack Extra > Gegevensanalyse
Extra > Invoegtoepassingen > Analysis Toolpack 11 ...
View
Full
Document
This note was uploaded on 05/28/2010 for the course WE BIBI010000 taught by Professor Marnikvuylsteke during the Spring '10 term at Ghent University.
 Spring '10
 MarnikVuylsteke

Click to edit the document details