Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Exercises on BIOSTATISTICS I. Introduction to descriptive statistics 1. Consider the “body length of a student who follows the course biostatistics” as continuous random variable. Construct the frequency distribution and define the classes such that they all contain about 5 outcomes. Construct the bar graph and histogram. Compute the mean height in 2 ways: i) using the individual heights and ii) using the classes (class frequencies and class centers). Do the two results agree perfectly? Why? 2. Consider the “Degree or certificate last obtained” as a non-quantitative random variable. Display the results in a pie chart (Small sectors shall be grouped into a sector “other” such that the total number of sectors amounts to about 6). 3. The voltage of a battery is measured repeatedly. One obtains consecutively: 1.65 V, 1.48 V, 1.71 V, 1.45 V, 1.79 V, 1.71 V. Calculate the mean, median, modus, standard deviation of the sample, variance and coefficient of variation. Knowing that the real voltage amounts to 1.5 V, do you expect the errors to be random or systematic? [Answer: mx = 1.63 V, median = 1.68 V, modus = 1.71 V, s = 0.14 V, s2 = 0.019 V2, vc=0.085] 4. After measuring the blood pressure of a group of selected patients, the physician calculates a mean of 12 cm Hg and a standard deviation of 2 cm. Suddenly, the physician detects a systematic error: the sphygmomanometer has indicated too low by 1 cm. How will the physician correct the mean, the standard deviation and the coefficient of variation (without measuring again)? II. Basics of probability 1. Do you agree with the following reasoning: “I have won the Lottery today. Now, I will stop gambling since my chances have shrunk away. The probability of winning two times a year is extremely low”. Explain. 2. Consider a family that has 4 children. Assume that the probability for a son equals 0.51. What is the probability that the family has 0, 1, 2, 3 and 4 boys? Make the sum of the probabilities found. 3. The probability for exposure to influenza during an epidemic amounts to 0.6. There is a serum on the market that protects the vaccinated person for 80% when exposed. A person who has not been vaccinated, however, has a probability of being infected of 0.9 when exposed. We randomly consider two persons, one is vaccinated and the other is not. What is the probability that at least one of the two will catch influenza? 4. A radiation therapist has set up a frequency distribution of the total radiation dose delivered to his patients. He used the obsolete unit rad. He has computed the following parameters: mean, standard deviation, variance, coefficient of variation, quartiles, skewness and kurtosis. Now he wants to present his data expressed in the unit Gy = 100 rad. In which way the therapist has to adapt the numerical values of his parameters? 1 5. In medicine, it often happens that a number of diseases cause the same symptoms. Assume that a given symptom, which we call event H, can only occur as a consequence of 3 possible underlying diseases A, B and C (mutually exclusive). A study has shown that P(A) = 0.01, P(B) = 0.005, P(C) = 0.02, P(H|A) = 0.90, P(H|B) = 0.95 en P(H|C) = 0.75. Which is the probability that a person who displays symptom H, suffers from disease A ? 6. A given genetic characteristic prevails in mice with a probability of 0.2. Three mice are selected randomly. Calculate that the genetic characteristic occurs in these 3 mice. 7. A medical researcher want to assess the efficacy of 2 drugs that are intended to decrease blood pressure, A en B. He administers the drugs to 4 identical twins: A to one member and B to the other member of each twin pair. The researcher detects that the 4 people who have received drug A develop a lower blood pressure with respect to the respective twin partner. What does this result suggest? Why did the researcher opt for identical (monozygotic) twins? (Hint: compute the probability that the four people who received drug A develop a lower blood pressure if both drugs had the same efficacy). 8. A road victim will die unless he receives blood within 10 minutes. The blood must be of type A Rh-positive and must be given by one donor. It takes 2 minutes to type a donor and the transfusion takes 2 minutes. There is only one non-disposable blood-typing kit available. Quite a lot of un-typed donors are present and 40% has the right blood type. (a) Which is the probability that the victim is rescued? (b) What would have been the probability if there was no blood-typing kit available? [Answer: (a) 0.8704, (b) 0.4] 9. The frequency of inhalation of a resting human adult is Gaussian distributed according to (µ = 16 min-1,σ2 = 16 min-2). Which is the probability that such an adult reaches a frequency higher than 22 min-1? [Answer: 0.0668] 10. The following choice is presented by poll to 180 randomly selected students in medicine: “Pro or contra numerus clausus?”. 68% of the students vote against the numerus clausus. Is this “against” statistical significant? (In case the real opinion of ALL students is 50%-50%, which would have been the probability of obtaining the vote of the students?) Hint: the variance of the binomial distribution equals n*θ*(1-θ), with θ probability of “success” on any trial and n the number of trials. Would a total number of 20 students have been sufficient when also assuming the 68% poll result? 11. By computer. The number of emergency admissions in a hospital is distributed according to Poisson with a mean of 5 people a day. i) Compute P(0), P(1), P(2), ..., P(10) and make a chart of the probability distribution. What is the probability that 3 persons are admitted on one day? Is it likely that a day 10 or more persons are admitted? ii) Construct the binomial distribution that has the same mean as the above Poisson distribution (the mean of the binomial distribution is µ=nθ, where θ is the “success” of the single trial, and n is the number of trials). Compare graphically to the Poisson distribution for i) n=10 and ii) n=50. What do you conclude? iii) Construct the normal distribution that has the same mean and variance as the above mentioned Poisson distribution. Compare graphically to the Poisson distribution. When would the similarity be still higher? 2 12. By computer. The manufacturer of a in at least 99% of the cases. As a cases, the drug was NOT effective. the using the i) binomial, ii) the = 0.01). drug claims that his product is effective check, 200 people are interrogated. In 8 Test whether the manufacturer was right Poisson and iii) the χ2 distribution. (α 13. By computer. Illustration of the central limit theorem. From a population real numbers that are uniformly distributed in the interval [0,1], we take a sample of 100 observations xi. The resulting sample mean is considered as random variable y. i) Compute y and s{xi} from your sample data. Compare to the theoretical mean and standard deviation of the uniform distribution (σ2 is equal to 1/12 of the square of the interval width). ii) Draw 20 samples from the same population in order to verify the central limit theorem for the mean (use a different seed number for each sample). Compute y and s{xi} for each sample. Construct a histogram for y and check whether it reflects the Gaussian distribution. iii) Compute s{y} and compare to s{xi} or σ{xi}. Why is the ratio about 10 ? 14. By computer. As random variable, we consider “the number of goals made in a football match”. The following frequency table was recorded after analyzing 636 matches. Number of goals Number of matches 0 32 1 92 2 141 3 132 4 111 5 70 6 32 7 12 8 – 10 14 Test (α = 0.01) the hypothesis that the number of goals in one match is distributed according to Poisson. [Answer: Yes, according to Poisson, χ2 = 5.24(χ2.01 (df=7) = 18.48), p = 0.630]. 3 15. Manually or by computer. Women with a tumor (benign or malignant) in the liver were asked whether they took oral contraceptives. The following contingency table was obtained. n tumor Malignant tumor Take oral contraceptives 138 49 Do NOT take oral contraceptives 39 41 Take of oral contraceptives NOT KNOWN 35 76 Does this sample indicate a correlation (α = 0.05) between the type of tumor and the use of oral contraceptives? [Answer: There is significant correlation, χ2 = 52.73 (χ2.05 = 5.99)]. 16. A study published in the New England Journal of Medicine (8 December 1977) suggested that aspirin protects male surgery patients against formation of blood clots in the veins after surgical operation. Among 23 men who received 4 aspirin tablets a day, only 4 developed blood clots to be compared with 14 from 25 men who received placebo tablets. Do these data point to a significant (α=0.01) correlation between the use of aspirin and the occurrence of blood clots. [Answer: There is significant correlation, χ2 = 7.62 (χ2.01 = 6.63)]. 17. By computer. Generate the χ2 probability distribution for 2 degrees of freedom. To that end, generate 2 columns of numbers (at least 150 numbers in each column) that are standard normally distributed. Square and sum up by column. Construct the histogram. Compare also the cumulative histogram to the cumulative χ2 distribution, which you can obtain by the function CHIDIST (CHI.KWADRAAT in the Dutch-language version) of MS Excel. 18. The water intake is determined of 17 rats that had been administered an amount of NaCl. The sample has a mean of 31.0 cm3 and a standard deviation of 6.2 cm3. We know that the normal water intake with rats amounts 22.0 cm3. Do these sample data prove that rats drink more after administration of NaCl ? (α=0.05). [Answer: yes, t=5.985, tα=1.746]. 19. The recovery time is measured of patients who have undergone surgery of type 1 or 2. Two independent sample surveys, conducted on the respective patient groups, revealed the following data: Surgery type 1 Surgery type 2 n1 = 21 n2 = 23 m1 = 7.3 days m2 = 8.9 days s12 = 1.23 days2 s22 = 1.49 days2 Do these data demonstrate a significant (α = 0.05) difference in recoverytime between the two groups? [Answer: yes, t=4.535, tα=2.120]. 20. In a study published in the New England Journal of Medicine" (vol. 297, pp. 528-530), the authors investigated the effect of alcohol (ab)use on the unborn child. First, they assessed the age of the pregnant women allocated to groups A (heavy drinkers) and B (moderate drinkers to temperance): 4 Group A Group B nA = 58 nB = 575 mA = 25.7 year mB = 22.8 year sΑ = 5.9 year sΒ = 5.5 year The authors used a t-test to test the null hypothesis that µΑ = µΒ. (a) Was it necessary to use the t-test to analyze these data? (b) The authors concluded that “p < 0.001 by t-test”, e.g. that their results proved a difference in mean age for α = 0.001. Do you agree? [Answer: (a) no, the sample sizes are large enough to allow a z-test toe; (b) yes, t = 3.801]. 21. Manually or by computer. The effect of alcohol on the human body seems to be higher on elevated heights above sea level due to higher alcohol retention. In order to test this hypothesis, a researcher randomly subdivides 12 persons into 2 groups. The 1st group is brought to a height of 4000 m and consumes a drink that contains 100 cc of alcohol. The 2nd group consumes the same amount of alcohol at sea level. After 2 hours, the concentration of alcohol in the blood (gram per 100 cc) is measured for each test subject: At sea level At 4000 m .07 .13 .10 .17 .09 .15 .12 .14 .09 .10 .13 .14 Do these data support the hypothesis that de alcohol retention in the blood is higher at elevated heights (α = 0.10)? [Answer: yes, t=2.945, df = 10, t(1).1 = 1.372, p = 0.007332]. 22. Manually or by computer. A psychologist wants to verify whether a given drug increases the reaction time. The following reaction times were recorded before and after the injection of the drug for 4 test subjects: Subject Reaction time (0.1 s) Before After 1 7 13 2 2 3 3 12 18 4 12 13 5 Test for a 5%-level of significance whether the drug increases the reaction time. [Answer: yes, t=2.425, df = 3, t(1).05 = 2.353, p = 0.04688]. 23. Manually or by computer. Single-factor ANOVA. An experiment was conducted to assess the role of age on the heart rate during jogging. Ten male test subjects were selected randomly in 4 age groups: 10-19, 20-39, 40-59 and 6069 year. The following increases in heart rate ware recorded: Age 10-19 year 20-39 year 40-59 year 60-69 year 29 24 37 28 33 27 25 29 26 33 22 34 27 31 33 36 39 21 28 21 35 28 26 20 33 24 30 25 29 34 34 24 36 21 27 33 22 32 33 32 309 275 295 282 TOTAL Display these data in an x-y scatter diagram and make a rough graphical analysis. Do these data support the hypothesis that the heart rate depends on the age group? (α = 0.05). [Answer: no, F = 0.8655, df = 3 and 36, F.05 = 2.87, p = 0.46785] 24) Manually. Two-factor ANOVA. To study the risk factors for cardiovascular system diseases, a statistician compares a group of runners with a control group. Both men and women are included. The heartbeat is measured 6 minutes after exercise. The following table lists the means found (for n' = 200 persons per group). B Men Women Means Runners Means 150.168 = a 117.864 = b 134.016 = ½(a+b) (s = 15.218) = mab 129.403 = c 103.096 = d 116.250 = ½(c+d) (s = 16.919) A→ Controls (s = 15.909) ↓ (s = 12.668) = mcd 139.786 = ½(a+c) = mac 110.480 = ½(b+d) = mbd 125.133 = ¼(a+b+c+d) =m 6 i) Compute the following sums of squares: S2intragroup, S2intergroup, S2A and S B. Determine S2AB from the relation S2intergroup = S2A + S2B + S2AB. ii) Verify that S2AB for this (2x2)-problem can be obtained directly as S AB/n’=(a-mab)(a-mac)+(b-mab)(b-mbd)+(c-mcd)(c-mac)+(d-mcd)(d-mbd). iii) Determine whether the main effects and the interaction are statistically significant (α = 0.05). Which effects are the most significant? iv) Interpret and illustrate your conclusion on interaction by using a plot of groups means 2 2 [Answer: 3 times statistically significant, S2intragroup = 185351, S2intergroup = 236690, S2A = 171768, S2B = 63126, S2AB = 1796, FA = 737.7, FB = 271.1, FAB = 7.7] 25. A very small-sized sample leads to the following 2x2 contingency table for the dichotomous random variables x and y : x 0 marginal marginal 0 0 9 9 1 y 1 4 2 6 4 11 Global total = 15 Test one-sidedly the null hypothesis that there is NO correlation between x en y (α = 0.05). [Answer: according to the Fisher Exact test: pabcd = 0.0110, so we reject the null hypothesis]. 7 26. In a study the time is assessed that two drugs A and B need to realize a peak value of plasma concentration. 11 patients are treated with both drugs. The times recorded are summarized in the following table. Is there a significant (α = 0.05) difference between both drugs? Use i) Wilcoxon rank sum test and ii) t-test. Patient Time needed Drug A Drug B 1 2.5 3.5 2 3.0 4.0 3 1.25 2.5 4 1.75 2.0 5 2.5 4.0 6 1.75 1.5 7 2.25 2.5 8 3.5 3.0 9 2.5 3.0 10 2.0 3.5 11 3.5 4.0 27. Survival analysis. In a clinical trial containing 48 patients, two treatments are compared: A (20) and B (28 patients). The endpoint is dying from cancer. The following events happen: 1) 14 A-patients die from cancer after 18, 25, 30, 34, 35, 36, 38, 39, 43, 60, 86, 130, 520 and 819 weeks respectively. 2) 1 A-patient dies, but NOT from cancer, after 121 weeks 3) 5 A-patients are still alive at the last follow-up consultation after 68, 77, 82, 324 and 546 weeks respectively. 4) 24 B-patients die from cancer after 13, 17, 22, 26, 27, 31, 40, 44, 47, 48, 61, 65, 71, 78, 83, 95, 108, 123, 186, 481, 568, 685, 711 and 1105 weeks (21 year) respectively. 5) 1 B-patient dies, though NOT from cancer, after 88 weeks 6) 3 B-patients are still alive at the last follow-up consultation after 37, 69 en 213 weeks respectively. Construct the survival curves for both treatment groups. Use the logrank test to test whether there is a significant (α = 0.05) difference in survival between the two groups. 8 28. A medical researcher has recorded the data below. Assume that the X-values do not contain uncertainty. 1) Compute the coefficients of linear regression, correlation and determination. Interpret. 2) Test the statistical significance (α = 0.05) of the results obtained in 1). 3) Construct the regression line using the criterion of minimal sum of squared deviations. Patient Years having smoked Measured lung damage X Y 1 25 55 2 36 60 3 22 50 4 15 30 5 48 75 6 39 70 7 42 70 8 31 55 9 28 30 10 34 35 [Answer: a = 1.28, b = 11.91, r = 0.76, F = 10.96, t = 3.31, statistical significant] 9 Extra exercises on BIOSTATISTICS 1. By computer. Construct a series of curves as in figure 8 of the syllabus for the case that C1 is related to an established therapy and that C2 is the success rate for a new therapy. The sample involves N patients that receive the new therapy. Take α = β = 0.05. 2. By computer. Construct Table T (p-values for the sign test) of the syllabus. 3. By computer. Illustrate that the attached Table A10 (critical values of the correlation coefficient r) can be computed from the critical values of t by using equation (67) of the syllabus. 10 Guidelines for the exercises on computer Tools --> Data Analysis Tools --> Add-Ins --> Analysis Toolpack Extra --> Gegevensanalyse Extra --> Invoegtoepassingen --> Analysis Toolpack 11 ...
View Full Document

This note was uploaded on 05/28/2010 for the course WE BIBI010000 taught by Professor Marnikvuylsteke during the Spring '10 term at Ghent University.

Ask a homework question - tutors are online