Biostatistics-lectures NOTES - (Biostatistics BARACK O...

Info icon This preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: (Biostatistics) BARACK O. ABONYO Chapter 1 Introduction To Biostatistics 2 Key words : Statistics , data , Biostatistics, Variable ,Population ,Sample Text Book : Basic Concepts and Methodology for the Health Sciences 3 Introduction Some Basic concepts Statistics is a field of study concerned with 1- collection, organization, summarization and analysis of data. 2- drawing of inferences about a body of data when only a part of the data is observed. Statisticians try to interpret and communicate the results to others. Text Book : Basic Concepts and Methodology for the Health Sciences 4 * Biostatistics: The tools of statistics are employed in many fields: business, education, psychology, agriculture, economics, … etc. When the data analyzed are derived from the biological science and medicine, we use the term biostatistics to distinguish this particular application of statistical tools and concepts. Text Book : Basic Concepts and Methodology for the Health Sciences 5 :Data • The raw material of Statistics is data. • We may define data as figures. Figures result from the process of counting or from taking a measurement. • For example: • - When a hospital administrator counts the number of patients (counting). • - When a nurse weighs a patient (measurement) Text Book : Basic Concepts and Methodology for the Health Sciences 6 :Sources of Data* We search for suitable data to serve as the raw material for our investigation. Such data are available from one or more of the following sources: 1- Routinely kept records. For example: - Hospital medical records contain information on patients. - Hospital accounting records contain data on the facility’s business activities. Text Book : Basic Concepts and Methodology for the Health Sciences 7 2- External sources. The data needed to answer a question may already exist in the form of published reports, commercially available data banks, or the research literature, i.e. someone else has already asked the same question. Text Book : Basic Concepts and Methodology for the Health Sciences 8 3- Surveys: The source may be a survey, if the data needed is about answering certain questions. For example: If the administrator of a clinic wishes to obtain information regarding the mode of transportation used by patients to visit the clinic, then a survey may be conducted among patients to obtain this information. Text Book : Basic Concepts and Methodology for the Health Sciences 9 4- Experiments. Frequently the data needed to answer a question are available only as the result of an experiment. For example: If a nurse wishes to know which of several strategies is best for maximizing patient compliance, she might conduct an experiment in which the different strategies of motivating compliance are tried with different patients. Text Book : Basic Concepts and Methodology for the Health Sciences 10 :A variable* It is a characteristic that takes on different values in different persons, places, or things. For example: - heart rate, the heights of adult males, the weights of preschool children, the ages of patients seen in a dental clinic. Text Book : Basic Concepts and Methodology for the Health Sciences 11 Types of variables QuantitativeQualitative Quantitative Variables Qualitative Variables Many characteristics It can be measured are not capable of in the usual being measured. sense. Some of them can be For example: ordered or ranked. - the heights of For example: adult males, - classification of people - the weights of into socio-economic preschool groups, children, - social classes based on - the ages of education, etc. patients seen inTextaBook : income, Basic Concepts and 12 Methodology for the Health Sciences dental clinic. Types of quantitative variables Discrete Continuous A discrete variable A continuous variable is characterized by gaps or interruptions in the values that it can assume. For example: - can assume any value within a specified relevant interval of values assumed by the variable. For example: - Height, The number of daily - weight, admissions to a - skull circumference. general hospital, The number of No matter how close together decayed, missing or the observed heights of two filled teeth per child people, we can find another person whose height falls in an somewhere in between. elementary Text Book : Basic Concepts and 13 Methodology for the Health Sciences school. * A population: It is the largest collection of values of a random variable for which we have an interest at a particular time. For example: The weights of all the children enrolled in a certain elementary school. Populations may be finite or infinite. Text Book : Basic Concepts and Methodology for the Health Sciences 14 * A sample: It is a part of a population. For example: The weights of only a fraction of these children. Text Book : Basic Concepts and Methodology for the Health Sciences 15 Strategies for understanding the meanings of Data (Pages( 19 – 27 Key words frequency table, bar chart ,range width of interval , mid-interval Histogram , Polygon Text Book : Basic Concepts and Methodology for the Health Sciences 17 Descriptive Statistics Frequency Distribution for Discrete Random Variables Example: Suppose that we take a sample of size 16 from children in a primary school and get the following data about the number of their decayed teeth, 3,5,2,4,0,1,3,5,2,3,2,3,3,2,4,1 To construct a frequency table: 1- Order the values from the smallest to the largest. 0,1,1,2,2,2,2,3,3,3,3,3,4,4,5,5 2- Count how many numbers are the same. No. of decayed teeth Frequency Relative Frequency 0 1 2 3 4 5 1 2 4 5 2 2 0.0625 0.125 0.25 0.3125 0.125 0.125 Total 16 1 Representing the simple frequency table We can represent the above simple using the bar chart frequency table 6 5 using the bar chart. 5 4 4 3 Frequency 2 1 2 2 2 4.00 5.00 1 0 1.00 2.00 Text Book : Basic.00 Concepts and Methodology for the Health Sciences Number of decayed teeth 3.00 19 2.3 Frequency Distribution for Continuous Random Variables For large samples, we can’t use the simple frequency table to represent the data. We need to divide the data into groups or intervals or classes. So, we need to determine: 1- The number of intervals (k). Too few intervals are not good because information will be lost. Too many intervals are not helpful to summarize the data. A commonly followed rule is that 6 ≤ k ≤ 15, or the following formula may be used, k = 1 + 3.322 (log n( Text Book : Basic Concepts and Methodology for the Health Sciences 20 2- The range (R). It is the difference between the largest and the smallest observation in the data set. 3- The Width of the interval (w). Class intervals generally should be of the same width. Thus, if we want k intervals, then w is chosen such that w ≥ R / k. Text Book : Basic Concepts and Methodology for the Health Sciences 21 Example: Assume that the number of observations equal 100, then k = 1+3.322(log 100) = 1 + 3.3222 (2) = 7.6 8. Assume that the smallest value = 5 and the largest one of the data = 61, then R = 61 – 5 = 56 and w = 56 / 8 = 7. To make the summarization more comprehensible, the class width may be 5 or 10 or the multiples of 10. Text Book : Basic Concepts and Methodology for the Health Sciences 22 Example 2.3.1 We wish to know how many class interval to have in the frequency distribution of the data in Table 1.4.1 Page 9-10 of ages of 189 subjects who Participated in a study on smoking cessation Solution : Since the number of observations equal 189, then k = 1+3.322(log 169) = 1 + 3.3222 (2.276) 9, R = 82 – 30 = 52 and w = 52 / 9 = 5.778 It is better to let w = 10, then the intervals will be in the form: Text Book : Basic Concepts and Methodology for the Health Sciences 23 Class interval Frequency 30 – 39 11 40 – 49 46 50 – 59 70 60 – 69 70 – 79 45 16 80 – 89 1 Total 189 Text Book : Basic Concepts and Methodology for the Health Sciences Sum of frequency sample size=n= 24 :The Cumulative Frequency It can be computed by adding successive .frequencies :The Cumulative Relative Frequency It can be computed by adding successive relative .frequencies :The Mid-interval It can be computed by adding the lower bound of the interval plus the upper bound of it and then . divide over 2 Text Book : Basic Concepts and Methodology for the Health Sciences 25 For the above example, the following table represents the cumulative frequency, the relative frequency, the cumulative .relative frequency and the mid-interval R.f= freq/n Class interval Mid – interval Frequency (Freq )f Cumulative Frequency Relative Frequency R.f Cumulative Relative Frequency 30 – 39 34.5 11 11 0.0582 0.0582 40 – 49 44.5 46 57 0.2434 - 50 – 59 54.5 - 127 - 0.6720 60 – 69 - 45 - 0.2381 0.9101 70 – 79 74.5 16 188 0.0847 0.9948 80 – 89 84.5 1 189 0.0053 1 Total Text Book : Basic Concepts and Methodology for the Health Sciences 189 1 26 : Example From the above frequency table, complete the table then answer the following questions: 1-The number of objects with age less than 50 years ? 2-The number of objects with age between 40-69 years ? 3-Relative frequency of objects with age between 70-79 years ? 4-Relative frequency of objects with age more than 69 years ? 5-The percentage of objects with age between 4049 years ? Text Book : Basic Concepts and Methodology for the Health Sciences 27 6- The percentage of objects with age less than 60 years ? 7-The Range (R) ? 8- Number of intervals (K)? 9- The width of the interval ( W) ? Text Book : Basic Concepts and Methodology for the Health Sciences 28 Representing the grouped To draw the histogram, the table true classesusing limits should be used. frequency the They can be computed by subtracting 0.5 from the lower limit and adding 0.5 to the upper limit for each interval. histogram True class limits Frequency 29.5 – <39.5 11 39.5 – < 49.5 46 49.5 – < 59.5 70 59.5 – < 69.5 45 69.5 – < 79.5 16 79.5 – < 89.5 1 Total 189 Text Book : Basic Concepts and Methodology for the Health Sciences 29 Representing the grouped frequency table using the Polygon Text Book : Basic Concepts and Methodology for the Health Sciences 30 Exercises Pages : 31 – 34 Questions: 2.3.2(a) , 2.3.5 (a) H.W. : 2.3.6 , 2.3.7(a) Text Book : Basic Concepts and Methodology for the Health Sciences 31 Section (2.4( : Descriptive Statistics Measures of Central Tendency Page 38 - 41 key words: Descriptive Statistic, measure of central tendency ,statistic, parameter, mean )μ( ,median, mode. Text Book : Basic Concepts and Methodology for the Health Sciences 33 The Statistic and The • A Statistic:Parameter It is a descriptive measure computed from the data of a sample. • A Parameter: It is a a descriptive measure computed from the data of a population. Since it is difficult to measure a parameter from the population, a sample is drawn of size n, whose values are 1 , 2 , …, n. From this data, we measure the statistic. Text Book : Basic Concepts and Methodology for the Health Sciences 34 Measures of Central A measure of central tendency is a measure which Tendency indicates where the middle of the data is. The three most commonly used measures of central tendency are: The Mean, the Median, and the Mode. The Mean: It is the average of the data. Text Book : Basic Concepts and Methodology for the Health Sciences 35 TheN Population Mean: = X i 1 N i which is usually unknown, then we use the sample mean to estimate or approximate it. The Sample Mean: x Example: = n x i 1 i n Here is a random sample of size 10 of ages, where 1 = 42, 2 = 28, 3 = 28, 4 = 61, 5 = 31, 6 = 23, 7 = 50, 8 = 34, 9 = 32, 10 = 37. x = (42 + 28 + … + 37( / 10 = 36.6 Text Book : Basic Concepts and Methodology for the Health Sciences 36 Properties of the Mean: • Uniqueness. For a given set of data there is one and only one mean. • Simplicity. It is easy to understand and to compute. • Affected by extreme values. Since all values enter into the computation. Example: Assume the values are 115, 110, 119, 117, 121 and 126. The mean = 118. But assume that the values are 75, 75, 80, 80 and 280. The mean = 118, a value that is not representative of the set of data as a whole. Text Book : Basic Concepts and Methodology for the Health Sciences 37 The Median: When ordering the data, it is the observation that divide the set of observations into two equal parts such that half of the data are before it and the other are after it. * If n is odd, the median will be the middle of observations. It will be the (n+1(/2 th ordered observation. When n = 11, then the median is the 6th observation. * If n is even, there are two middle observations. The median will be the mean of these two middle observations. It will be the (n+1(/2 th ordered observation. When n = 12, then the median is the 6.5th observation, which is an observation halfway between the 6th and 7th ordered observation. Text Book : Basic Concepts and Methodology for the Health Sciences 38 Example: For the same random sample, the ordered observations will be as: 23, 28, 28, 31, 32, 34, 37, 42, 50, 61. Since n = 10, then the median is the 5.5th observation, i.e. = (32+34(/2 = 33. Properties of the Median: • Uniqueness. For a given set of data there is one and only one median. • Simplicity. It is easy to calculate. • It is not affected by extreme values as is the mean. Text Book : Basic Concepts and Methodology for the Health Sciences 39 The Mode: It is the value which occurs most frequently. If all values are different there is no mode. Sometimes, there are more than one mode. Example: For the same random sample, the value 28 is repeated two times, so it is the mode. Properties of the Mode: • • Sometimes, it is not unique. It may be used for describing qualitative data. Text Book : Basic Concepts and Methodology for the Health Sciences 40 Section (2.5( : Descriptive Statistics Measures of Dispersion Page 43 - 46 key words: Descriptive Statistic, measure of dispersion , range ,variance, coefficient of variation. Text Book : Basic Concepts and Methodology for the Health Sciences 42 2.5. Descriptive Statistics – Measures of Dispersion: • A measure of dispersion conveys information regarding the amount of variability present in a set of data. • Note: 1. If all the values are the same → There is no dispersion . 2. If all the values are different → There is a dispersion: 3.If the values close to each other →The amount of Dispersion small. b( If the values are widely scattered → The Dispersion is greater. Text Book : Basic Concepts and Methodology for the Health Sciences 43 Ex. Figure 2.5.1 –Page 43 • ** Measures of Dispersion are : 1.Range )R(. 2. Variance. 3. Standard deviation. 4.Coefficient of variation )C.V(. Text Book : Basic Concepts and Methodology for the Health Sciences 44 1.The Range (R(: • Range =Largest value- Smallest value = • • • • • • • • xL xS Note: Range concern only onto two values Example 2.5.1 Page 40: Refer to Ex 2.4.2.Page 37 Data: 43,66,61,64,65,38,59,57,57,50. Find Range? Range=66-38=28 Text Book : Basic Concepts and Methodology for the Health Sciences 45 2.The Variance: • It measure dispersion relative to the scatter of the values a bout there mean. 2 a( Sample Variance ( S ( : • (x x) ,where x is sample mean n 2 i S2 • • • • • • i 1 n 1 Example 2.5.2 Page 40: Refer to Ex 2.4.2.Page 37 Find Sample Variance of ages , x = 56 Solution: S2= [)43-56( 2 +)66-43( 2+…..+)50-56( 2 ]/ 10 = 900/10 = 90 Text Book : Basic Concepts and Methodology for the Health Sciences 46 • b(Population Variance ( 2 ( : • where , is Population mean 3.The Standard Deviation: • is the square root of variance= Varince 2 S a( Sample Standard Deviation = S = 2 b( Population Standard Deviation = σ = N 2 ( xi )2 i 1 N Text Book : Basic Concepts and Methodology for the Health Sciences 47 4.The Coefficient of Variation (C.V(: • Is a measure use to compare the dispersion in two sets of data which is independent of the unit of the measurement . S C . V (100) where S: Sample standard • X deviation. • X : Sample mean. Text Book : Basic Concepts and Methodology for the Health Sciences 48 :Example 2.5.3 Page 46 • Suppose two samples of human males yield the following data: Sampe1 Sample2 Age 25-year-olds 11year-olds Mean weight 145 pound 80 pound Standard deviation 10 pound 10 pound Text Book : Basic Concepts and Methodology for the Health Sciences 49 • We wish to know which is more variable. • Solution: • c.v )Sample1(= )10/145(*100= 6.9 • c.v )Sample2(= )10/80(*100= 12.5 • Then age of 11-years old)sample2( is more variation Text Book : Basic Concepts and Methodology for the Health Sciences 50 Exercises • • • • Pages : 52 – 53 Questions: 2.5.1 , 2.5.2 ,2.5.3 H.W. :2.5.4 , 2.5.5, 2.5.6, 2.5.14 * Also you can solve in the review questions page 57: • Q: 12,13,14,15,16, 19 Text Book : Basic Concepts and Methodology for the Health Sciences 51 Chapter 3 Probability The Basis of the Statistical inference Key words: Probability, objective Probability, subjective Probability, equally likely Mutually exclusive, multiplicative rule Conditional Probability, independent events, Bayes theorem Text Book : Basic Concepts and Methodology for the Health Sciences 53 Introduction 3.1 The concept of probability is frequently encountered in everyday communication. For example, a physician may say that a patient has a 50-50 chance of surviving a certain operation. Another physician may say that she is 95 percent certain that a patient has a particular disease. Most people express probabilities in terms of percentages. But, it is more convenient to express probabilities as fractions. Thus, we may measure the probability of the occurrence of some event by a number between 0 and 1. The more likely the event, the closer the number is to one. An event that can't occur has a probability of zero, and an event that is certain to occur has a probability of one. Text Book : Basic Concepts and Methodology for the Health Sciences 54 Two views of Probability 3.2 :objective and subjective *** Objective Probability ** Classical and Relative Some definitions: 1.Equally likely outcomes: Are the outcomes that have the same chance of occurring. 2.Mutually exclusive: Two events are said to be mutually exclusive if they cannot occur simultaneously such that A B =Φ . Text Book : Basic Concepts and Methodology for the Health Sciences 55 The universal Set (S): The set all possible outcomes. The empty set Φ : Contain no elements. The event ,E : is a set of outcomes in S which has a certain characteristic. Classical Probability : If an event can occur in N mutually exclusive and equally likely ways, and if m of these possess a triat, E, the probability of the occurrence of event E is equal to m/ N . For Example: in the rolling of the die , each of the six sides is equally likely to be observed . So, the probability that a 4 will be observed is equal to 1/6. Text Book : Basic Concepts and Methodology for the Health Sciences 56 Relative Frequency Probability: Def: If some posses is repeated a large number of times, n, and if some resulting event E occurs m times , the relative frequency of occurrence of E , m/n will be approximately equal to probability of E . P(E) = m/n . *** Subjective Probability : Probability measures the confidence that a particular individual has in the truth of a particular proposition. For Example : the probability that a cure for cancer will be discovered within the next 10 years. Text Book : Basic Concepts and Methodology for the Health Sciences 57 Elementary Properties of 3.3 :Probability Given some process (or experiment ) with n mutually exclusive events E1, E2, E3,…………, En, then 1-P(Ei ) 0, i= 1,2,3,……n 2- P(E1 )+ P(E2) +……+P(En )=1 3- P(Ei +EJ )= P(Ei )+ P(EJ ), Ei ,EJ are mutually exclusive Text Book : Basic Concepts and Methodology for the Health Sciences 58 Rules of Probability 1-Addition Rule P(A U B)= P(A) + P(B) – P (A∩B ) 2- If A and B are mutually exclusive (disjoint) ,then P (A∩B ) = 0 Then , addition rule is P(A B)= P(A) + P(B) . 3- Complementary Rule P(A' )= 1 – P(A) w...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern