**Unformatted text preview: **(Biostatistics)
BARACK O. ABONYO Chapter 1 Introduction To
Biostatistics
2 Key words : Statistics , data , Biostatistics,
Variable ,Population ,Sample Text Book : Basic Concepts
and Methodology for the
Health Sciences 3 Introduction
Some Basic concepts
Statistics is a field of study
concerned with
1- collection, organization,
summarization and analysis of data.
2- drawing of inferences about a
body of data when only a part of
the data is observed.
Statisticians try to interpret and
communicate the results to
others.
Text Book : Basic Concepts and
Methodology for the Health Sciences 4 * Biostatistics:
The tools of statistics are employed in
many fields:
business, education, psychology,
agriculture, economics, … etc.
When the data analyzed are derived
from the biological science and
medicine,
we use the term biostatistics to
distinguish this particular application
of statistical tools and concepts. Text Book : Basic Concepts and
Methodology for the Health Sciences 5 :Data
• The raw material of Statistics is data.
• We may define data as figures. Figures
result from the process of counting or
from taking a measurement. • For example:
• - When a hospital administrator counts
the number of patients (counting).
• - When a nurse weighs a patient
(measurement) Text Book : Basic Concepts and
Methodology for the Health Sciences 6 :Sources of Data*
We search for suitable data to serve
as the raw material for our
investigation.
Such data are available from one or
more of the following sources:
1- Routinely kept records. For example:
- Hospital medical records contain
information on patients.
- Hospital accounting records contain
data on the facility’s business
activities.
Text Book : Basic Concepts and
Methodology for the Health Sciences 7 2- External sources.
The data needed to answer a
question may already exist in the
form of
published reports, commercially
available data banks, or the research
literature, i.e. someone else has
already asked the same question.
Text Book : Basic Concepts and
Methodology for the Health Sciences 8 3- Surveys:
The source may be a survey, if the data
needed is about answering certain
questions. For example:
If the administrator of a clinic wishes to
obtain information regarding the mode
of transportation used by patients to
visit the clinic,
then a survey may be conducted among
patients to obtain this information.
Text Book : Basic Concepts and
Methodology for the Health Sciences 9 4- Experiments.
Frequently the data needed to answer
a question are available only as the
result of an experiment. For example:
If a nurse wishes to know which of several
strategies is best for maximizing patient
compliance,
she might conduct an experiment in
which the different strategies of
motivating compliance
are tried with different patients.
Text Book : Basic Concepts and
Methodology for the Health Sciences 10 :A variable*
It is a characteristic that takes on
different values in different persons,
places, or things. For example:
- heart rate,
the heights of adult males,
the weights of preschool children,
the ages of patients seen in a dental
clinic. Text Book : Basic Concepts and
Methodology for the Health Sciences 11 Types of variables
QuantitativeQualitative Quantitative
Variables Qualitative Variables
Many characteristics
It can be measured
are not capable of
in the usual
being measured.
sense.
Some of them can be
For example:
ordered or ranked. - the heights of
For example:
adult males,
- classification of people
- the weights of
into socio-economic
preschool
groups,
children,
- social classes based on
- the ages of
education, etc.
patients seen inTextaBook : income,
Basic Concepts and
12
Methodology
for
the
Health
Sciences
dental clinic. Types of quantitative variables
Discrete Continuous A discrete variable A continuous variable
is characterized by
gaps or
interruptions in the
values that it can
assume. For example:
- can assume any value within
a specified relevant interval
of values assumed by the
variable. For example: - Height,
The number of daily
- weight,
admissions to a
- skull circumference.
general hospital,
The number of
No matter how close together
decayed, missing or
the observed heights of two
filled teeth per child
people, we can find another
person whose height falls
in an
somewhere in between.
elementary
Text Book : Basic Concepts and
13
Methodology for the Health Sciences
school. * A population:
It is the largest collection of values
of a random variable for which we
have an interest at a particular
time. For example:
The weights of all the children
enrolled in a certain elementary
school.
Populations may be finite or infinite.
Text Book : Basic Concepts and
Methodology for the Health Sciences 14 * A sample:
It is a part of a population. For example:
The weights of only a fraction
of these children. Text Book : Basic Concepts and
Methodology for the Health Sciences 15 Strategies for
understanding the
meanings of Data
(Pages( 19 – 27 Key words frequency table, bar chart ,range
width of interval , mid-interval
Histogram , Polygon Text Book : Basic Concepts and
Methodology for the Health
Sciences 17 Descriptive Statistics
Frequency Distribution
for Discrete Random Variables Example: Suppose that we take a
sample of size 16 from
children in a primary school
and get the following data
about the number of their
decayed teeth,
3,5,2,4,0,1,3,5,2,3,2,3,3,2,4,1
To construct a frequency
table:
1- Order the values from the
smallest to the largest.
0,1,1,2,2,2,2,3,3,3,3,3,4,4,5,5
2- Count how many
numbers are the same. No. of
decayed
teeth Frequency Relative
Frequency 0
1
2
3
4
5 1
2
4
5
2
2 0.0625
0.125
0.25
0.3125
0.125
0.125 Total 16 1 Representing the
simple
frequency
table
We can represent
the above simple
using
the
bar
chart
frequency table
6 5 using the bar
chart. 5 4 4 3 Frequency 2 1 2 2 2 4.00 5.00 1 0 1.00
2.00
Text Book : Basic.00
Concepts
and
Methodology for the Health
Sciences
Number of decayed teeth 3.00 19 2.3 Frequency Distribution
for Continuous Random Variables
For large samples, we can’t use the simple frequency table to
represent the data.
We need to divide the data into groups or intervals or
classes.
So, we need to determine:
1- The number of intervals (k).
Too few intervals are not good because information will be
lost.
Too many intervals are not helpful to summarize the data.
A commonly followed rule is that 6 ≤ k ≤ 15,
or the following formula may be used,
k = 1 + 3.322 (log n(
Text Book : Basic Concepts and
Methodology for the Health
Sciences 20 2- The range (R).
It is the difference between the
largest and the smallest observation
in the data set.
3- The Width of the interval (w).
Class intervals generally should be of
the same width. Thus, if we want k
intervals, then w is chosen such that
w ≥ R / k.
Text Book : Basic Concepts and
Methodology for the Health
Sciences 21 Example:
Assume that the number of observations
equal 100, then
k = 1+3.322(log 100)
= 1 + 3.3222 (2) = 7.6 8.
Assume that the smallest value = 5 and the
largest one of the data = 61, then
R = 61 – 5 = 56 and
w = 56 / 8 = 7. To make the summarization more
comprehensible, the class width may be 5
or 10 or the multiples of 10.
Text Book : Basic Concepts and
Methodology for the Health
Sciences 22 Example 2.3.1 We wish to know how many class interval to have
in the frequency distribution of the data in Table
1.4.1 Page 9-10 of ages of 189 subjects who
Participated in a study on smoking cessation
Solution :
Since the number of observations
equal 189, then
k = 1+3.322(log 169)
= 1 + 3.3222 (2.276) 9,
R = 82 – 30 = 52 and
w = 52 / 9 = 5.778
It is better to let w = 10, then the intervals
will be in the form:
Text Book : Basic Concepts and
Methodology for the Health
Sciences 23 Class interval Frequency 30 – 39 11 40 – 49 46 50 – 59 70 60 – 69
70 – 79 45
16 80 – 89 1 Total 189
Text Book : Basic Concepts and
Methodology for the Health
Sciences Sum of frequency
sample size=n=
24 :The Cumulative Frequency
It can be computed by adding successive
.frequencies :The Cumulative Relative Frequency
It can be computed by adding successive relative
.frequencies :The Mid-interval It can be computed by adding the lower bound of
the interval plus the upper bound of it and then
. divide over 2 Text Book : Basic Concepts and
Methodology for the Health
Sciences 25 For the above example, the following table represents the
cumulative frequency, the relative frequency, the cumulative
.relative frequency and the mid-interval R.f= freq/n Class
interval Mid –
interval Frequency
(Freq )f Cumulative
Frequency Relative
Frequency
R.f Cumulative
Relative
Frequency 30 – 39 34.5 11 11 0.0582 0.0582 40 – 49 44.5 46 57 0.2434 - 50 – 59 54.5 - 127 - 0.6720 60 – 69 - 45 - 0.2381 0.9101 70 – 79 74.5 16 188 0.0847 0.9948 80 – 89 84.5 1 189 0.0053 1 Total Text Book : Basic Concepts and
Methodology for the Health
Sciences 189 1 26 : Example From the above frequency table, complete the
table then answer the following questions:
1-The number of objects with age less than 50
years ?
2-The number of objects with age between 40-69
years ?
3-Relative frequency of objects with age between
70-79 years ?
4-Relative frequency of objects with age more
than 69 years ?
5-The percentage of objects with age between 4049 years ?
Text Book : Basic Concepts and
Methodology for the Health
Sciences 27 6- The percentage of objects with age less than
60 years ?
7-The Range (R) ?
8- Number of intervals (K)?
9- The width of the interval ( W) ? Text Book : Basic Concepts and
Methodology for the Health
Sciences 28 Representing the grouped
To draw
the histogram, the table
true classesusing
limits should
be used.
frequency
the
They can be computed by subtracting 0.5 from the lower
limit and adding 0.5 to the upper limit for each interval.
histogram
True class limits Frequency
29.5 – <39.5 11 39.5 – < 49.5 46 49.5 – < 59.5 70 59.5 – < 69.5 45 69.5 – < 79.5 16 79.5 – < 89.5 1 Total 189 Text Book : Basic Concepts and
Methodology for the Health
Sciences 29 Representing the grouped
frequency table using the
Polygon Text Book : Basic Concepts and
Methodology for the Health
Sciences 30 Exercises Pages : 31 – 34 Questions: 2.3.2(a) , 2.3.5 (a) H.W. : 2.3.6 , 2.3.7(a) Text Book : Basic Concepts and
Methodology for the Health
Sciences 31 Section (2.4( :
Descriptive Statistics
Measures of Central
Tendency
Page 38 - 41 key words:
Descriptive Statistic, measure of
central tendency ,statistic, parameter,
mean )μ( ,median, mode. Text Book : Basic Concepts and
Methodology for the Health Sciences 33 The Statistic and The
• A Statistic:Parameter
It is a descriptive measure computed from the
data of a sample. • A Parameter:
It is a a descriptive measure computed from the
data of a population.
Since it is difficult to measure a parameter from the
population, a sample is drawn of size n, whose
values are 1 , 2 , …, n. From this data, we measure
the statistic.
Text Book : Basic Concepts and
Methodology for the Health Sciences 34 Measures of Central
A measure of central tendency is a measure which
Tendency
indicates
where the middle of the data is.
The three most commonly used measures of central
tendency are: The Mean, the Median, and the
Mode.
The Mean:
It is the average of the data.
Text Book : Basic Concepts and
Methodology for the Health Sciences 35 TheN Population Mean:
= X
i 1 N i which is usually unknown, then we use the sample mean to estimate or approximate it. The Sample Mean: x Example: = n x
i 1 i n Here is a random sample of size 10 of ages, where 1 = 42, 2 = 28, 3 = 28, 4 = 61, 5 = 31, 6 = 23, 7 = 50, 8 = 34, 9 = 32, 10 = 37. x = (42 + 28 + … + 37( / 10 = 36.6
Text Book : Basic Concepts and
Methodology for the Health Sciences 36 Properties of the Mean:
• Uniqueness. For a given set of data there is
one and only one mean. • Simplicity. It is easy to understand and to
compute. • Affected by extreme values. Since all
values enter into the computation. Example: Assume the values are 115, 110, 119, 117, 121
and 126. The mean = 118.
But assume that the values are 75, 75, 80, 80 and 280. The
mean = 118, a value that is not representative of the set of
data as a whole.
Text Book : Basic Concepts and
Methodology for the Health Sciences 37 The Median:
When ordering the data, it is the observation that divide the
set of observations into two equal parts such that half of
the data are before it and the other are after it.
* If n is odd, the median will be the middle of observations. It
will be the (n+1(/2 th ordered observation.
When n = 11, then the median is the 6th observation.
* If n is even, there are two middle observations. The median
will be the mean of these two middle observations. It will
be the (n+1(/2 th ordered observation.
When n = 12, then the median is the 6.5th observation, which
is an observation halfway between the 6th and 7th ordered
observation. Text Book : Basic Concepts and
Methodology for the Health Sciences 38 Example:
For the same random sample, the ordered
observations will be as:
23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
Since n = 10, then the median is the 5.5th
observation, i.e. = (32+34(/2 = 33. Properties of the Median:
• Uniqueness. For a given set of data there is
one and only one median. • Simplicity. It is easy to calculate.
• It is not affected by extreme values as
is the mean. Text Book : Basic Concepts and
Methodology for the Health Sciences 39 The Mode:
It is the value which occurs most frequently.
If all values are different there is no mode.
Sometimes, there are more than one mode. Example:
For the same random sample, the value 28 is
repeated two times, so it is the mode. Properties of the Mode:
•
• Sometimes, it is not unique.
It may be used for describing qualitative
data.
Text Book : Basic Concepts and
Methodology for the Health Sciences 40 Section (2.5( :
Descriptive Statistics
Measures of Dispersion
Page 43 - 46 key words:
Descriptive Statistic, measure of
dispersion , range ,variance, coefficient of
variation. Text Book : Basic Concepts and
Methodology for the Health Sciences 42 2.5. Descriptive Statistics –
Measures of Dispersion:
• A measure of dispersion conveys information
regarding the amount of variability present in a set of
data.
•
Note:
1. If all the values are the same
→ There is no dispersion .
2. If all the values are different
→ There is a dispersion:
3.If the values close to each other
→The amount of Dispersion small.
b( If the values are widely scattered
→ The Dispersion is greater.
Text Book : Basic Concepts and
Methodology for the Health Sciences 43 Ex. Figure 2.5.1 –Page 43
• ** Measures of Dispersion are :
1.Range )R(.
2. Variance.
3. Standard deviation.
4.Coefficient of variation )C.V(. Text Book : Basic Concepts and
Methodology for the Health Sciences 44 1.The Range (R(:
• Range =Largest value- Smallest value =
•
•
•
•
•
•
•
• xL xS Note:
Range concern only onto two values
Example 2.5.1 Page 40:
Refer to Ex 2.4.2.Page 37
Data:
43,66,61,64,65,38,59,57,57,50.
Find Range?
Range=66-38=28
Text Book : Basic Concepts and
Methodology for the Health Sciences 45 2.The Variance:
• It measure dispersion relative to the scatter of the values
a bout there mean.
2
a( Sample Variance ( S ( :
• (x x)
,where x is sample mean
n 2 i S2 •
•
•
•
•
• i 1 n 1 Example 2.5.2 Page 40:
Refer to Ex 2.4.2.Page 37
Find Sample Variance of ages , x = 56
Solution:
S2= [)43-56( 2 +)66-43( 2+…..+)50-56( 2 ]/ 10
= 900/10 = 90
Text Book : Basic Concepts and
Methodology for the Health Sciences 46 • b(Population Variance ( 2 ( : • where , is Population mean
3.The Standard Deviation:
• is the square root of variance= Varince
2
S
a( Sample Standard Deviation = S =
2 b( Population Standard Deviation = σ =
N 2 ( xi )2 i
1 N Text Book : Basic Concepts and
Methodology for the Health Sciences 47 4.The Coefficient of Variation
(C.V(:
• Is a measure use to compare the
dispersion in two sets of data which is
independent of the unit of the
measurement .
S
C
.
V (100) where S: Sample standard
•
X
deviation.
• X : Sample mean. Text Book : Basic Concepts and
Methodology for the Health Sciences 48 :Example 2.5.3 Page 46
• Suppose two samples of human males yield
the following data:
Sampe1
Sample2
Age
25-year-olds
11year-olds
Mean weight
145 pound
80 pound
Standard deviation 10 pound
10 pound Text Book : Basic Concepts and
Methodology for the Health Sciences 49 • We wish to know which is more variable.
• Solution:
• c.v )Sample1(= )10/145(*100= 6.9
• c.v )Sample2(= )10/80(*100= 12.5
• Then age of 11-years old)sample2( is more
variation Text Book : Basic Concepts and
Methodology for the Health Sciences 50 Exercises
•
•
•
• Pages : 52 – 53
Questions: 2.5.1 , 2.5.2 ,2.5.3
H.W. :2.5.4 , 2.5.5, 2.5.6, 2.5.14
* Also you can solve in the review
questions page 57:
• Q: 12,13,14,15,16, 19 Text Book : Basic Concepts and
Methodology for the Health Sciences 51 Chapter 3
Probability
The Basis of the
Statistical inference Key words: Probability, objective Probability,
subjective Probability, equally likely
Mutually exclusive, multiplicative rule
Conditional Probability, independent events,
Bayes theorem Text Book : Basic Concepts and
Methodology for the Health Sciences 53 Introduction 3.1 The concept of probability is frequently encountered in everyday
communication. For example, a physician may say that a
patient has a 50-50 chance of surviving a certain operation.
Another physician may say that she is 95 percent certain that a
patient has a particular disease.
Most people express probabilities in terms of percentages.
But, it is more convenient to express probabilities as fractions.
Thus, we may measure the probability of the occurrence of
some event by a number between 0 and 1.
The more likely the event, the closer the number is to one. An
event that can't occur has a probability of zero, and an event
that is certain to occur has a probability of one.
Text Book : Basic Concepts and
Methodology for the Health Sciences 54 Two views of Probability 3.2
:objective and subjective *** Objective Probability ** Classical and Relative Some definitions:
1.Equally likely outcomes:
Are the outcomes that have the same
chance of occurring.
2.Mutually exclusive:
Two events are said to be mutually exclusive
if they cannot occur simultaneously such that A B =Φ . Text Book : Basic Concepts and
Methodology for the Health Sciences 55 The universal Set (S): The set all
possible outcomes.
The empty set Φ : Contain no elements.
The event ,E : is a set of outcomes in S
which has a certain characteristic.
Classical Probability : If an event can
occur in N mutually exclusive and equally
likely ways, and if m of these possess a
triat, E, the probability of the occurrence of
event E is equal to m/ N .
For Example: in the rolling of the die ,
each of the six sides is equally likely to be
observed . So, the probability that a 4 will
be observed is equal to 1/6.
Text Book : Basic Concepts and
Methodology for the Health Sciences 56 Relative Frequency Probability:
Def: If some posses is repeated a large
number of times, n, and if some resulting
event E occurs m times , the relative
frequency of occurrence of E , m/n will be
approximately equal to probability of E .
P(E) = m/n .
*** Subjective Probability :
Probability measures the confidence that a
particular individual has in the truth of a
particular proposition.
For Example : the probability that a cure
for cancer will be discovered within the
next 10 years.
Text Book : Basic Concepts and
Methodology for the Health Sciences 57 Elementary Properties of 3.3
:Probability Given some process (or experiment )
with n mutually exclusive events E1,
E2, E3,…………, En, then
1-P(Ei ) 0, i= 1,2,3,……n
2- P(E1 )+ P(E2) +……+P(En )=1
3- P(Ei +EJ )= P(Ei )+ P(EJ ),
Ei ,EJ are mutually exclusive Text Book : Basic Concepts and
Methodology for the Health Sciences 58 Rules of Probability 1-Addition Rule
P(A U B)= P(A) + P(B) – P (A∩B )
2- If A and B are mutually exclusive
(disjoint) ,then
P (A∩B ) = 0
Then , addition rule is
P(A B)= P(A) + P(B) .
3- Complementary Rule
P(A' )= 1 – P(A)
w...

View
Full Document

- Summer '16
- Dr. Myte
- Normal Distribution, Notes, Probability theory