Probability Practice Problems
1.
Hyperlipidemia in children has been hypothesized to be related to high cholesterol in their
parents. The following data were collected on parents and children.
CHILD
Not Hyperlipidemic
Hyperlipidemic
Both Parents
Sigma notation
Sigma notation is a method used to write out a long sum in a concise way. In this unit we look
at ways of using sigma notation, and establish some useful rules.
In order to master the techniques explained here it is vital that you undertake
Measures of Central Tendency
Numerical Methods
It is often desirable to describe some particular
characteristic of a data set numerically. Perhaps
the most familiar measure of
2 x 2 Table in Excel
2 x 2 Table in Excel
We can use Excel to create 2 x 2 tables to
calculate probabilities.
Use Insert PivotTable
Measures of Variability
Measures of Variability
It is important to be able to quantify the degree of
spread or scatter in a data set. Measures of this
sort are referred to as
Foundations of Biostatistics
What is Statistics?
Statistics is a discipline concerned with
1. the organization and summarization of data, and
2. the drawing of inferences about
Scales of Measurement
Data
When a characteristic is measured and
recorded, the result is termed data.
If a series of blood pressures is recorded on a
Statistical Notation
Sum
population mean
sample mean
population proportion
sample proportion
1-
N
population size
n
sample size
population standard deviation
s
sample standard deviation
alpha, significance level
df
Degrees of Freedom
Z
Z score Appendix A
Summation
Summation Notation
Summation notation is the notation used to
show exactly how data are to be summed.
x = variable
c = constant
Count - Smoker
Sex
F
M
Total Result
Smoker
No
6
3
9
Yes Total Result
2
8
4
7
6
15
Count of Smoker Column Labels
Row Labels
No
Yes Grand Total
F
6
2
8
M
Grand Total
3
9
4
6
7
15
Probability of being Male
0.466667
Probability of Smoking
0.4
Probability of S
Distributions and Graphs
Descriptive Statistics
Goal: take data from a sample and present it in
a concise, understandable way.
Simplest way to present data is the frequency
Scales of Measurement Practice Problems
For each of the following measures, indicate the scale of measurement (nominal, ordinal,
interval, ratio) and whether the measure is discrete or continuous
Measure
Gender (male, female)
Reaction time measured in sec
Measures of Distributional Shape
Measures of Distributional Shape
Certain aspects of distribution shapes can be
characterized numerically.
The two most common of these, skew
Confidence Intervals for
Inference
The two basic forms of inference are
hypothesis testing and confidence intervals.
Hypothesis tests pose the question as to
1. A national survey of 40,000 adults in the US in 1992 found that 26% of adults were current smokers. A
similar survey of 37,000 adults in the US in 1966 found that 43% of adults were current smokers. Test at
a 5% level of significance whether the propor
A study is conducted in patients with HIV. The primary outcome is CD4 cell count, which is the measure
of the stage of the disease. A regression analysis is performed relating CD4 count to the duration of HIV
in years. The computer output appears below: F
Using the data below, generate a 95% confidence interval for the difference in
proportions of women delivering preterm in the experimental and standard drug
treatment groups.
Preterm Delivery
Yes
No
Total
Experimental Drug
17
83
100
Standard Drug
23
77
10
A group of HIV positive subjects, all of whom are being treated medically in a similar fashion, are
randomly assigned to one of three groups. Two groups are given two different diet supplements while
the remaining group is given an inert similar appearing
A study is run comparing HDL cholesterol levels between men who exercise regularly and those who do
not. The data are shown below.
Regular Exercise
N
Mean
Std Dev
Yes
35
48.5
12.5
No
120
56.9
11.9
Generate a 95% confidence interval for the difference in m
A researcher wants to be 95% certain of the proportion of full time university students who have a part
time job in excess of 20 hours per week. Assume all assumptions have been met for using the procedure.
What statistical procedure would most likely be
Chi Square Practice Problems with Answers
1.
The following data was collected in a clinical trial evaluating a new compound designed
to improve wound healing in trauma patients. The new compound was compared against
a placebo. After treatment for 5 days w
Confidence Interval for the
Difference Between Means
CI for the Difference Between Means
The independent samples t test attempts to
determine whether there is a difference
Assignment 3: One and Two Sample Methods
(75 points)
For this project, we will use a subset of the North Carolina birth data set. The data set ncbirth200.sav is a random sample
of 200 births from the data set ncbirth1450.sav. When doing this assignment, m
A clinical trial is run to investigate the effectiveness of an experimental drug in reducing preterm delivery
to a drug considered standard care and to placebo. Pregnant women are enrolled and randomly assigned
to receive either the experimental drug, the