This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Biostatistics (Based on www.statsoft.com) Chapter I : Elementary concepts and descriptive statistics What are variables. Variables are things that we measure, control, or manipulate in
research. They differ in many respects, most notably in the role they are given in our
research and in the type of measures that can be applied to them. Correlational vs. experimental research. Most empirical research belongs clearly to
one of those two general categories. In correlational research we do not (or at least try not
to) inﬂuence any variables but only measure them and look for relations (correlations)
between some set of variables, such as blood pressure and cholesterol level. In
experimental research, we manipulate some variables and then measure the effects of this
manipulation on other variables; for example, a researcher might artiﬁcially increase
blood pressure and then record cholesterol level. Data analysis in experimental research
also comes down to calculating "correlations" between variables, speciﬁcally, those
manipulated and those affected by the manipulation. However, experimental data may
potentially provide qualitatively better information: Only experimental data can
conclusively demonstrate causal relations between variables. For example, if we found
that whenever we change variable A then variable B changes, then we can conclude that
"A inﬂuences B." Data from correlational research can only be "interpreted" in causal
terms based on some theories that we have, but correlational data cannot conclusively prove causality. Dependent vs. independent variables. Independent variables are those that are
manipulated whereas dependent variables are only measured or registered. This
distinction appears terminologically confusing to many because, as some students say,
"all variables depend on something." However, once you get used to this distinction, it
becomes indispensable. The terms dependent and independent variable apply mostly to
experimental research where some variables are manipulated, and in this sense they are
"independent" from the initial reaction patterns, features, intentions, etc. of the subjects.
Some other variables are expected to be "dependent" on the manipulation or experimental
conditions. That is to say, they depend on "what the subject will do" in response.
Somewhat contrary to the nature of this distinction, these terms are also used in studies
where we do not literally manipulate independent variables, but only assign subjects to
"experimental groups" based on some preexisting properties of the subjects. For
example, if in an experiment, males are compared with females regarding their white cell
count (WCC), Gender could be called the independent variable and WCC the dependent variable. Measurement scales. Variables differ in "how well" they can be measured, i.e., in how much measurable information their measurement scale can provide. There is obviously
some measurement error involved in every measurement, which determines the "amount of information" that we can obtain. Another factor that determines the amount of
information that can be provided by a variable is its "type of measurement scale."
Speciﬁcally variables are classiﬁed as (a) nominal, (b) ordinal, (c) interval or ((1) ratio. a. Nominal variables allow for only qualitative classiﬁcation. That is, they can be
measured only in terms of whether the individual items belong to some
distinctively different categories, but we cannot quantify or even rank order those
categories. For example, all we can say is that 2 individuals are different in terms
of variable A (e.g., they are of different race), but we cannot say which one "has
more" of the quality represented by the variable. Typical examples of nominal
variables are gender, race, color, city, etc. b. Ordinal variables allow us to rank order the items we measure in terms of which
has less and which has more of the quality represented by the variable, but still
they do not allow us to say "how much more." A typical example of an ordinal
variable is the socioeconomic status of families. For example, we know that
uppermiddle is higher than middle but we cannot say that it is, for example, 18%
higher. Also this very distinction between nominal, ordinal, and interval scales
itself represents a good example of an ordinal variable. For example, we can say
that nominal measurement provides less information than ordinal measurement,
but we cannot say ”how much less" or how this difference compares to the
difference between ordinal and interval scales. c. Interval variables allow us not only to rank order the items that are measured, but
also to quantify and compare the sizes of differences between them. For example,
temperature, as measured in degrees Fahrenheit or Celsius, constitutes an interval
scale. We can say that a temperature of 40 degrees is higher than a temperature of
30 degrees, and that an increase from 20 to 40 degrees is twice as much as an
increase from 30 to 40 degrees. d. Ratio variables are very similar to interval variables; in addition to all the
properties of interval variables, they feature an identiﬁable absolute zero point,
thus they allow for statements such as x is two times more than y. Typical
examples of ratio scales are measures of time or space. For example, as the Kelvin
temperature scale is a ratio scale, not only can we say that a temperature of 200
degrees is higher than one of 100 degrees, we can correctly state that it is twice as
high. Interval scales do not have the ratio property. Most statistical data analysis
procedures do not distinguish between the interval and ratio properties of the
measurement scales True" Mean and Conﬁdence Interval. Probably the most often used descriptive
statistic is the mean. The mean is a particularly informative measure of the "central
tendency" of the variable if it is reported along with its conﬁdence intervals. As
mentioned earlier, usually we are interested in statistics (such as the mean) from our
sample only to the extent to which they can infer information about the population. The
conﬁdence intervals for the mean give us a range of values around the mean where we
expect the "true" (population) mean is located (with a given level of certainty). For
example, if the mean in your sample is 23, and the lower and upper limits of the p=.05
conﬁdence interval are 19 and 27 respectively, then you can conclude that there is a 95%
probability (explained below) that the population mean is greater than 19 and lower than
27. If you set the plevel to a smaller value, then the interval would become wider thereby
increasing the "certainty" of the estimate, and vice versa; as we all know from the
weather forecast, the more "vague" the prediction (i.e., wider the conﬁdence interval), the more likely it will materialize. Note that the width of the conﬁdence interval depends on
the sample size and on the variation of data values. The larger the sample size, the more
reliable its mean. The larger the variation, the less reliable the mean. The calculation of
conﬁdence intervals is based on the assumption that the variable is normally distributed
(explained below) in the population. The estimate may not be valid if this assumption is
not met, unless the sample size is large, say n=100 or more. Shape of the Distribution, Normality. An important aspect of the "description" of a
variable is the shape of its distribution, which tells you the frequency of values from
different ranges of the variable. Typically, a reSearcher is interested in how well the
distribution can be approximated by the normal distribution (see below for an example of
this distribution). Simple descriptive statistics can provide some information relevant to
this issue. For example, if the skewness (which measures the deviation of the distribution
from symmetry) is clearly different from 0, then that distribution is asymmetrical, while normal distributions are perfectly symmetrical. If the kurtosz's (which measures
"peakedness" of the distribution) is clearly different from 0, then the distribution is either ﬂatter or more peaked than normal; the kurtosis of the normal distribution is 0. More precise information can be obtained by performing one of the tests ofnormalz'ty to
determine the probability that the sample came from a normally distributed population of
observations (e.g., the socalled KolmogorovSmirnov test, or the ShapiroWilks' W test.
However, none of these tests can entirely substitute for a visual examination of the data
using a histogram (i.e., a graph that shows the frequency distribution of a variable). ‘ was: Mama; [2sz Wet The graph allows you to evaluate the normality of the empirical distribution because it
also shows the normal curve superimposed over the histogram. It also allows you to examine various aspects of the distribution qualitatively. For example, the distribution
could be bimodal (have 2 peaks). This might suggest that the sample is not homogeneous
but possibly its elements came from two different populations, each more or less
normally distributed. In such cases, in order to understand the nature of the variable in
question, you should look for a way to quantitatively identify the two sub—samples Why the "Normal distribution" is important. The distribution of many test statistics is
normal or follows some form that can be derived from the normal distribution. In this
sense, philosophically speaking, the Normal distribution represents one of the empirically
veriﬁed elementary "truths about the general nature of reality," and its status can be
compared to the one of fundamental laws of natural sciences. The exact shape of the
normal distribution (the characteristic "bell curve") is deﬁned by a function which has
only two parameters: mean and standard deviation What is "statistical signiﬁcance" (pvalue). The statistical signiﬁcance of a result is the
probability that the observed relationship (e.g., between variables) or a difference (e.g.,
between means) in a sample occurred by pure chance ("luck of the draw"), and that in the
population from which the sample was drawn, no such relationship or differences exist.
Using less technical terms, one could say that the statistical signiﬁcance of a result tells
us something about the degree to which the result is "true" (in the sense of being
"representative of the population"). More technically, the value of the pvalue represents
a decreasing index of the reliability of a result . The higher the pValue, the less we can
believe that the observed relation between variables in the sample is a reliable indicator
of the relation between the respective variables in the population. Speciﬁcally, the p
value represents the probability of error that is involved in accepting our observed result
as valid, that is, as "representative of the population." For example, a pValue of .05
(i.e.,1/20) indicates that there is a 5% probability that the relation between the variables
found in our sample is a ”ﬂuke." In other words, assuming that in the population there
was no relation between those variables whatsoever, and we were repeating experiments
like ours one after another, we could expect that approximately in every 20 replications
of the experiment there would be one in which the relation between the variables in
question would be equal or stronger than in ours. (Note that this is not the same as saying
that, given that there IS a relationship between the variables, we can expect to replicate
the results 5% of the time or 95% of the time; when there is a relationship between the
variables in the population, the probability of replicating the study and ﬁnding that
relationship is related to the statistical power of the design. In many areas of research, the
p—value of .05 is customarily treated as a "borderline acceptable" error level. Statistical Power. The probability of rejecting a false statistical null hypothesis Hypothesis Testing. Suppose that the politician was interested in showing that more
than the majority of people supported her position. Her question, in statistical terms: "Is W > .50?" Being an optimist, she believes that it is.
In statistics, the following strategy is quite common. State as a "statistical null
hypothesis" something that is the logical opposite of what you believe. Call this hypothesis H0. Gather data. Then, using statistical theory, show from the data that it is
likely H0 is false, and should be rejected. By rejecting H0, you support what you actually believe. This kind of situation, which is
typical in many ﬁelds of research, for example, is called "RejectSupport testing," (RS
testing) because rejecting the null hypothesis supports the experimenter's theory. The null hypothesis is either true or false, and the statistical decision process is set up so
that there are no "ties." The null hypothesis is either rejected or not rejected.
Consequently, before undertaking the experiment, we can be certain that only 4 possible
things can happen. These are summarized in the table below E §State of the World
:IIO F11" . ,. .. i gCorrect éType II Err
iHOjAcce tance Eﬂ
Demsmn if... RWW . 3.... W Type I Error Correct
Hl .
,r ‘ gRejection 2‘
2
s
S
:2
§
, 3 Note that there are two kinds of errors represented in the table. Many statistics textbooks
present a point of View that is common in the social sciences, i.e., that 05, the Type I error rate, must be kept at or below .05, and that, if at all possible, 3, the Type II error rate, must be kept low as well. "Statistical power," which is equal to l — [3, must be kept
correspondingly high. Ideally, power should be at least .80 to detect a reasonable departure from the null hypothesis. How to determine that a result is "really" signiﬁcant. There is no way to avoid
arbitrariness in the ﬁnal decision as to what level of signiﬁcance will be treated as really
"signiﬁcant." That is, the selection of some level of signiﬁcance, up to which the results
will be rejected as invalid, is arbitrary. In practice, the ﬁnal decision usually depends on
whether the outcome was predicted a priori or only found post hoc in the course of many
analyses and comparisons performed on the data set, on the total amount of consistent
supportive evidence in the entire data set, and on "traditions" existing in the particular
area of research. Typically, in many sciences, results that yield pi .05 are considered
borderline statistically signiﬁcant but remember that this level of signiﬁcance still
involves a pretty high probability of error (5%). Results that are signiﬁcant at the p E .01
level are commonly considered statistically signiﬁcant, and p E .005 or p S .001 levels
are often called "highly" signiﬁcant. But remember that those classiﬁcations represent
nothing else but arbitrary conventions that are only informally based on general research experience. ...
View
Full Document
 Spring '10
 MarnikVuylsteke

Click to edit the document details