Tarter, M. E. (2008), Data transformations. In S. Boslaugh (Ed.),
Encyclopedia of Epidemiology
(pp. 249–254). Thousand Oaks, CA: Sage Publications.
Data transformations modify measured values systematically.
For example, suppose
the heart-rate (HR) ratio variate,
= (HR Work – HR Rest)/(HR Predicted Maximum
– HR Rest) is transformed to the new variate arcsin(
In terms of the match up
between, on the one hand, the statistical methodology applied to study arcsin(
and, on the other, the assumptions that underlie this methodology, a variate like
) is often a preferred transform of a variate like
In modern statistical usage transformations often help preprocess raw data prior to
the implementation of a general-purpose software package.
Were the steps from data
input to some display or printing device’s output compared to a journey by car through a
city, a transformation like arcsin(
) would play the role of an access road to the
software package’s freeway onramp.
Software validity or, loosely speaking, journey
safety, depends on underlying assumptions.
Hence data transformations can be classified
on the basis of types of assumptions.
These include a measured variate’s
Normality, model linearity and/or variate homoscedasticity, i.e
equal standard deviations.
In addition, some useful transformations are not designed to
preprocess measurements individually.
Instead, once an estimator or test statistic has
been computed using raw measurements, these transformations can help enhance the
Normality of the estimator or test statistic.
Transformations and Simulated Data
Besides the transformation of measured values, among the steps implemented for the
purpose of simulating artificial data values a transformation procedure is usually applied.
For example, by using a pair of uniformly distributed random numbers as input a Box-
Muller transformation (BMT) generates a pair of independent,
Normal, in other
words, Normal with zero expectation and unit variance, variates.
To answer the two
questions, (1) Why does the BMT have so many applications? And, (2) How are
transformation components assembled? it is helpful to call upon the following notational
The two Greek letters,
, represent the standard Normal density
function, i.e. curve, and cumulative distribution function (cdf), respectively.
In the same
way that sin
often designates the arcsin function,
designates the inverse of
The three symbols that form
(which in older statistical and epidemiological texts
is often called the
) provide a useful notational device because of the
tendency for transformation and other data analysis steps to be taken in the reverse of the
order in which data simulation process components are implemented.
For instance no
data analysis text discusses a scale parameter
before discussing a location parameter