EDUC 771: Applied Multivariate Analysis
What is multivariate analysis and how is it different from univariate analysis?
There are two ways that multivariate analysis is conceptualized: an analysis that
considers several variables being measured for each person/object, or more strictly
speaking, the analysis of data that contain more than one dependent variable for a person.
Let’s consider the difference.
In the end of EDUC656 we started to examine multiple regression. In multiple
regression, we had one dependent variable per person: say, score on a test. That was
predicted from a variety of variables: SES, parental involvement, IQ, etc. This situation
considers many
independent
variables, but only one
dependent
variable.
Alternatively, we could have investigated the relationship of many dependent variables.
Say, we have score on SAT-M and score on SAT-V. Each of these could have its own set
of predictors, although some might be the same. For SAT-Q we have: GPA in math
classes, # foreign languages learned, etc. For SAT-V we have GPA in English courses, #
hours read to as a child, etc. in this case, both SAT-M and SAT-V are dependent
variables, each with a set of independent variables. Further, there is dependence among
the two scores: they are related to IQ (if it exists), SES, parental involvement, and unique
factors about each individual. Therefore, while two separate univariate analyses could be
performed for each dependent variable, a multivariate approach can take into account the
relationship between the two dependent variables.
We started our sequence of statistics in EDUC 555 and EDUC656 and considered several
aspects to understand data better through particular displays of data. We made graphs,
histograms, frequency distributions to try to get a sense of the data. We then started
computing summary statistics such as the sample mean and variance, which lead us to
being able to compare means from different samples, be they dependent or independent,
and then to predict values from a number of variables in regression.
So, we start multivariate analysis with a similar format. Given the nature of multivariate
data, and the complexity, certain organizational tools are necessary to organize the data.
The tools we will use come from a branch of mathematics called linear algebra.
We start with some definitions.
Matrix:
A matrix is a collection of numbers ordered by rows and columns. For example:
C=
6 9 3
7 5 4
⎛
⎝
⎜
⎞
⎠
⎟