ISYE 2028 A and B
Spring 2009
Lecture 1
Dr. Kobi Abayomi
January 8, 2009
1
Introduction: What is Data
In statistics we worry about what there is to observe (what we expect to see) and what we have
actually observed (what we do in fact see).
Data
are the quantitative characterizations of what we see, often based upon what we expect (or
often even desire to see). Data are our observations: observed and quantiﬁed. We call the
population
the set of all possible observations. We call a
sample
the observations we see at a glance, inspection
or study. A
statistic
is any quantity we derive from the observed data – i.e. any quantity we can
generate from the sample. A
parameter
is any quantity we cannot observe about the data; one
that is speciﬁc to the population. We often seek to
estimate
parameters using samples from the
population.
1
2
Classifying and Observing Data
We often call the individual objects described by a set of data call
units of observation
or
cases
.
A
variable
is an object that holds information about the same characteristic for many cases. A
data table
is an arrangement of data – the convention is to let rows represent cases and columns
represent variables.
Here is an example of a possible data table.
ID
Name
ShoeSize
TestScore
Classlevel
FashionLevel
1
Kobi
11
95
Graduate
Low
2
Djleroy
11
100
PhD.
High
3
Ronald
22
50.15
PreK
Very Low
...
...
...
...
...
1
These are my heuristic deﬁnitions.
..
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentWe classify variables as either
quantitative
 where the numbers act as numerical values or
categorical
 which are either word or numerals that are treated as nonnumeric. For the above example, which
variables are which.
Quantitative data can be either
discrete
– in that we can list all the possible values – or
continuous
in
that we cannot. A more particular distinction is to say a variable is
discrete
if it can take countably
many values and that a variable is
continuous
otherwise. The distinction is often apparent in use.
Quantitative variables in which the order (in the greater than, less than sense) and distance be
tween data can be determined are called
interval
variables. Percent scores are examples of interval
variables.
Quantitative variables in which the order of data points can be determined, but
not
the distance
are called
ordinal
. Examples are letter grades.
.
Categorical variables which are determined by categories that cannot be ordered, such as gender
and color, are called
nominal
.
In math notation a
data table
is a multivariate vector – i.e.
matrix
–
x
with dimension
n x k
or
(
n,k
) or
n
rows and
k
columns. The observations are the rows
n
; the number of measurement
types is
k
, the number of columns. We select the
ith
observation of the
jth
variable with element
x
i,j
from
x
.
In
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '07
 SHIM
 Graduate PhD, Kobi Djleroy Ronald

Click to edit the document details