Stata Walkthrough: Exploring and Describing Data
This is an optional assignment, intended to get you accustomed to working with data
in Stata.
To do this assignment, you will need to go to my webpage and download
the database of
SAT scores by U.S. states.
1.
Open the database using the menus
(
or open it directly from your computer
)
.
2.
Type
describe
to look at the types of variables and their labels.
float
are
normal, continuous values;
int
and
byte
take discrete numerical values;
str
have text.
You will also see the labels attached to each variable, which should
explain what they measure
3.
Type
list
to view all of the observations in your database.
You will see the
state’s name, poverty rate, percentage of students taking the SAT, scores on
the test, and per student spending for each state.
4.
Type
summarize
to get some important summary statistics for each state
(
the mean, standard deviation, minimum and maximum
)
.
Note that you have
zero
(
numerical
)
values for the state name, so Stata does not give you
summary statistics for this variable.
5.
Type
summ spend, detail
to get detailed summary statistics for this
variable.
In the “Percentiles” column, you determine the lower and upper
quartiles
(
25
th
percentile is
$
6409; 75
th
percentile is
$
8700.50
)
as well as the
median
(
50
th
percentile is
$
7080.50
)
.
The mean, standard deviation, and
variance are also shown.
6.
You can use Stata as a calculator by typing “display” followed by some
expression.
(
The commands for basic operations are standard: +,

, *, /, and ^
for exponents.
Stata understands parentheses, and knows to always
Please
Excuse My Dear Aunt Sally.
)
To confirm that the standard deviation is the
square root of the variance, type
display (1205623)^(0.5)
.
7.
You might be curious if spending is correlated with some other variables.
Type
corr spend pover percent sat
to obtain a table of all the
correlation coefficients between the variables.
In the first column, you can
tell that there is a moderate negative correlation between spending and
poverty rates
(
states with higher poverty rates tend to spend less per student
)
,
a moderate positive correlation between spending and the percentage who
take the test
(
states that spend more tend to have more students who take the