This preview shows pages 1–3. Sign up to view the full content.
API202 A
Spring 2009
TUTORIAL FOR STATA
This tutorial will help you prepare for Part 2 of Assignment 1, and also for using Stata throughout this
course. You do not need to submit any output from the tutorial
.
Please note that this tutorial is a complement to and not a substitute for one of the Stata sessions offered
by the teaching fellows.
If you have not signed up for a Stata session in the lab, do so as soon as possible!
Handson practice is the best way to learn Stata.
Start up Stata (double click on the Stata icon), and follow through the steps below. Following these steps
you will learn how to run some useful commands in Stata, as well as how to produce a log file showing
your commands and output.
1.
Preliminaries
First, load the data.
The command is “use <filename>”.
Stata datasets have the extension “.dta”, but you
don’t have to type the extension. At this point you do not need to worry about how we get data from
Excel to Stata since we will provide data in Stata format for you.
You need to tell Stata in which directory your files are. For the purposes of this exercise, we will assume
that you will use "m:\api202” as your directory and that you
have saved the data set
gender2009.dta
in
this directory.
Type the following command on the command line and then press enter.
use "m:\api202\gender2009.dta"
Unlike Excel, you do not see the individual data.
In Stata, you think about the data as variables and
observations, not as individual cells.
To see a list of variables in the data set, type “describe”.
describe
Contains data from gender2009.dta
obs:
950
vars:
6
26 Jan 2009 21:18
size:
15,200 (98.5% of memory free)


storage
display
value
variable name
type
format
label
variable label


age
byte
%9.0g
age in years
salary
float
%9.0f
yearly salary
hours
byte
%9.0g
usual weekly hours worked
weeks
byte
%9.0g
weeks worked last year
educ
byte
%9.0g
years of education
gender
float
%9.0g
mf
gender, =1 if male, =0 if female


Sorted by:
The dataset has six variables, named age, salary, hours, weeks, educ, and gender.
The other columns
describe the formatting of the variables.
Don’t worry about those for now.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document 2.
Descriptive Statistics
To get a summary of the data, we use the “summarize <varname>” command.
summarize salary
Variable 
Obs
Mean
Std. Dev.
Min
Max
+
salary 
950
23834.72
21200.48
30
169999
Note that we can abbreviate “summarize” as “sum” (and in general can abbreviate most commands by
their first few letters).
sum salary
This is the end of the preview. Sign up
to
access the rest of the document.
This note was uploaded on 04/12/2009 for the course HKS API202A taught by Professor Levy during the Spring '09 term at Harvard.
 Spring '09
 LEVY

Click to edit the document details