Unformatted text preview: 1/4/11 PADP 8120: Data Analysis and Sta5s5cal Modeling Introduc)on Spring 2011 Angela Fer5g, Ph.D. Gree5ngs Name Ph.D. or M.P.A. Student What is your dream job? 1 1/4/11 3 hour survival plan Snacks sign up sheet During break, we'll go outside to eat/drink Only 1-1.5 hour of lecture This means you have to read before class. Last half of class will involve "prac5ce": We are going to learn Stata Work with data Work some problems Course Mechanics No prerequisites: no sta5s5cs background expected Required Text: Agres5 & Finlay Grading: Can work in groups, turn in separate work 2 exams: in-class midterm, take-home final (30% each) Almost weekly homework sets (10%) 1 group presenta5on (10%) 1 paper (20%) Explain a published empirical paper's results (group grading) Data/Methods/Results sec5on of your own original research Office hours: Fri 10-noon, or by appt Website: hbp://hogwarts.spia.uga.edu/~afer5g/ sta5s5cs.html 2 1/4/11 Course Overview Introduc5on to the use of quan5ta5ve data in social science to beber understand the world Goal is two-fold: You will be able to understand and cri5cize the research of others You will be able to do your own research Any ques5ons about course before we dive in? 3 1/4/11 What is Sta5s5cs? Methods for: Designing and conduc5ng empirical research studies Describing collected data Making decisions/inferences about phenomena represented by data Key Terms 1 Popula'on: the total set of individuals of interest in a study Sample: a subset of the popula5on that is actually observed 4 1/4/11 Key Terms 2 Descrip've sta's'cs: graphical and numerical techniques for summarizing the informa5on in a collec5on of data Inferen'al sta's'cs: procedures for making generaliza5ons about characteris5cs of a popula5on based on informa5on from a sample Key Terms 3 Parameters: characteris5cs of the popula5on about which we make inferences using sample data (the "truth") Sta's'cs: corresponding characteris5cs of the sample data upon which we base our inferences about parameters (es5mates of the "truth") 5 1/4/11 Types of Data Random variables are measures of a characteris5c of a subject (something or someone) that varies across subjects in a popula5on Types of variables: Quan'ta've: numbers represent a quan5ta5ve variable, magnitudes, can be discrete or con'nuous Categorical with a nominal scale: represent a category, no ordering Categorical with an ordinal scale: ordered categories How we collect data, or Sampling Ojen when we see numbers used, they are not rela5ng to a popula5on, but to a sample of that popula5on For a sta5s5c from a sample to be useful, the sample has to be "representa5ve" of the popula5on An intui5vely obvious way of doing this is to pick people at random 6 1/4/11 Why sample? Cost the US 2010 Census cost $7 billion Speed it takes years to process the census Impossibility if you want to es5mate the quality of a year's crop, you can eat it all there would be none lej to sell Types of sampling 1. Simple random sampling: choose n subjects from a popula5on of N such that each subject sampled had a equal probability of being selected PROS Ensures that sample sta5s5cs will reflect popula5on parameters Good in experimental designs too Would like to randomize who gets "treatment" (not let people choose) to ensure that the groups are not ini5ally different 7 1/4/11 Simple Random Sampling CONS May not include enough of a par5cularly interes5ng group (e.g. minori5es) Difficult to get list of en5re popula5on to sample from Not all members of your chosen sample may respond Types of samples 2. Stra'fied random sampling: classify popula5on into groups, then select by simple random sampling within groups PROS Ensures that sample sta5s5cs for each group will reflect group popula5on parameters CONS S5ll difficult and subjects may s5ll not respond Need to deal with complex sampling strategy if want a parameter representa5ve of the whole popula5on 8 1/4/11 Types of sampling 3. Cluster random sampling: if popula5on members are naturally clustered, then select clusters with simple random sampling, and then select subjects with simple random sampling within clusters PROS CONS Don't need a list of the popula5on Subjects may s5ll not respond Need to deal with complex sampling strategy if want a parameter representa5ve of the whole popula5on Sampling Error Sampling error occurs when we use a sta5s5c based on a sample to predict the value of a popula5on parameter There is always error because not using en5re popula5on Goes down when the sample is larger get closer to the popula5on 9 1/4/11 Poten5al biases Sampling bias: when samples are drawn with non-probability sampling methods (e.g. volunteer sampling), or when the en5re popula5on is not sampled (e.g. cellphone only households) Response bias: when subjects' responses are incorrect, or influenced by the ques5onaire/ interview (e.g. social norms, ques5on order) Nonresponse bias: when subjects refuse to par5cipate or do not answer all of the ques5ons 10 ...
View Full Document
This note was uploaded on 01/18/2012 for the course PADP 8120 taught by Professor Fertig during the Summer '11 term at UGA.
- Summer '11