Lab Assignment #2
Due: September 23rd 27th, 2013
Measures of Central Tendency
This assignment asks you to review the GSS datasets and describe the data using
measures of central tendency. Please read and carefully follow the direction
Homework #1 ANSWERS
1.
Identify the level of measurement most appropriate for the following variables:
a.
Yearly income (High, Medium, Low)
b.
The speed that a vehicle is moving in miles per hour
c.
Yearly income (in dollars)
d.
Example 4:
We want to know if the weight of adolescent girls in a population has changed over time
Population mean weight of girls 30 years ago was 60 kilos
We have no firm expectations about the direction of change in girls weight
Null hypothesis: =60
Key Concepts
Description and inference
Description: summarizing data to describe subjects or objects of interest.
Inference: drawing conclusions regarding the relationships between attributes of
interest and making predictions based on data.
Informatio
Describing the distribution of interval (continuous) variables
Frequency distributions
Missing values
Values well out of the range are used to represent missing values
Shows up on frequency distributions at the very bottom
Frequency
Percent
Usually
Point estimate
Central limit theorem tells us that the sample mean is an unbiased estimator of the
population mean
Sampling distribution centred on the population parameter
Sample mean is also an efficient estimator of
The variance of the sampling d
Can express how close y(bar) is to in probabilistic terms
Sampling distribution of all y(bar) for all samples of size n
Sampling distribution is normal
Will never know for sure if is in the interval
However, you can still know a lot
Confidence Intervals
Probability
Definition
With a random sample or randomized experiment, the probability of an observation has
a particular outcome is the proportion of times that outcome would occur in a very long
sequence
Let P(A) denote the probability of a possible ou
Z-score = ( y)/
Transform into standard normal distribution
Mean = 0
Standard deviation = 1
1
f ( z)=
z 2
exp
Z TABLE
Gives the area in the tail or tails of the distribution for selected values of z
Z-table in the back of the Agresti text gives the
Homework #2 ANSWERS
1.
STATA problem: Use the GSS 2002 data set (not the subsample of 40) and the variable
spend the evening with friends [short name: socfrend]. Be sure to attach the STATA
output.
a. What is the full question a
Homework #3 ANSWERS
1.
A researcher suspects a link between age and support for abortion rights, and obtains the
following data:
Favors
abortion
rights
Age
-yes
young
no
old
yes
young
no
old
no
young
yes
old
yes
young
no
old
yes
Odds and odds ratios
Low birth weight and smoking
In the situations where we are interested in describing the prevalence of some condition in a
population or the chances of coming down with a condition, we often use odds.
Odds are the frequency of being
The criterion function for obtaining point estimates a and b for the population paramaters
is based on the residual e
The residuals give the vertical distances between the fitted line and the actual y values
The least squayres estimators are those val
Kanika Gandhi
SOC 1100: Introductory Statistics for Social Research
Homework #1
September 20th, 2013
Variables, Frequencies, Measures of Central Tendency
1. Please answer the general discussion points below for the following article:
Chen, F., & Short, S.
SOC 1100: Introductory Statistics for Social Research
Lab Assignment #1
Due: September 23rd September 27th, 2013
Describing Data
This assignment asks you to review the GSS dataset and begin to describe the data. Use
the GSS02-B data found on Canvas to ans
LECTURE 1 Samples
Key Concepts
Variables
Quantitative
Measurement scale has numerical values
o Interval scale
Specific numerical distance or interval between them
o Discrete or continuous
Continuous: Infinite continuum of real number values
Categorical
Only makes sense when talking about ordinal variables
Table 1
Low Health
High Health
Low Happiness
Joe, Jane, Mary and Bob
4
0
High Happiness
0
Tim, Jack, Bill and Sue
4
Example of perfect positive relationship between the two
Table 2
Low Health
High Heal
Example 1:
Do TAs and Instructors of TAs differ in their perceptions of how often miscommunication
occurred?
H0:
Ha:
(TA)= ( prof ) (under the null hypothesis, difference is 0)
(TA) ( prof )
Where difference is 1 if miscommunication is common and y=0 if
Look at relationships between 2 distributions
Formula
y= + x
Formula for a straight line
Expresses the value of u as a linear function of the value of x in a bivariate population. The
formula defines a straight line with slope and y-intercept
A model o
Sampling Distribution
For any population we can draw multiple, independent random samples of size n. We
call these samples n1, n2, n3, n4
In a sense hypothetical
Each sample will generate a mean
Distribution of sample means produces by repeatedly taking
Example 1:
TA Hour Worked in a week - A Two Sided Test
Research question: Do TAs work an average of 20 hours per week?
Define hypotheses:
H0: = 0
Ha: 0
Current example
Null hypothesis: =20
Alternative hypothesis: 20
(Two sided hypothesis)
Define test s
Example 1: An Alternative Test of Difference between 2 Proportions
2 x 2 table showing relationship between gender and voting in 2004 election
Question: Is there a relationship between gender and voting?
H0: Voting and gender are statistically independe
Normal distribution and normal
curve
curve
