STAT 4006: Categorical Data Analysis
Assignment 5
Academic year 12/13, second term, Chapter 6
Due date: April 26th, 2013 Assignment Box @ 1/F of Lady Shaw Building
1. The following table refers to automobile accident records in Florida in 1988.
Safety Equ
Chapter 4. Generalized Liner Models
4.1 Generalized Linear Model
4.2 Generalized Linear Models for Binary Data
4.3 Generalized Linear Models for Count Data
4.4 Extension of Notation in Generalized Linear Models
4.5 Generalized Linear Mixed Models*
1
4.1 G
Chapter 6. Loglinear Models for Contingency Tables
6.1 Introduction
6.2 Loglinear Model for Twoway Tables
6.3 Loglinear Models for Threeway Tables
6.4 The LoglinearLogit Connection
6.5 Model Selection and Comparison
1
6.1 Introduction
In Chapter 4, we
ML tting of multiple logistic regression models
Let y = (y1 , , yN ), where yi is the number of successes
among ni observations, xi = (xi1 , , xip ), i = 1, , N , and
= (1 , , p ). The logistic regression model, regarding as
a regression parameter with u
Chapter 2. TwoWay Contingency Tables
2.1 Notation and Denition
2.2 Sampling Models
2.3 Test of Independence and Test of Homogeneity
2.4 Comparing Proportions in 2 2 Tables
2.5 Odds Ratio
2.6 Measures of Association for Ordinal Variables
2.7 Measures of A
STAT 4006: Categorical Data Analysis
Xinyuan Song
Department of Statistics
The Chinese University of Hong Kong
1
Chapter 1. Distribution and Inference for Categorical Data
1.1 Categorical Response Data
1.2 Distributions for Categorical Data
1.3 Statistica
Testing Independence for Ordinal Data
The X 2 and G2 tests ignore the ordering information when used
to test independence between ordinal variables. When rows
and/or columns are ordinal, more powerful tests usually exist.
Linear Trend Alternative to Indep
1. Correlation between response variables in GLMMs
GLMM:
(1)
g(it ) = xit + zit ui ,
where ui N (0, ). The random effects in ui incorporate
correlations among response variables in the same cluster i.
If g() is a log link, ui N (0, 2 ), and zit = 1, then
34
= + x . It can be rearranged in
t
the form of = t + tx . It is still a Poisson GLM with identity link, with being
the total number of responses at different levels if x. t and tx are then the explanatory
variables with and as the coefficients. And the
STAT 4006: Categorical Data Analysis
Assignment 4
Academic year 12/13, second term, Chapter 5
Due data:6:00pm, April 12nd, 2013 to Assignment Box @ 1/F Lady Shaw Building
1. For a study using logistic regression to determine characteristics associated wit
SSTA 4006 Assignment 5 Solution
TAT
4
1a)
log
= 3.7771 + 0.1449 x
1
For x =8,
= 3.7771 + 0.1449(8)
1
= 0.068
log
b)
For x =26,
log
1
= 0 .5
= 3.7771 + 0.1449(26)
c)
For x= 8,
The rate of change = (0.1449)(0.068)(10.068) = 0.009
For x= 26,
The rate of c
STAT 4006: Categorical Data Analysis
Assignment 3
Academic year 12/13, second term, Chapter 4
Due date: 5:00 pm, March 29, 2013. assignment box, 1/F, Lady Shaw Building
1. The following table refers to a sample of subjects randomly selected for an Italian
STAT 4006: Categorical Data Analysis
Assignment 2
Academic year 12/13, second term, Chapter 3
Due Date: 5:00 pm Tuesday, 12th March. Assignment box for STAT 4006, 1/F, Lady Shaw
Building.
1. The data in the table are obtained from 322 couples with ages 32
STA 4006 Assignment 2 Solution
3
1)
= 1.76
XY (1)
Thus the blood pressures for husband and wife seem to be no association when years
of marriage are less than 5.
= 12.7
XY ( 2 )
Thus husband with abnormal blood pressure is more likely to imply wife with
a
STAT 4006: Categorical Data Analysis
Assignment 1
Academic year 12/13, rst term, Chapter 12
Due date: 5:00pm Feb 22(Friday), 2012. Assignment Box at LSB 1/F
1. The data in the following table is obtained from a multinomial distribution.
(a) Test with = 0
STA 4006
Categorical Data Analysis
1.
Description
This course provides statistical techniques in analyzing categorical data. Topics include measures of association,
analysis of twoway and threeway contingency tables, loglinear models, logit models, gene
Measures of Association for Ordinal Variables
Ordinal Trends
 Concordant pair: ( X i X j )(Yi Y j ) > 0

Tied in X: ( X i X j )(Yi Y j ) = 0 as X i = X j

Tied in Y: ( X i X j )(Yi Y j ) = 0 as Yi = Y j
Discordant pair: ( X i X j )(Yi Y j ) < 0
Tied i
STAT 4006 Categorical Data Analysis (20132014)
Tutorial 4
Measure of association for ordinal variables
Ordinal trends
 concordant pair: ( X i X j )(Yi Y j ) > 0

discordant pair: ( X i X j )(Yi Y j ) < 0
Tied in X: ( X i X j )(Yi Y j ) =s X i = X j
0a
Example
[1998 Midterm] The data in the following table is obtained from a multinomial distribution
with parameters 1,., 5
Cell
Probability
1
2
1
3
3
2
4
4
5
5
Frequency
17
16
12
12
13
a) Test with = 0.05 the null hypothesis H 0 : 1 = 2 = 3 = 4 = 5
1
Solut
STAT 4006 Categorical Data Analysis (2013  2014)
Tutorial 3
)LVKHU,UZLQ H[DFW WHVW
 used when cell frequencies are too small
 H 0 : no association VS H1 : association exists (2tailed or 1tailed)
X\Y
1
2
1
n11
n21
2
n12
n22
Total
n1
n2
Total
n1
n2
n
X
STAT4006 Categorical Data Analysis 20122013
STAT 4006 Categorical4Data Analysis (20132014)
Tutorial
Tutorial 5
Chapter 3
3way contingency table
 3 categorical variables, X (I categories), Y (J categories), Z (K categories)
 I J K table: I rows, J col
STAT4006 Categorical Data Analysis (20132014)
Tutorial 2
3. For Poisson parameters
(
) = y log n

log likelihood: L ( ) = log e n

L( )
=0 = y

hypothesis testing ( H 0 : Data come from Poisson distribution)

calculate the sample mean = y and then e
(20122013)
STA 4006 Categorical Data Analysis (20132014)
(20 201
Tutorial 1
Chapter 1
Classification Levels

Nominal (categories without a natural ordering)

Ordinal (categories with ordering, but distances between categories are
unknown)

Interval