APPLIED STATISTICS Logistic Regression for Two-Category Response Variables and Its Estimation Dr Tao Zou Research School of Finance, Actuarial Studies & Statistics The Australian National University Last Updated: Tue Sep 26 13:52:35 2017 1 / 23
Overview Two-Category Response Variables Motivating Example Bianry Logistic Regression Model Estimation of Bianry Logistic Regression Prediction of a New Observation 2 / 23
References 1. F.L. Ramsey and D.W. Schafer (2012) Chapter 20 of The Statistical Sleuth 2. ANU STAT3015 Lecture Notes 3. The slides are made by R Markdown . 3 / 23
Two-Category Response Variables In numerous regression applications, the response variable of interest is a categorical variable taking two values. In such situations the response can be represented by a binary indicator variable taking on values 0 and 1. For example: In a study on the effectiveness of a new drug, the response might be whether a given patient survived a 5-year period. In a study of home ownership, the response variable is whether a given individual owns a home. 4 / 23
Example: Anaesthetic Data (Taken from STAT3015 notes.) The potency of an anaesthetic agent is measured in terms of the minimum concentration at which at least 50% of patients exhibit no response to stimulation. Thirty patients were given a particular anaesthetic at various predetermined concentrations for 15 minutes before a stimulus was applied. The response variable was simply an indication as to whether the patient responded to the stimulus in any way. “Response” is 1 if the patient responded to the stimulus. 5 / 23
R Code setwd ( ~/Desktop/Research/AppliedStat2017/L9 ) a= read.csv ( anaesthetic.csv );a ## Concentration Response ## 1 0.8 1 ## 2 0.8 1 ## 3 0.8 1 ## 4 0.8 1 ## 5 0.8 1 ## 6 0.8 1 ## 7 0.8 0 ## 8 1.0 1 ## 9 1.0 1 ## 10 1.0 1 ## 11 1.0 1 ## 12 1.0 0 ## 13 1.2 1 ## 14 1.2 1 ## 15 1.2 0 ## 16 1.2 0 ## 17 1.2 0 ## 18 1.2 0 ## 19 1.4 1 ## 20 1.4 1 ## 21 1.4 0 ## 22 1.4 0 ## 23 1.4 0 ## 24 1.4 0 ## 25 1.6 0 ## 26 1.6 0 ## 27 1.6 0 ## 28 1.6 0 ## 29 2.5 0 ## 30 2.5 0 6 / 23
R Code (Con’d) attach (a) plot (Concentration, Response, ylim= c (- 0.5 , 1 )) fit= lm (Response~Concentration) lines (Concentration,fit\$fitted, lty= 2 ) 1.0 1.5 2.0 2.5 -0.5 0.0 0.5 1.0 Concentration Response On this scale, a linear regresion does not seem appropriate. 7 / 23
Violation of Linear Regression Assumptions Y : Response; X : Concentration. 1. Y not conform normality assumption, since Y only takes values of 0 and 1. 2. tapply (Response, Concentration,mean) ## 0.8 1 1.2 1.4 1.6 2.5 ## 0.8571429 0.8000000 0.3333333 0.3333333 0.0000000 0.0000000 Given X = 0 . 8, the sample mean of Y is 0.857; given X = 1 . 0, the sample mean of Y is 0.800; given X = 1 . 2, the sample mean of Y is 0.333; given X = 1 . 4, the sample mean of Y is 0.333; given X = 1 . 6, the sample mean of Y is 0.000; given X = 2 . 5, the sample mean of Y is 0.000.

