37 Pages

lect04

Course: CHL 5210H, Fall 2011
School: University of Toronto
Rating:
 
 
 
 
 

Word Count: 3738

Document Preview

Data Categorical Analysis - Lei Sun CHL 5210 - Statistical Analysis of Qualitative Data Topic: Logistic Regression Outline Single predictor (2 levels, > 2 levels, quantitative). Parameter interpretation. Inference of the parameter. Multiple predictors (next lecture). 1 Categorical Data Analysis - Lei Sun 2 Introduction to logistic regression (logit model). So far: two-way contingency tables. Most...

Register Now

Unformatted Document Excerpt

Coursehero >> Canada >> University of Toronto >> CHL 5210H

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Data Categorical Analysis - Lei Sun CHL 5210 - Statistical Analysis of Qualitative Data Topic: Logistic Regression Outline Single predictor (2 levels, > 2 levels, quantitative). Parameter interpretation. Inference of the parameter. Multiple predictors (next lecture). 1 Categorical Data Analysis - Lei Sun 2 Introduction to logistic regression (logit model). So far: two-way contingency tables. Most studies: several explanatory variables (categorical and continuous). Goal: describe their eects on response variables, including relevant interactions, smoothed estimates of response probabilities. Most important case: logistic regression (logit model), a linear model for the logit transformation of a binomial parameter. Also: log-linear model (Poisson regression model), a linear model for the log of a Poisson mean. We will study the following. Single predictor (2 levels, > 2 levels, quantitative). Parameter interpretation. Inference of the parameter. Multiple predictors. Interaction. Model building (model t and selection). Categorical Data Analysis - Lei Sun 3 Logistic regression with one covariate with two levels. Background. Y : binary response, outcomes denoted by 0 and 1. e.g. birth weight, normal (0) or low weight (1). X : binary predictor, two levels denoted by 0 and 1. e.g. smoking status, no (0) or yes (1). We are interested in dening E [Y ] = P (Y = 1) = (X ): reects its dependence on value of the predictor, X . A linear probability model: P (Y = 1) = (X ) = + X Major structural defect: probabilities fall between 0 and 1. But linear functions take values over entire real line. Interpretation of : is the change in for a one-unit increase in X. We want to construct our model so that: Predicted value of P (Y = 1) = (X ) is bounded between 0 and 1. The regression coecient is measured on a meaningful scale. We can construct statistical inferences which approximately model the Yi, i = 1, ..., n, as Bernoulli random variables. Categorical Data Analysis - Lei Sun Consider logit of as a linear function of X . logit( (X )) = log = (X ) = (X ) = + X 1 (X ) exp( + X ) . 1 + exp( + X ) If = 1 and = 2, (X ) = + X . 0.8 0.6 0.0 0.2 0.4 PI(x) exp(+X ) 1+exp(+X ) . (X ) = 1.0 (X ) = exp(X ) 1+exp(X ) . -4 -2 0 x 2 4 4 Categorical Data Analysis - Lei Sun 5 Interpretation of the parameters. Let i = P (Yi = 1|Xi ) = (Xi ) be the probability that the ith mother will have a low weight infant as a function of the predictor X . (Xi = 0): risk for non-smokers, and (Xi = 1): risk for smokers. Using the logistic regression. logit( (X )) = + X. logit( (X = 0)) = for non-smokers. logit( (X = 1)) = + for smokers. Thus, = logit( (X = 0)) = log ( (X = 0) ). 1 (X = 0) : the log-odds of having low weight babies for non-smokers (exp() = is the odds then). And, = logit( (X = 1)) logit( (X = 0)) (X = 0) (X = 1) ) log ( )= = log ( 1 (X = 1) 1 (X = 0) = : the log-odds ratio of having low weight babies comparing smokers with non-smokers (exp( ) = is the odds ratio then). Categorical Data Analysis - Lei Sun 6 The LBW example. Birth Smoking Status Weight No (X = 0) Yes = (X = 1) Total Normal (Y = 0) 86 (n11) 44 (n12) 130 (n1.) Low (Y = 1) 29 (n21) 30 (n22) 59 (n2.) Total 115 (n.1) 74 (n.2 ) 189 (n..) We will see that the estimated log-odds ratio, obtained using logistic regression with a single binary predictor is identical to our previous derivations based on the MLE of the two binomial proportions: n11 n22 86 30 = log ( ) = log ( ) log (2.02) 0.7. n12 n21 29 44 SE () = 1 1 1 1 + + + = .32. n11 n12 n21 n22 data x1;infile "bwt.dat"; input id low age lwt race smoke ptl ht ui ftv bwt; proc logistic descending; model low=smoke; run; Analysis of Maximum Likelihood Estimates Parameter DF Estimate Standard Error Intercept smoke 1 1 -1.0870 0.7040 0.2147 0.3196 Wald Chi-Square Pr > ChiSq 25.6244 4.8516 <.0001 0.0276 Categorical Data Analysis - Lei Sun 7 A few notes on logistic regression. Eect coecients in the logistic regression model refer to odds ratios. Thus, one can t such models and estimate eects in case-control studies (X rather than the response variable Y is random). With case-control studies, one cannot estimate in other binary response models. Unlike the odds ratio, the eect for the conditional distribution of X given Y does not equal that for Y given X . Important advantage of the logit model: medical studies. Many case-control studies use matching: model and analyses should take the matching into account (Chapter 10: logistic regression for matched case-control studies). Categorical Data Analysis - Lei Sun 8 Estimating the parameters in a logistic regression model. Assume that the relationship between the probability of having a low weight baby, = P (Y = 1) and a covariate X follows the logistic regression model: logit( ) = logit(P (Y = 1|X )) = log ( ) = + X. 1 Suppose we sample n individuals from a population at random and record each individual the values of Y and X . The data are usually coded as: Individual Response Covariate 1 y1 x1 2 y2 x2 3 y3 x3 . . . . . . . . . n yn xn From this sample, we wish to estimate the population parameters and . We have: P (Yi = 1|Xi = xi) = exp( + xi) , 1 + exp( + xi) P (Yi = 0|Xi = xi) = 1 P (Yi = 1|Xi = xi ) = 1 . 1 + exp( + xi) Categorical Data Analysis - Lei Sun 9 Combine both and we get P (Yi = yi|Xi = xi) = exp(( + xi) yi ) , yi = 0, 1. 1 + exp( + xi) Since the data are assumed to have been obtained at random, the probability of observed data is: P (Y1 = y1, Y2 = y2 , ..., Yn = yn |X1 = x1, X2 = x2, ..., Xn = xn ) = P (Y1 = y1 |X1 = x1) P (Y2 = y2 |X2 = x2) ... P (Yn = yn |Xn = xn ). The likelihood of the data in terms of and is: n L(, ) = exp(( + xi) yi ) . i=1 1 + exp( + xi ) MLE of and : Search for the combination of values that maximize L(, ). In general, there is no closed-form solutions. Alternatively, uses iterative techniques (such as the NewtonRaphson method or the Fisher scoring algorithm). Newton-Raphson method: rst derivatives (gradient vector) and the matrix of second derivatives (Hessian matrix). (k ) = (k 1) 2 (k 1) 1 l ( ) ij l((k1)) l((k1)) l((k1)) ( , , ..., ). 1 2 p Categorical Data Analysis - Lei Sun 10 Fisher scoring algorithm: similar, but use the expected value of the Hessian matrix, called the expected information. (Hessian matrix itself: observed information). In the case of one binary predictor, one can show that = log ( SE () = n21 ), n11 1 1 + . n11 n21 n11 n22 ). = log ( n12 n21 SE ( ) = 1 1 1 1 + + + . n11 n12 n21 n22 Results are identical to the previous deviations in a 2 2 table. 86 30 n11 n22 ) = log ( ) log (2.02) 0.7. = log ( n12 n21 29 44 SE () = 1 1 1 1 + + + = .32. n11 n12 n21 n22 Analysis of Maximum Likelihood Estimates Parameter DF Estimate Standard Error Intercept smoke 1 1 -1.0870 0.7040 0.2147 0.3196 Wald Chi-Square Pr > ChiSq 25.6244 4.8516 <.0001 0.0276 Categorical Data Analysis - Lei Sun 11 Surprising? Both are MLEs and asymptotic variance of MLE. Dierent way of looking at the same data: two binomial proportions or logit model. Measure of association for a 2 2 table is eect parameter in logit model for binary data. = . Inference of the parameters. We have shown the asymptotic properties of MLEs: consistent, efcient and normally distributed. We have studied various hypothesis testing methods: Wald, likelihood ratio test and score tests. Categorical Data Analysis - Lei Sun 12 Wald test. The hypotheses: Ho : = 0 , Ha : = 0 . Test statistic: X=( 0 2 ) 2 . 1 ) SE ( Results: X=( .7040 0 2 ) = 4.8516(p-value = .0276). .3196 The LOGISTIC Procedure Model Information Data Set Response Variable Number of Response Levels Number of Observations Model Optimization Technique WORK.X1 low 2 189 binary logit Fishers scoring Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Parameter Intercept smoke DF 1 1 Estimate -1.0870 0.7040 Standard Error 0.2147 0.3196 Wald Chi-Square 25.6244 4.8516 Pr > ChiSq <.0001 0.0276 In fact, if we look at the data as a 2 2 table, and note = : n11 n22 (log ( n12 n22 ) 0)2 0 2 )=1 X=( 1 1 1 = 4.8516. SE () n11 + n12 + n21 + n22 Categorical Data Analysis - Lei Sun 13 Likelihood ratio test. The hypotheses: Ho : = 0 , Ha : = 0 . Ho is nested in Ha : Ho : log ( 1 ) = , Ha : log ( 1 ) = + X. Test statistic: X = 2(log (LHa (, )) log (LHo ())) 2 . 1 Result: X = 229.805 (234.672) = 4.867(p-value = .0274). Model Fit Statistics Criterion -2 Log L Intercept Only 234.672 Intercept and Covariates 229.805 Testing Global Null Hypothesis: BETA=0 Test Likelihood Ratio Chi-Square 4.8674 DF 1 Pr > ChiSq 0.0274 =? = n2. n.. , marginal distribution, regardless of the value of X . = log 1 = log n2.. . n1 Categorical Data Analysis - Lei Sun 14 In fact, if we look at the data as a 2 2 table, and note that = 0 (X = 1) = (X = 0). Recall likelihood ratio test of two binomial proportions: X = 2ln = 2(l()l()) = 2 observedlog ( observed ) = 4.87, expected where expected count is under the null hypothesis, e.g. 1 E [n11] = n1. 1 = n1. n... . n data x1;infile "bwt.dat"; input id low age lwt race smoke ptl ht ui ftv bwt; proc freq;tables low*smoke/chisq expected; run; Table of low by smoke low smoke Frequency| Expected | 0| 1| Total ---------+--------+--------+ 0| 86 | 44 | 130 | 79.101 | 50.899 | ---------+--------+--------+ 1| 29 | 30 | 59 | 35.899 | 23.101 | ---------+--------+--------+ Total 115 74 189 Statistic DF Value Prob -----------------------------------------------------Likelihood Ratio Chi-Square 1 4.8674 0.0274 Categorical Data Analysis - Lei Sun 15 Score test. The hypotheses: Ho : = 0 , Ha : = 0 . Test statistic: X = S (, = 0)I (, = 0)1S (, = 0), which is based on the score function evaluated using the MLE of under the null hypothesis that = 0. Results: X = 4.9237(p-value = .0265). Testing Global Null Hypothesis: BETA=0 Test Score Chi-Square 4.9237 DF 1 Pr > ChiSq 0.0265 Note that = 0 independence. Recall that the Pearson 2 test of a binomial test has the same form as the score test. In fact, it is true for the Pearson 2 test of homogeneity or independence for contingency tables as well: X= (Oij Eij )2 2 , r Eij Categorical Data Analysis - Lei Sun 16 where the expected count of the ijth cell under the null hypothesis of homogeneity or independence: Eij = ni. n.j . n.. (86 79.1)2 (44 50.9)2 (29 35.9)2 (30 23.1)2 + + + X= 79.1 50.9 35.9 23.1 = 4.924 (p-value = .026). data x1;infile "bwt.dat"; input id low age lwt race smoke ptl ht ui ftv bwt; proc freq;tables low*smoke/chisq expected; run; Table of low by smoke low smoke Frequency| Expected | 0| 1| Total ---------+--------+--------+ 0| 86 | 44 | 130 | 79.101 | 50.899 | ---------+--------+--------+ 1| 29 | 30 | 59 | 35.899 | 23.101 | ---------+--------+--------+ Total 115 74 189 Statistics for Table of low by smoke Statistic DF Value Prob -----------------------------------------------------Chi-Square 1 4.9237 0.0265 Categorical Data Analysis - Lei Sun 17 Condence interval for the odds ratio or exp( ): Usually done by inverting the Wald test: (exp( 1.96SE ( )), (exp( + 1.96SE ( ))) (exp(.70401.96.3196), exp(.7040+1.96.3196)) = (1.081, 3.783) Wald Confidence Interval for Adjusted Odds Ratios Effect smoke Unit 1.0000 Estimate 2.022 95% Confidence Limits 1.081 3.783 Odds Ratio Estimates Effect smoke Point Estimate 2.022 95% Wald Confidence Limits 1.081 3.783 CI obtained by inverting likelihood ratio and score tests require iterative solutions, and there is no closed form expression as in the case of the Wald based CI. data x1;infile "bwt.dat"; input id low age lwt race smoke ptl ht ui ftv bwt; proc logistic descending; model low=smoke/walrl plrl; run; Profile Likelihood Confidence Interval for Adjusted Odds Ratios Effect smoke Unit 1.0000 Estimate 2.022 95% Confidence Limits 1.082 3.801 Categorical Data Analysis - Lei Sun 18 In summary: Likelihood ratio, wald and score tests are asymptotically equivalent. Testing Global Null Hypothesis: BETA=0 Test Chi-Square Likelihood Ratio Score Wald DF Pr > ChiSq 4.8674 4.9237 4.8516 1 1 1 0.0274 0.0265 0.0276 The three tests may dier in small samples or when the alternative hypothesis in true, in which the likelihood ratio and score tests are usually more reliable than the Wald test. However, the Wald test can be easily adapted to test other null hypothesis, e.g. Ho : = 1. X=( .7040 1 2 1 2 ) =( ) = .8577(p-value = .3543). .3196 SE ( ) The likelihood ratio and score tests depend on the null hypothesis, and they need to be recalculated accordingly. Categorical Data Analysis - Lei Sun 19 In the context of single binary predictor X , inferences based on logistic regression are equivalent to inferences based on a 2 2 contingency table. = , and = 0 homogeneity or independence. Table of low by smoke low smoke Frequency| 0| 1| Total ---------+--------+--------+ 0| 86 | 44 | 130 ---------+--------+--------+ 1| 29 | 30 | 59 ---------+--------+--------+ Total 115 74 189 Statistics for Table of low by smoke Statistic DF Value Prob -----------------------------------------------------Chi-Square 1 4.9237 0.0265 Likelihood Ratio Chi-Square 1 4.8674 0.0274 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ----------------------------------------------------------------Case-Control (Odds Ratio) 2.0219 1.0807 3.7831 Categorical Data Analysis - Lei Sun 20 Logistic regression with one covariate with > 2 levels. The LBW example. Are the of number physician visits during the rst trimester associated with an increased risk for having a low weight baby? data x1;infile "bwt.dat"; input id low age lwt race smoke ptl ht ui ftv bwt; proc freq; tables low*ftv; run; low ftv | 0| 1| 2| 3| 4| 6| Total ---------+--------+--------+--------+--------+--------+--------+ 0| 64 | 36 | 23 | 3| 3| 1| 130 ---------+--------+--------+--------+--------+--------+--------+ 1| 36 | 11 | 7| 4| 1| 0| 59 %low | 36.00 | 23.40 | 23.33 | 57.14 | 25.00 | 0.00 | ---------+--------+--------+--------+--------+--------+--------+ Total 100 47 30 7 4 1 189 Categorical Data Analysis - Lei Sun 21 Pool the last two categories together to increase the expected number of observations. data x1;infile "bwt.dat"; input id low age lwt race smoke ptl ht ui ftv bwt; if ftv ge 4 then ftv=4; proc freq; tables low*ftv/all exact expected; run; low ftv Frequency| Expected | 0| 1| 2| 3| 4| Total ---------+--------+--------+--------+--------+--------+ 0| 64 | 36 | 23 | 3| 4| 130 | 68.783 | 32.328 | 20.635 | 4.8148 | 3.4392 | ---------+--------+--------+--------+--------+--------+ 1| 36 | 11 | 7| 4| 1| 59 | 31.217 | 14.672 | 9.3651 | 2.1852 | 1.5608 | ---------+--------+--------+--------+--------+--------+ Total 100 47 30 7 5 189 Statistic DF Value Prob -----------------------------------------------------Chi-Square 4 5.7541 0.2183 Likelihood Ratio Chi-Square 4 5.6804 0.2243 WARNING: 40% of the cells have expected counts less than 5. Chi-Square may not be a valid test. Fishers Exact Test ---------------------------------Table Probability (P) 2.160E-04 Pr <= P 0.2245 Categorical Data Analysis - Lei Sun 22 Coding dummy variables for covariates with > 2 levels. A dummy variable: an indicator or design variable. A variable with J levels/categories may be modeled using J 1 dummy variables. Dummy variables are often constructed so that regression coecients provide comparisons of responses of subjects from the jth category to the responses of subjects from a reference category. Consider the model: logit( (Xi )) = + 1 Xi1 + 2 Xi2 + 3 Xi3 + 4Xi4 , where Xi = (Xi1, Xi2, Xi3, Xi4), and Xij = 1 if ith woman had j visits, j = 1, 2, 3, 4 and Xij = 0 otherwise (j = 0 visits). We can summarize the specication of the dummy variables in the above model using the following table: Number of Dummy Variables Visits X1 X2 X3 X4 0 0 0 0 0 1 0 0 0 1 2 0 1 0 0 0 0 1 0 3 4+ 0 0 0 1 Which category is the reference category? How should one select a reference category? Any other coding scheme? Categorical Data Analysis - Lei Sun 23 Logistic regression with dummy variables. Recall the model: logit( (X)) = + 1 X1 + 2 X2 + 3 X3 + 4X4 , where X = (X1, X2, X3 , X4), and Xj = 1 if a woman had j visits, j = 1, 2, 3, 4 and Xj = 0 otherwise (j = 0 visits). Note that: logit( (X = 0)) = for women had 0 visits, logit( (Xj = 1)) = + j for women had j visits. Interpretation of the parameters: = logit( (X = 0)) = log ( (X = 0) ). 1 (X = 0 ) : log odds of having low weight babies for women who had 0 visits. j = logit( (Xj = 1)) logit( (X = 0)) (Xj = 1) (X = 0) = log ( ) log ( ) 1 (Xj = 1) 1 (X = 0) = j , j = 1, 2, 3, 4. j : the log-odds ratio of having low weight babies comparing women who had j visits with women who had 0 visits. Categorical Data Analysis - Lei Sun 24 Estimating the parameters in the model. The likelihood of the data in terms of and s is: n L(1, ..., 4, ) = exp(( + 1 xi1 + 2 xi2 + 3xi3 + 4 xi4) yi ) , i=1 1 + exp( + 1 xi1 + 2 xi2 + 3 xi3 + 4 xi4) where yi is the response of the ith individual, yi = 1 if the baby has low birth weight, yi = 0 otherwise. Similar approach as before: the MLE of and j are obtained via iterative techniques such as the Newton-Raphson or Fisher scoring algorithms. For this particular case: one covariate with several levels: = log ( n21 ), n11 n11 n2(j +1) ), j = 1, 2, 3, 4. j = log ( n21 n1(j +1) Birth Number of Physician Weight 0 1 2 3 Normal n11 n12 n13 n14 Low n21 n22 n23 n24 Visits 4+ n15 n25 One may think of solving the two equations below for and j : n21 exp() = P (Y = 1|X = 0) = (X = 0) = n11 + n21 1 + exp(), n2(j +1) exp( + j ) = P (Y = 1|Xj = 1) = (Xj = 1) = . n1(j +1) + n2(j +1) 1 + exp( + j ) Categorical Data Analysis - Lei Sun 25 Inference of the parameters. Hypotheses: Ho : 1 = 2 = 3 = 4 = 0 (the covariate has no eect, or homogeneity or independence), Ha : at least one of j = 0. Wald test: XW = (1, ..., 4)[Cov(1, ..., 4)]1(1, ..., 4) 2 . 4 Score test (same form as the Pearson 2 test): XS = S (, 1 = ... = 4 = 0)I (, 1 = ... = 4 = 0)1S (, 1 = ... = 4 = 0) 2 , or 4 XS = where Eij = (Oij Eij )2 2 , 4 Eij ni. n.j n.. . Likelihood ratio test: XL = 2ln = 2(l(, 1, ..., 4) l(, 1 = ... = 4 = 0)) 2 , or 4 XL = 2 Oij log ( Oij ) 2 . 4 Eij CI is usually constructed by inverting the Wald test. Categorical Data Analysis - Lei Sun SAS codes and output of the LBW example. data x1;infile "bwt.dat"; input id low age lwt race smoke ptl ht ui ftv bwt; if ftv ge 4 then ftv=4; proc logistic descending; class ftv (para=ref ref=0); model low=ftv; run; The LOGISTIC Procedure Model Information Data Set Response Variable Number of Response Levels Number of Observations Model Optimization Technique WORK.X1 low 2 189 binary logit Fishers scoring Response Profile Ordered Value low Total Frequency 1 2 1 0 59 130 Class Level Information Design Variables Class Value 1 2 3 4 ftv 0 1 2 3 4 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 26 Categorical Data Analysis - Lei Sun Model Fit Statistics Intercept Only 234.672 Criterion Intercept and Covariates 228.992 -2 Log L Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq 5.6804 5.7541 5.4946 4 4 4 0.2243 0.2183 0.2402 Likelihood Ratio Score Wald Analysis of Maximum Likelihood Estimates DF Parameter Intercept ftv ftv ftv ftv 1 2 3 4 Estimate Standard Error 1 1 1 1 1 -0.5754 -0.6103 -0.6142 0.8630 -0.8109 Wald Chi-Square Pr > ChiSq 0.2083 0.4026 0.4793 0.7917 1.1373 7.6273 2.2976 1.6422 1.1885 0.5084 0.0057 0.1296 0.2000 0.2756 0.4758 Odds Ratio Estimates Point Estimate Effect ftv ftv ftv ftv 1 2 3 4 vs vs vs vs 0 0 0 0 0.543 0.541 2.370 0.444 95% Wald Confidence Limits 0.247 0.211 0.502 0.048 1.196 1.384 11.186 4.129 Global null hypothesis of 1 = ... = 4 : XW ald = 5.4946 with 4 d.f. Is it equal to the sum of individual Wald test of i = 0? XW = (1, ..., 4)[Cov(1, ..., 4)]1(1, ..., 4) Need to consider Cov (i, j )! 27 Categorical Data Analysis - Lei Sun 28 Logistic regression with quantitative covariates. Modeling the number of physician visits as a quantitative covariate. low ftv | 0| 1| 2| 3| 4| 6| Total ---------+--------+--------+--------+--------+--------+--------+ 0| 64 | 36 | 23 | 3| 3| 1| 130 ---------+--------+--------+--------+--------+--------+--------+ 1| 36 | 11 | 7| 4| 1| 0| 59 ---------+--------+--------+--------+--------+--------+--------+ Total 100 47 30 7 4 1 189 The following model treats the covariate as nominal, since it is invariant to the ordering of the categories. logit( (Xi )) = + 1 Xi1 + 2 Xi2 + 3 Xi3 + 4Xi4 , where Xi = (Xi1, Xi2, Xi3, Xi4), and Xij = 1 if ith woman had j visits, j = 1, 2, 3, 4 and Xij = 0 otherwise (j = 0 visits). Note that: inference of 1 needs only the rst and second columns, and inference of 2 needs only the rst and third columns, etc. For ordered factor categories, other models are more parsimonious than this, yet more complex than the independence model. Categorical Data Analysis - Lei Sun 29 We could consider modeling the number of physician visits (F T V ) as a quantitative covariate. F T V : the number of physician visits. Yi , i = 1, ..., n: independent Bernoulli random variables with P (Yi = 1|F T Vi ) = (F T Vi ) and logit(i ) = logit( (F V Ti)) = + F T Vi , or i = exp( + F T Vi) . 1 + exp( + F T Vi ) MLEs can only be obtained by numerical methods. Interpretation of the parameters: = logit( (F V T = 0)). : log odds of having low weight babies for women who had 0 visits. = logit( (F V T = a + 1)) logit( (F V T = a)). : the log-odds ratio of having low weight babies comparing women who had a + 1 visits with women who had a visits. Note that: inference of now takes into account all the data. Categorical Data Analysis - Lei Sun SAS codes and output. data x1;infile "bwt.dat"; input id low age lwt race smoke ptl ht ui ftv bwt; /* if ftv ge 4 then ftv=4; */ proc logistic descending; model low=ftv; output out=x2 xbeta=xbeta; proc plot; plot xbeta*ftv/ hpos=30 vpos =20; run; Model Fit Statistics Intercept Only 234.672 Criterion Intercept and Covariates 233.899 -2 Log L Testing Global Null Hypothesis: BETA=0 Test Chi-Square Likelihood Ratio Score Wald DF Pr > ChiSq 0.7731 0.7492 0.7432 1 1 1 0.3792 0.3867 0.3886 Analysis of Maximum Likelihood Estimates Parameter DF Estimate Standard Error Wald Chi-Square Pr > ChiSq Intercept ftv 1 1 -0.6868 -0.1351 0.1948 0.1567 12.4277 0.7432 0.0004 0.3886 Odds Ratio Estimates Effect ftv Point Estimate 0.874 95% Wald Confidence Limits 0.643 1.188 30 Categorical Data Analysis - Lei Sun The predicted values. The tted model: logit(i ) = .6868 + (.1351 F T Vi), or Plot of xbeta*ftv. Legend: A = 1 obs, B = 2 obs, etc. xbeta | -0.687 + Z | | -0.822 + Z | | -0.957 + Z | | -1.092 + G | | -1.227 + D | | -1.362 + | | -1.497 + A --+---+---+---+---+---+---+---0 1 2 3 4 5 6 ftv NOTE: 99 obs hidden. 31 Categorical Data Analysis - Lei Sun 32 The model predicted probability of having a low weight baby is given by: exp( + F T V ) exp(.6868 + (.1351 F T Vi )) = = . 1 + exp( + F T V ) 1 + exp(.6868 + (.1351 F T Vi )) e.g. if a woman had 0 physician visits (F T V = 0) during the rst trimester of her pregnancy, the model predicts that she has a 33% chance of having a low weight baby: i = exp(.6868 + (.1351 0) = .3347. 1 + exp(.6868 + (.1351 0)) F T V = 1: i = FTV FTV FTV FTV FTV = 2: = 3: = 4: = 5: = 6: i i i i i exp(.6868 + (.1351 1)) = .3053. 1 + exp(.6868 + (.1351 1)) = .2774. = .2512. = .2266. = .2038. = .1828. The probability decreases monotonically as the number of visits increases because of the model used. Categorical Data Analysis - Lei Sun 33 The predicted value of can be obtained in SAS: output out=x2 pred=pred; proc print; var ftv pred; run; Obs ftv pred 1 0 0.33475 2 3 0.25124 3 1 0.30537 ................... 187 0 0.33475 188 0 0.33475 189 3 0.25124 Condence intervals for the predicted values? The tted model: logit( )pred = + F T V. The variance of the predicted value: V ar(logit( )pred ) = V ar()+F T V 2 V ar( )+2F T V Cov (, ). The CI for logit( ): logit( )pred z/2 SE (logit( )pred ). The CI for : inverse the logit function. In SAS? Categorical Data Analysis - Lei Sun 34 Compare with the previous model: exp( + 1 X1 + 2 X2 + 3 X3 + 4X4 ) 1 + exp( + 1X1 + 2 X2 + 3 X3 + 4 X4) exp(.5754 + (.6103 X1 ) + (.6142 X2) + .8630 X3 + (.8109 X4 )) . = 1 + exp(.5754 + (.6103 X1 ) + (.6142 X2 ) + .8630 X3 + (.8109 X4 )) = where X = (X1, X2, X3 , X4), and Xj = 1 if a woman had j visits, j = 1, 2, 3, 4 and Xj = 0 otherwise (j = 0 visits). FTV FTV FTV FTV FTV = 0: i = .3600. = 1: i = .2340. = 2: i = .2333. = 3: i = .5714. = 4+: i = .2000. The probability does not decreases monotonically as the number of visits increases, because the model used allows for dierent parameter for each category. How can we examine whether there is a linear relationship between logit( ) and the number of physician visits? data x1;infile "bwt.dat"; input id low age lwt race smoke ptl ht ui ftv bwt; if ftv ge 4 then ftv=4; proc sort; by ftv; proc means noprint; var low; output out=x2 sum=sy n=ny; by ftv; data x3; set x2; pihat = sy/ny; lgt = log(pihat/(1-pihat)); proc print; var ftv sy ny pihat lgt; proc plot; plot lgt*ftv=*/hpos=30 vpos=20; run; Categorical Data Analysis - Lei Sun Obs ftv 1 2 3 4 5 0 1 2 3 4 sy 36 11 7 4 1 Plot of lgt*ftv. ny pihat 100 47 30 7 5 0.36000 0.23404 0.23333 0.57143 0.20000 Symbol used is *. lgt | 1+ | | | | * | 0+ | | |* | | -1 + | * * | * | | | -2 + -+------+------+------+------+0 1 2 3 4 ftv lgt -0.57536 -1.18562 -1.18958 0.28768 -1.38629 35 Categorical Data Analysis - Lei Sun 36 Methods for modeling quantitative covariates. Several methods are available for modeling quantitative covariates when one doubts that a linear association exists on the logit scale. Use dummy variables. Fit a polynomial model, i.e. including higher order terms such as: logit(i ) = logit( (F V Ti)) = + F T Vi + F T Vi2. Fit a model including other functions of covariates, e.g. log (F V Ti). Nonparametric procedures such as generalized additive models (GAM). A detailed example is provided by Figueiras A, Cadarso-Suarez C (2001). Application of nonparametric models for calculating odds ratios and their condence intervals for continuous exposures. American Journal of Epidemiology 154:264-275. Categorical Data Analysis - Lei Sun 37 Eect of categorizing quantitative covariates into qualitative/categorical covariates using dummy variables. Why categorize? The use of dummy variable to model quantitative covariates can result in a dramatic loss in power. Zhao and Kolonel (1992). American Journal of Epidemiology. Categories should be constructed using meaningful cutpoints ideally selected prior to looking at the data. The precision of estimated regression coecients may be enhanced to the extend that in each category there are either equal numbers of cases, equal number of controls, or equal number of subjects. Hsieh et al. (1991). Epidemiology. The selection of optimal cutpoints which maximize statistical signicance should be used very cautiously. Altman et al. (1994). Journal of National Cancer Institute.
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

UNAM MX - RADIOLOGIA - sexto
Upper Cervical AdjustingInnate Intelligence Chiropractorsrecognize that an inborn&quot;innate intelligence&quot; drives the humanmachine with the utilization of the nervoussystem. Specific upper cervical care uses amechanistic approach in the assessment of
George Mason - CEIE - 360
Fall 2011CEIE 360 Introduction to Transportation EngineeringProblems for HW PracticeInstructor: Mohan M. Venigalla, Ph.D., P.E.Department of Civil, Environmental and Infrastructure EngineeringEngineering Building, Room 1411, MSN 6C1Office hours: Wed
George Mason - CEIE - 360
Spring 2012CEIE 360 Introduction to Transportation EngineeringProblems for HW PracticeInstructor: Mohan M. Venigalla, Ph.D., P.E.Department of Civil, Environmental and Infrastructure EngineeringEngineering Building, Room 1411, MSN 6C1Office hours: W
George Mason - CEIE - 360
Spring 2012CEIE 360 Introduction to Transportation EngineeringHW 1 ProblemsInstructor: Mohan M. Venigalla, Ph.D., P.E.Department of Civil, Environmental and Infrastructure EngineeringEngineering Building, Room 1411, MSN 6C1Office hours: Wednesday 2:
George Mason - CEIE - 360
Spring 2012CEIE 360 Introduction to Transportation EngineeringTentative Course OutlineInstructor: Mohan M. Venigalla, Ph.D., P.E.Department of Civil, Environmental and Infrastructure EngineeringEngineering Building, Room 1411, MSN 6C1Office hours: W
George Mason - CEIE - 360
9/1/2011AimeeFlannery&amp;MohanVenigallaTransportationSystem Safeandefficientmovementofpeopleandgoodsinanenvironmentallyfriendlyway. Fourprimarymodesoftransportation Air Waterborne Pipeline SurfaceAimeeFlannery&amp;MohanVenigalla19/1/2011AirTranspo
George Mason - CEIE - 525
Concrete Slabs on GradeCEIE 525 Structural1Slab on Gradesupported directly on the groundoften abbreviated to SOG on drawingssometimes called slab on groundtypical cross sectionCEIE 525 Structural2Slab on gradeCEIE 525 Structural3Slab on Grade
George Mason - CEIE - 525
Structural Condi.on Surveys ASCE Standard 1199 Guideline for Structural Assessment of Exis.ng Buildings American Society of Civil Engineers (ASCE) and the Structural Engineering Ins.tute (SEI) CEI
George Mason - CEIE - 525
CEIE 525Structural Evaluation and Remediation Lecture 1 01/24/12WelcomeDr. Bill KilpatrickM.Eng. ProgramGeoConStructUnique programCourse Offerings:Core CoursesElectivesLarge Class: 24 studentsIntroduction of each studentClass Protocol:welcome
UCSD - ECE - 103
ECE 103, HW-1 Solution, Fall 2011Oct. 3-7, 2011
Strayer - BUS - 508
Running head: AMAZON.COMS E-BUSINESS MODELAmazons Business ModelRegina WoodDr. Linda HargisStrayer UniversityBus 508July 19, 20111. Discuss the pros and cons of Amazons growth and diversification of business andspecialization, and make recommendat
University of Illinois, Urbana Champaign - PHYSICS - 102
Physics 102: Lecture 7RC CircuitsPhysics 102: Lecture 7, Slide 1Recall .First we covered circuits with batteries andcapacitorsThen we covered circuits with batteries andresistorsseries, parallelseries, parallelKirchhoffs Loop and Junction Relati
University of Illinois, Urbana Champaign - PHYSICS - 102
Physics 102: Lecture 09Currents and MagnetismPhysicsPhysics 102: Lecture 9, Slide 1Summary of Today Last time: Magnetic forces on moving charge magnitude F = qvBsin() direction: right-hand-rule Today: Magnetic forces on currents and current loop
University of Illinois, Urbana Champaign - PHYSICS - 102
Physics 102: Lecture 10Faradays LawChanging Magnetic Fields create Electric FieldsPhysics 102: Lecture 10, Slide 1Last Two LecturesMagnetic fieldsForces on moving charges and currentsTorques on current loopsMagnetic field due to Long straight wir
University of Illinois, Urbana Champaign - PHYSICS - 102
Physics 102: Lecture 3Electric Potential Energy&amp; Electric PotentialPhysics 102: Lecture 2, Slide 1Overview for Todays Lecture Electric Potential Energy &amp; Work Uniform fields Point charges Electric Potential (like height) Uniform fields Point cha
University of Illinois, Urbana Champaign - PHYSICS - 102
Physics 102: Lecture 04Capacitors(&amp; batteries)Physics 102: Lecture 4, Slide 1Phys 102 so farBasic principles of electricity Lecture 1 electric charge &amp; electric force Lecture 2 electric field Lecture 3 electric potential energy and electric potent
University of Illinois, Urbana Champaign - PHYSICS - 102
Physics 102: Lecture 08MagnetismThis material is NOT onexam 1!Physics 102: Lecture 8, Slide 1Lecture Overview Magnetic fields Magnetic forces on moving charges Direction: Right hand rule MagnitudePhysics 102: Lecture 8, Slide 2Magnets &amp; magneti
University of Illinois, Urbana Champaign - PHYSICS - 102
Physics 102: Lecture 06Kirchhoffs LawsPhysics 102: Lecture 6, Slide 1Last Resistors in series:Time=Last LectureCurrent thru is same; Resistors WhatCircuitsabout this one?Physics 102: Lecture 6, Slide 2+ 2 + 3+Voltage drop across is IR i1
University of Illinois, Urbana Champaign - PHYSICS - 102
Physics 102: Lecture 05Circuits and Ohms LawPhysics 102: Lecture 5, Slide 1Summary of Last Time CapacitorsC = 0A/d C=Q/V Physical Series 1/Ceq = 1/C1 + 1/C2Ceq = C1 + C2 Parallel EnergyU = 1/2 QVSummary of Today Resistors PhysicalR = L/A V=
University of Illinois, Urbana Champaign - PHYSICS - 102
Physics 102: Lecture 02Coulombs Lawand Electric FieldsToday we will get some practice using Coulombs Law learn the concept of an Electric FieldPhysics 102: Lecture 2, Slide 1Recall Coulombs Law Magnitude of the force between charges q1 and q2sep
University of Illinois, Urbana Champaign - PHYSICS - 102
Welcome to Physics 102! Electricity + Magnetism(at the heart of most processes around us:in atoms &amp; molecules; living cells)OpticsAtomic PhysicsNuclear PhysicsRelativityhttp:/www.communicationcurrents.comPlease turn cell phones offPhysics 102: L
Appalachian State - CS - 1000
Appalachian State - CS - 1001
7-1( ) A B C D E A. B. C. D. E.
Appalachian State - CS - 1001
5 30 12 3 456 SC (9 ) _ _ _ _- - - - 7 120
Appalachian State - CS - 1001
5 30 12. 5 3.4. Jackson 5. 6 _ _ _ _- - - - 10 120 F(A&gt;1)(B=0) 2 20 1
Appalachian State - CS - 1001
1.210 20 1. A A BC D2. C A BC D3. A A B C D4. D A B C D5. B A B C D6. B A B C D7. C A
Appalachian State - CS - 1001
()1A B C D2A B C D3A B C D4A B C D5A B C D6()A BC D7
Appalachian State - CS - 1001
()1c2b3c4a5.b6d7c8c9B10.c11a12c13a14.c15D16a17b18b19d20c21.b22.b23.a24.c25.a123.:
Appalachian State - CS - 1001
()1SD()A B C D2A B C D3A B C D4A B C D5A B C D6()A B CD7()A B C
Appalachian State - CS - 1001
()1d2c3a4c5c6b7c8c9d10B11a12b13B14a15c16b17d18d19c20.b21.d22.c23.c24.c25.b1234
Appalachian State - CS - 1001
()1 CASEA CASEB CASEC CASED CASE 2Putnam A B C D3 McCall A B C D4ISO 3 ASQICBSQMCCSQRCDSQDC5A B C D6A B C D7A B C D8
Appalachian State - CS - 1001
() 1 20 1.C2.D3.C4.D5.B6.C7.B8.B9.A10.D11.B12.B13.A14.C15.A16.D17.D18.C19.B20.D 2 20 21.22.23.1124.25.26.27.28.29.30.F/(1+(n*i) 3 15 31.323334JSP
Appalachian State - CS - 1001
1. ABCD2. ABCD3. ADFDBPADCSC4.JSP ABCD5. ABCD6. ABCDCDCD7.
Appalachian State - CS - 1001
108213801 10821380144 AA5 120 1 15 12 3 45
Appalachian State - CS - 1001
1 1+1+ 1+1+1+1+212
Appalachian State - CS - 1001
1.D []ACBD2.A []AB CD3.C []ADFDBPADCSC4.JSP D []ABCD5.D []ACBD6.B []ABCDCDCD
Appalachian State - CS - 1001
1 20 1 CASEA CASEB CASEC CASED CASE 2Putnam ABCD3 McCall ABCD4ISO 3 ASQICBSQMC5AB6ACSQRCC
Appalachian State - CS - 1001
2073 20032004 ( 1 25 )123 4 DFD 56 7Jackson 89
Appalachian State - CS - 1001
2073 20032004 ()2004 7 ( 1 25 )1 2 34 DD5 6 78 9 10 11 1213 ( 2 10 )1D 2B 3C 4D 5A( 2 lo )1ABCD 2ABCD 3BD 4AC 5ACD( 2 10 )1X 2 3X 4X 5( 28 )1( 10 )
Appalachian State - CS - 1001
() 2510555510101015100120 1 20 1_C_A) B) C) D) 2(SQA) CMM _B_A) B) C) D) 3_D_A) PC B) C) D) 4_D_A) B) C) D) / 5 SQA _A) PC-Lint B
Appalachian State - CS - 1001
1 1 20 1 CASEA CASEB CASEC CASED CASE 2Putnam AB CD3 McCall ABCD4ISO 3 ASQICBSQMCCSQRCDSQDC5 ABCD6
Appalachian State - CS - 1001
20072008 B 04 1 2 30 1 ( )A B C D 2A B C D 3()A B C D 4
Appalachian State - CS - 1001
, (&quot;,&quot;. 1.5 , 15 )Warnier ,.()PAD ,. (),.(),.(),.().().,.(),.(),.(),(OOD).(),( 2 , 10 )
Appalachian State - CS - 1002
1 (A) (B) (C) (D)1.( b)A. B. C. D.( )A. B. C. D.( )A. B. C.ER D.3NF ( )A.0 B.1 C.3 D.( )A. B. C. D.( c)A. B. C. D.( D)A. B. C. D.
Appalachian State - CS - 1002
WRI 0601Chap1:1 internet internet internet 2 3 4 Internet protocol stackapplication :supportingnetworkapplicationstransport :processprocessdatatransferFTP,SMTP,HTTPTCP,UDP
Appalachian State - CS - 1002
1. ?a. b. : c. d. e. .2. ?a. -b. c. .3.
Appalachian State - CS - 1002
1.2.3.4.5. 3 3 1.2.3.4.5.1.2.3. ppt1.2.3.4. C 30%40%
University of Phoenix - ACC - 101
Internal Controls Paper1Internal Controls PaperBrandi MurobayashiACC 210September 5, 2010Mike WellsInternal Controls Paper2Internal Controls PaperAn internal control system is implemented within an organizationto safe guard assets, to check the
University of Phoenix - ACC - 101
LEARNINGTEAMCHARTERCourseTitleACC/210AccountingInformationSystemsMichaelWellsInstructorCourseDates 8/17/109/20/10Allteammembersparticipatedinthecreationofthischarterandagreewithitscontents_X_Pleasecheck)TeamMembers/PersonalInformationNamePhone
University of Phoenix - ACC - 101
1The Effects of Technology on the Accounting ProfessionThe Effects of Technology on the Accounting ProfessionBrandi MurobayashiACC 210August 23, 2010Mike Wells2The Effects of Technology on the Accounting ProfessionThe Effects of Technology on the
University of Phoenix - ACC - 101
Accounting Information Systems1Accounting Information SystemsNarciso Astorga, Chee-Chee Manghram, Brandi MurobayashiRhonda Powell, Tina SchniebsACC 210August 30, 2010Michael WellsAccounting Information Systems2FedEx Corporation is one of the lea
University of Phoenix - ACC - 101
How do payroll, accounts payable, accounts receivables, andaccounting information systems interact with one another? Do theywork efficiently? Explain your answer.FedEx offers customers a wide variety options to pay theirinvoices. Customers can pay the
University of Phoenix - ACC - 101
1The Systems Development Life CycleThe Systems Development Life CycleNarciso Astorga, Chee-Chee Manghram, Brandi MurobayashiRhonda Powell, Tina SchniebsACC 210September 13, 2010Michael Wells2The Systems Development Life CycleThe Systems Developm
University of Phoenix - ACC - 101
LEARNINGTEAMCHARTERTEAMCCourseTitleACC280PrinciplesofAccountingTeamMembers/ContactInformationNamePhoneTimezoneandEmailAvailabilityDuringtheWeekBrandiMurobayashi8087726851HawaiiTimezonebmurobayashi@gmail.comAvailableanytimeTeamGroundRulesand
University of Phoenix - ACC - 101
(a) Journalize the adjusting entries on August 31 for the 3-month period June 1August 31.GENERAL JOURNALDateAccount Titles and Explanation Reference2008Adjusting EntriesAug. 31 Insurance Expense722Prepaid Insurance130(To record insurance expired
University of Phoenix - ACC - 101
P15-6 (H - M)P15-6 The comparative statements of Dillon Company are presented below.DILLON COMPANYIncome StatementFor Year Ended December 312009Net sales (all on account)600,000ExpensesCost of goods sold415,000Selling and administrative120,800
University of Phoenix - ACC - 101
1Financial Statements PaperFinancial Statements PaperBrandi MurobayashiACC 280 Principles of AccountingOctober 31, 2011Charles Cornett2Financial Statements PaperFinancial Statements PaperThis paper will define the purpose of accounting and ident
University of Phoenix - ACC - 101
E1-1 Urlacher Company performs the following accounting tasks during the year._Analyzing and interpreting information._Classifying economic events._Explaining uses, meaning, and limitations of data._Keeping a systematic chronological diary of events.
University of Phoenix - ACC - 101
Questions2.State two generally accepted accounting principles that relate to adjusting the accounts.Two generally accepted accounting principles that relate to adjusting the accounts arethe revenue recognition and matching principles. Under the revenu
University of Phoenix - ACC - 101
E4-2 The adjusted trial balance columns of the worksheet for Goode Company are as follows.GOODE COMPANYWorksheet (partial)For the Month Ended April 30, 2008Adjusted TrialIncomeBalanceStatementBalance SheetAccount TitlesDr.Cr.Dr.Cr.Dr.Cr.Ca
University of Phoenix - ACC - 101
Brandi MurobayashiApril 11, 2011Mr. BrownACC 305Ch. 5 ExercisesE5-7 Peter Kalle Company had the following account balances at year-end: cost of goods sold $60,000; merchandiseinventory $15,000; operating expenses $29,000; sales $108,000; sales disco