UNL, STAT 875
Excerpt: ... Project #4 STAT 875 Spring 2009 Complete the following problems below. Within each part, include your R program output with code inside of it and any additional information needed to explain your answer. You may need to edit your output and code in order to make it look nice after you copy and paste it into your Word document. 1) (55 total points) Find a data set of your choice which is not in Agresti (2002, 2007). The data set needs to have a binary response variable, at least one continuous explanatory variable, and at least 4 explanatory variables overall. If your data set has not already been approved, please send it to me for approval first before beginning this problem. a) Find the best logistic regression model for the data set by following steps 1-5 shown in the Chapter 5 lecture notes. Make sure to be comprehensive AND organized in your write-up! When deciding if explanatory variables are important, use = 0.10. Below is an outline of the steps. i) (5 points) Find all possible one variable lo ...
N.C. State, ST 732
Excerpt: ... , then /2 2 1 + 0.346b . 9 Example: one covariate, sample of size 13 with 1 = -1.5, 2 2 = 0.75, g1,1 = b = 4; population average is shown in red, and the approximation in blue. 1.0 y 0.0 -4 0.2 0.4 0.6 0.8 -2 0 2 x 4 6 8 10 Case study: abnormal ECG in a drug trial. options linesize = 80 pagesize = 21 nodate; data ecg; retain id 0; infile 'ecg.txt' firstobs = 32; input seq r1 r2 count; if seq = 1 then /* Placebo followed by drug */ do i = 1 to count; id = id + 1; trt = 0; y = r1; period = 0; output; trt = 1; y = r2; period = 1; output; end; else /* Drug followed by placebo */ do i = 1 to count; id = id + 1; trt = 1; y = r1; period = 0; output; trt = 0; y = r2; period = 1; output; end; run; 11 title1 'Marginal Logistic Regression Model '; title2 'Crossover Trial on Cerebrovascular Deficiency'; proc genmod descending; class id; model y = trt period / d=bin; repeated subject=id / logor=fullclust; run; title1 'Mixed Effects Logistic Regression Model (Random Intercept)'; title2 'Crossover Trial ...
Columbia, P 8400
Excerpt: ... g) PRISON: whether the addict had prison history, yes=1, no=0 1. Fit a linear regression model to estimate number of days in treatment from prison history and maximum methadone dose. Report the intercept, partial regression coefficients and interpret them. (6 points) 2. Run a logistic regression model to predict the completion of treatment (FINISH) with DOSE and PRISON as explanatory variables. Report and interpret the odds ratios for DOSE and PRISON. (4 points) 3. Use Kaplan-Meier method to estimate whether prison history influences the number of days in treatment. What is the median number of days in treatment for with and without prison history? Interpret the Log-Rank test. (2 points) 4. Run a Cox model, which includes both DOSE and PRISON as explanatory variables. Report and interpret the hazards ratios for DOSE and PRISON. (4 points) 5. There are problems in fitting a linear regression model for this data. What are they? (2 points) 6. Is it right to fit a ...
Michigan State University, CSE 847
Excerpt: ... Homework 4 Due: Feb. 27, 2008 (midnight) February 18, 2008 Problem 1 (10pt) Show that the log-likelihood function of the logistic regression model is concave. More specifically, given a set of training examples D = {(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )}, where each xi Rd and yi {-1, +1}, the loglikelihood function function L(D) of the conditional exponential model is written as: N L(D) = i=1 N log p(yi |xi ) 1 1 + exp -yi (xi w + c) = i=1 log where w and c are the weights and threshold need to be determined. To show that the log-likelihood function L is concave, you need to show that the Hessian matrix of L, i.e., the second partial derivatives of L, is negative semidefinitive. Problem 2 (20pt) Implement the logistic regression model for classification (without regularization). The training data can be found in file http:/www.cse.msu.edu/cse847/assignments/spam.train.txt. Each row in the file corresponds to a training data point. The last attribute of each data point is the class label (eithe ...
Penn State, STAT 501
Excerpt: ... Stat 501 Lab 15 1 A simple logistic regression model The data set tongue.txt contains data concerning the treatment of tongue cancer with radiation therapy (Mendenhall, et al, 1989). Speci.cally, the data set contains the values of two variables on 24 patients: y = response; is 1 if the disease is absent after three years, and is 0 if the disease is present after three years x 1 = days; the number of days the patient received radiation therapy. The investigators were interested in summarizing the relationship of treatment response to the number of days of treatment. 1. Create a scatter plot of y = response versus x1 = days: Does there appear to be a relationship between response and days? 2. Use Stat > Regression > Fitted line plot. to estimate the linear regression function between response and days: (a) Recall that the least squares line estimates the linear relationship between the predictor and the mean of the response. What does the y axis represent in this situation in which we have a binary ...
Cornell, BTRY 6030
Excerpt: ... of the general linear model ( eg. Multiple regression,ANOVA). Many relationships cannot be adequately summarized by a simple linear equation, for two major reasons: Distribution of dependent variable Link function Generalized Linear Model Special case is logistic regression model . Special case is the loglinear model. Components of generalized linear model The random component identifies the response variable Y and assumes a probability distribution for it. The systematic component specifies the explanatory variables used as predictors in the model. The link describes the functional relationship between the systematic component and the expected value (mean) of the random component. Random component Let Y1, Y2,.Yn denote the independent observations for the response variable Y for a sample size N. The random component of a GLM consists of identifying the response variable Y and selecting a probability distribution for (Y1, Y2,.Yn). Random component If the potenti ...
UMass (Amherst), BE 640
Excerpt: ... Assignment 13 Review of Logistic Regression (see esb08p15.sas for example problem) Problems (Note: Any SAS output should include program documentation.) As part of a neighborhood study in Boston, a random sample of subjects in each of three areas (low, medium, and high walkability areas) are classified as to whether or not they are obese. The results are as follows: Walkability Low Medium High Obese 8 6 1 Not Obese 40 60 40 1. State and test an appropriate hypothesis about obesity status in this setting (using a chi square test). 2. Based on the result of a contingency table, estimate the odds of obesity for each type of walkability area. 3. Using the results of the contingency table, estimate the odds ratio for obesity for each Walkability level, picking a suitable reference group. 4. Fit a logistic regression model to these data, and estimate a 95% confidence interval for the odds ratios that you computed in 3). 5. Use walkability as score (a continuous variable in the logistic regression analysis), re ...
UMass (Amherst), BE 640
Excerpt: ... Assignment 9 Reading Chapter 22 (pp639-646) on Maximum Likelihood in Kleinbaum, Kupper, et al. Chapter 23 (pp656-664) Logistic Regression Analysis. in Kleinbaum, Kupper, et al. Review Programs in esb05p27.sas; esb05p28.sas Problems (Note: Any SAS output should include program documentation.) Exercise: 9.1. A sample of 500 college students participated in a study designed to evaluate the level of college students' knowledge of a certain group of common diseases. The following table shows the students classified by major field of study and level of knowledge of the group of diseases. Suppose that we wish to fit a logistic regression model to examine the relationship between disease knowledge for different majors. Table 9.1. Disease Knowledge Major PreMed Other Total Good Poor Total 31 19 50 91 359 450 122 378 500 9.1a. Evaluate the odds of good disease knowledge for premed students, and for other students. Also, evaluate the natural logarithm of these odds. 9.1b. In a population, suppose that we represent Ln ...
UMass (Amherst), BE 640
Excerpt: ... Assignment 9 Review Programs in esb05p27.sas; esb05p28.sas Problems (Note: Any SAS output should include program documentation.) Exercise: 9.1. A sample of 500 college students participated in a study designed to evaluate the level of college students' knowledge of a certain group of common diseases. The following table shows the students classified by major field of study and level of knowledge of the group of diseases. Suppose that we wish to fit a logistic regression model to examine the relationship between disease knowledge for different majors. Table 9.1. Disease Knowledge Major PreMed Other Total Good Poor Total 31 19 50 91 359 450 122 378 500 9.1a. Evaluate the odds of good disease knowledge for premed students, and for other students. Also, evaluate the natural logarithm of these odds. 9.1b. In a population, suppose that we represent Ln ( Odds | Pr emed ) = , and Ln ( Odds | Other ) = + . Write an interpretation of the parameter , and provide an interpretation of the parameter e . 9. ...
Cornell, ORIE 474
Excerpt: ... ORIE 474 D. Ruppert Homework #2 due Wed, Oct 23, 2002 1 2 3 4 0 .3 .2 .05 .05 .6 Extimate There will be no class on Friday, Oct 11. Also, I will not have ofce hours on Oct 10 and 11. You should develop a good logistic regression model for prediction of the target. Write a brief report discussing your model and how you arrived upon it. Also include lift charts for %Respone, %Captured Response, Lift value, and Prot include either cumulative or noncumulative charts, but not both. Also include a plot of the ROC curve. 2. A stratied sample was taken of a population. The response rate, that is, the probability that is 2% in the population, but was xed at 40% by stratication of the sample. The sample proportions were: P P Note that responders have been oversampled half of the sample are responders. The actual response rate is approximately 1%. For this exercise, choose a response rate somewhere between 0.75% and 1.5% and use that value. There will be a guest lecturer, Kurt Hol ...
Allan Hancock College, STA 3301
Excerpt: ... STA3301 Statistical Models Practical Week 9 Question 1 Answer problem 7.1 in Dobson 2nd edition (or 8.1 in the first edition) using r. Fit the model using the lower limit of the Dosage range as the predictor. Question 2 Answer problem 7.3 Dobson 2nd edition (or 8.3 in the first edition) using r. Question 3 Consider the gamma distribution written in the form below: exp(-y/)y -1 . f (y; , ) = ()(/) Show that the deviance is n D=2 j=1 - log yj j + y j - j . j Question 4 A data set in ICU.dat consists of a sample of 200 subjects who were part of a much larger study on survival of patients following admission to an adult intensive care unit (ICU). The major goal of this study was to develop a 1 logistic regression model to predict the probability of survival to hospital discharge of these patients. Clinicians associated with the study felt that age was a key determinant of survival. The data which are recorded in ICU.dat are Vital Status (0 means Lived; 1 means Died) and Age in years. (a) Write do ...
UPenn, VHM 802
Excerpt: ... Epidemiology/Biostats VHM 812/802 Course Winter 2008, Atlantic Veterinary College, PEI Henrik Stryhn Index of Lecture 3b Page 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Title Introduction to logistic regression Dataset nocardia (VER) Example dataset mice Why not linear regression? Logit transformation Logistic regression model Logistic regression for mice data 2 2table analysis 2 2table and logistic regression Case-control study and logistic regression Two-way table and logistic regression Linear vs. logistic modelling Odds-ratio in multiple logistic regression Statistical inference for logistic regression Stata do-le (selection) L3b-0 Introduction to Logistic Regression Logistic regression: binary (0/1, dichotomous) outcomes, possibly grouped to binomial outcomes (e.g., 3 positive out of 10 animals), rst of several regression-type models not relying on normal distribution assumptions, sometimes called generalised linear models (glms), model building from the predictors simil ...
Washington, BS 536
Excerpt: ... r heterogeneity and tests for linear trend and departure from linear trend. After successfully completing this course, you can ordinarily expect to be able to: 1. Fit appropriate logistic regression model s to data from epidemiologic case-control studies using STATA and evaluate the fit of these models. Interpret regression coefficients from logistic regression model s fit to case-control data and test hypotheses about them. Explain when logistic regression methods should be replaced by conditional logistic regression methods. Present results of analyses using logistic regression to readers who are not familiar with logistic regression. 2. 3. 4. ...
Harvard, HST 951
Excerpt: ... a function of the low birth weight indicator), as well as a correlation matrix. What results appear the most striking to you? Create a random test set of 200 cases and show whether your sample is representative of the population. If not, try another one. From here on, calculate the Brier score, the c-index, plot the ROC, and calculate the Hosmer-Lemeshow (HL) goodness-of-fit (both types) for all requested models in the training and test sets. Use R to run your logistic regression model s, and either netlab or NevProp from http:/brain.unr.edu/FILES_PHP/show_papers.php#software. Do not use off-the-shelf ROC software please! (ii) Run a logistic regression model using maternal smoking to predict low birth weight. What are your inferences? What is the estimated probability of having a low birth weight infant for a woman who smokes 30 cigarettes per day? What is the estimated probability of having a low birth weight infant for a nonsmoking woman? What is the estimated odds ratio associated with a 30 cigarette per ...
University of Michigan, PERSONAL 510
Excerpt: ... "AFIFI.DAT"; INPUT #1 IDNUM 1-4 AGE 5-8 SEX 13-15 SURVIVE 16 SHOKTYPE 17-20 SBP1 21-24 MAP1 25-28 HEART1 29-32 CARDIAC1 45-48 2 URINE1 57-60 HGB1 69-72 1 #2 SBP2 21-24 MAP2 25-28 HEART2 29-32 CARDIAC2 45-48 2 URINE2 57-60 HGB2 69-72 1; RUN; 2 5. Create new variables. You can create these new variables in the data step that you use to read in the raw data. a) DIED = 1 if the person died, 0 if the person lived. This is the variable you will use for logistic regression model s. b) DIED2 = 1 if the person died, 2 if the person lived. This is the variable you will use for cross-tabs. c) SHOCK = 1 if the person was in shock (shoktype=3,4,5,6,7) or 2 if the person was not in shock (shoktype=2). This is the variable you will use for cross-tabs. d) SHOCKDUM = 1 if the person was in shock, = 0 if the person was not in shock. This is the dummy variable you will use for your logistic regression model s. 6. Create a cross-tab between SHOCK and DIED2. Remember, SHOCK is the risk factor, and DIED2 is the outcome. a) What ...
UPenn, VHM 802
Excerpt: ... Epidemiology/Biostats VHM 812/802 Course Winter 2007, Atlantic Veterinary College, PEI Henrik Stryhn Index of Lab/tutorial 3 Page Title 1 Logistic regression menu (Minitab) 2 Logistic regression results(Minitab) Notes for Exercises in Session 3 VER:16.1; VER:Chapter 15 problem (VER:16.2) Outline of lab session: brief demonstration of Minitab facilities for logistic regression (using calf dataset from VER:16.1-2), (upon request only) further demonstration/discussion of Minitab/Stata analysis for linear (regression) models, individual work/discussions on the exercises (both Minitab and Stata), VER:16.2 involves material not covered in Fridays lecture, any questions on linear and logistic regression model s (conceptual and/or practical). P3-0 Logistic Regression Menu (Minitab) Menu item Stat-Regression-Binary, Model setup: choice between binary and grouped data formats, model terms entered in box (factors in both boxes), Graphs: diagnostics plotted either agains ...
Stanford, HRP 261
Excerpt: ... of Uterine Irritability (1 = Yes, 0 = No) History of Premature Labor (0 = None, 1 = Yes) OBS AGE LOW LWT SMOKE HT UI PTD - a. Examine all the variables. Do you notice any obvious errors? If so, fix the errors using your best guess for what the values should be. b. Report unadjusted odds ratios (and 95% confidence intervals) for all predictors of interest (note that some of the variables above are not predictors). Round to two significant digits. c. Report adjusted odds ratios from a single logistic regression model that includes all variables in (b) where the p-value was <.10. d. Which of the predictor variables from (c) are significantly associated with having a history of premature labor? Use a single logistic regression to answer this question. e. What are the best predictors of low birth weight pregnancy among women who do not have a history of premature labor? f. What are your conclusions? Is "history o ...