BS 805
Class 4
COMBINING SAS DATA SETS
Adding observations to data sets - set statement
Adding variables to data sets - merge statement
Modifying information in a data set - update statement
Optional statements used with these three statements:
by to
BS805
Class 2
Arrays and Two Factor ANOVA
Arrays
o array name
o size of the array
o variables inside the array
do loops
o index of the loop
o bounds of the loop
Useful with repeated measurements on each subject
Two Factor ANOVA (ANalysis Of VAriance)
BS 805
Class 6
DUMMY VARIABLE REGRESSION AND ANALYSIS-OFCOVARIANCE (ANCOVA)
I. Dummy Variable Regression
Situation
1. Regression
2. continuous dependent (outcome) variable
3. both continuous and categorical independent (predictor) variables.
Examples of c
BS 805
Class 9
Changing Levels in SAS Data Sets
Overview
1) Converting one record per subject to n records per subject - output statement
2) Converting n records per subject to one record per subject
a) Summarizing over n records - proc means
b) Taking on
BS 805
Class 5
Multiple Linear Regression
Linear Regression
Dependent Variable: measured (continuous) variable. Also called the outcome
variable.
Independent Variables: measured (or binary) variables that might influence the
value of the dependent variabl
BS 805
Class 7
Regression Diagnostics and Goodness of Fit
Situation
We want to fit a multiple regression model to a data set but are unsure that
assumptions used in the model are satisfied by the data
To have more confidence that the model is appropriate
BS 805
Class 1
Data Sets, Advanced Input
1.
Overview:
A.
Data Set structure
B.
Temporary SAS Data Sets:
Made in SAS data step work.dsname
used by SAS during ordinary SAS programs
not for storage beyond the SAS run
C.
Stored SAS Data Sets:
Made using lib
BS 805
Class 11
Macros
Purpose
1. Macros are used to repeat a series of SAS statements.
a. Useful if a series of statements is used often.
b. Example would be a series of checks to run on all variables in data
sets. A macro ensures a standardized series o
BS805
Class 8
COLLINEARITY DIAGNOSTICS, PIECEWISE LINEAR
MODELS
I.
Collinearity Diagnostics
When one is analyzing data using a multiple regression model, a problem can arise
such that an independent variable(s) is related statistically and strongly to ano
BS 805
Class 3
DATES AND FUNCTIONS
MULTIPLE COMPARISONS IN TWO FACTOR
ANOVA
Using Dates in SAS
o Informats (read in dates from raw data file)
o Formats (control how dates are presented in output)
o Inputting dates along with other variables
Functions
o
BS704 Assignment 4
Goals of the Assignment
Use R to generate and summarize descriptive statistics on a sample of patients participating
in a study to evaluate the impact of an interactive web based asthma education program that
focuses on managing asthma
BS704 Assignment 8
Goals of the Assignment
Conduct and interpret analysis of variance in R, and
Conduct and interpret pairwise post-hoc tests in R.
Dataset for Analysis
This homework uses a masked dataset based on 200 mother-infant pairs in the Maternal
BS704 Assignment 5
Goals of the Assignment
Compute confidence intervals for a mean and proportion by hand,
Compute confidence intervals for a mean and proportion using R,
Create new continuous variables using R,
Compute stratified descriptive statisti
BS704 Assignment 10
Goals of the Assignment
Perform and interpret sample size calculations by hand and using R.
Description of the Study
A group of researchers are interested in estimating the mean age at which people develop
Alzheimers dementia and the
BS704 Assignment 9
Goals of the Assignment
Conduct and interpret chi square goodness of fit tests by hand and using R, and
Conduct and interpret chi square tests of independence by hand and using R.
Dataset for Analysis
Data include demographic and clin
BS704 Assignment 12
Goals of the Assignment
Interpret slope and y-intercept of simple linear regression model with a categorical
predictor,
Interpret slopes of multiple linear regression models,
Interpret coefficients of determination for multiple line
BS704 Homework 11
This assignment focuses on correlation and linear regression. R commands for computing the
correlation and running a linear regression model are described in Section 4.1 of the course R
manual. When turning in your homework please includ
BS704 Assignment 2
Goals of the Assignment
Distinguish variable types for analysis,
Create new variables using R,
Use R to generate descriptive statistics for different variable types, and
Summarize and interpret descriptive statistics.
Dataset for An
BS704 Assignment 6
Goals of the Assignment
Conduct and interpret tests of hypothesis to compare groups with respect to
continuous and dichotomous outcomes by hand and using R.
Dataset for Analysis
A study was conducted to evaluate the impact of an intera
BS704 Assignment 1
Goals of the Assignment
Install R on your computer,
Enter data into an Excel spreadsheet for analysis,
Convert the Excel spreadsheet into a comma separated values (.csv) file for analysis,
Bring a dataset (.csv file) into R for stat
BS704 Assignment 13
Goals of the Assignment
Perform logistic regression in R and interpret the results, and
Perform survival analysis in R and interpret the results.
Dataset for Analysis
These data were collected at the Baystate Medical Center, Springfi
BS704 Assignment 3
Goals of the Assignment
Distinguish variable types for analysis,
Create new variables using R,
Use R to generate and compare frequency distributions of clinical risk factors in key
subgroups, and
Summarize systolic blood pressures i
Experimental studies cause and effect
A type of experimental study where patients are randomized to receive
one of several comparison treatments.
- Advantages: Gold standard from a statistical point of view,
minimizes bias and confounding; Disadvantages:
Lecture 1
Observational studies inferences limited to descriptions and associations;
with carefully designed analysis can make stronger inferences (statistical
adjustment)
Case report: Detailed report of specific features of case.
Case series: Systematic
Example Interpretation: Weare95%confidentthatthetrueproportionofpatients
onantihypertensivemedicationisbetween33%and36%
*when comparing Cis: suggests real significant difference if there is no overlap of
confidence intervals
Confidence Intervals for CONTI
Percentiles of the Normal Distribution
The kth percentile is defined as the score that holds k percent of the scores below it.
For example: 90th percentile is the score that holds 90% of the scores below it.
Q1 = Lower Quartile = 25th percentile
Median =
Central Limit Theorem for Non-Normal Distributions
=standard error(variability in sample mean)
Estimation
Goal - To make valid inferences about the population parameter based on a single
random sample from the population.
There are two types of estimates