STA 371G R Commands Create a list of numbers (vector) o Name_of_list <- c(#1, #2, #3 …) Calculating and naming sample statistics o sample_mean <- mean(Name_of_list) o sample_variance <- var(Name_of_list) o sample_standard_deviation <- sd(Name_of_list) Calculate a sample statistic of only the first 5 numbers of a list o Sample_mean_5 <- mean(Name_of_list[1:5]) Calculate 95% confidence interval o Avg_price_ci_95 <- t.test(Name_of_list,conf.level=0.95) Make a histogram of a variable from a dataset o Hist(datasetname$variablename, main=’’, xlab=’Title of the variable’, col=’color’) Simple regression model o Model <- lm(y_variable ~ x_variable) o Summary(model) Predict/extrapolate a value from a linear model o Predict.lm(model’s_name, list(x_variable=#,x_variable=#) Create a Confidence Interval for a regression model (range that 95% sure contains the true slope andintercept) Create a confidence and prediction interval for a specific X value (confidence = what is the average response at that x value, prediction = what is the exact y value for the x value) Make a scatterplot of 2 variables from a dataset
plot(stock_market_returns$W5000, resid(model), pch=16, +col='green', xlab='W5000', ylab='Residuals') o Errors are normally distributed Look at scatterplot of residuals and look for appx. Normality > hist(resid(model’s_name), col='darkred', + xlab='Residuals', main='') Look at Q-Q plot of residuals to look for a straight line > qqnorm(resid(model), main='') o Variance of Y is the same for any value of X (homoscedasticity) Look at the residual plot to make sure there is roughly equal spread all the way across Create a multiple regression model o Model <- lm(Y ~ X 1 + X 2+ X 3 …, data=dataset_name) o Summary(model) Summary with rounded decimal places o Round(summary(model)$coefficients,3) Testing if the whole multiple regression model is significant o Use P-value of the overall model o To see how good the predictions are: Check histogram of residuals for normality Hist(model$residuals, col=’green’, main=’’, xlab=’Residuals’, ylab=’Frequency’ Check the mean of the residuals, should be very close to 0 Mean$model$residuals) Find the SD of the residuals Sd(model$residuals) You now can create a distribution of the residuals with the mean and SD – to see how much of the data (Y) falls within these standard deviations Can obtain these statistics directly from the regression model Summary(model)$sigma – gives you residual standard error, also found on the summary page in general
You've reached the end of your free preview.
Want to read all 9 pages?