rsim-1 - Simulation exercises in R Master in Statistical...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Simulation exercises in R Master in Statistical Data-Analysis Simulation uses methods based on random numbers to simulate a process of interest on the computer. The goal is to learn important statistical and/or practical information about the process. In statistics, simulations can be used to create simulated data sets in order to study the accuracy of mathematical approximations and the effect of assumptions being violated. We will study properties of some quantities that can be calculated from a set of data which are a random draw from a population. Some aspects that are used throughout the exercises are given below. 1. Random numbers form a basic tool for any simulation study. Simulations require the ability to generate random numbers. On a computer, it is only possible to generate pseudo-random numbers which for practical purposes behave as if they were drawn randomly. All random number generators essentially work as follows: (a) A seed number is needed as input for the process of generating a random number. This seed can be supplied by the user or the computer generates the seed e.g. as a function of the data. (b) The seed number is put into mathematical functions that eventually return a random number and a new seed that will be used to generate the next random number. In R, set.seed declares the seed for the random generator. If we use this command before a random number generating statement, we are able to retain the same number each time we provide the same seed. set.seed(7) rnorm(1) 2. The for-loop (see introduction to R): for ( var in vector ) { statements } 3. The if-loop (see introduction to R): if ( test ) { statements } else { statements } or ifelse( test , statement for test is true , statement for test is false ) 1 2 1. Population versus sample In a first step, we will focus on the difference between a population and a sample from a population. To this end, we use the data set of the BIRNH-study. In particular, the variable of interest is diastolic blood pressure. birnhdata<-read.delim("C:/Temp/Birnh.dat",header=TRUE,sep=",") x<-birnhdata$DIASTOL 1. To better understand the distinction between a population and a sample, assume (incorrectly) that the population of interest is the group of 5815 individuals involved in the BIRNH-study. Calculate the population mean and population variance of the diastolic blood pressure. What is the interpretation of these measures? mean(x,na.rm=T) a<-mean(x,na.rm=T) sum((x[!is.na(x)]-a)^2)/(5815-sum(is.na(x))) 2. In medical studies it is usually impossible and not worthwhile to gather data from the entire target population. One generally needs to investigate variables of interest based on a smaller sample which is randomly selected from the original population....
View Full Document

Page1 / 9

rsim-1 - Simulation exercises in R Master in Statistical...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online