This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Simulation exercises in R Master in Statistical DataAnalysis Simulation uses methods based on random numbers to simulate a process of interest on the computer. The goal is to learn important statistical and/or practical information about the process. In statistics, simulations can be used to create simulated data sets in order to study the accuracy of mathematical approximations and the effect of assumptions being violated. We will study properties of some quantities that can be calculated from a set of data which are a random draw from a population. Some aspects that are used throughout the exercises are given below. 1. Random numbers form a basic tool for any simulation study. Simulations require the ability to generate random numbers. On a computer, it is only possible to generate ‘pseudorandom’ numbers which for practical purposes behave as if they were drawn randomly. All random number generators essentially work as follows: (a) A seed number is needed as input for the process of generating a random number. This seed can be supplied by the user or the computer generates the seed e.g. as a function of the data. (b) The seed number is put into mathematical functions that eventually return a random number and a new seed that will be used to generate the next random number. In R, ‘set.seed’ declares the seed for the random generator. If we use this command before a random number generating statement, we are able to retain the same number each time we provide the same seed. set.seed(7) rnorm(1) 2. The forloop (see introduction to R): for ( var in vector ) { statements } 3. The ifloop (see introduction to R): if ( test ) { statements } else { statements } or ifelse( test , statement for test is true , statement for test is false ) 1 2 1. Population versus sample In a first step, we will focus on the difference between a population and a sample from a population. To this end, we use the data set of the BIRNHstudy. In particular, the variable of interest is diastolic blood pressure. birnhdata<read.delim("C:/Temp/Birnh.dat",header=TRUE,sep=",") x<birnhdata$DIASTOL 1. To better understand the distinction between a population and a sample, assume (incorrectly) that the population of interest is the group of 5815 individuals involved in the BIRNHstudy. Calculate the population mean and population variance of the diastolic blood pressure. What is the interpretation of these measures? mean(x,na.rm=T) a<mean(x,na.rm=T) sum((x[!is.na(x)]a)^2)/(5815sum(is.na(x))) 2. In medical studies it is usually impossible and not worthwhile to gather data from the entire target population. One generally needs to investigate variables of interest based on a smaller sample which is randomly selected from the original population....
View
Full Document
 Winter '10
 TIBSHIRANI,R
 Statistics, Normal Distribution, Variance, Statistical hypothesis testing

Click to edit the document details