Lecture4 (1).pdf - Random Number Functions Simulation(Rizzo...

This preview shows page 1 out of 28 pages.

Unformatted text preview: Random Number Functions Simulation (Rizzo Ch 6) BTRY/STSCI 4520 1 / 29 Random Number Functions R has a number of built-in functions associated with specic distributions. These provide the density, distribution and quantile functions of each distribution as well as generating random numbers from them. Example: the normal family dnorm Normal density. pnorm Cumulative normal density. qnorm Normal quantiles (q(p) = {u : P(x ≤ u) = p}) rnorm Generates normal random numbers. Standard pattern d...: density, p...: distribution, q...: quantiles r...: random numbers: just add appropriate distribution name. 2 / 29 Examples x = seq(-3,3,0.1) # sequence of numbers par(mfrow=c(2,2)) # Plot window with 2-by-2 panes # filled by row. plot(x,dnorm(x),type='l') plot(x,pnorm(x),type='l') p = seq(0,1,0.01) # probabilities between 0 and 1 plot(p,qnorm(p),type='l') z = rnorm(100,mean=3,sd=5) plot(z) 3 / 29 R Built-in Distributions beta binom cauchy chisq exp f gamma geom hyper beta binomial Cauchy Chi-squared exponential Fisher's f gamma geometric hypergeometric logis logistic lnorm log normal nbinom negative binomial norm normal pois Poisson t Student's t unif uniform Weibull Weibull Numerous others available in packages. Later in the class, we will discuss how to generate random numbers from scratch. 4 / 29 The sample Function Sometimes you want to randomly select elements from a set: x = sqrt(1:10) y = sample(x,5) # 5 entries from x drawn at random If sample is given n instead of x, it assumes 1:n sample(10,5) You can also sample with replacement. This means each time you choose from the original collection and get can repeated elements: > sample(x,5,replace=TRUE) [1] 2.828427 2.645751 2.449490 2.000000 2.828427 5 / 29 Simulation Why do statisticians need to generate random numbers at all? 6 / 29 Simulation Why do statisticians need to generate random numbers at all? To torture students. 7 / 29 Simulation Why do statisticians need to generate random numbers at all? To torture students. To investigate the sample properties of complex statistical procedures. 8 / 29 Simulation Why do statisticians need to generate random numbers at all? To torture students. To investigate the sample properties of complex statistical procedures. To verify mathematical results (e.g. the square of a normal is a χ21 ) 9 / 29 Simulation Why do statisticians need to generate random numbers at all? To torture students. To investigate the sample properties of complex statistical procedures. To verify mathematical results (e.g. the square of a normal is a χ21 ) To assess convergence to asymptotic results (eg, central limit theorem). 10 / 29 Simulation Why do statisticians need to generate random numbers at all? To torture students. To investigate the sample properties of complex statistical procedures. To verify mathematical results (e.g. the square of a normal is a χ21 ) To assess convergence to asymptotic results (eg, central limit theorem). Within some statistical procedures (later in the course). 11 / 29 Evaluating the Central Limit Theorem We know that if X1 , X2 , . . . are i.i.d. with mean µ and variance σ 2 , but no other conditions  P √ n1 ni=1 Xi − µ d → N(0, 1) n σ But how close is this for a given, nite n? Eg, suppose we had three negative binomial random variables with p = 0.07? Z = rnbinom(500,1,0.07) hist(Z,50,freq=FALSE) x = seq(0,100,5) lines(x,dnbinom(x,1,0.07), type='b',col='blue') 12 / 29 A Simulation Experiment Recall that EZ = 1−p p = 13.286, Var(Z ) = 1−p p2 = 189.796 nsim = 500 # Define simulation parameters n = 3 p = 0.07 mu = (1-p)/p sig = sqrt( (1-p)/p^2 ) res = rep(0,nsim) for(i in 1:nsim){ # Run Simulations Z = rnbinom(n,1,p) res[i] = sqrt(n)/sig*( mean(Z) - mu ) } hist(res,50,freq=FALSE) x = seq(-3,3,0.1) lines(x,dnorm(x)) # Plot Results 13 / 29 Some Comments On Good Simulation Code Pre-dene as many simulation constants into R objects as possible. This way, to change the simulation (eg, if we wanted to change p ), we only need to do so in one place. Pre-compute numbers that will be used many times (mu and sig above). Store as much as possible; you never know when you will need it. Ideally, you can re-create exactly the same results from your simulation if you need to (I maybe should have stored all the Z that I generated). Alternatively: set the random number seed using set.seed() to ensure you always generate the same random numbers  we will cover this later in the class. 14 / 29 Some More Comments On Good Code Further general guidelines: Organize your code in coherent .R les (functions related to a particular task, simulations for a particular set of results). Provide an over-all description at the start of the le - what is this code doing? Date the last time you modied the code  even better, keep a running log of changes you have made (admission: I'm not very good at this). Arrange your code into coherent blocks; reading/generating data, running the simulation, collating results and make sure these are clearly separated and labeled (this often happens naturally). When you save simulation results (including plots from them), include as much information in the le name as possible (esp. the date); keep a le documenting exactly what each of these results means and contains. 15 / 29 Comment everything! Larger n Because I pre-dened n=3, it's easy to change it: to 30, 100, 500 16 / 29 Evaluating Densities It is dicult to judge how good a histogram is; QQ-plots do better. From qqnorm(res): 17 / 29 Simulation Size Guidelines Many simulations aim to estimate an average (ie, an expectation). E.g.: we want the variance of the mean of Z as generated above: nsim = 500 n = 3 p = 0.07 mu = (1-p)/p res = rep(0,nsim) for(i in 1:500){ Z = rnbinom(n,1,p) res[i] = mean( (Z - mu)^2 ) } mean(res) 18 / 29 How Many Simulations? Usually want to estimate the number to within a tolerance Remember that Var(X¯ ) = Var(X )/N . From initial simulation of 500, sd(res) about 354. So we need 3542 ≈ 125, 000 simulations so that Var(mean(res)) is about 1. Calculations apply very generally: Estimate a probability: expectation of an indicator; e.g. mean(res > 300) (0.15) Variance: mean( (res-mean(res))2 ) (95504) For probabilities, remember if X1 , . . . , XN are binary, Var(X¯ ) = p(1N−p) ≤ 41N . If you want p to within δ , you need N ≥ 1/4δ 2 . (δ = 0.01: N = 2500, δ = 0.001: N = 25, 000) 19 / 29 Application: Statistical Inference Frequentist inference: Our data = 1 realization of a random process. Statistical procedures work over most alternative realizations too. Condence intervals: would cover truth in 95% of alternatives. Hypothesis tests: would not reject 95% of alternatives when the null is true. Alternative realizations?? 1,000 equivalent data sets where random numbers were dierent. Let's simulate them! 20 / 29 Example II: Testing Tests From in-class exercise, Lecture 2 (see R code). Suppose we wanted to test H0 : p = 0.07 from 30 samples? 30 negative binomial 0.07 samples X1 , . . . , X30 . Known mean is 0.93/0.07 = 13.286. Conduct a t-test for mean of the Xi : t=q |X¯ − 13.286|  1 Pn ¯ 2 i=1 Xi − X n−1 and reject if t > 2.045 (obtained from qt(0.975,29)). Should have probability of rejecting ≈ 0.05 when p = 0.07. But does this test have reasonable α-level? Test with a simulation. 21 / 29 Testing α-levels nsim = 25000 # We need accuracy to within 0.001 to n = 30 # check alpha = 0.05. res = rep(0,nsim) t = rep(0,nsim) mu = (1-p)/p for(i in 1:nsim){ X = rnbinom(n,1,p) t[i] = sqrt(n)*abs( mean(X) - mu )/sd(X) res[i] = t[i] > qt(0.975,29) } mean(res) 22 / 29 Assessing Power This is also a way to look at your chances of rejecting a test when H0 is wrong (especially in complicated models). In this case, we'll just change p=0.1 when generating X : > mean(res) [1] 0.8 Running over possible values: 23 / 29 Seeds and Repeatability We will see later that R produces pseudo-random numbers. There is a deterministic sequence of numbers: if you start in the same place, you get the same answer. Typically, R chooses a seed as a starting point; often from system clock. But, you can specify this with an integer giving where in the generator's cycle you want to start: > runif(5) [1] 0.6223777 0.6754986 0.8022900 0.2603083 0.7597607 > set.seed(36) > runif(5) [1] 0.6223777 0.6754986 0.8022900 0.2603083 0.7597607 > runif(5) [1] 0.01990291 0.95542781 0.43666244 0.08922046 0.360519 24 / 29 R and Seeds Besides set.seed, R also stores .Random.seed. This is a vector of 626 integers that also specify parts of the random number generator. Usually remains constant (across R sessions and computers), but saving it can ensure compatibility over platforms. > RNG.seed = .Random.seed > runif(5) [1] 0.80228995 0.26030829 0.75976074 0.01990291 0.95542781 > .Random.seed = RNG.seed > runif(5) [1] 0.80228995 0.26030829 0.75976074 0.01990291 0.95542781 Also doesn't require you to make up an integer. Works for any simulation (as long as you do exactly the same commands). 25 / 29 Using Seeds Seeds are useful as a way of storing how to reproduce your simulation results: Instead of storing everything in a simulation, this lets you re-run it exactly. Often simulation time mitigates against this, but it can be convenient. In Statistical Computing, we can use this in setting up a checkle: Set the seed at the start of the simulation Now you should get exactly the answer that we do. But caveate: you get the same sequence of random numbers, but may use them dierently. 26 / 29 Checkle Cautions Example, let's take two means of 5 normal observations: set.seed(1221) X = matrix(nrorm(10),5,2,byrow=TRUE) apply(X,2, mean) gives dierent answers to set.seed(1221) X = matrix(nrorm(10),5,2,byrow=FALSE) apply(X,2, mean) and we won't tell you to do this one way versus another. You don't need to pass the checkle! We will look for these dierences in your code. 27 / 29 Summary Built-in functions to calculate densities, distribution functions, quantile functions and generate random variables. Simulation methods to check on statistical properties of procedures. Simulation sizes should be large enough to accurately estimate quantities of interest. Can check on this with standard statistical tools  you can always add more simulations! 28 / 29 ...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture