Stats 406 Homework 8
1.
Jacob Balicki
UMID: 75134208
sumsal = dbGetQuery(conn, "SELECT yearID, SUM(salary) as sum_salary
FROM Salaries where yearID > 1984 GROUP BY yearID;")
numteams = dbGetQuery(conn, "SELECT yearID, count(teamID) as
sum_teams FROM Salar
Monte Carlo Methods: Monte Carlo Integration
Monte Carlo integration
Let be a density on Rd . For example if d = 1, is a density
on R.
Let h : Rd R a function. Suppose we want to evaluate the
integral (h) = h(x)(x)dx = E(h(X ).
Very often, such integral c
Working with relational databases
Why use a database?
Database systems (DBS) are a very popular way of storing
data. They are particularly ecient in dealing with large data
sets.
Think of a DBS as a dataset managed by a software (known
as the database ser
Chap I: Introduction to R: First steps
A general introduction
R is a software for statistical computing and data analysis. It
is also a programming language.
R is freely distributed software (www.r-project.org) with
contributions from developers from arou
Monte Carlo Methods PART I: Random Variables
simulation
Random numbers generation
Most commonly used random variables can be generated in R
using appropriate functions. There is a naming convention in
R best explained using the Gaussian distribution:
To c
Chap I: Introduction to R: More on the syntax
Control Structures
Control structures are useful to implement repetitive tasks.
R has control structures similar to C .
For loops
Loops are used to carry out a sequence of related operations
without having to
Chap I: Introduction to R: Input/Output and
graphics
Data Input/Output
It is important to be capable of moving data in and out of R.
We make a distinction between data les and R variables and
objects.
Data Input/Output
To save R internal objects, use the
Page 1
STAT 406: HW3
All computer code should be written using the language R. Type ALL your
code into one PLAIN Text format le. Plain text format is available by
default in R. Please do not use Microsoft Word .doc format or .rtf format of
.pdf format. I
# Xintong Zhu STAT 406 HW 2
setwd("~/Desktop/STAT406")
# Prob 1
#(a)
#suppose id = 100 has status = infected at the beginning
epidemic_sim=function(pinf,pd,pimm) cfw_
#initialize value and status
status=c(rep("susceptible",99), "infected")
week=0
whi
Page 1
STAT 406: HW2
All computer code should be written using the language R. Type ALL your
code into one PLAIN Text format le. Plain text format is available by
default in R. Please do not use Microsoft Word .doc format or .rtf format of
.pdf format. I
Chapter I. Introduction to R: the basic objects
In order to work with a language we need to know its syntax
and its objects.
In terms of the syntax, we have seen how to use variables and
functions. We will learn more as we progress.
This chapter focuses o
Statistics 406 Midterm Exam
Fall 2007
No calculators, formula cards, computers, or notes may be used. It is best to try every question. Partial credit will be given.
1. Describe in 2-3 sentences what the following program is doing. Focus on the statistica
The bootstrap: an application of Monte Carlo
Methods in Statistical inference
Statistical inference
Suppose we have a sample x1 , . . . , xn .
We assume that x1 , . . . , xn is a (iid) realizations of random
variables X1 , . . . , Xn .
We then assume that
Numerical Optimization: some basic algorithms
Basic concepts
We recall few results from calculus. Let f : R R a function.
A point x0 is called a local minimum if there exists an open
interval I containing x0 such that f (x0 ) f (x) for all x I .
If the in
Homework 6
#Problem 1
#a
integral = function(N,b) cfw_
X = rbeta(N, 1, b)
fun = exp(-X^2) / dbeta(X, 1, b)
Ihat = mean(fun)
se = sqrt(var(fun)/N)
return(c(Ihat, se)
N = 10e4
integral(N, 1)
#[1] 0.7471906917 0.0006359974
#Beta = 1 gives us the closest est
Jacob Balicki
UMID: 75134208
Homework 7
1. The graph suggests theta_hat2 is a better estimator than theta_hat1.
k = 1e4 #num replicates
n = 50 #num samples
#Sequence of values of theta
THETA = seq(from = 0.5, to = 10, by = 0.1)
#Storage for the MSEs of ea
Page 1
STAT 406: HW7
All computer code should be written using the language R. Type ALL your
code into one PLAIN Text format le. Plain text format is available by
default in R. Please do not use Microsoft Word .doc format or .rtf format of
.pdf format. I
Page 1
STAT 406: HW8
All computer code should be written using the language R. Type ALL your
code into one PLAIN Text format le. Plain text format is available by
default in R. Please do not use Microsoft Word .doc format or .rtf format of
.pdf format. I
The EM algorithm
Introduction
The EM (Expectation Maximization) algorithm is an
algorithm developed by statisticians to deal with statistical
models with latent variables or missing data.
missing data: as the name suggests, part of the data is
missing.
la
Chapter 2: Monte Carlo Methods
Law of Large Numbers
Theorem
Let X 1, X 2, . . . be a sequence of independent and identically
distributed random variables with mean := E(X ). As n ,
n1 n Xi becomes less and less random, and converges to .
i=1
What does it
Numerical Optimization and model tting
Introduction
Optimization is a fundamental analytical tool in Statistics.
The other equally important analytical tool is integration and
we have discussed this earlier in this course.
An optimization problem is a mat
Monte Carlo Methods PART I: Random Variables
simulation
Random numbers generation
Most commonly used random variables can be generated in R
using appropriate functions. There is a naming convention in
R best explained using the Gaussian distribution:
To c
Practice exam problems 1. What will be the approximate value of M following execution of each of the following programs? (a) X = array(rnorm(10000), c(1000,10) A = apply(X, 1, mean) M = mean(A) Solution: The answer is zero. Reasoning: Each element of A is
Statistics 406 Problem Set 5 Due in lab, Tuesday October 23 1. Suppose we observe iid values X1 , . . . , Xn that are uniformly distributed on the interval (0, a), where a > 0 is an unknown constant. We can estimate a using the maximum value of the sample
STATS 406 Fall 2016: Lab 08
1
Law of Large Numbers
iid
Write a function which calculates the empirical distribution function of X1 , , Xm
N (0, 1). The empirical distribution function is defined as
m
1 X
m (x) =
I(Xi x)
m i=1
where I(X x) is an indicator