Computational Statistics
and
Data Analysis
STAT341
Instructor: Ali Ghodsi
Two Paradigms
Classical Statistics
Infer information from small data sets (Not
enough data)
Machine Learning
Infer informati
Computational Statistics and Data Analysis
STAT 341
Practice Questions Set 2
1. Recall that a linear congruence generator (LCG) is of the form
xk+1 (axk + b) mod m
where a, b, m and x0 is the seed of
Computational Statistics and Data Analysis
STAT 341
Practice Questions
1. Let X1 , . . . , Xn be iid random variables with cdf F . Generate random variables X(n)
and X(1) that are distributed accordin
Stat 431
ASSIGNMENT 3 SOLUTIONS
1. (a) It seems reasonable to assume that the number of claims Yi from group i has a Poisson
distribution with mean i = ni i , where ni is the number of policies and i
Stat 431
ASSIGNMENT 2 SOLUTIONS
1. (a) Using the raw data we estimate the odds ratio of having a conviction for someone with
a long education versus someone with a short education to be 0.4167.
6
24
1
Stat 431
ASSIGNMENT 1
1. The following data are the remission times (in weeks) for a group of 25 leukemia patients
undergoing chemotherapy:
1, 1, 2, 4, 4, 6, 6, 6, 7, 8, 9, 9, 10, 12, 13, 14, 18, 19,
STAT 341 - Assignment 3
Due Monday Oct 30 at 11am - to be submitted through crowdmark
1. [4 marks] Determine whether the Horvitz-Thompson estimator is location-scale equivariant and/or
location-scale
Introduction
Preamble
The subject matter of computational statistics is that of Statistics itself, but developed via computation
rather than only through mathematics.
The goal of the course is to pr
STAT 341- Assignment 1
Due Monday September 25 at 11am - to be submitted through crowdmark
1. [10 Marks] Suppose TN (y1 , . . . , yN ) is the median. For simplicity, lets also suppose that N is odd,
t
Computational Statistics and Data Analysis
STAT 341
Practice Questions Set 2 - Solutions
1. The LCG under consideration is xk+1 (15xk + 4) mod 7
(a) Check the three conditions of the theorem:
b = 4,
1 Basic Monte Carlo Integration
We consider three integration methods in this course:
Basic Monte Carlo Integration
Importance Sampling
Markov Chain Monte Carlo (MCMC)
The rst, and most basic, meth
1 Random Vector Generation
We want to sample from X = ( X1 , X2 , . . . , Xd ) , a d-dimensional vector from a known
pdf f ( x ) and cdf F ( x ) . Consider the following two cases:
Case 1: if the x1 ,
LECTURE 3: Review of Linear Algebra and MATLAB
Vector and matrix notation
g Vectors
g Matrices
g Vector spaces
g Linear transformations
g Eigenvalues and eigenvectors
primer
g MATLAB
g
Introduction t
Sampling from Gamma Distribution
If X Gamma( , ), then its pdf is of the form:
Shape Scale
f (x) =
x 1 e x
,
()
x 0
A Gamma distribution with an integer shape parameter say = m is
also called an Erlan
Example Continuous Case
Sample from Beta(2,1)
In general:
Beta(, ) =
(+)
()()
x 1 (1 x)1 ,0 < x < 1
Ali Ghods
Computational Statistics and Data Analysis STAT 341
Note: (n) = (n 1)! if n is a positive
Computational Statistics and Data Analysis
STAT 341
Ali Ghods
University of Waterloo
Ali Ghods
Computational Statistics and Data Analysis STAT 341
Multiplicative Congruential Method
One way to generat
Computational Statistics and Data Analysis
STAT 341
Ali Ghods
University of Waterloo
September 17, 2015
Ali Ghods
Computational Statistics and Data Analysis STAT 341
Multiplicative Congruential Method
Bayesian VS Frequentists
Ali Ghodsi
Computational Statistics and Data Analysis STAT 341
Frequentists
Probability is objective and refers to the limit of an events
relative frequency in a large number
Explicitly defined Population Attributes
Explicitly defined Population Attributes
Statistics is: the fun of finding patterns in data; the pleasure of making discoveries; the import
of deep philosophic
Accuracy of prediction
Oftentimes interest lies in predicting the value of a variate (the response variate) given the value of
some explanatory variates.
We build a response model that encodes how t
Resampling
Resampling
In As previous sections have shown, understanding the sampling behaviour of sample attributes is
essential to making inferences about any population attribute. e.g.
For discrepan
The harsh reality
Predictive accuracy provides insight into the performance of a predictor and can be used to choose
between competing ones.
The key to this usefulness however is that the predictive
Computational Statistics and Data Analysis
STAT 341
Assignment 4
Due: Tuesday November 10 before 12pm in drop box 15 located in the 4th floor of MC
Policy on Lateness: No late assignment will be accep
Computational Statistics and Data Analysis
STAT 341
Assignment 1
Due: Thursday October 1 before 4pm in drop box 15 located in the 4th floor of MC
Policy on Lateness: No late assignment will be accepte
Computational Statistics and Data Analysis
STAT 341
Assignment 3
Due: Tuesday October 27 before 12pm in drop box 15 located in the 4th floor of MC
Policy on Lateness: No late assignment will be accept
Computational Statistics and Data Analysis
STAT 341
Assignment 5
Due: Monday November 23 before 12pm in drop box 15 located in the 4th floor of MC
Policy on Lateness: No late assignment will be accept
Computational Statistics and Data Analysis
STAT 341
Assignment 2
Due: Tuesday October 13 before 4pm in drop box 15 located in the 4th floor of MC
Policy on Lateness: No late assignment will be accepte
Please fill in this answer sheet and attach it after your cover page and your procedure sheets.
(1) Please assume Y is numerical and please use 5-fold cross-validation to find the optimal K for a
K-ne