WELCOME to Statistics 416
STAT 416
Emphasis on application and practical use of statistical methods for designing and analyzing microarray experiments. Students completing STAT 416 should be able to design and analyze their own microarray experiments and
Statistics 416
HW2
Solutions
1. (20 points)
Create and display a vector of the consecutive integers 1 though 12.
Create a vector of the numbers 9, 5, and 3.
Create the vector 1,2,3,4,5,6,9,5,3 and assign it to x.
Display x.
Multiply each entry in x by 2 a
Stat 416
1.
Homework 3
Solutions
(a) (2 points) Blocking is not used in this experiment. Blocking was dened in our notes as grouping similar experimental units together and assigning different treatments within such groups of
experimental units. Thus the
Stat 416
Homework 4
Solutions
Exam Corrections (10 points)
1. (6 points) For the complete experiment, we have the following table of factors with levels.
Factor
Levels
Diet
chimp
McDonalds
Tissue
brain
liver
Litter
levels were not specied in the paper
Bat
STAT416
HW5
Solutions
1. a) (7 points) An interaction between the factors diet and batch would mean
that the differences among the diets would be different for the two batches.
>
>
>
>
>
>
>
>
>
+
+
+
+
+
>
setwd("C:\z\Courses\S416\GEOdata")
d=read.table(
Exam 1 from a Past Semester
1. Provide a brief answer to each of the following questions.
a) What do perfect match and mismatch mean in the context of Affymetrix GeneChip
technology? Be as specific as possible in your answer.
b) True or False: Robots are
Solutions to Exam 1 from a Past Semester
1. Provide a brief answer to each of the following questions.
a) What do perfect match and mismatch mean in the context of Affymetrix GeneChip
technology? Be as specific as possible in your answer.
A perfect match
Statistical Design and Analysis of Microarray Experiments.
STAT 416

Spring 2014
Statistical Design and Analysis
of Gene Expression Experiments
First Lecture!
An Overview
1
Central Dogma: DNARNAProtein
Illustration provided by the
National Human Genome
Research Institute
DNA
(transcription)
RNA
(translation)
Protein
2
Gene Expression
Statistical Design and Analysis of Microarray Experiments.
STAT 416

Spring 2014
Pooling samples
Pooling of tissues or RNA samples is sometimes
necessary to obtain sufficient RNA for
hybridization for microarray experiments.
Even when pooling is not necessary to obtain
enough sample in RNAseq experiments, it can
be beneficial becau
Statistical Design and Analysis of Microarray Experiments.
STAT 416

Spring 2014
Bioinformatics processing of NGS data
Peng Liu
1
RNAseq data analysis procedure
Image (PH, Current) analysis to get the
sequences of each cluster (read)
Bioinformatics analysis pipeline
Background estimation/correction
Normalization
Detection of dif
Statistical Design and Analysis of Microarray Experiments.
STAT 416

Spring 2014
Replication in RNAseq experiment
Two types of replications:
Biological replication
Technical replication
The following definitions are from Nettleton,
Chapter 5 of the NGS book, with slight
modifications.
1
Biological replication
Biological replicatio
Statistical Design and Analysis of Microarray Experiments.
STAT 416

Spring 2014
Introduction to Experimental Design
1
Terminology
Experiment An investigation in which the
investigator applies (assigns) some treatments
to experimental units and then observes the
effect of the treatments on the experimental
units by measuring one or m
Statistical Design and Analysis of Microarray Experiments.
STAT 416

Spring 2014
Designs that have been discussed
Full factorial treatment design
Experimental designs:
Completely Randomized Design
Randomized Complete Block Design
Latin Square Design
Incomplete Block Design (including BIBD)
Splitplot Design
1
Steps involved in d
Stat 416
Homework 1
Solutions
1. (15 points) Many of you had difculties with this problem. The notation calls for one circle for each
experimental unit. The most common mistake was to use one circle for each sample rather than one for
each experimental un
Solutions
Statistics 416
Exam 1
March 5, 2009
1. It is possible to use more than two dyes and, therefore, more than two samples on a single
slide. Suppose that a fourdye system has been developed so that four distinct samples can be
measured together on
Name: _
Statistics 416
Exam 1
March 5, 2009
1. It is possible to use more than two dyes and, therefore, more than two samples on a single
slide. Suppose that a fourdye system has been developed so that four distinct samples can be
measured together on a
Central Dogma: DNA RNA Protein Statistical Design and Analysis of Microarray Experiments
First Lecture! 1/15/2008
Illustration provided by the National Human Genome Research Institute
1
2
Microarray Technology
Monitoring gene expression helps understand t
Microarray Technology Statistical Design and Analysis of Microarray Experiments
1/17/2008 Peng Liu
How to measure expressions of thousands of genes simultaneously? Two types of fabrication for microarray:
spotted microarray (cDNA microarray, 5005k bp lon
A microarray scanner creates a digital image of a microarray. A digital image is a rectangular array of intensity values.
The Basics of Microarray Image Processing
1/22/2008
Each intensity value corresponds to a pixel. The color depth of an image is the n
Prenormalization analysis Prenormalization Methods for TwoColor Microarray Data
1/22/2008
Image processing Background correction Filtration Transformation
1
2
From this image
We get a data table that usually includes
Header: information about scanning
Normalization Normalization Methods for TwoColor Microarray Data
1/24/2008 Peng Liu Normalization does not necessarily have anything to do with the normal distribution that plays a prominent role in statistics.
1 2
Possible levels of normalization
Within
Withinslide normalization LOWESS Normalization for TwoColor Microarray Data
1/29/2008 Peng Liu
This is done separately for each slide. The purpose is to make red intensities and green intensities comparable. It is known that for selfself experiments (f
In last lecture
Normalization Methods for TwoColor Microarray Data (continued)
1/31/2008 Peng Liu
Withinslide normalization
Intensitydependent dye effect LOWESS normalization
LOWESS can be applied to each subgrid (sector / printtip group) when there
A Probe Set for a Particular Gene in GeneChip
gene sequence .TGCAATGGGTCAGAAGGACTCCTATGTGCCT. perfect match sequence AATGGGTCAGAAGGACTCCTATGTG mismatch sequence AATGGGTCAGAACGACTCCTATGTG probe pair probe cell
Normalization and Construction of Expression M
Terminology
Experiment An investigation in which the investigator applies (assigns) some treatments to experimental units and then observes the effect of the treatments on the experimental units by measuring one or more response variables. The assignment
Suppose we have 24 experimental units and would like to compare the effects of 4 treatments on gene expression. Use a completely randomized design to assign 6 experimental units to each treatment.
2 3 3 2 4 1
Incomplete Block Experimental Designs
2
3
2
3
Statistical Models
Introduction to Mixed Linear Models
A statistical model describes a formal mathematical data generation mechanism from which an observed set of data is assumed to have arisen.
2/14/2008
1
2
Example 1: TwoTreatment CRD
Assign 8 Plants t
Matrix Introduction to Matrix Algebra Useful for Statistics
Peng Liu 2/19/2008 A matrix is a rectangular array of elements arranged in rows and columns. An example:
Column 1 2 3 4
1
Row 1 1 2 3 4 A= Row 2 5 6 7 8 a12 a13 a14 a = 11 a21 a22 a23 a24 = [aij
Statistical Design and Analysis of Microarray Experiments.
STAT 416

Spring 2014
Variability in RNAseq data
Peng Liu
1
Sources of variability in RNAseq count data
Example 3 in the introduction
lecture
Consider a simplified experiment:
3 independent biological replicates
for each of the two conditions.
Each sample was used to pre