Statistical Design and the Analysis of Experiments:
STAT 516

Spring 2011
Stat 516
Homework 5
Solutions
1. Consider conducting m hypothesis tests. Let V denote the number of type I errors. Let R denote the number
of rejected null hypotheses. Let Q = V /R if R > 0, and let Q = 0 if R = 0. Let (0, 1) be xed. By
denition, a method
What is RNAseq?
Introduction to the Analysis
of RNAseq Data
RNAseq refers to the method of using NextGeneration Sequencing (NGS) technology to
measure RNA levels.
NGS technology is an ultrahighthroughput
technology to measure DNA sequences.
4/6/2011
C
Mixture Modeling of the
pvalue Distribution
UniformBeta Mixture Modeling
of the pvalue Distribution
First proposed by Allison, D. B. , Gadbury, G. L., Heo,
M., Fernndez, J. R., Lee, C.K., Prolla, T. A.,
Weindruch, R. (2002). A mixture model approach
Central Dogma
Basic Biology Related to
Technology for Measuring
Gene Expression
DNA contains genes that code for proteins.
DNA
(transcription)
1/11/2011
RNA
Copyright 2011 Dan Nettleton
(translation)
protein
Proteins perform essential biological functions
Name: _
Statistics 516
Exam 1
March 3, 2010
1. Suppose a test for differential expression is conducted for each of 100 genes. The following
table provides information about the observed pvalues.
Range
Number of pvalues
[0.00.1]
42
(0.10.2]
10
(0.20.3
Stat 516
1.
Homework 3
Solutions
(a) (5 points) Expression data were simulated for an experiment involving g = 10, 000 genes and
two treatment groups with ve independent experimental units in each treatment group. Gene2
2
specic variances 1 , . . . , g we
Stat 516
1.
Homework 1
Solutions
(a) GAUUACACGUGCCUUGGA
(b) asp tyr thr cys gly
(c) The amino acid cys would change to the stop codon. Thus, we would end up with the sequence asp
tyr thr.
2. See slide 9 of slide set number 3.
3. The notation calls for one
Stat 516
Homework 2
Solutions
1. See handwritten notes at the end of these solutions.
2. Cut and paste the matrix of numbers to a text le. I saved that le as affypixel.txt. Open R and set the
working directory to the directory containing the le. For examp
STAT 516
HW6
Solutions
1. Answers vary.
2. a)
b)
c)
d)
1
3. Consider the following "data" to be clustered using a variety of methods described below.
10
20
40
80
85
121
160
168
195
For each part of the problem, assume that Euclidean distance will be used
Wildtype vs. Myostatin Knockout Mice
Mixture Modeling of the Distribution of
pvalues from ttests
2/17/2011
Belgian Blue
cattle have a
mutation in the
myostatin gene.
Copyright 2011 Dan Nettleton
1
Affymetrix GeneChips on 5 Mice per Genotype
2
A Typical