Homework 2
Due May 10, 2010
(1) In rejection algorithm, let () be the target distribution, and suppose
() = () is easily computable. You have found a proposal distribution
(), and acovering constant, , such that () () for all .
(a) Show that the expected
MCMC Example
Inference of population structure
Ref:
Pritchard IK, Stephens M &: Donnelly (2000) Genetics
Falush D, Stephens M & Pritchard ]K (2003) Genetics Goal 8: Intuition
- Identify clusters of individuals, who are genetically
more similar to each
Regularized Multivariate Regression for Identifying
Master Predictors with Application to Integrative
Genomics Study of Breast Cancer
Jie Peng 1 , Ji Zhu 2 , Anna Bergamaschi 3 , Wonshik Han4 ,
Dong-Young Noh4 , Jonathan R. Pollack 5 , Pei Wang6
1
2
Depar
Copyright 6 1997 by the Genetics Society of America
Inferring Coalescence Times From DNA Sequence Data
Simon Tavare,* David J. Balding,+J R. C. Griffiths' and Peter DonneUyst2
*Departments of Mathematics and Biological Sciences, University o Southern Cali
Scan Statistics for Genome-wide Proling
1. What is genome-wide proling?
2. Likelihood based models
3. Scan statistics
4. Multiple testing control
High density DNA copy number data
Array-based Comparative Genomic Hybridization
Figures from Garnis et al. (2
! Monte Carlo Methods for Coalescent - The basic coalescent approach
I Inference under coalescent
- TMRCA _
- Population parameters Coalescent Coalescent process
I Assumptions for the basic model
- Neutral evolution (no selection)
- Constant population
Plug-in Principle
Goal: Learn something about the underlying probability
distribution F (x).
Plug-in Principle
Goal: Learn something about the underlying probability
distribution F (x).
Observe data X1 , . . . , Xn drawn independently from F (x).
Plug-in
HMM Example I: Haplotype Phasing
Reference:
Scheet 8: Stephens, 2006. A Fast and Flexible Statistical Model for Large-
Scale Population Genotype Data: Applications to Inferring Missing Genotypes
and Haplotypic Phase. Am. J. Hum Genet. 78:629 Genotype vs
Review of Hypothesis Testing
Classic two sample problem:
X1 , X2 , . . . , Xm F
Y1 , Y2 , . . . , Yn G
H0 :
F = G.
e.g.: F ,G are Gaussian with different means.
The hypothesis testing framework
We compute a test statistic that takes on a larger (or more
e
Copyright 2000 by the Genetics Society of America
Inference of Population Structure Using Multilocus Genotype Data
Jonathan K. Pritchard, Matthew Stephens and Peter Donnelly
Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom
Ma
Statistics and Probability Primer for Computational Biologists
by
Peter Woolf
Christopher Burge
Amy Keating
Michael Yaffe
Massachusetts Institute of Technology
BE 490/ Bio7.91
Spring 2004
Index
Index
Introduction
Chapter 1:Common Statistical Terms
1.1 Mea
Motivations
DAG
GGM
Integrative Analysis
Graphical models for constructing genetic
regulatory networks
Jie Peng
Department of Statistics, University of California, Davis
May 10, 2010
Motivations
DAG
GGM
Integrative Analysis
Genetic regulatory networks
Ge
Aisk
LCUDE (W Vacdole X IV
_V
8%, ~PLp a, COM X I H or T
?(><=H3 = ?
i>(><= 1!
+03; a one x9{.,.,éf.
f(xs3>=§ , are"
(am when
we aSSume 1m {$6, Lg some
I Shh \NJWIKM, f; I
9 is
Sowa '{rvr 4-K!» MOM, our I; 40
One, spedfic W1 6 {o
a: hwmmé = PM) 5: L(e
Final Project Guidelines
Contents: 3-4 pages of single space text, plus an appendix
with gures, data, documented code, and brief
bibliography.
Essential ingredients of your report:
1. A brief introduction that gives the background of your
problem
2. A det
A Bayesian Approach to Transforming Public Gene
Expression Repositories into Disease Diagnosis
Databases
Haiyan Huang , Chun-Chi Liu , and Xianghong Jasmine Zhou
Department of Statistics, University of California, Berkeley, CA 94720, USA, Program in Mole
Estimating isoform-specific gene
expression from RNA-Seq data
Hui Jiang
Stanford University
Outline
Biological Motivation
Alternative Splicing
Experimental and Technological Background
RNA-Seq
Ultra High Throughput Sequencing
Data Preprocessing - Se
EL Expectation-Maximization Last time
- Likelihood inference
§ML = argmax 1.16; X)
9 T ('03) likellhood
X: genolgpe wunl's : HIM, mm, 77333
Ex: Alleles M, a}, 9=P(A). . u
A 31mm. wtele comma
pA = al = 2 ("AA +WA3 +n33>
I Bayesian inference
TreoJ: Wold Par
STATS 166/366 Homework 1 Solution
May 19, 2010
Problem 1
(a)
Let the CG content of this region be , then the likelihood
L() =
100
75
75 (1 )10075
So log-likelihood
l() = 75 log + 25 log(1 ) + log
100
75
Take derivative of l() and set to 0, we get = 0.75,
Homework 1
Due April 26, 2010
(1) Consider a 100 nucleotide segment of a CG-rich region of the genome. There
are 75 nucleotides are either C or G in this segment.
(a) What is the MLE of the CG content of this region.
(b) If the true CG content of this reg
P
Monte Carlo Methods for Coalescent
nishing up from last time Rejection algorithm for TMRCA
[Tavare et. a1, 1997]
- Target distribution: T | Sn = k
AsSumc ropuIAi'mrL pummel-ens, is known
f walefcerut
Let the pmposwt oliJHibwm 9%) «v Yanolom 96/!ng Wa
What is DNA copy number?
Normally, each somatic cell contains 2 copies of every
chromosome.
What is DNA copy number?
One of the earliest observed copy number changes is trisomy
of chromosome 21 in Downs Syndrome.
What is DNA copy number?
In fact, it becam
Vol. 00 no. 00 2005
Pages 18
BIOINFORMATICS
Joint Estimation of DNA Copy Number from Multiple
Platforms
Nancy R. Zhang 1, , Yasin Senbabaoglu 2 and Jun Z. Li 3
1
Department of Statistics, Stanford University.
Program in Bioinformatics, University of Michi
(1)
(2)
Homework 3
Due: May 29, Lola
Consider an articial data set consisting of the 8 numbers
1.2, 3.5, 4.7, 7.3, 8.6, 12.4, 13.8, 18.1.
Let é be the 25% trimmed mean, computed by deleting the smallest two
numbers and largest two numbers, and then taking
A Fast and Flexible Statistical Model for Large-Scale Population Genotype
Data: Applications to Inferring Missing Genotypes and Haplotypic Phase
Paul Scheet and Matthew Stephens
Department of Statistics, University of Washington, Seattle
We present a stat