This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: BIOINFORMATICS ORIGINAL PAPER Vol. 25 no. 1 2009, pages 2229 doi:10.1093/bioinformatics/btn580 Sequence analysis Predicting DNA recognition by Cys 2 His 2 zinc finger proteins Anton V. Persikov 1 , Robert Osada 2 and Mona Singh 1 , 2 , 1 Lewis-Sigler Institute for Integrative Genomics and 2 Department of Computer Science, Princeton University, Princeton, NJ 08544, USA Received on 14 July, 2008; revised on October 14, 2008; accepted on November 6, 2008 Advance Access publication November 13, 2008 Associate Editor: Alfonso Valencia ABSTRACT Motivation: Cys 2 His 2 zinc finger (ZF) proteins represent the largest class of eukaryotic transcription factors. Their modular structure and well-conserved proteinDNA interface allow the development of computational approaches for predicting their DNA-binding preferences even when no binding sites are known for a particular protein. The canonical model for ZF proteinDNA interaction consists of only four amino acid nucleotide contacts per zinc finger domain. Results: We present an approach for predicting ZF binding based on support vector machines (SVMs). While most previous computational approaches have been based solely on examples of known ZF proteinDNA interactions, ours additionally incorporates information about proteinDNA pairs known to bind weakly or not at all. Moreover, SVMs with a linear kernel can naturally incorporate constraints about the relative binding affinities of proteinDNA pairs; this type of information has not been used previously in predicting ZF proteinDNA binding. Here, we build a high-quality literature-derived experimental database of ZFDNA binding examples and utilize it to test both linear and polynomial kernels for predicting ZF proteinDNA binding on the basis of the canonical binding model. The polynomial SVM outperforms previously published prediction procedures as well as the linear SVM. This may indicate the presence of dependencies between contacts in the canonical binding model and suggests that modification of the underlying structural model may result in further improved performance in predicting ZF proteinDNA binding. Overall, this work demonstrates that methods incorporating information about non-binding and relative binding of proteinDNA pairs have great potential for effective prediction of proteinDNA interactions. Availability: An online tool for predicting ZF DNA binding is available at http://compbio.cs.princeton.edu/zf/. Contact: email@example.com Supplementary information: Supplementary data are available at Bioinformatics online. 1 INTRODUCTION The mapping of transcriptional networks is a key step in understanding the regulatory mechanisms of gene expression....
View Full Document
- Spring '09
- Machine Learning