IJITCS-V4-N8-3 - I.J Information Technology and Computer Science 2012 8 22-36 Published Online July 2012 in MECS(http\/www.mecs-press.org DOI

IJITCS-V4-N8-3 - I.J Information Technology and Computer...

This preview shows page 1 - 2 out of 15 pages.

I.J. Information Technology and Computer Science, 2012, 8, 22-36 Published Online July 2012 in MECS () DOI: 10.5815/ijitcs.2012.08.03 Copyright © 2012 MECS I.J. Information Technology and Computer Science, 2012, 8, 22-36 Genomic Analysis and Classification of Exon and Intron Sequences Using DNA Numerical Mapping Techniques Mohammed Abo-Zahhad, Sabah M. Ahmed, Shimaa A. Abd-Elrahman Electrical and Electronics Engineering Department, Faculty of Engineering, Assiut University, Assiut, Egypt ([email protected], [email protected], and [email protected]) Abstract Using digital signal processing in genomic field is a key of solving most problems in this area such as prediction of gene locations in a genomic sequence and identifying the defect regions in DNA sequence. It is found that, using DSP is possible only if the symbol sequences are mapped into numbers. In literature many techniques have been developed for numerical representation of DNA sequences. They can be classified into two types, Fixed Mapping (FM) and Physico Chemical Property Based Mapping (PCPBM ( . The open question is that, which one of these numerical representation techniques is to be used? The answer to this question needs understanding these numerical representations considering the fact that each mapping depends on a particular application. This paper explains this answer and introduces comparison between these techniques in terms of their precision in exon and intron classification. Simulations are carried out using short sequences of the human genome (GRch37/hg19). The final results indicate that the classification performance is a function of the numerical representation method. Index Terms Genomic Signal Processing; DNA and Proteins Sequences; Numerical Mapping; Codon, Exons and Introns; Short Time Fourier Transform I. Introduction Genomic Signal Processing (GSP) is defined as the analysis, and use of genomic signals to gain biological knowledge, and the translation of that knowledge into systems-based applications. Genomic information is digital in a very real sense. It ‟s It is represented in the form of sequences of which each element can be one out of a finite number of entities. Such sequences, like DNA and proteins, have been represented by character strings, in which each character is a letter of an alphabet. In case of DNA, the alphabet is of size 4 (for proteins it ‟s 20) and consists of the letters A, T, C and G (e.g. …. ATCGCTGA ...). If numerical values are assigned to these characters, the resulting numerical sequences are readily amenable to DSP applications such as gene prediction which refers to locate the protein-coding regions (exons) of genes in a long DNA sequence [1]. Therefore, it is necessary to map the symbols into numerical sequences. An ideal mapping should be such that the period-3 component of the DNA sequence should be independent of the nucleotides mapping, which is possible only through symmetric mapping [2]- [3]. Once the mapping is done, signal processing techniques can be used to identify period-3 regions in the DNA sequence. The average
Image of page 1
Image of page 2

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture