This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: '6; 15 . .
‘ marks b) Explain the terms “window” and “stringency” when used with a dot plot program and q p 15
, marks -10 p
’ marks b) Introns in genes Complicate gene ﬁnding. How may gene ﬁnding programs take account UNIVERSITY OF HON-G KONG Department of Biochemistry Bachelor of Science in Bioinformatics: Final Examination (2004-2005) BIOC1805> Elements of Bioinformatics ' Date: 17th May, 2005 (Tuesday) Time: 9:30 am - 11:30 am Candidates may use any self-centained, silent, battery-operated and pocket-sized calculator. It is the candidate's responsibility to ensure that his/her calculator operates satisfactorily and that it
is used only for the purposes of calculation. _ . . - ‘ Recorded material of any kind is not permitted to be stored in the calculator. Candidates must record the name and type of their calculators on the front page of their examination scripts. Answer all questions. Each question canies the marks indicated. Q l. ' a) The dot plot and global sequence alignment are methods for comparing two sequences. '7 .
What are the differences in the types of information each method gives you? the BLOSUM62 substitution matrix. c) A colleague tells you that he is going to do a dot plot of two amino acid sequences. He
has decided to use a word or k-tuple based approach, with a k-tuplesize of 5. What
advice would you give him and Why? Q 2. - a) When making the PAM and BLOSUM series of substitution matrices, each column in a
1 multiple sequence alignment is examined. How does the treatment of the amino acids in
a column differ between the two approaches? b) For what purpose(s) is a model based on random chance used in deriving the PAM series of substitution matrices? , c) Which BLQSUM and PAM matrices might you consider using if you were comparing
V sequences from humans to sequences from each of mammals, reptiles and bacteria?
Why? ' ' Q 3. q a) Explain the role of open reading frames (ORFs) in gene ﬁnding. Include a comment on
the lengths of open reading frames in a section of DNA that does not contain a gene. of these complications? Page 1 of 3 I ' marks ,15 Q 4. a) In the ﬁrst stage of progressive multiple sequence alignment, either a k-tupleor
' - dynamic programming based algorithm can be used. What are the advantages and 15 . . u .r . . . .
disadvantages of usmg these algorithms in this Sltuatlon? ’. b) Describe and compare the main features Of FastA' and either EMBL or Genbank ' ' i‘
f sequence ﬁle, formats. ' v ' - - V , c) Comment on the types of sequence ﬁle formats that can be used for storing multiple sequence alignments. Q 5. I , a)-' Explain why the number of rooted trees of taxa is the same as the number of unrooted
I »_ trees of NH taxa. I ' marks I . b) For the two nucleotide sequences below, calculate the observed distance and the- ' ' corrected IdiStance (using Kiniura’s two-parameter model, below) between them. Give
‘ the details of all calculations. What‘are the probable causes for the differences in the I I distances yOu calculated? V I 'd=05m.n—l———+o2ﬂné~L—z
,;>. 1e2P—Q _ 1—2Q 'j 5s Cv'Y; F- N 's s F T. s I w I pp Y_ C
_,E,C F'F. D'K.V-N H T S I W I P Y C c) What is the advantage of I the Kimura method ever that of J ukes and Cantor? d) Coinrhent On the changes in each codon position and the effect on the encoded amino , . , acid. 7 i Q 6. You are writing a versionof the Chou Fasman method for predicting seCondary structure. 20 v ' You want to write a subroutine in PERL that will calculate the average propensity for a
marks secondary structure state. The average is to be calculated for every ﬁve consecutive amino acids in a sequence and the reSults should be available to the main program after the subroutine
ﬁnishes. YOur subroutine should deal with one secondary structure state at a time and be called once for each state you wish to process. It should not use global variables. in your main program, you have the amino acid sequence stored in an array (one amino acid, in
one letter code, per element) and thepropensities of the amino acids fora particular secondary
structure state are stored in a hash. The keys of the hash are theone letter amino acid codes and .1 the'values' arethe propensities of the amino acids for that state. . Write the PERL code for your subroutine and show how the subroutine is called from your ,
main program. Indicate the logic of your subroutine and provide suitable comments. Where
you cannot remember the precise syntax, indicate your intention clearly in the comments. Page 2 of 3 V at Q 7 a). Whyare local alignment methodsthe basis of sequence daiabase search programs like 10. ;' ... ‘ . BLAST and FastA? ' »
marks b) Te ﬁnd distantly-Vrelate—d protein coding geries, should you Search nucleic-acid Ior amino V
' ' , '_ ; » acid databas¢s7whyfz ' ' ' " ' I i . 7 e) I might some types or of sequences be unsuitable fer database searches? -‘-—--- End of‘P'aper ---- -4 I Page 3 of 3 ...
View Full Document