PastPaper_05 - '6; 15 . . ‘ marks b) Explain the terms...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: '6; 15 . . ‘ marks b) Explain the terms “window” and “stringency” when used with a dot plot program and q p 15 , marks -10 p ’ marks b) Introns in genes Complicate gene finding. How may gene finding programs take account UNIVERSITY OF HON-G KONG Department of Biochemistry Bachelor of Science in Bioinformatics: Final Examination (2004-2005) BIOC1805> Elements of Bioinformatics ' Date: 17th May, 2005 (Tuesday) Time: 9:30 am - 11:30 am Candidates may use any self-centained, silent, battery-operated and pocket-sized calculator. It is the candidate's responsibility to ensure that his/her calculator operates satisfactorily and that it is used only for the purposes of calculation. _ . . - ‘ Recorded material of any kind is not permitted to be stored in the calculator. Candidates must record the name and type of their calculators on the front page of their examination scripts. Answer all questions. Each question canies the marks indicated. Q l. ' a) The dot plot and global sequence alignment are methods for comparing two sequences. '7 . What are the differences in the types of information each method gives you? the BLOSUM62 substitution matrix. c) A colleague tells you that he is going to do a dot plot of two amino acid sequences. He has decided to use a word or k-tuple based approach, with a k-tuplesize of 5. What advice would you give him and Why? Q 2. - a) When making the PAM and BLOSUM series of substitution matrices, each column in a 1 multiple sequence alignment is examined. How does the treatment of the amino acids in a column differ between the two approaches? b) For what purpose(s) is a model based on random chance used in deriving the PAM series of substitution matrices? , c) Which BLQSUM and PAM matrices might you consider using if you were comparing V sequences from humans to sequences from each of mammals, reptiles and bacteria? Why? ' ' Q 3. q a) Explain the role of open reading frames (ORFs) in gene finding. Include a comment on the lengths of open reading frames in a section of DNA that does not contain a gene. of these complications? Page 1 of 3 I ' marks ,15 Q 4. a) In the first stage of progressive multiple sequence alignment, either a k-tupleor ' - dynamic programming based algorithm can be used. What are the advantages and 15 . . u .r . . . . disadvantages of usmg these algorithms in this Sltuatlon? ’. b) Describe and compare the main features Of FastA' and either EMBL or Genbank ' ' i‘ f sequence file, formats. ' v ' - - V , c) Comment on the types of sequence file formats that can be used for storing multiple sequence alignments. Q 5. I , a)-' Explain why the number of rooted trees of taxa is the same as the number of unrooted I »_ trees of NH taxa. I ' marks I . b) For the two nucleotide sequences below, calculate the observed distance and the- ' ' corrected IdiStance (using Kiniura’s two-parameter model, below) between them. Give ‘ the details of all calculations. What‘are the probable causes for the differences in the I I distances yOu calculated? V I 'd=05m.n—l———+o2flné~L—z ,;>. 1e2P—Q _ 1—2Q 'j 5s Cv'Y; F- N 's s F T. s I w I pp Y_ C 'AGCTGTTACTTTAATTCATCGTTTACCTCCATCTGGATACCTTATTGT GAGTGCTTCTTCGACAAAAACCACACGTCCATTTGGATCCCCTACTGC _,E,C F'F. D'K.V-N H T S I W I P Y C c) What is the advantage of I the Kimura method ever that of J ukes and Cantor? d) Coinrhent On the changes in each codon position and the effect on the encoded amino , . , acid. 7 i Q 6. You are writing a versionof the Chou Fasman method for predicting seCondary structure. 20 v ' You want to write a subroutine in PERL that will calculate the average propensity for a marks secondary structure state. The average is to be calculated for every five consecutive amino acids in a sequence and the reSults should be available to the main program after the subroutine finishes. YOur subroutine should deal with one secondary structure state at a time and be called once for each state you wish to process. It should not use global variables. in your main program, you have the amino acid sequence stored in an array (one amino acid, in one letter code, per element) and thepropensities of the amino acids fora particular secondary structure state are stored in a hash. The keys of the hash are theone letter amino acid codes and .1 the'values' arethe propensities of the amino acids for that state. . Write the PERL code for your subroutine and show how the subroutine is called from your , main program. Indicate the logic of your subroutine and provide suitable comments. Where you cannot remember the precise syntax, indicate your intention clearly in the comments. Page 2 of 3 V at Q 7 a). Whyare local alignment methodsthe basis of sequence daiabase search programs like 10. ;' ... ‘ . BLAST and FastA? ' » marks b) Te find distantly-Vrelate—d protein coding geries, should you Search nucleic-acid Ior amino V ' ' , '_ ; » acid databas¢s7whyfz ' ' ' " ' I i . 7 e) I might some types or of sequences be unsuitable fer database searches? -‘-—--- End of‘P'aper ---- -4 I Page 3 of 3 ...
View Full Document

Page1 / 3

PastPaper_05 - '6; 15 . . ‘ marks b) Explain the terms...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online