{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

PastPaper_06 - THE UNIVERSITY OF HONG KONG Department of...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 2
Background image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: THE UNIVERSITY OF HONG KONG Department of Biochemistry Bachelor of Science in Bioinformatics: Final Examination (2005-2006) BIOC1805 Elements of Bioinforrnatics Date: 16th May, 2006 (Tuesday) Time: 2:30 pm - 4:30 pm Candidates may use any calculator which fulfils the following criteria: (a) it should be self—contained, silent, battery—operated and pocket-sized; and (b) it should have numeral-display facilities only and be used only for the purpose of calculation. It is the candidate's responsibility to ensure that the calculator operates satisfactorily and the candidate must record the name and type of the calculator on the front page of the examination scripts. Lists of permitted/prohibited calculators will not be made available to candidates for reference. The onus will be on the candidate to ensure that the calculator used will not be in Violation of the criteria listed above. Answer all questions. Each question carries the marks indicated. Q l. a) Two of the most commonly used formats for storing single nucleic acid or amino acid 12 sequences are the F ASTA and GenBank/GenPept formats. Discuss the differences between marks the two formats and the advantages or disadvantages of one format over the other. (6) b) What sequence formats might you use to store an alignment of several sequences? (2) 0) Do you consider one of the formats in your answer to (b) to be preferable to the others? Explain. (2) d) What difference(s) is/are there if a file contains several unaligned sequences or several aligned sequences? (2) Q 2. a) A dot plot program may use a k—tuple (or word) based approach or it may use a window and threshold (or stringency) approach. What is (are) the main difference(s) between the 12 two approaches and how would this (these) affect the speed of the program? (3) marks b) For a k-tuple based method, would you use similar k-tuple sizes for amino acid and nucleotide sequences? Explain. (3) c) What changes do you expect to see in a dot-plot if you increase the k-tuple size when you are comparing (i) two relatively similar sequences or (ii) two relatively dissimilar sequences? (3) d) Explain why threshold values are usually set to be higher than the window length when using log odds style substitution matrices (e. g. window of length 10, threshold 23, for the BLOSUM62 substitution matrix). (3) Page 1 of3 Q 3. a) Discuss the advantages and disadvantages of the various ways of presenting the 6 information in a multiple sequence alignment. (6) marks Q 4. a) Outline the differences between global and local alignment methods. (4) 8 b) Gaps in alignment programs are usually given a penalty (gm) of the form: gn = c + nx; where marks n is the length of the gap. Why is this form used and why are c and x usually very different in size? (4) Q 5. HYDROPHOBIC Tiny 12 Aliphatic marks Aromatic a) Explain the meaning of the positive, zero and negative numbers that are found in the PAM and BLOSUM series of substitution matrices. (3) b) With reference to the above diagram of the properties of the amino acids, what are the approximate values you would expect to see in a PAM or BLOSUM style log odds substitution matrix for: Isoleucine to Valine; Arginine to Methionine, and Threonine to Asparagine? Explain (6). 0) Why do the diagonal entries of these log odds substitution matrices have different values? (3) Q 6. a) Explain the two-phase strategy used by programs like BLAST and FASTA when aligning a 12 query sequence to a database sequence. (3) marks b) Both BLAST and FASTA use a word concept as part of their algorithms. What types of sequences do you expect to match your query as you change the word size from large to small? (3) 0) Describe the purpose and main characteristics of a PROSITE pattern. (3) d) What does it mean if a sequence is a “false—positive” or a “true-negative” match to a PROSITE pattern? (3) Page 2 of3 Q 7. Give the main methods of protein structure modelling and explain how these methods are related to the level of similarity between a sequence that we wish to make a model of and marks the sequences of proteins whose structure is known? (5) Q 8. When using computational methods to identify genes in a piece of eukaryotic genomic DNA, what characteristics would help to differentiate introns fiom exons? (5) marks Q 9. a) A colleague of yours said he would apply a correction to his observed value for the number of transitions, but not to the observed number of transversions, when calculating a distance marks between pairs of sequences. Give your comments on his approach. (5) b) What are the key differences between distance matrix and character state methods for determining phylogenetic trees? (3) Q 10. Your colleagues have been discussing a protein sequence. They have been passing a file 20 containing the sequence to each other and making notes on it. Now they want to be able to marks use the file in some programs you have written. However, the file is no longer in proper FASTA format. Write a PERL subroutine to successfully read the file and return the title and sequence (with all amino acids in upper case) to the calling routine. You may assume that the file, which is given below, has already been successfully opened. Ensure your code is properly commented and indicate your intentions if you are unsure of the correct syntax. # Our very exciting protein! # I think the linker site should be that marked in lower case. (AW) # The region similar to protein H451 is marked by ‘*' (ND) # Possible binding sites are numbered (GK) >Protein XYZ (secretive name) MIWIPYCIKLTSNGGTVDQKCFSVEEIVdeppIGLNW # linker region (AW) TLLNISLTGIHADIQVR*WEPPPNA* DVQKGWIVLKYELQYKEVNESQWKMMDPVSA TSVPVYSLRLDKEYEVRVRSRQRNSEK lYGEFSl EALYVTL # Site 1 (GK) PQMSPFACEEDFQFP 2WFLIIIF2 GIFGLTMILFLFIFSKQQRIKMLILPP # Don't think I believe your site 2 — check Hu et al (ND) ------ End of Paper --—--- Page 3 of 3 ...
View Full Document

{[ snackBarMessage ]}

Page1 / 3

PastPaper_06 - THE UNIVERSITY OF HONG KONG Department of...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon bookmark
Ask a homework question - tutors are online