PastPaper_08 - THE UNIVERSITY OF HONG KONG Department of...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Background image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: THE UNIVERSITY OF HONG KONG Department of Biochemistry Bachelor of Science in Bioinformatics: Final Examination (2007—2008) BIOC1805 Elements of Bioinformatics Date: 6th May, 2008 (Tuesday) Time: 9:30 am - 11:30 am Candidates may use any calculator which fulfils the following criteria: (a) it should be self-contained, silent, battery-operated and pocket—sized; and (b) it should have numeral-display facilities only and be used only for the purpose of calculation. It is the candidate‘s responsibility to ensure that the calculator operates satisfactorily and the candidate must record the name and type of the calculator on the front page of the examination scripts. Lists of permitted/prohibited calculators will not be made available to candidates for reference. The onus will be on the candidate to ensure that the calculator used will not be in violation of the criteria listed above. Answer all questions. Each question carries the marks indicated. Q 1. a) You wrote a program that 15 1. read a sequence file obtained from the NCBI in GenPept format; marks 2. called readseq to convert that file to FastA format; 3. subsequently used readseq again to convert the FastA format file to GenPept format. What is(are) the difference(s) between the original and final GenPept format files? b) A gene sequence was downloaded, in default format, from the EMBL databank of the EBI and from GenBank of the NCBI. What difference(s) do you expect between the two files? 0) A colleague wants to store his multiple sequence alignment in a GenPept format file. Explain Why this is possible. Is this the best format to store a multiple sequence alignment? Give reasons for your answer. Q 2. You are using a dot-plot to compare two homologous nucleic acid sequences that code for a protein. You know that, although the corresponding amino acid sequences are very similar, 15 there are many synonymous substitutions between the sequences. marks a) If you use the k-tuple method, what consequences do you expect for k-tuple sizes of 2, 4 and 8? b) Explain whether a window and threshold/stringency approach is better? c) What Window size and threshold/stringency do you expect to give the best results for that method? Why? Q 3. a) Clustal uses either a k—tuple based alignment or a dynamic programming based method for 10 the initial all-pair-wise alignment phase. What is(are) the main difference(s) between these . _ . . 9 marks two methods for par Wise sequence ahgnment. b) Explain the differences among global, semi-global and local pair-wise alignments. Page 1 of 2 Q 4. 21) Give 3 methods that are commonly used to represent the information in a multiple 10 sequence alignment. Explain their main advantages and disadvantages for representing the marks alignment. b) What are the main differences between the “block” and “gap” approaches to multiple sequence alignment? Q 5. a) Explain the meaning of the positive, zero and negative values in the PAM and BLOSUM series of substitution matrices. 10 16 14 16 6 16 16 sequences. 1 0 marks b) What are the main differences in the way that the PAM and BLOSUM series of matrices are derived? Q 6. A B C D A distance matrix determined from the nucleotide sequences 10 B 1 5 of 5 taxa is given, in lower—triangular form, to the lefi. C l 4 1 6 marks D a) Calculate and draw the UPGMA tree relating these E b) Give the Newick format version of this tree, including the branch lengths. Q 7. a) What are the main uses of each of the three nucleotide BLAST programs (Megablast, 10 Discontiguous Megablast and BlastN)? marks b) Why is the expect score “E” used to assess the quality of a match between a query sequence and a sequence in a large databank (such as GenBank)? c) What do E values of 10 and 0.001 mean in terms of a BLAST sequence database search? Q 8. You have found an open reading frame in a piece of genomic sequence. How would you test 10 computationally to see if it was likely to belong to a protein coding gene if it was from marks a) a prokaryotic genome b) a eukaryotic genome? Q 9. a) Give the principle behind the three main methods for protein structure modelling. Explain 10 the level of sequence similarity between target and template appropriate for each method. marks b) How does the apparently limited number of natural protein folds assist modelling projects? ---- -- End of Paper ------ Page 2 of 2 ...
View Full Document

{[ snackBarMessage ]}

Page1 / 2

PastPaper_08 - THE UNIVERSITY OF HONG KONG Department of...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online