38 Pages

Slide2

Course: CS 6890, Fall 2008
School: Utah State
Rating:
 
 
 
 
 

Word Count: 1330

Document Preview

Sequence Similar Similar Function Charles Yan Spring 2006 From Sequence to Function Protein sequence determine protein function. Thus similar protein sequences have similar functions One approach to predict function for a new protein is to search for similar proteins (homologues) whose functions are known. If the similarities are high, it is likely that the new protein has the same functions as its...

Register Now

Unformatted Document Excerpt

Coursehero >> Utah >> Utah State >> CS 6890

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Sequence Similar Similar Function Charles Yan Spring 2006 From Sequence to Function Protein sequence determine protein function. Thus similar protein sequences have similar functions One approach to predict function for a new protein is to search for similar proteins (homologues) whose functions are known. If the similarities are high, it is likely that the new protein has the same functions as its homologues 2 Homologue Search Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families 3 Dynamic Programming a1a2a3...am b1b2b3...bn Mi,j = MAX { Mi1, j1 + Si,j (match/mismatch) Mi,j1 + w (gap in sequence #1) Mi1,j + w (gap in sequence #2) } 4 Dynamic Programming G A A T T C A G T T A (sequence #1) G G A T C G A (sequence #2) Si,j = 1 (match) Si,j = 0 (mismatch score) w = 0 (gap penalty) 5 Dynamic Programming M1,1 = MAX[M0,0 + 1, M1, 0 + 0, M0,1 + 0] = MAX [1, 0, 0] = 1 6 Dynamic Programming 7 Dynamic Programming 8 Global and Local Alignment A global alignment is an optimal alignment that includes all characters from each sequence, whereas a local alignment is an optimal alignment that includes only the most similar local region or regions. 9 BLAST The BLAST programs (Basic Local Alignment Search Tools) are a set of sequence comparison algorithms introduced in 1990 that are used to search sequence databases for optimal local alignments to a query. Break the query and database sequences into fragments ("words"), and initially seek matches between fragments. The initial search is done for a word of length "W" that scores at least "T" when compared to the query using a given substitution matrix. Word hits are then extended in either direction in an attempt to generate an alignment with a score exceeding the threshold of "S". The "T" parameter dictates the speed and sensitivity of the search. 10 11 12 13 BLAST Web interface: http://www.ncbi.nlm.nih.gov/BLAST/ Download http://www.ncbi.nlm.nih.gov/BLAST/download.shtml 14 BLAST 15 BLAST 16 17 18 19 BLAST 20 Substitution Matrix A substitution matrix containing values proportional to the probability that amino acid i mutates into amino acid j for all pairs of amino acids 21 Substitution Matrix The BLOSUM family BLOSUM matrices are based on local alignments. BLOSUM 62 is a matrix calculated from comparisons of sequences with no less than 62% divergence. All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins. BLOSUM 62 is the default matrix in BLAST 2.0. Though it is tailored for comparisons of moderately distant proteins, it performs well in detecting closer relationships. A search for distant relatives may be more sensitive with a different matrix. 22 Substitution Matrix The PAM family PAM matrices are based on global alignments of closely related proteins. The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence. Other PAM matrices are extrapolated from PAM1. 23 Substitution Matrix The relationship between BLOSUM and PAM substitution matrices. BLOSUM matrices with higher numbers and PAM matrices with low numbers are both designed for comparisons of closely related sequences. BLOSUM matrices with low numbers and PAM matrices with high numbers are designed for comparisons of distantly related proteins. If distant relatives of the query sequence are specifically being sought, the matrix can be tailored to that type of search. 24 25 Raw Score S The raw score S for an alignment is calculated by summing the scores for each aligned position and the scores for gaps 26 Bit Score S' Raw scores have little meaning without detailed knowledge of the scoring system used, or more simply its statistical parameters K and lambda. Unless the scoring system is understood, citing a raw score alone is like citing a without distance specifying feet, meters, or light years. By normalizing a raw score using the formula one attains a "bit score" S', which has a standard set of units. 27 Bit Score S' The value S' is derived from the raw alignment score S in which the statistical properties of the scoring system used have been taken into account. Because bit scores have been normalized with respect to the scoring system, they can be used to compare alignment scores from different searches. 28 Significance The significance of each alignment is computed as a P value or an E value E value: Expectation value. The number of different alignents with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score. P value :The probability of an alignment occurring with the score in question or better. The p value is calculated by relating the observed alignment score, S, to the expected distribution of HSP scores from comparisons of random sequences of the same length and composition as the query to the database. The most highly significant P values will be those close to 0. P values and E values are different ways of representing the significance of the alignment. 29 Evalue In the limit of sufficiently large sequence lengths m and n, the statistics of HSP scores are characterized by two parameters, K and lambda. Most simply, the expected number of HSPs with score at least S is given by the formula We call this the Evalue for the score S. This formula makes eminently intuitive sense. Doubling the length of either sequence should double the number of HSPs attaining a given score. Also, for an HSP to attain the score 2x it must attain the score x twice in a row, so one expects E to decrease exponentially with score. The parameters K and lambda can be thought of simply as natural scales for the search space size and the scoring system respectively. 30 Pvalue The number of random HSPs with score >= S is described by a Poisson distribution. This means that the probability of finding exactly a HSPs with score >=S is given by where E is the Evalue of S given by equation (1) above. Specifically the chance of finding zero HSPs with score >=S is eE, so the probability of finding at least one such HSP is This is the Pvalue associated with the score S. For example, if one expects to find three HSPs with score >= S, the probability of finding at least one is 0.95. The BLAST programs report Evalue rather than P values because it is e...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Utah State - CS - 6670
Matching Problems in BioinformaticsCharles Yan Fall 2008Matching ProblemGiven a string P (pattern) and a long string T (text), find all occurrences, if any, of P in T.Example T: Given a string P (pattern) and a long string T (text), find all o
Utah State - CS - 6890
Gene FindingCharles Yan 1Gene FindingContent sensors Extrinsic content sensorsIntrinsic content sensorsCompare with protein sequences Compare with cDNA and ESTs Genomic comparisons Prediction methodsSignal sensors 2In
Utah State - CS - 5050
ShortestPaths8 B 2 8 2 7 5 E 3 C A 0 4 1 8 F D 5 3 2 9ShortestPaths1OutlineandReadingWeightedgraphs(7.1) Shortestpathproblem Shortestpathproperties Algorithm EdgerelaxationDijkstrasalgorithm(7.1.1) TheBellmanFordalgorithm(7.1.2) Short
Utah State - CS - 6670
Inexact MatchingCharles Yan 2008 1Longest Common Subsequence Given two strings, find a longest subsequence that they share substring vs. subsequence of a string Substring: the characters in a substring of S must occur contiguously in
Utah State - CS - 6100
Intelligence Agents(Chapter 2)1An Agent in its EnvironmentAGENT Sensor Input action outputENVIRONMENT2Agent Environmentsaccessible (get complete state info) vs inaccessible environment (most real world environments) episodic (temporary
Utah State - CS - 5050
Computational Geometry Chapter 121Range queries How do you efficiently find points that are inside of a rectangle? Orthogonal range query ([x1, x2], [y1,y2]): find all points (x, y) such that x1<x<x2 and y1<y<y2 Useful also as a multi-attribu
Utah State - USU - 1360
ErrorsData Errors: Human error Bad Measurements Modeling Errors:Wrong formula Incorrect assumption Incorrect assumptionImplementation Errors:1999 Mars Orbiter lost as Lockheed Martins programmed using English units but NASA used metric units.
Utah State - CS - 5070
Great Principles Project Principles Summary 8/22/07ComputationThese principles define the nature of computational processes, both natural and artificial: what they can and cannot do, and how we cope with inherent and pervasive computational compl
Utah State - CS - 6100
Homework 7 CS 6100 (can be done in groups of 1,2, or 3) Old Exam 2 (Fall 2007) + two questionsFill in the blank using the technical description (1 point each)1. In negotiation, the situation of "if I can help you without hurting me, I will" is ter
Utah State - CS - 5070
Coordination Principles8/12/07These principles concern how autonomous entities work together toward a common result. A coordination system is a set of agents interacting within a finite or infinite game toward a common objective. A. Agents can be
Utah State - CS - 7100
Fuzzy Kernel-Stable Coalitions Between Rational AgentsBastian BlankenburgMatthias KluschDFKI - German Research Center for Artificial Intelligence Stuhlsatzenhausweg 3 66123 Saarbrucken, Germany Onn ShehoryIBM - Haifa Research Lab Tel Aviv Sit
Utah State - CS - 5050
Graphs Chapter 64 17 3802SFO3371843ORDLAX1233DFWGraphs1Graph A graph is a pair (V, E), where V is a set of nodes, called vertices E is a collection (can be duplicated) of pairs of vertices, called edges Vertices and edges are
Utah State - CS - 6100
Dynamic PricingPeter R. Wurman North Carolina State UniversityE-commerce Big PictureInfrastructureTCP/IP HTTP & HTML Anonymity Databases EncryptionE-commerce Big PictureMake ContactWeb mining Data mining XMLRecommendationsInfrastructure
Utah State - CS - 5050
Maximum Flow4/6 s 3/5 v 1/1 3/3 1/1 u 2/2 w 3/3 4/7 1/9 z t 3/51Maximum FlowFlow Network A flow network (or just network) N consists of A weighted digraph G with nonnegative integer edge weights, wherethe weight of an edge e is called the
Utah State - CS - 5050
Chapter 7 Shortest Paths8 B 2 8 2 7 5 E 3 C A 0 4 1 8 D 5 32 9F1Shortest PathsWeighted Graphs In a weighted graph, each edge has an associated numerical value,called the weight of the edge Edge weights may represent, distances, costs,
Utah State - CS - 2420
Traversals of a graphHamiltonian TourHamiltonian path/tour: find a path through the graph such that every vertex is visited exactly once. If you must begin and end at the same point, it is a tour. Otherwise, it is a path. (NP complete) There is no
Utah State - CS - 2420
Chapter 15 Graphs and PathsYou know about trees. They have a rigid structure of each node have a single node that points to it (or none, in the case of the root). Sometimes life isn't so structured. For example: I need to fly to Tokyo. I want to fin
Utah State - CS - 6100
Ideas for 6100 topics. The most important thing is to find something you like. If you do what you like, you won't "work" a day on it. The digital libraries are WONDERFUL. One approach to finding a topic would be: 1. Thumb through the class text looki
Utah State - CS - 5050
R-7.2 (15 points) Algorithm ModifiedDijkstra (G, v) Input: A simple directed graph G with nonnegative edge weights and a vertex v. Output: A label D[u] for each vertex u, such that D[u] is the shortest distance from v to u in G for all u G.vertices(
Utah State - CS - 4700
Sample Midterm Questions 1. In a program you try to compile, if you mistype the constant integer "Count12" as "Count 12", when would this error be recognized? 2. At lexical analysis 3. At parsing 4. At code generation 5. At load time 2. Your employer
Utah State - FIE - 2000
Session A TECHNOLOGY-ENHANCED LEARNING ENVIRONMENT FOR A GRADUATE/UNDERGRADUATE COURSE ON OPTICAL FIBER COMMUNICATIONSH. Scott Hinton1, Roberto Gonzalez2, Laura L. Tedder3, Sandeep Karandikar4, Harpreet Behl5, Paul C. Smith6, John Wilbanks7, James H
Utah State - FIE - 2000
Session VIRTUAL CIRCUIT LABORATORYHess Hodge1, H. Scott Hinton2, and Michael Lightner3Abstract We present the rationale, implementation and performance features of a virtual lab environment for an electronic circuits course. The primary purpose of
Utah State - ECE - 470
Group #3 Kelvin Khor James F. Kreycik Vivek Kurisunkal Justin Marz Nevin Mcchesney Team Problems (2.18, 2.19, Matlab, 1.1, 1.5, 1.6, 1.7, 1.8) 2.18 Electron Energy: Hydrogen atom En = - mo e 4 [ J s] 2 (4 0 ) 2 h2 n 2 -(9.11 10-31 ) (1.60 10-19
Utah State - ECE - 470
Group #2 Bryce Haas David Hawk Bradley Henry Justin Hermann Peter Hindman EECS 470 Problem Set #2 (1.1, 1.5, 1.6, 1.7, 1.8, 2.18, 2.19, and program) 1.1 a) Face-centered cubic corners 8 1 [atoms] = 1 [atom] 8 sides 6 1 [atoms ] = 3 [atoms ] 2 tot
Utah State - PROC - 250
LEGEND{ascii_file}{graphics_file}Plots a geological legend based on information recorded in an ascii file. Arguments {ascii_file} - name of the ascii file to be read containing information about the geological legend. The default file name lege
Utah State - PROC - 250
/* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /* /** GEOLOGICAL SURVEY OF CANADA --Name: cndlogo.aml Usage: CNDLOGO <UL | CL | LL | UC | CC | LC | UR
Utah State - PROC - 250
7 findreplace.menu /* /* Called by: editanno.menu /* Calls made: findreplace.aml /* /* Description: Menu for entering find and replace annotation strings. /* Find string: %findstring Replace with: %replacestring %proceed %findstring input .ae$findstr
Utah State - SANTANDER - 2
=eduCommons 2.2.0-final Localization=Summary-Since eduCommons is customized from Plone, it has built-in support for localization of menus,controls, and other chrome. As of June 2006 Plone is available in 56 different translations.eduCommons r
Utah State - EDUCOMMONS - 3
=LinguaPlone Translation Instructions=In the context of Department, Course, and ECObjects, translations must occur in a 'top down'manner. A Department must be translated prior to translating a Course, which must betranslated prior to any objec
Utah State - FRWS - 3800
Deserts in General Regions of sparse life largely because of usual aridity of their climate Biological definition Structurally simple but functionally complex, characterized by contracted to absent perennial vegetation; ephemerals when wet. Ephem
Utah State - RS - 6740
Sean Hammond Assignment #1/ Prospectus FRWS 6740 The project I wish to do coincides directly with my masters project. More specifically my objective is to explore the potential of classifying imagery by fuel loading. The objective is to classify fuel
Utah State - C - 5
These slides have been prepared as a general guide for preparing training data for USGS canopy and impervious predictions. These images have all been classified using different methods, but regardless of the method the end product is what is importan
Utah State - C - 5
NLCD2001 C5 and Cubist TrainingMike Coan (coan@usgs.gov) Limin Yang, Chengquan Huang, Bruce Wylie, Collin Homer Land Cover Strategies TeamEROS Data Center, USGS June 2003Overview Classification tree C5/See5 General description of the algorith
Utah State - C - 5
The following slides are intended to provide a few examples of some problems and issues that come up in Landcover mapping. This will be an ever-growing presentation as more issues and clear examples will arrive in the future. Please feel free to cont
Utah State - PHOTOS - 2
02/01/2004 11:04 AM 5,083,210 AZ020104RM006_1.JPG02/01/2004 11:04 AM 4,531,054 AZ020104RM006_2.JPG02/02/2004 10:00 AM 3,876,458 AZ020204RM001_1.jpg02/03/2005 02:42 PM 657,842 AZ020204RM001_2.jpg05/24/2002 10
Utah State - GIS - 4930
Goals for this week (Sept. 9 - 11)Projections Data storage formats Database Management Systems (tabular) Graphic Database Structure raster, vector, surficial Data compressionRDBMS in GISMedian income in cache county by census tract (19
Utah State - GIS - 4930
Sept 16 - 18Data entry (digitizing, scanning) Editing geodata Quality control and error checking Tiles Edgematching Georeferencing and transformationsEditing geodataOnce you have completed initial data entry, you will still need to clean
Utah State - GIS - 4930
Sep 30 - Oct 2Geographic objects Lowlevel vs. highlevel objects Spatial measurement Calculating area, length, shape, distanceFunctional distance ReclassificationFunctional distanceTuesday, we discussed conceptually the idea of functional
Utah State - GIS - 4930
Goals for this week (Sept. 9 - 11)Projections Data storage formats Database Management Systems (tabular) Graphic Database Structure raster, vector, surficial Data compressionProjectionsIn projecting a map, you are attempting to represen
Utah State - GIS - 4930
Goals for this weekUnderstanding geographicacy Understanding cartographic communication Relationship of scale to cartography Map symbolism Know different families of map projections Familiarity with some geographic grid systems Understand the
Utah State - GISCLASS - 2005
ArcGIS CustomizationsAny customizations added to the normal.mxt template will always be available from ArcGIS when the same user is logged onto the machine (this file is saved in "\Documents and settings\<profile_name>\Application Data\ESRI\ArcMap\T
Utah State - GISCLASS - 2005
Customizing ArcGISWhy learn customization? Make the software suit your preferences Automate repetitive tasks Use tools from other sources Increase the power of the software Looks great on a resumCustomization levels Project-specific saved
Utah State - STREAMREST - 08
Version 4.0.0 March 2008Section - Arrays SizesPlan 01 1 1 0 27 90 F 7 1 0 0 9 9 0 0 0 0
Utah State - STREAMREST - 08
Proj Title=Design Exercise - Generic HEC RAS ModelCurrent Plan=p01Default Exp/Contr=0.3,0.1English UnitsGeom File=g01Flow File=f01Plan File=p01Y Axis Title=ElevationX Axis Title(PF)=Main Channel DistanceX Axis Title(XS)=StationBEGIN DESCRIP
Utah State - STREAMREST - 08
1.0591257542558040.0000000000000000.000000000000000-1.059287684282477460409.9029325585000004484271.707987529200000
Utah State - FRWS - 3800
MOJAVE CREOSOTEBUSH DESERTLOCATION Creosotebush represents the bottom of the vegetation zone distribution that we are studying.DISTRIBUTIONDISTRIBUTION Barely gets into Utah at the Dixie Corridor, but occurs throughout the lowlands of the Moj
Utah State - FRWS - 3800
Wildland Ecosystems FRWS 3800Syllabus, Important Dates, and Class Information Spring 2006_Lectures: M, W, F Instructors:8:30-9:20 BNR 314 Office Hours:MWF 9:30am 11:00am or by appointmentDoug Ramsey NR 355A 797-3783 doug.ramsey@usu.edu htt
Utah State - FRWS - 3710
Monitoring and Assessment In Natural Resources and Environmental Management WILD 3710Syllabus, Assignments, Important Dates, and Class Information Spring 2007_Lectures: M, W Lab: Th1:30-2:20 NR-217 1:30-4:20 QL-306 / NR-2171Instructors:Ro
Utah State - RS - 5750
Fire FinderA Spectrally Based Model to Identify Fire Scars from Landsat Thematic Mapper ImageryElectromagnetic RadiationWave Model Particle Model Speed of Light = 186,282.03 miles/sec or 3x108 m/secEM Radiation is generated when an electrical ch
Utah State - RS - 5750
Identifying Moose Habitat in Cache and Rich Counties Jenna Ames Remote Sensing 6750 12/14/062 Abstract Moose are very particular when it comes to their habitat. Because of their small population in Utah, there have been fewer studies done on moose
Utah State - RS - 5750
A LOGISTIC REGRESSION APPROACH FOR MAPPING POTENTIAL SUITABILITY FOR THE OCCURRENCE OF Bromus tectorum ACROSS THE GREAT BASIN REGION -UTILIZING THE MODERATE RESOLUTION IMAGING SPECTRORADIOMETERChristopher M. McGinty Remote Sensing / Geographic Inform
Utah State - RS - 5750
An Unclassified Look at Franklin County, IdahoFinal ReportFrom Cache TreasuresElizabeth Ballif FRWS 5750 Dr. Doug Ramsey Applied Remote Sensing1Abstract I wanted to do a study that has meaning to me. I am proud to be from Idaho! I chose a p
Utah State - RS - 5750
Project Objective Examine the changes in land cover within the study area (North Bengal) between the year 1978 and 1990 at the landscape level (L+1). Data and Method- For the present study we obtained the Landsat MSS data for the year 1978 and Landsa
Utah State - RS - 5750
The Spread of Spruce Beetle Outbreak on the Dixie National Forest and the Interaction of Fire(Keyes 2003)Kaylyn Little FRWS 5750 Final Project December 14, 2005AbstractMany people assume that after a spruce beetle outbreak or insect outbreak
Utah State - RS - 5750
Soil Depth Detection using MODIS LST day/night Thermal Data.By: Colby BrungardRS5750 Fall 2005Abstract: This study discusses the feasibility of measuring soil depth with remotely sensed images of a single land cover type. MODIS Terra 1km LST dat
Utah State - RS - 5750
Fires in Invasive Annual Grasses Vs. BigsageBy Daniel OttFRWS 575012/10/04Abstract This study was done from September of 2004 to December of 2004. The study was designed to look at fire return intervals in Annual Invasive Grasslands and Sagebrus
Utah State - RS - 5750
SAGEBRUSH CHANGE DETECTIONQTLuong/terragalleria.comAlan J. Luce FRWS 5750 December 15, 20042Table of ContentsPageAbstract/Introduction.4 Literature Review..5-8 Study Site.9-11 Methods..12-19 Results.20 Conclusion.21-22 References..23-24
Utah State - RS - 5750
Minimum Distance to Means Similar to Parallelepiped classifier, but instead of bounding areas, the user supplies spectral class means in n-dimensional space and the algorithm calculates the distance between a candidate pixel and each class. The can
Utah State - RS - 5750
Remote sensing of lakewater clarity in Bear Lake from 1984 1997: a relative comparison. By Kent BraddyProject Paper for Applied Remote Sensing (FWRS 5750) December 11, 20021AbstractPractical models of measuring lake water clarity require cal
Utah State - RS - 5750
Unsupervised Landcover Classification Using Quickbird Imagery Albion Basin, UTBrian MeismanFRWS 5750 Ramesy 12/11/02Abstract: Using Quickbird multispectral imagery to generate landcover classification is different from traditional resolution im
Utah State - RS - 5750
Brigham City Vegetation Alicia BeauchaineFinal Report Fall 2003 Applied Remote Sensing FRWS 5750 Dr. Doug Ramsey December 11, 2003Abstract Initially, the objectives to be addressed were the rate of growth of the city based on the change in vegetat