2 Pages

proj

Course: CSE 330, Fall 2009
School: UPenn
Rating:
 
 
 
 
 

Word Count: 665

Document Preview

330 CIS Project First Handout September 4, 2008 As Web 2.0 continues to grow, we see more and more community-driven information sharing websites. News sharing sites have become especially popular, two examples of which are Digg.com (established in 2004) and newsvine.com (established in 2006). These sites allow users to share, comment and vote on what they nd to be the most interesting news across the web, with no...

Register Now

Unformatted Document Excerpt

Coursehero >> Pennsylvania >> UPenn >> CSE 330

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
330 CIS Project First Handout September 4, 2008 As Web 2.0 continues to grow, we see more and more community-driven information sharing websites. News sharing sites have become especially popular, two examples of which are Digg.com (established in 2004) and newsvine.com (established in 2006). These sites allow users to share, comment and vote on what they nd to be the most interesting news across the web, with no editorial censorship behind the scenes. In this project, you will create a servlet-based application that allows registered users to share links to their favorite news stories, articles or videos and publish the most popular pieces (as voted on by the majority of users) on the front page of your site. The basic elements that your application should provide are the following: 1. The application must allow users to register, and maintain basic information such as name, age, location and profession. Each registered user must be allowed to share URLs and vote on those that they nd interesting. 2. Each shared URL may either be private (only seen by friends) or public (seen by anyone), and must also be unique across the list of all links. Along with each URL, you should keep track of its title, the date and time it was posted, the category it belongs to, the number of votes it currently has, and some descriptive content about the piece. 3. Only registered users may share and vote on site content, but guest users should be allowed to search. 4. Users should be allowed to comment on the pieces. For extra credit, implement threads of comments, so that users may reply to individual comments. 5. Each registered user has a friends list composed of other registered users. In order to see what your friends have been doing, RSS feeds should be generated containing the latest activity of each friend (i.e what they have posted, commented or voted on). The feeds should be readable by any common feed reader. 6. You should allow basic search capabilities that allow to users search pieces based on popularity, category, time and keyword. We expect that keyword searches will use inverted indices implemented over relational tables. 1 In addition to the elements found above in most news sites, you should implement two extra features: When users search the database for posted URLs, the result may contain outdated or non-existent links (for example, a link to a video may have been taken out due to violations of copyright rules). Hence, every so often (e.g., weekly or monthly), you will need to crawl posted links to check whether they are currently active. Inactive links and references to them must be removed from the database. Most websites (like Digg and newsvine) provide a recommendation system that is used for recommending stories that users may like, based on their past activity of either voting or contributing. The algorithms for deciding what should or should not be recommended to a user vary across websites (some can be very complex), but here we will provide a combination of results to the user: For every piece that a user has either shared, voted or commented upon, nd other users that have also voted or commented upon it, and rec...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

UPenn - TCOM - 501
TCOM501 Networking: Theory & Fundamentals Midterm Examination Professor Yannis A. Korilis March 5, 2003Answer all problems. Good Luck!Problem 1 [20 points]: Consider an M/M/1 queue that can accommodate at most K customers in the system (queued or
UPenn - CIS - 535
Motif detection (summary . so far) 1) What is it? (eg, Translation site, splice acceptor site, transcription factor binding site etc.) 2) Various representations of a motif Collection of words, consensus, regular expression, Positional Weight matrix
UPenn - CIS - 535
1GCB 535 / CIS 535: Introduction to Bioinformatics Midterm II Solutions12/15/04 1a. A synonymous mutation is a mutation in a codon that does not affect the amino acid that is coded for. A nonsonymous mutation changes the amino acid that is coded f
UPenn - CIS - 535
CIS 535 Fall 2005Homework 2Friday 9/30-/05Biological background: Organisms from flies to humans have daily circadian rhythms entrained with the 24-hour cycle of day and night that regulate many physiological systems. In mammals, there appears to
UPenn - CIS - 535
GCB/CIS 535 Fall 2004Lab Exercise 7: Clustering of Gene expression data (11/12/04)Tools: 1. Cluster 3.0: For download and install on your own computer:http:/bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/software.htmFor use in towne 142, 14
UPenn - CIS - 535
GCB/CIS 535 Fall 2004Lab Exercise 5: Vista toolsBiological Background:(10/29/04)In this weeks exercise, well try to use tools available at Vista to repeat some of the findings in Lootss paper (http:/www.seas.upenn.edu/~cis535/Lab/SciencesLoots
UPenn - CIS - 535
The goalLearning Gene Regulatory NetworksBayesian Belief Networks Predicting Gene Expression from SequenceBeer and Tavazole! Learnregulatory (transcriptional) networks from gene expression data ! Find and use TF binding sites"Or other regu
UPenn - CIS - 535
GCB/CIS 535 Fall 2004Home work 2: BlastDue in class, Wednesday, Sep 29, 2004 Biological background: Refer to the handout for first lab exercise (09/17/04) Comparative analysis of biological pathways in different genomes can help us to learn about
UPenn - CIS - 535
Promoter Prediction1) Biology of transcription 2) Prokaryotic versus Eukaryotic transcription 3) Discriminating features A. Specific binding sites B. Binding site distribution C. Clusters of binding sites D. DNA structure E. CpG islands 4) First unb
UPenn - CIS - 535
GCB/CIS 535 Fall 2004Lab Exercise 1: Sequence alignmentBiological background:(09/17/04)Some of the exercises and homework will deal with members of the high affinity copper and iron transport system in yeast (Saccharomyces cerevisiae). Here is
UPenn - CIS - 535
GENES & PHENOTYPESMICROARRAY TECHNOLOGIESVivian Cheung, MD Department of Pediatrics vcheung@mail.med.upenn.edu (215) 590-4950 Phenotype: Characteristics of an individual as determined by his/her genotype and the environment. Traditionally: short st
UPenn - CIS - 535
GCB 535 / CIS 535 Fall 2005Homework 1Due in class, Wednesday, September 21, 2005 A general note about homework: we feel that as much as possible, learning should be a collaborative experience. To that end, we welcome you and in fact, encourage yo
UPenn - CIS - 535
1GCB 535 / CIS 535 Fall 2004Homework 3Due in class, Wednesday, October 13, 2004Blast theory1. (8 pts) Blast plug-and-chug a. Use a BLOSUM50 matrix to calculate the score for the second of the three MANSE GLUTATHIONE matches (p. 11 on the hand
UPenn - CIS - 535
1GCB 535 / CIS 535 Fall 2004Homework 4Due in class, Wednesday, November 10, 2004Comparative genomics1. (6 pts) In Lootss paper (http:/www.seas.upenn.edu/~cis535/Lab/Sciences-Loots.pdf), the authors used comparative genomic sequence analysis m
UPenn - CIS - 535
Motif findingGCB 535 / CIS 535 M. T. Lee, 10 Oct 2004 Our goal is to identify significant patterns of letters (nucleotides, amino acids) contained within long sequences. The pattern is called a motif. Well stick to nucleotides for the rest of this d
UPenn - CIS - 535
GCB/CIS 535 Fall 2004Home work 6 (part 2): Mass Spectrum AnalysisDue in class, Wednesday, Dec 8, 2004 Tools: 1. PeptideCutter: http:/us.expasy.org/tools/peptidecutter/ 2. Protein identification by PepFrag: http:/prowl.rockefeller.edu/PROWL/pepfrag
UPenn - CIS - 535
pH 310kDa 100The omics nomenclature10PROTEOMICSDe novo sequence prediction for: nsi78_11.1803.1806.2.dta Absolute Relative Probability Probability 3.9% 36.3% 2.3% 24.7% 6.1% 17.2% 3.1% 6.5% 2.7% <0.1%GenomicsDNA (Gene) Transcription1
UPenn - CIS - 535
1GCB 535 / CIS 535 Fall 2004Midterm 1 ReviewHeres a list of things weve covered so far, and a set of sample questions. These are questions that failed to make it onto the midterm for one reason or another. Usually that means they are somewhat am
UPenn - CIS - 535
GCB/CIS 535 Microarray TopicsJohn Tobias November 8th, 2004OverviewNormalization Sample Quality Control Finding Differentially Expressed Genes Gene List AnnotationOverviewNormalizationAdjustment of gene expression values across an experimen
UPenn - CIS - 535
GCB/CIS 535 Fall 2004Lab Exercise 8: Analysis of MS/MS data (12/03/04)Tools: 1. PeptideSearch tools from the Bioanalytical Research Group at EMBL: http:/www.narrador.embl-heidelberg.de/GroupPages/Homepage.html (Contains the tools we need to use in
UPenn - CIS - 535
1GCB 535 / CIS 535 Fall 2004Midterm 2 ReviewAs before, you should also go over all the homework problems and make sure you understand the concepts behind them. Generally, you should not need to worry about equations, numbers, etc. To encourage y
UPenn - CIS - 535
GCB/CIS 535 Microarray TopicsJohn Tobias November 3rd, 2004OverviewExperimental Design Technology Replicates Experimental Execution Data ProcessingOverviewExperimental DesignIdentify Critical Conditions Identify Critical Comparisons Minimiz
UPenn - CIS - 535
Markov ModelHidden Markov ModelsSequence of statesE.g., exon, intron, Sequence of observationsE.g., AATCGGCGT Called emissionsProbability of transitionThe Markov matrix Mij = p(Sj | Si)Probability of emissionP(Ok|Sj)Lyle Ungar, Univer
UPenn - CIS - 535
GCB/CIS 535 Fall 2004Lab Exercise 3: Motif finding IBiological background:(10/01/04)The transcription activities of the genes of the yeast high-affinity copper and iron transport system are controlled by at least three transcription factors: M
UPenn - CIS - 535
GCB/CIS 535 Fall 2004Lab Exercise 4: Promoter PredictionTools:(10/08/04)1. Gene/Protein sequence search: NCBI Entrez: http:/www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein 2. Promoter Prediction: http:/www.cbs.dtu.dk/services/Promoter/ (Prom
UPenn - CIS - 535
GCB/CIS 535 Microarray TopicsJohn Tobias November 8th, 2004Overview Normalization Sample Quality Control Finding Differentially Expressed Genes Gene List AnnotationOverview Normalization Sample Quality Control Finding Differentially Expre
UPenn - CIS - 535
PROTEOMICSDe novo sequence prediction for: nsi78_11.1803.1806.2.dta Sequence Absolute Relative Probability Probability 3.9% 36.3% CRGSVNFP[PL]FK 2.3% 24.7% CRGSVN[DE][PL]FK 6.1% 17.2% CRGSVPFN[PN]FK 3.1% 6.5% CRGSV[SR]D[PL]FK 2.7% CRGSVPFNWGDK <0.1%
UPenn - CIS - 535
GCB/CI S 535 M icr oar r ay TopicsJohn Tobias November 3r d, 2004Over view Exper iment al Design Technology Replicat es Exper iment al Execut ion Dat a Pr ocessingOver view Exper iment al Design Technology Replicat es Exper iment al Exe
UPenn - CIS - 535
GCB/CI S 535 M icr oar r ay TopicsJohn Tobias November 15t h , 2004Over view Gene L ist Annot at ion Significance of Over St at ist icalion Repr esent atSimilar it y / Clust er ing Applicat ions Public Dat a For mat s and Dat abasesOver vi
UPenn - CIS - 535
Introduction to BioInformaticsGCB/CIS535Lyle Ungar, University of PennsylvaniaCourse OverviewSequence alignment Dynamic programming Blast and its variants statistical significance Motif and promoter prediction Homology and HMMs Experime
UPenn - CIS - 535
Hidden Markov ModelsLyle Ungar, University of PennsylvaniaMarkov Model Sequenceof states of observations of transition of emissionE.g., exon, intron, E.g., AATCGGCGT Called emissions The Markov matrix Mij = p(Sj | Si) P(Ok|Sj) Sequence
UPenn - CIS - 535
GCB535/CIS535 Homework 71. (10 pts) Experimental design: The following is an example of results from a spotted microarray done on mouse liver tissue:Gene A B C D Clock knockout Cy3 Cy5 2300 8028 201 100 8000 3082 2020 1200 Ratio 0.286497 2.01 2.595
UPenn - CIS - 535
1GCB 535 / CIS 535: Introduction to BioinformaticsMidterm Examination Wednesday, 12 October 2005 This is a closed-book exam. Write your answers on the exam paper, in the spaces provided. If you need more space, use the back side of the page, c
UPenn - CIS - 535
1GCB 535 / CIS 535: Introduction to BioinformaticsMidterm Examination Friday, 10 December 2004 This midterm examination consists of 11 pages (including this one), 5 questions, and 70 points. Please check to make sure you have all the pages. T
UPenn - CIS - 535
GCB 535 /CIS 535 Homework 5 Logistic regressionCircadian rhythmicity of biologic processes is a fundamental property of all eukaryotic and some prokaryotic organisms. These rhythms are driven by an internal time-keeping system. Changes in the extern
UPenn - CIS - 535
1.a) NGNA & NGNC 1 A 0.25 T 0.25 C 0.25 G 0.25 Score = 0.25 * 1 * 0.25 * 0 = 0 b) CACAF & NNNNG 2 0 0 0 1 3 0.25 0.25 0.25 0.25 4 0.5 0 0.5 0A T C G1 2 3 4 0.125 0.625 0.125 0.625 0.125 0.125 0.125 0.125 0.625 0.125 0.625 0.125 0.125 0.125 0.12
UPenn - CIS - 535
1GCB 535 / CIS 535: Introduction to BioinformaticsMidterm Examination Wednesday, 12 October 2005 This is a closed-book exam. Write your answers on the exam paper, in the spaces provided. If you need more space, use the back side of the page, c
UPenn - CIS - 535
CIS/GCB535 Homework - conservation papers. Please hand in on Monday 1) Three questions about the papers which would be interesting for class discussion. You need not answer these questions; just pose them. 2) Brief answers to the following questions:
UPenn - CIS - 535
YORFNAMEGWEIGHTCell-cycle Alpha-Factor 1Cell-cycle Alpha-Factor 2Cell-cycle Alpha-Factor 3Cell-cycle Alpha-Factor 4Cell-cycle Alpha-Factor 5Cell-cycle Alpha-Factor 6Cell-cycle Alpha-Factor 7Cell-cycle Alpha-Factor 8Cell-cycle Alpha-Factor
UPenn - CIS - 535
Introduction Squence analysis
UPenn - CIS - 535
YORFNAMEGWEIGHTGORDER01020304050607080100110120130140150160EWEIGHT1111111111111111YOR348CYOR348C PUT4 transport proline and gamma-aminobutyrate permease S0005875110.65116280.
UPenn - CIS - 535
probe id92257_at93619_at93694_at97724_at100122_at94420_f_atgene symbolclock per1per2cry2Gnb5cry1labelCT18-14.333.2615.776511.83380.90333.62050CT 22-13.772.8694.644710.0353-0.22653.77890CT 2-23.52.
UPenn - CIS - 535
Gene IDM_AM_BM_CM_DM_E_Re-IVTM_FM_GM_HM_IM_J_NewPoolM_KM_LM_MM_NM_OM_PM_QM_RM_TOM_AM_TOM_BM_TOM_CM_TOM_DM_TOM_EM_TOM_FM_TOM_GM_TOM_HM_TOM_IM_TOM_J_Re-IVTM_TOM_K_Re-IVTMGU74Av2.CELM_TOM_L_Re-fragM_TOM_MM_TOM_NM_TOM_O_R
UPenn - CIS - 535
Genomics: the big pictureLyle Ungar, University of PennsylvaniaHow do genes control phenotype? FindgenesAnnotate them (e.g. function) Identifyintrons, exons, regulatory elements Determine genetic regulatory networkE.g. Transcription
UPenn - CIS - 535
Gap penalties Gap penalties are chosen appropriately for each scoring matrix based on empirical studies involving a set of similar sequences. Bayesian adaptive sequence alignment (Zhu, Liu and Lawrence) This approach samples from all optimal alignme
UPenn - CIS - 535
Sequence alignmentInformally, aligning a pair of sequences means matching up the letters across the sequences preserving the order of letters (no cross-overs) which makes the most sense in a given context. Example: Align THISISREALLYSTRANGE and THIS
UPenn - CIS - 535
Mayo Clin Proc, November 2002, Vol 77Primer on Medical Genomics Part IV1185Medical GenomicsPrimer on Medical Genomics Part IV: Expression ProteomicsANIMESH PARDANANI, MD, PHD; ERIC D. WIEBEN, MD; THOMAS C. SPELSBERG, PHD; AND AYALEW TEFFERI,
UPenn - CIS - 535
Comparative Genomics"Know then thyself, presume not God to scan; The proper study of mankind is man." Alexander Pope, 1733"Nothing in biology makes sense except in the light of evolution." Theodosius Dobzhansky, 1932Comparative Genomics1. 2.
UPenn - CIS - 581
CIS581, Computer Vision Project 4, Automatic Face Morphing Due December 16, 1:00 pmOverviewThis project focuses automatic face morphing. You need to collect 12 or more face images. You are free to pick anyone, but it should include 6 celebrities (
UPenn - TCOM - 503
TCOM 503/EE 509 Problem Set 2 Due: 09/28/05Read: Chapters 1, 2 and 3 of the textbook by Palais (using 5th edition). Review: Overheads from the Volume I of bulk pack. Review: Complex numbers and vectors. Dwights Office Hours: 4:35 5:45 on Wednesday,
UPenn - TCOM - 503
TCOM 503/EE 509 Problem Set 5 Due: 10/21/05 @ 12 Noon in 203 MooreReview: Overheads from the Volume II of bulk pack. Review: Fourier Transforms use your own text of one of the texts in the bulk pack of readings on Fourier Transforms (the first one
UPenn - DRAGON - 2
XML is widely accepted as the standard for data exchange between businesses on the Internet. However, most corporations publish only selected portions of their proprietary business data as XML documents, and even then only virtually, that is by e
UPenn - CSE - 140
The best way to approach a Bayes problem is 1) Determine what the data and hypotheses are. Try to be precise. Data are things that you observe. Hypotheses are things you want to know, but don't observe.2) What are the prior and the likelih
UPenn - SAS - 540
FILE h123.eh 104th CONGRESS 2d Session AN ACT To amend title 4, United States Code, to declare English as the official language of the Government of the United States.
UPenn - STAT - 550
Take Home Final, Statistics 550, Fall 2008This is a take home final exam and is due Wednesday, December 17th by 5 pm (put in Prof. Smalls mailbox in the statistics department or e-mail him). You can consult any references but cannot speak with anyon
UPenn - STAT - 510
Statistics 510: Notes 21Reading: Sections 7.3-7.4 I. Moments of the Number of Events That Occur (Chapter 7.3) Example 1: Consider n independent trials, with each trial being a success with probability p. Let X be the number of successes in the n tri
UPenn - STAT - 510
Statistics 510: Notes 16Reading: Section 5.6.2-5.6.4, 5.7 I. Other Continuous Distributions (5.6.2-5.6.4) 1. Weibull Distribution (Section 5.6.2) The Weibull distribution is widely used in engineering as a model for the lifetime of objects. A random
UPenn - STAT - 510
Statistics 510: Notes 13Reading: Sections 5.1-5.3 Note: Room and Time for Question and Answer Review Session for midterm. Monday, October, 16th, 6:30 pm, Huntsman Hall 265. I. Wrap up on cumulative distribution functions (Section 4.9) The cumulative
UPenn - STAT - 510
Statistics 510: Notes 22Reading: Section 7.6-7.7 I. Conditional Expectation and Prediction (Chapter 7.6) Sometimes a situation arises where the value of a random variable X is observed and then, based on the observed value, an attempt is made to pre
UPenn - STAT - 550
Statistics 550 Notes 6Reading: Section 1.5 I. Sufficiency: Review and Factorization Theorem Motivation: The motivation for looking for sufficient statistics is that it is useful to condense the data X to a statistic T ( X ) that contains all the inf
UPenn - STAT - 550
Statistics 550 Notes 13Reading: Section 2.3. Schedule: 1. Take home midterm due Wed. Oct. 25th 2. No class next Tuesday due to fall break. We will have class on Thursday. 3. The next homework will be assigned next week and due Friday, Nov. 3rd. I. A