Documents Found!
As seen in
Less Work, Better Grades
Join
Course Hero
Access
best resources
Ace
your classes
Ace your courses with Course Hero!

Submit your homework question or assignment here:
352 Tutors are online
 
We are so confident that you will love our service, we will answer your first homework question for FREE!
*  Attach Assignment (optional):
 
Study Smarter, Score Higher
 
Document Content (unformatted)
Course Hero has millions of student submitted documents similar to the one below including study guides, homework solutions, papers, exam answer keys and textbook solutions.
the Searching Workplace Web Ronald Fagin Ravi Kumar Kevin S.McCurley Jasmine Novak D. Sivakumar John A.Tomlin David P.Williamson Presentation by Na Dai Motivation Influence of social forces on internet vs. intranet Reflections Development guidance Difference measure of success Few research on intranet search Corporations don't want to expose their intranet Limited privileges to access intranet Share a great deal, but we need to consider the unique characteristics of intranet search! Core Ideas Focus on: Intranet web search ranking problem Characteristics: Robustness and flexibility Decouple the ranking process Selection of ranking heuristics Synthesis of ranking methods Comparison on Nature: Internet vs. Intranet Axiom 1. Intranet documents are often created for simple dissemination of information, rather than to attract and hold the attention of any specific group of users. Axiom2. A large fraction of queries tend to have a small set of correct answers (often unique), and the unique answer pages do not usually have any special characteristics. Axiom3. Intranets are essentially spam-free. Axiom4. Large portions of intranets are not search-engine-friendly. Difference on graph structures: Internet vs. Intranet (1) Generalization: Heterogeneous Diverse: 7000 hosts All kinds of commercial web servers Specialty: Lotus Domino servers Estimated: 50M URLs Crawled: 20M URLs After delete Duplicate Links: 4.6M URLs 3.4M pages (containing anchortext) Difference on graph structures: Internet vs. Intranet (2) Indegree and outdegree distributions Connectivity properties Internet: SCC: 30% IN: 25% OUT: 25% IBM Intranet SCC: 10% OUT: large System Architecture Crawler URLs; metadata; canonicalization ranking Global component Query-independent ranking tables Duplicate elimination component use shingle, group, select one favorite for indexing Inverted index engine Primary copies of documents Multiple separate indices : content, title, anchortext Separate dictionaries Query runtime system Result markup and presentation system Rank Aggregation Aim: minimize the total number of inversion Kendall tau distance K(, ) = # unordered pair{k, l} MC4 (Markov chains) Experiment Query sets: Q1: 131 top popular queries (broad topic, single-word, directed towards "hubs") Q2: 82 queries with median frequency (specific, longer, "typical" queries) What is the correct answers for queries? Existing search engine + good old browsing Locate seed answer + refined by browsing Ambiguous? Multiple answers. Evaluation Criteria Recall at position p = Similarity : Kmin Experiment Results Information in anchortext, doc titles, keyword descriptors and meta-data is valuable for intranet search Building separate indices is effective Information from indices is query-independent. Experiment Results For different type of queries, different heuristics have different performances. (classifier? Learn from logs) Many auxiliary heuristics are quite useful Experiment Results 7 auxiliary ranking heuristics are much more closely aligned with the ranking based on the title index Apart from Da, auxiliary ranking heuristics are dissimilar Conclusion Difference between Intranet and Internet searches Queries Notion of "good answers" Social process that create Intranet vs. the one that creates Internet Rank aggregation: flexible and modular follow "plug and play" mode Combine multiple ranking heuristics Thank you!
Find millions of documents here - Study Guides, Homework Solutions, Papers, Exam Answer Keys and more. Course Hero has millions of course related materials that will enable you to learn better, faster and get an A in all your courses.
Below is a small sample set of documents:

Lehigh >> IE >> 170 (Spring, 2007)
I Hate A-Rod! IE170: Algorithms in Systems Engineering: Lecture 31 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University April 20, 2007 Jeff Linderoth (Lehigh University) IE170:Lecture 31 Lecture Notes 1 / 14 Jeff L...
Lehigh >> CSE >> 450 (Spring, 2008)
Corroborate and learn facts from the web Shubin Zhao and Jonathan Betz Presentation by Yang Yu 1 Problem Definition Known Facts Entity_Name Date of Birth Angelina Jolie June 4, 1975 More Facts Entity_Name Date of Birth Academy Awards Place of birt...
Lehigh >> CSE >> 432 (Fall, 2008)
Data abstraction with C+ classes A data structure for dates in C: struct Date { int month, day, year } void setMonth(struct Date*,int); /Note: ANSI C function prototypes from C+ void setDay(struct Date*,int); void printDate(Date*) . What\'s the proble...
Lehigh >> CSE >> 450 (Spring, 2008)
Distributed Computing Seminar Lecture 2: MapReduce Theory and Implementation Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet Summer 2007 Except as otherwise noted, the contents of this presentation are Copyright 2007 University of Wa...
Lehigh >> CSE >> 432 (Fall, 2008)
Sample (short and simple) Requirements Specification Line Editor with Multiple Undo/Redo Purpose: Develop a simple text editor, whose most interesting feature will be multiple undo/redo, i.e., it should be possible to undo any sequence of commands th...
Lehigh >> CSE >> 450 (Spring, 2008)
Opinion Observer: Analyzing and Comparing Opinions on the Web Bing Liu, Minging Hu, Junsheng Cheng Presentation by Mark Strohmaier Problem Overview The Internet allows people to see product reviews from a large number of people Often easier for c...
Lehigh >> CSE >> 432 (Fall, 2008)
Team Project Role Assessments: Project manager Each project manager should give a self-assessment, and each person who interacts with a project manager should evaluate that person\'s performance of that in the project Project manager\'s name: __ Respon...
Lehigh >> CSE >> 450 (Spring, 2008)
Chapter 5: Information Retrieval and Web Search An introduction Most slides courtesy Bing Liu Introduction Text mining refers to data mining using text documents as data. Most text mining tasks use Information Retrieval (IR) methods to pre-p...
Lehigh >> CSE >> 432 (Fall, 2008)
The \"meaning\" of inheritance Semantics of inheritances is shifty (Overhead: A Hierarchy of Classes) -a classic paper by Woods on semantic networks: What\'s in a link? -isa could mean subtype or instance-of or has-properties-of. See and discuss UM mult...
Lehigh >> CSE >> 450 (Spring, 2008)
Web Mining Seminar CSE 450 Spring 2008 MWF 11:1012:00pm Maginnes 113 Instructor: Dr. Brian D. Davison Dept. of Computer Science & Engineering Lehigh University http:/www.cse.lehigh.edu/~brian/course/webmining/ davison@cse.lehigh.edu Course Obje...
Lehigh >> CSE >> 432 (Fall, 2008)
Templates LOOKOUT library includes several different array classes: IntArray, FloatArray, StringArray. Cobiously these have a lot in common; the only difference between them is element type CWouldn=t it be simpler if there were just one generic array...
Lehigh >> CSE >> 450 (Spring, 2008)
Distributed Computing Seminar Lecture 5: Graph Algorithms Sierra Michels-Slettvet Summer 2007 Except as otherwise noted, the content of this presentation is 2007 Google Inc. and licensed under the ...
Lehigh >> CSE >> 432 (Fall, 2008)
Exception handling in C+ Why is exception handling a good idea? What is it good for? Robustness: error recovery, or at least graceful termination Goal: separate exceptional from normal processing How is error handling done in traditional C code? 1) R...
Lehigh >> CSE >> 450 (Spring, 2008)
Distributed Computing Seminar Lecture 4: Clustering an Overview and Sample MapReduce Implementation Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet Summer 2007 Except as otherwise noted, the content of this presentation is 2007 Googl...
Lehigh >> CSE >> 432 (Fall, 2008)
Networking in Java Networking is a massive and complex topic, whole courses are devoted to this subject Java provides a rich set of networking capabilities Ranging from manipulating URLs on the Internet to client-server systems connecting via sockets...
Lehigh >> CSE >> 450 (Spring, 2008)
Sic Transit Gloria Telae: Towards an Understanding of the Web\'s Decay Ziv Bar-Yossef et al IBM Almaden and T.J Watson Research Centers Mark Strohmaier Problem Motivation Determining if a link is dead is not trivial Using dead links as a decay...
Lehigh >> CSE >> 432 (Fall, 2008)
Use Cases Two kinds of use case documents: a use case diagram and use case text. Text provides the detailed description of a particular use case Diagram provides an overview of interactions between actors and use cases Here\'s an example of a use case...
Lehigh >> CSE >> 450 (Spring, 2008)
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Nave Bayesian classification Nave Bayes for text classification Support vector machines K-nearest neighbor Ens...
Lehigh >> CSE >> 432 (Fall, 2008)
Eiffel Like the tower, emphasizing elegant French design; developed by Bertrand Meyer The basic structure of object-oriented languages is the class a class is both 1) a module and 2) a type as a module: an interface (set of available services) & an i...
Lehigh >> CSE >> 450 (Spring, 2008)
DETECTING PHRASE-LEVEL DUPLICATION ON THE WORLD WIDE WEB Dennis Fetterly, Mark Manasse Marc Najork Microsoft Research SIGIR\'05 CSE 450 Web Mining Seminar Presented by Liangjie Hong y gj g March 24th, 2008 1 BACKGROUND Types of Spam Content Spam Lin...
Lehigh >> CSE >> 432 (Fall, 2008)
C+ idioms Now that you\'ve learned C+, how do we use it well A highly recommended book: Scott Meyers Effective C+: 50 Specific Ways to Improve your Programs and Designs, Second Edition, Addison-Wesley, 1997. (Also More Effective C+: 35 New Ways.) Scot...
Lehigh >> CSE >> 432 (Fall, 2008)
Object-Oriented Analysis Requirements analysis and domain analysis precede design So far we\'ve looked at requirements analysis-understanding what the customer wants Domain analysis understands the customer\'s problem-by identifying the classes compris...
Lehigh >> CSE >> 432 (Fall, 2008)
Cost Estimation Van Vliet, chapter 7 Glenn D. Blank Cost estimates: when and why When does a contractor estimate costs for building a house? Before construction begins, let alone payment Takes into account subcontracts for foundation, framing, ...
Lehigh >> CSE >> 432 (Fall, 2008)
Abstract data types What does abstract` mean? From Latin: to pull out`-the essentials To defer or hide the details Abstraction emphasizes essentials and defers the details, making engineering artifacts easier to use I don`t need a mechanic`s ...
Lehigh >> CSE >> 432 (Fall, 2008)
Assertions in Java (JDK 1.4) Jarret Raim updated by Glenn Blank What is an assertion? An assertion is a statement in Java that enables you to test your assumptions about your program. Each assertion contains a boolean expression that you believ...
Lehigh >> CSE >> 432 (Fall, 2008)
CSE432: Object-Oriented Software Engineering Objectives What do you hope to learn in this course? Here are my list of course objectives: To investigate principles of object-oriented software engineering, from analysis through testing To lea...
Lehigh >> CSE >> 432 (Fall, 2008)
LehighUML Project John Pequeno, Adam Balgach, Sally Moritz & Professor Glenn Blank Extreme Programming XP Method Iterative Development Iterations measured in minutes to weeks of iteration dependent on project type. LehighUML Iterations of 1 to 3...
Lehigh >> CSE >> 432 (Fall, 2008)
J2EE Structure & Definitions Catie Welsh CSE 432 http:/www.developer.com/java/ejb/article.php/1434371 http:/java.sun.com/j2ee/1.4/docs/tutorial/doc/index.html J2EE Breakdown Web Clients contain 2 parts Dynamic Web pages containing HTML, XML W...
Lehigh >> CSE >> 450 (Spring, 2008)
A Taxonomy of JavaScript Redirection Spam Kumar Chellapilla Microsoft Live Labs One Microsoft Way Redmond, WA 98052 +1 425 707 7575 Alexey Maykov Microsoft Live Labs One Microsoft Way Redmond, WA 98052 +1 425 705 5193 kumarc@microsoft.com ABSTRACT ...
Lehigh >> CSE >> 432 (Fall, 2008)
Exceptions in Java What is an exception? An exception is an error condition that changes the normal flow of control in a program Exceptions in Java separates error handling from main business logic Based on ideas developed in Ada, Eiffel and C...
Lehigh >> CSE >> 450 (Spring, 2008)
Chapter 4: Unsupervised Learning Most slides courtesy Bing Liu Road map Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Data standardization Handling mixed attributes Which clusterin...
Lehigh >> CSE >> 432 (Fall, 2008)
Why software engineering? Demand for software is growing dramatically Software costs are growing per system Many projects have cost overruns Many projects fail altogether Software engineering seeks to find ways to build systems that are on time ...
Lehigh >> CSE >> 450 (Spring, 2008)
Chapter 3: Supervised Learning Most slides courtesy Bing Liu Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Nave Bayesian classification Nave Bayes for tex...
Lehigh >> CSE >> 432 (Fall, 2008)
Design patterns Glenn D. Blank Definitions A pattern is a recurring solution to a standard problem, in a context. Christopher Alexander, a professor of architecture. Why would what a prof of architecture says be relevant to software? \"A patte...
Lehigh >> CSE >> 450 (Spring, 2008)
Nadav Eiron, Kevin S.McCurley, JohA.Tomlin IBM Almaden Research Center WWW\'04 CSE 450 Web Mining Presented by Zaihan Yang Introduction & Contribution Propose algorithmic innovations for the basic PageRank paradigm. Problem of Web Frontier ( Dangl...
Lehigh >> CSE >> 432 (Fall, 2008)
From use cases to classes (in UML) A use case for writing use cases Use case: writing a use case Actors: analyst, client(s) Client identifies and write down all the actors. Analyst writes down all the actors. Client identifies the use cases, i.e., g...
Lehigh >> CSE >> 450 (Spring, 2008)
Mining Web Multi-resolution Community-based Popularity for Information Retrieval Laurence A. F. Park Kotagiri Ramamohanarao Department of Computer Science and Software Engineering University of Melbourne, Australia {lapark,rao}@csse.unimelb.edu.au ...
Lehigh >> CSE >> 432 (Fall, 2008)
Requirements specification CSE432 Object-Oriented Software Engineering Requirements analysis and system specification Why is this the first stage of most life cycles? Need to understand what customer wants first! Requirements analysis says: \"Make...
Lehigh >> CSE >> 450 (Spring, 2008)
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen School of Information Management & Systems University of California Berkeley, CA 94720 USA hchen@sims.berkeley.edu ABSTRACT Susan Dumais Microsoft Research One Microsoft W...
Lehigh >> CSE >> 450 (Spring, 2008)
Chapter 6: Link Analysis Most slides courtesy Bing Liu Road map Introduction Social network analysis Co-citation and bibliographic coupling PageRank HITS Summary 2 Introduction Early search engines mainly compare content similarity of t...
Lehigh >> CSE >> 450 (Spring, 2008)
CSE 450 Web Mining Seminar CSE W b Mi i S i Jian Wang Roadmap d Analysis of User Behavior A l i f U B h i Analysis of Implicit Feedback Learning Ranking Functions Conclusion and Future Work Reference: Accurately Interpreting Clickthrough Dat...
Lehigh >> CSE >> 450 (Spring, 2008)
by Hao Chen, Susan Dumais by Hao Chen Susan Dumais cse 450: Web Mining Seminar Jian Wang ABSTRACT & INTRODUCTION A user interface that organizes Web search results into hierarchical lt i t hi hi l categories. Two main components A text classi...
Lehigh >> CSE >> 450 (Spring, 2008)
CSE 450 Web Mining Seminar CSE W b Mi i S i Jian Wang Introduction Extractbased generic Webpage summarization To utilize extra knowledge to improve Webpage summarization, i.e., clickthrough dataset summari ation i e clickthrough dataset To bui...
UC Davis >> BIS >> 102 (Spring, 2008)
Name White Key Last, First BIS102-02, Spring \'08, Page 1 of 7 C. S. Gasser BioSci 102-02 Apr. 22, 2008, First Midterm Instructions: There are seven pages in this exam including the cover sheet, please count them before you start to make sure a...
Lehigh >> CSE >> 450 (Spring, 2008)
A TAXONOMY OF JAVASCRIPT REDIRECTION SPAM Kumar Chellapilla, Alexey Maykov Microsoft Live Labs AIRWeb 2007 CSE 450 Web Mining Seminar Presented by Liangjie Hong Feb 20th, 2008 1 BACKGROUND & INTRODUCTION What is Spam? Any deliberate human actio...
Lehigh >> CSE >> 450 (Spring, 2008)
Web Usage Mining: An Overview Lin Lin Department of Management Lehigh University Jan. 30th Agenda Web Usage Mining: Definition Research Issues in Web Usage Mining Current Research in Web Usage Mining Going Forward Web Usage Mining: A Definition...
Lehigh >> CSE >> 450 (Spring, 2008)
Eric J. Glover1, Kostas Tsioutsiouliklis1,2, Steve Lawrence1, David M. Pennock1, Gary W. Flake1 International World Wide Web Conference, 2002 Presented by Zaihan Yang CSE Web Mining Introduction Aim Classification of web pages Description of web...
Lehigh >> CSE >> 450 (Spring, 2008)
Preface The rapid growth of the Web in the last decade makes it the largest publicly accessible data source in the world. Web mining aims to discover useful information or knowledge from Web hyperlinks, page contents, and usage logs. Based on the pr...
Lehigh >> CSE >> 450 (Spring, 2008)
Navigation-Aided Retrieval Shashank Pandit and Christopher Olstony Presentation by Yang Yu CSE 450 Web Data Mining Outline Introduction Related Work System Model Prototype System Evaluation Summary & Future Work Introduction Background reas...
UC Davis >> NPB >> 114 (Spring, 2008)
NPB 114 Final Exam (2004) Matching (2 pts each). NOTE: Some answers may be used more than once a. Enterokinase b. Amylase c. Sucrase _ _ _ _ d. Trypsin e. Lactase 1. This enzyme doesn\'t breakdown any food items in the GI tract 2. Its activity yields...
Lehigh >> CSE >> 450 (Spring, 2008)
Ziv Bar-Yossef, Idit Keidar, Uri Schonfeld WWW\'07 CSE 450 Web Mining Presented by Zaihan Yang Introduction & Contribution Propose a novel algorithm DustBuster for uncovering DUST. Discover DUST rules from a URL list Mainly focus on the substring sub...
Lehigh >> CSE >> 450 (Spring, 2008)
Enhanced Web Page Classification Xiaoguang Qi Background Utilizing features of neighbors Using fielded features Problem definition Classification A set of labeled data is used to train a classifier which can be applied to label future example...
Lehigh >> IE >> 426 (Fall, 2006)
1 4 ...
Lehigh >> IE >> 426 (Fall, 2006)
1 2 2 3 ...
Lehigh >> IE >> 426 (Fall, 2006)
6 8 2 1 4 5 6 1 7 6 3 1 4 2 6 7 5 7 5 6 8 9 ...
Lehigh >> IE >> 426 (Fall, 2006)
6 8 2 1 4 5 6 1 7 6 3 1 4 2 6 7 5 7 5 6 8 9 ...
UC Davis >> NPB >> 114 (Spring, 2008)
NPB 114 Practice MT#2 Matching (1 pt each) a. Acinar cell b. Endocrine cell c. Parietal cell _ _ _ d. Chief cell e. Mucous cell 1. Produces an alkaline fluid to protect the stomach 2. Its product is released into the bloodstream 3. Produces pepsinog...
Lehigh >> IE >> 426 (Fall, 2006)
6 8 2 8 6 7 5 7 4 1 3 4 5 6 5 1 4 7 3 9 1 4 2 2 5 6 8 9 7 6 ...
Lehigh >> IE >> 426 (Fall, 2006)
6 8 2 8 6 7 5 7 4 1 3 4 5 6 5 1 4 7 3 9 1 4 2 2 5 6 8 9 7 6 ...
Lehigh >> IE >> 426 (Fall, 2006)
Scenario Mean Stdev Buy Optimal q c r 100 30 100 85 0.7 0.5 0.05 YOUR CHOICE OPTIMAL Demand Sell Salvage Profit Sell Salvage 1 121 100 0 70 85 0 2 71 71 29 51.15 71 14 3 110 100 0 70 85 0 67 67 33 48.55 67 18 59 59 41 43.35 59 26 51 51 49 38.15 51 3...
Lehigh >> IE >> 426 (Fall, 2006)
Informal Homework Survey September 14, 2006 Please answer the following questions. This is an anonymous survey, but even if it wasn\'t, I wouldn\'t hold your answers against you. Difficulty On a scale of 1-10, with a 10 being \"I hate you. Why are you ...
Lehigh >> IE >> 426 (Fall, 2006)
IE426 Course Survey-Quiz #0 Name: email: Background Mathematics Mathematicians are like Frenchmen: whatever you say to them they translate into their own language and forthwith it is something entirely different.\" -Johann Wolfgang von Goethe Please...
wap
Lehigh >> IE >> 426 (Fall, 2006)
IE 426 Case Study Integer Programming 1 Wireless Capacity Expansion Planning Note: This is a real consulting problem. The names have been changed to protect the innocent. Prof. Linderoth will be acting as the client. You have been contracted by a...
pls
Lehigh >> IE >> 426 (Fall, 2006)
IE 426 Case Study #3 Stochastic Programming Due Date: December 16, 2006 1 Networks for Private Line Services The RoaDMaP Corporation is in the business of providing telecommunication services. We are going to build a planning model for the priva...
Lehigh >> IE >> 426 (Fall, 2006)
Optimization Models Draft of August 26, 2005 III. Beyond Linear Optimization Robert Fourer Department of Industrial Engineering and Management Sciences Northwestern University Evanston, Illinois 60208-3119, U.S.A. (847) 491-3151 4er@iems.northwest...
Lehigh >> IE >> 426 (Fall, 2006)
e P D 9 D 6 1ucbU g 2Vq2V2Q2tbQ1qYCVo152CA tsTquhdT21&Vy2V12\' Xd CsqD v IU D 8 8 IU v I 8 0 R IU 8 w D 8 ( 8 D I % (U 0 rU 0 P 6 % % 6 8 0 R (U ( I %U R ( 8 g e D %F 8 r I 8D 8F 8 8 R 9 GU R3 \' 8 0 g e 1)b1V2)2CX Xd VCnqQcbcd21foW...
A-I
Lehigh >> IE >> 426 (Fall, 2006)
Optimization Models Draft of August 26, 2005 I. Formulating an Optimization Model: An Introductory Example Robert Fourer Department of Industrial Engineering and Management Sciences Northwestern University Evanston, Illinois 60208-3119, U.S.A. (84...
Lehigh >> CSE >> 432 (Fall, 2008)
JUnit A tool for test-driven development History Kent Beck developed the first xUnit automated test tool for Smalltalk in mid-90\'s Beck and Gamma (of design patterns Gang of Four) developed JUnit on a flight from Zurich to Washington, D.C. Ma...
Lehigh >> IE >> 426 (Fall, 2006)
Optimization Models Draft of August 26, 2005 II. Elementary Linear Optimization Models Robert Fourer Department of Industrial Engineering and Management Sciences Northwestern University Evanston, Illinois 60208-3119, U.S.A. (847) 491-3151 4er@iems...
Lehigh >> IE >> 426 (Fall, 2006)
46 1. Introduction and Examples While weather effects do no~ vary greatly over 25-year periods, fire damage can be quite variable. Assume that in each 25-year block, the probability is 1/3 that 15% of all timber stands are destroyed and that the pro...
What are you waiting for?