link-analysis - Chapter 6: Link Analysis Most slides...

Info iconThis preview shows pages 1–8. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Chapter 6: Link Analysis Most slides courtesy Bing Liu 2 Road map Introduction Social network analysis Co-citation and bibliographic coupling PageRank HITS Summary 3 Introduction Early search engines mainly compare content similarity of the query and the indexed pages. I.e., They use information retrieval methods, cosine , TF-IDF , ... From 1996, it became clear that content similarity alone was no longer sufficient. The number of pages grew rapidly in the mid-late 1990s. Try classification technique, Google estimates: 10 million relevant pages. How to choose only 30-40 pages and rank them suitably to present to the user? Content similarity is easily spammed. A page owner can repeat some words and add many related words to boost the rankings of his pages and/or to make the pages relevant to a large number of queries. 4 Introduction (cont ) Starting around 1996, researchers began to work on the problem. They resort to hyperlinks . In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a hyperlink based search patent. The method uses words in anchor text of hyperlinks. Web pages on the other hand are connected through hyperlinks, which carry important information. Some hyperlinks : organize information at the same site. Other hyperlinks : point to pages from other Web sites. Such out-going hyperlinks often indicate an implicit conveyance of authority to the pages being pointed to. Those pages that are pointed to by many other pages are likely to contain authoritative information. 5 Introduction (cont ) During 1997-1998, two most influential hyperlink based search algorithms PageRank and HITS were reported. Both algorithms are related to social networks . They exploit the hyperlinks of the Web to rank pages according to their levels of prestige or authority. HITS : Jon Kleinberg (Cornel University), at Ninth Annual ACM- SIAM Symposium on Discrete Algorithms , January 1998 PageRank : Sergey Brin and Larry Page, PhD students from Stanford University, at Seventh International World Wide Web Conference ( WWW7 ) in April, 1998. PageRank powers the Google search engine . Stanford University the great ! Google: Sergey Brin and Larry Page (PhD candidates in CS) Yahoo!: Jerry Yang and David Filo (PhD candidates in EE) HP, Sun, Cisco, 6 Introduction (cont ) Apart from search ranking, hyperlinks are also useful for finding Web communities. A Web community is a cluster of densely linked pages representing a group of people with a special interest. Beyond explicit hyperlinks on the Web, links in other contexts are useful too, e.g., for discovering communities of named entities (e.g., people and organizations) in free text documents, and for analyzing social phenomena in emails.. 7 Road map Introduction Social network analysis Co-citation and bibliographic coupling PageRank HITS Summary 8...
View Full Document

This note was uploaded on 08/06/2008 for the course CSE 450 taught by Professor Davison during the Spring '08 term at Lehigh University .

Page1 / 73

link-analysis - Chapter 6: Link Analysis Most slides...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online