124.11.lec18

124.11.lec18 - CS 124/LINGUIST 180: From Click to edit...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon
Click to edit Master subtitle style Dan Jurafsky Lecture 18: Networks part I: Link Analysis, PageRank slides from Chris Manning, a few also from Ray Mooney and Bing Liu
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Slide from Chris Manning Outline Anchor text Background on networks Bibliometric (citation) networks Social networks Link analysis for ranking PageRank HITS Search Engine Optimization
Background image of page 2
Slide from Chris Manning The Web as a Directed Graph 3 3 Assumption 1:   A hyperlink between pages denotes       author  perceived relevance (quality signal) Assumption 2:   The anchor of the hyperlink describes              the target page (textual context) Page A hyperlink Page B Anc hor
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Slide from Chris Manning Anchor Text For ibm how to distinguish between: IBM’s home page (mostly graphical) IBM’s copyright page (high term freq. for ‘ibm’) Rival’s spam page (arbitrarily high term freq.) 4 4 www.ibm.com “ibm “ibm.co m” “IBM home page” A million pieces of  anchor text with “ibm”  send a strong signal
Background image of page 4
Slide from Chris Manning Indexing anchor text When indexing a document D , include anchor text from links pointing to D . 5 5 www.ibm.com Armonk, NY-based computer giant IBM announced today Joe’s computer hardware links Sun HP IBM Big Blue today announced record profits for the quarter
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Slide from Chris Manning Indexing anchor text Can sometimes have unexpected side effects – like what? Can score anchor text with weight depending on the authority of the anchor page’s website E.g., if we were to assume that content from cnn.com or yahoo.com is authoritative, then trust the anchor text from them 6 6
Background image of page 6
Slide from Chris Manning Anchor Text Other applications Weighting/filtering links in the graph Generating page descriptions from anchor text 7 7
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Slide from Chris Manning Roots of Web Link Analysis Bibliometrics Social network analysis 8 8
Background image of page 8
Slide from Chris Manning Citation Analysis: Impact Factor Developed by Garfield in 1972 to measure the importance (quality, influence) of scientific journals. Measure of how often papers in the journal are cited by other scientists. Computed and published annually by the Institute for Scientific Information (ISI). The impact factor of a journal J in year Y is the average number of citations (from indexed documents published in year Y ) to a paper published in J in year Y 1 or Y 2. Slide from Ray Mooney
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Slide from Chris Manning Citations vs. Links Web links are a bit different than citations: Many links are navigational. Many pages with high in-degree are portals not content providers. Not all links are endorsements. Company websites don’t point to their competitors.
Background image of page 10
Image of page 11
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 06/01/2011.

Page1 / 58

124.11.lec18 - CS 124/LINGUIST 180: From Click to edit...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online