# 04 - organic-search.pdf - How Search Engines Rank Organic...

• Homework Help
• 14

This preview shows page 1 - 5 out of 14 pages.

How Search Engines Rank Organic Listings Two categories of generating search results Vertical search Algorithmic search Three distinct sources of algorithmic information: Content/Relevance: Vector Model Search results based on content of pages: the words that appear on each page and their frequencies Network Popularity: Citation Model Search results based mainly on “popularity” of pages Content of pages is looked at only minimally Usage or Behavioral Popularity: Empirical click and visitation data Organic Search 1 Usage or Behavioral Popularity Behavioral popularity score for a page Generically Number of visitors to the page Amount of time spent on the page Behavioral popularity score for a page, specifically with respect to a search engine query Increase in click rate for a link in the search results page, relative to the “normal” click rate for a link at that position. Penalty for lower-than-normal click rate. Amount of time spent on the page. Penalty for quick click to another link in the search results page. Organic Search 2
Mini Web Universe with 5 Pages Page ID cat dog mouse fish horse cow Page Points To d1 12 12 0 2 0 0 d2, d5 d2 3 1 6 0 0 0 d3, d5 d3 1 4 0 0 5 20 d1, d2, d4 d4 6 3 3 11 0 0 d1, d2, d3, d4 d5 1 1 0 1 1 1 d1, d5 What is the best match for the following search query? “cat dog fish” Organic Search 3 Vector Model The matching score S ( d, q ) between the query and a document d is the sum of the TFIDF score over every term t in the query. Let n ( d, t ) be the number of occurrences of term t in document d . TF ( d, t ) = 1 + log(1 + log( n ( d, t ))) if n ( d, t ) > 0 , otherwise 0 IDF ( t ) = log(1 + total number of documents number of documents containing t ) TFIDF ( d, t ) = TF ( d, t ) × IDF ( t ) S ( d, q ) = t q TFIDFT ( d, t ) The best match to a query q is the document with the highest value of S ( d, q ) Organic Search 4
TF ( d, t ) and IDF ( t ) for our Mini Web Universe TF ( d, t ) values for every document-term pair: Page ID cat dog mouse fish horse cow d1 1.318 1.318 0.000 1.114 0.000 0.000 d2 1.169 1.000 1.250 0.000 0.000 0.000 d3 1.000 1.205 0.000 0.000 1.230 1.362 d4 1.250 1.169 1.169 1.310 0.000 0.000 d5 1.000 1.000 0.000 1.000 1.000 1.000 IDF ( t ) values for every term: cat dog mouse fish horse cow 0.301 0.301 0.544 0.426 0.544 0.544 Organic Search 5 TFIDF ( d, t ) for our Mini Web Universe and Matching Scores for the query “cat dog fish” Page ID cat dog mouse fish horse cow Matching Scores d1 0.397 0.397 0.000 0.475 0.000 0.000 1.268 d2 0.352 0.301 0.680 0.000 0.000 0.000 0.653 d3 0.301 0.363 0.000 0.000 0.669 0.741 0.664 d4 0.376 0.352 0.636 0.558 0.000 0.000 1.286 d5 0.301 0.301 0.000 0.426 0.544 0.544 1.028 d1, d4 and d5 contain all three query terms: d4 is the best and d5 is the worst Organic Search 6
Managerial guidelines In a query consisting of multiple terms, focus on the more unusual term. At least in the early startup phase: focus on the unusual queries. Organic Search 7 Citation Model The PageRank for a page is proportional to the probability that you would end up at a page if you were just surfing randomly and infinitely (that is, if you were a web user following a random ergodic Markov Chain).