How Search Engines Rank Organic ListingsTwo categories of generating search results•Vertical search•Algorithmic searchThree distinct sources of algorithmic information:•Content/Relevance: Vector Model–Search results based on content of pages: the words that appear oneach page and their frequencies•Network Popularity: Citation Model–Search results based mainly on “popularity” of pages–Content of pages is looked at only minimally•Usage or Behavioral Popularity: Empirical click and visitation dataOrganic Search 1Usage or Behavioral Popularity•Behavioral popularity score for a page Generically–Number of visitors to the page–Amount of time spent on the page•Behavioral popularity score for a page, specifically with respect to asearch engine query–Increase in click rate for a link in the search results page, relative tothe “normal” click rate for a link at that position. Penalty forlower-than-normal click rate.–Amount of time spent on the page. Penalty for quick click to anotherlink in the search results page.Organic Search 2
Mini Web Universe with 5 PagesPage IDcatdogmousefishhorsecowPage Points Tod112120200d2, d5d2316000d3, d5d31400520d1, d2, d4d46331100d1, d2, d3, d4d5110111d1, d5What is the best match for the following search query?“cat dog fish”Organic Search 3Vector ModelThe matching scoreS(d, q)between the query and a documentdis the sum of theTFIDF score over every termtin the query. Letn(d, t)be the number of occurrencesof termtin documentd.TF(d, t)=1 + log(1 + log(n(d, t))) ifn(d, t)>0,otherwise 0IDF(t)=log(1 +total number of documentsnumber of documents containingt)TFIDF(d, t)=TF(d, t)×IDF(t)S(d, q)=t∈qTFIDFT(d, t)The best match to a queryqis the document with the highest value ofS(d, q)Organic Search 4
TF(d, t)andIDF(t)for our Mini Web UniverseTF(d, t)values for every document-term pair:Page IDcatdogmousefishhorsecowd11.3181.3180.0001.1140.0000.000d21.1691.0001.2500.0000.0000.000d31.0001.2050.0000.0001.2301.362d41.2501.1691.1691.3100.0000.000d51.0001.0000.0001.0001.0001.000IDF(t)values for every term:catdogmousefishhorsecow0.3010.3010.5440.4260.5440.544Organic Search 5TFIDF(d, t)for our Mini Web Universeand Matching Scores for the query“cat dog fish”Page IDcatdogmousefishhorsecowMatching Scoresd10.3970.3970.0000.4750.0000.0001.268d20.3520.3010.6800.0000.0000.0000.653d30.3010.3630.0000.0000.6690.7410.664d40.3760.3520.6360.5580.0000.0001.286d50.3010.3010.0000.4260.5440.5441.028d1, d4 and d5 contain all three query terms:d4 is the best and d5 is the worstOrganic Search 6
Managerial guidelines•In a query consisting of multiple terms, focus on the more unusual term.•At least in the early startup phase: focus on the unusual queries.Organic Search 7Citation Model•The PageRank for a page is proportional to the probability that you would end upat a page if you were just surfing randomly and infinitely (that is, if you were aweb user following a random ergodic Markov Chain).