08-diffusion_annot

08-diffusion_annot - CS224W: Social and Information Network...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Probabilistic models of network contagion How contagions diffuse in real‐life: Viral marketing Blogs Group membership 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2 How do viruses/rumors propagate? Will a flu‐like virus linger, or will it become extinct? (Virus) birth rate β: probability than an infected neighbor attacks (Virus) death rate δ: probability that an infected node heals Prob. δ Prob. β Healthy N2 N1 Infected 10/13/2009 N N3 3 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu General scheme for epidemic models: S…susceptible E…exposed I…infected R…recovered Z…immune 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4 Number of nodes Assuming perfect mixing, i.e., a network is a complete graph The model dynamics: time Susceptible 10/13/2009 Infected Recovered 5 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Susceptible‐Infective‐Susceptible (SIS) model Cured nodes immediately become susceptible Virus “strength”: s = β / δ Infected by neighbor with prob. β Susceptible Cured internally with prob. δ 10/13/2009 Infective Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6 Number of nodes Assuming perfect mixing (complete graph): time Susceptible Infected dS SI I dt dI SI I dt 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7 Representing SIS epidemic an SIR model 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8 Epidemic threshold of a graph is a value of t, such that: If strength s = β / δ < t epidemic can not epidemic happen (it eventually dies out) Given a graph compute its epidemic threshold 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9 What should t depend on? Avg. degree? And/or highest degree? And/or variance of degree? And/or third moment of degree? And/or diameter? 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10 [Wang et al. 2003] We have no epidemic if: (Virus) Death rate Epidemic threshold β/δ < τ = 1/ λ1,A largest eigenvalue of adj. matrix A (Virus) Birth rate ► λ1,A alone captures the property of the graph! 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11 [Wang et al. 2003] 500 N um ber o f I nfecte d N odes Oregon β = 0.001 10,900 nodes and 31,180 edges 400 β/δ > τ (above threshold) 300 200 100 β/δ = τ (at the threshold) 0 250 δ: 0 500 750 0.06 0.07 1000 Time 0.05 10/13/2009 β/δ < τ (below threshold) 12 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Does it matter how many people are initially infected? 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13 Prob. of adoption depends on the number of friends who have adopted [Bass ‘69, Granovetter ’78] What is the shape? Distinction has consequences for models and algorithms Prob. of adoption k = number of friends adopting Prob. of adoption k = number of friends adopting Diminishing returns? 10/13/2009 Critical mass? 14 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu [Leskovec et al., TWEB ’07] Senders and followers of recommendations receive discounts on products 10% credit 10% off • Data – Incentivized Viral Marketing program • 16 million recommendations • 4 million people • 500,000 products 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15 [Backstrom et al., KDD ’06] Use social networks where people belong to explicitly defined groups Each group defines a behavior that diffuses Data – LiveJournal: On‐line blogging community with friendship links and user‐defined groups Over a million users update content each month Over 250,000 groups to join 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16 [Leskovec et al., TWEB ’07] Probability of purchasing 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 DVD recommendations (8.2 million observations) 10 20 30 40 # recommendations received 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17 [Backstrom et al., KDD ’06] LiveJournal community membership Prob. of joining k (number of friends in the community) 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18 For viral marketing: We see that node v receiving the i‐th recommendation and then purchased the product For communities: At time t we see the behavior of node v’s friends Questions: When did v become aware of recommendations or friends’ behavior? When did it translate into a decision by v to act? How long after this decision did v act? 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19 Large anonymous online retailer (June 2001 to May 2003) 15,646,121 recommendations 3,943,084 distinct customers 548,523 products recommended Products belonging to 4 product groups: books DVDs music VHS Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20 10/13/2009 purchase following a recommendation customer recommending a product customer not buying a recommended product Majority of recommendations do not cause purchases nor propagation Notice many star‐like patterns Many disconnected components 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21 t1 < t2 < … < tn legend t3 bought but didn’t receive a discount t1 t2 t5 t4 10/13/2009 bought and received a discount received a recommendation but didn’t buy Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22 What role does the product category play? products customers 2,863,977 805,285 794,148 239,583 3,943,084 recommendations 5,741,611 8,180,393 1,443,847 280,270 15,646,121 edges 2,097,809 962,341 585,738 160,683 3,153,676 buy + get discount 65,344 17,232 7,837 909 91,322 buy + no discount discount 17,769 58,189 2,739 467 79,164 Book DVD Music Video Full 103,161 19,829 393,598 26,131 542,719 people high low 10/13/2009 recommendations Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23 There are relatively few DVD titles, but DVDs account for ~ 50% of recommendations. Recommendations per person DVD: 10 books and music: 2 VHS: 1 Recommendations per purchase books: 69 DVDs: 108 music: 136 VHS: 203 Overall there are 3.69 recommendations per node on 3.85 different products. Music recommendations reached about the same number of people as DVDs but used only 1/5 as many recommendations Book recommendations reached by far the most people – 2.8 million. All networks have a very small number of unique edges. For books, videos and music the number of unique edges is smaller than the number of nodes – the networks are highly disconnected 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24 Does sending more recommendations influence more purchases? BOOKS 7 DVDs Number of Purchases Number of Purchases 10 20 30 40 50 Outgoing Recommendations 60 0.5 0.4 0.3 0.2 0.1 0 6 5 4 3 2 1 0 20 40 60 80 100 120 Outgoing Recommendations 140 25 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu What is the effectiveness of subsequent recommendations? -3 12 10 x 10 BOOKS Probability of buying 0.07 0.06 0.05 0.04 0.03 DVDs Probability of buying 10 8 6 4 5 10 15 20 25 30 35 Exchanged recommendations 0.02 40 5 10 15 20 25 30 35 Exchanged recommendations 40 26 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu consider successful recommendations in terms of books overall have a 3% success rate (2% with discount, 1% without) fiction romance (1.78), horror (1.81) teen (1.94), children’s books (2.06) comics (2.30), sci‐fi (2.34), mystery and thrillers (2.40) av. # senders of recommendations per book category av. # of recommendations accepted lower than average success rate (significant at p=0.01 level) nonfiction sports (2.26) home & garden (2.26) travel (2.39) higher than average success rate (statistically significant) professional & technical medicine (5.68) professional & technical (4.54) engineering (4.10), science (3.90), computers & internet (3.61) law (3.66), business & investing (3.62) Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27 10/13/2009 47,000 customers responsible for the 2.5 out of 16 million recommendations in the system 29% success rate per recommender of an anime DVD Giant component covers 19% of the nodes Overall, recommendations for DVDs are more likely to result in a purchase (7%), but the anime community stands out Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28 10/13/2009 Variable const # recommendations recommendations # senders # recipients recipients product price # reviews reviews avg. rating R2 10/13/2009 transformation ln(r) ln(ns) ln(nr) ln(p) ln(v) ln(t) Coefficient -0.940 *** 0.426 *** -0.782 *** -1.307 *** 0.128 *** -0.011 *** -0.027 * 0.74 significance at the 0.01 (***), 0.05 (**) and 0.1 (*) levels Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29 12 size of giant component 10 8 6 4 2 x 10 4 4 x 10 10 6 2 # nodes 0 0 1.7*10 m 10 20 m (month) (month) by month quadratic fit fit 2 3 4 6 number of nodes Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu x 10 1 6 0 0 10/13/2009 n 30 94% of users make first recommendation without having received one previously Size of giant connected component increases from 1% to 2.5% of the network (100,420 users) – small! Some sub‐communities are better connected 24% out of 18,000 users for westerns on DVD 26% of 25,000 for classics on DVD 19% of 47,000 for anime (Japanese animated film) on DVD Others are just as disconnected 3% of 180,000 home and gardening 2‐7% for children’s and fitness DVDs 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 31 Products suited for Viral Marketing: small and tightly knit community pricey products rating doesn’t play as much of a role few reviews, senders, and recipients but sending more recommendations helps Observations for future diffusion models: purchase decision more complex than threshold or simple infection influence saturates as the number of contacts expands links user effectiveness if they are overused Conditions for successful recommendations: professional and organizational contexts discounts on expensive items small, tightly knit communities 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 32 How big are cascades? What are the building blocks of cascades? 938 973 Medical guide book 10/13/2009 DVD 33 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Given a (social) network A process by spreading over the network creates a graph (a tree) Social network Cascade (propagation graph) Let’s count cascades 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 34 General observations: DVDs have the richest cascades (most recommendations, most densely linked) Books have small cascades Music is 3 times larger than video but does not have much variety in cascades cascades Book DVD Music Video 122,657 289,055 13,330 1,928 number of all “words” high low different 959 87,614 158 109 vocabulary size 35 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu is the most common cascade subgraph It accounts for ~75% cascades in books, CD and VHS, only 12% of DVD cascades is 6 (1.2 for DVD) times more frequent than For DVDs is more frequent than Chains ( ) are more frequent than is more frequent than a collision ( ) (but collision has less edges) Late split ( ) is more frequent than Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 36 10/13/2009 Stars (“no propagation”) Bipartite cores (“common friends”) Nodes having same friends 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 37 Delete late recommendations Count how many people are in a single cascade Exclude nodes that did not buy steep drop‐off 10 6 books = 1.8e6 x 4 -4.98 10 10 2 very few large cascades 10 0 10 10/13/2009 0 10 1 10 2 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 38 DVD cascades can grow large Possibly as a result of websites where people sign up to exchange recommendations ~ x-1.56 shallow drop off – fat tail Count 10 4 10 2 a number of large cascades 10 0 10 10/13/2009 0 x = Cascade size (number of nodes) 10 1 10 2 10 3 39 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu [Leskovec et al., SDM ’07] Posts Blogs Information cascade Time ordered hyperlinks Data – Blogs: We crawled 45,000 blogs for 1 year 10 million posts and 350,000 cascades 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 40 Cascade shapes (ranked by frequency) The probability of observing a cascade on n nodes follows a Zipf distribution: p(n) ~ n-2 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Count x = Cascade size (number of nodes) 41 Most of cascades are trees Number of edges Effective diameter Cascade size (number of nodes) Cascade size Number of cascades per node also follows power‐law distribution. 10/13/2009 Count Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Number of joined cascades Count Cascades per node 42 Cascade sizes follow a heavy‐tailed distribution Viral marketing: Books: steep drop‐off: power‐law exponent ‐5 DVDs: larger cascades: exponent ‐1.5 Blogs: Power‐law exponent ‐2 What’s a good model? What role does the underlying social network play? Can make a step towards more realistic cascade generation (propagation) model? 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 43 1) Randomly pick blog to infect, add to cascade. B1 1 1 1 1 B3 B4 B2 B1 2) Infect each in‐linked neighbor with probability B1 1 1 1 B2 2 B4 3 B1 2 3 1 B3 3) Add infected neighbors to cascade. B1 1 1 1 1 10/13/2009 4) Set node infected in (i) to uninfected. B1 1 1 1 B2 B1 B2 B1 2 B4 3 B4 2 B4 3 B4 B3 1 B3 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 44 Count β=0.025 Cascade size Cascade node in‐degree Count Most frequent cascades 10/13/2009 Size of star cascade Count Size of chain cascade 45 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Count Generative model produces realistic cascades Blogs – information epidemics Which are the influential/infectious blogs? Viral marketing Who are the trendsetters? Influential people? Disease spreading Where to place monitoring stations to detect epidemics? 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 46 ...
View Full Document

This note was uploaded on 01/11/2011 for the course CS 224 at Stanford.

Ask a homework question - tutors are online