17-kronecker_annot

17-kronecker_annot - CS224W: Social and Information Network...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu [Internet Mathematics ‘09] Φ(k), (conductance) Better and better communities Communities get worse and worse Best community has ~100 nodes k, (cluster size) 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2 Denser Denser and denser network core Small good communities Nested core‐periphery 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3 Intuition: self‐similarity leads to power‐laws Try to mimic recursive graph / community growth There are many obvious (but wrong) ways: Initial graph Recursive expansion Kronecker Product is a way of generating self‐similar matrices Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4 11/15/2010 [PKDD ‘05] Intermediate stage (3x3) (9x9) Adjacency matrix 11/15/2010 Adjacency matrix Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5 Kronecker product of matrices A and B is given by NxM KxL N*K x M*L Define a Kronecker product of two graphs as a Kronecker product of their adjacency matrices Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6 11/15/2010 [PKDD ‘05] Kronecker graph: a growing sequence of graphs by iterating the Kronecker product Each Kronecker multiplication exponentially increases the size of the graph Kk has N1k nodes and E1k edges, so we get densification One can easily use multiple initiator matrices (K1’, K1’’, K1’’’ ) that can be of different sizes Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu K1 11/15/2010 7 [PKDD ‘05] Continuing multypling with K1 we obtain K4 and so on… K1 K4 adjacency matrix 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8 [PKDD ‘05] 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9 [PKDD ’05] Kronecker graphs have many properties found in real networks: Properties of static networks Power‐Law like Degree Distribution Power‐Law eigenvalue and eigenvector distribution Small Diameter Properties of dynamic networks Densification Power Law Shrinking/Stabilizing Diameter 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10 [PKDD ’05] Theorem: Constant diameter: If G1 has diameter d then graph Gk also has diameter d Observation: Edges in Kronecker graphs: where X are appropriate nodes Example: 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11 [PKDD ’05] Create N1N1 probability matrix Θ1 Compute the kth Kronecker power Θk For each entry puv of Θk include an edge (u,v) in Kk with probability puv Kronecker 0.25 multiplication 0.05 0.05 0.01 0.10 0.15 0.02 0.03 0.10 0.02 0.15 0.03 0.04 0.06 0.06 0.09 Probability of edge pij 0.5 0.2 0.1 0.3 Θ1 11/15/2010 Instance matrix K2 flip biased coins 12 Θ2 = Θ1 Θ1 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu [Mahdian‐Xu, WAW ’07] What is known about Stochastic Kronecker? Undirected Kronecker graph model with: Connected, if: b+c > 1 a 1 b b c Connected component of size Θ(n), if: (a+b)(b+c) > 1 Constant diameter, if: b+c > 1 Not searchable by a decentralized algorithm 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13 Given a real network G Want to estimate initiator matrix: 1 Method of moments [Owen ‘09] ab cd Compare counts of and solve the system of equations Maximum likelihood [ICML ‘07] arg max P( P( | Θ1) 2 SVD [VanLoan‐Pitsianis ‘93] min G 1 1 F Can solve using SVD 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14 [ICML ‘07] Maximum likelihood estimation arg max 1 P( | Kronecker 1) 1 ab cd Naïve estimation takes O(N!N2): N! for different node labelings: N2 for traversing graph adjacency matrix Our solution: Kronecker product (E << N2): N2 E Our solution: Metropolis sampling: N! (big) const Do gradient descent Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15 11/15/2010 Maximum likelihood estimation Maximum Given real graph G ab Find Kronecker initiator graph Θ (i.e., ) cd which arg max P(G | ) We need to (efficiently) calculate And maximize over Θ (e.g., using gradient descent) Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16 P(G | ) 11/15/2010 [ICML ‘07] Given a graph G and Kronecker matrix Θ we calculate probability that Θ generated G P(G|Θ) 0.5 0.2 0.1 0.3 Θ 0.25 0.05 0.05 0.01 0.10 0.15 0.02 0.03 0.10 0.02 0.15 0.03 Θk 0.04 0.06 0.06 0.09 1 0 1 1 0 1 0 1 1 0 1 1 G 1 1 1 1 P(G|Θ) ( u ,v )G G P(G | ) k [u , v] (1 k [u , v]) ( u ,v )G 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17 [ICML ‘07] Θ 0.5 0.1 0.2 0.3 Θk 0.25 0.05 0.05 0.01 0.10 0.15 0.02 0.03 0.10 0.02 0.15 0.03 0.04 0.06 0.06 0.09 G’ 1 2 4 2 1 3 3 σ 1 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 P(G | ) P(G | , )P( ) Nodes are unlabeled Graphs G’ and G” should have the same probability P(G’|Θ) = P(G”|Θ) One needs to consider all node correspondences σ G” 4 All correspondences are a priori equally likely There are O(N!) correspondences P(G’|Θ) = P(G”|Θ) [ICML ‘07] P(G | ) k [ u , v ] (1 k [ u , v ]) ( u ,v )G ( u ,v )G Assume we solved the correspondence problem Calculating Takes O(N2) time Infeasible for large graphs (N ~ 105) 0.25 0.05 0.05 0.01 0.10 0.15 0.02 0.03 0.10 0.02 0.15 0.03 0.04 0.06 0.06 0.09 1 0 1 0 0 1 0 1 1 1 1 1 1 σ… node labeling σ P(G|Θ, σ) 0 1 0 Θkc G [ICML ‘07] Log‐likelihood Gradient of log‐likelihood Sample the permutations from P(σ|G,Θ) and and average the gradients Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20 11/15/2010 [ICML ‘07] Metropolis sampling: Start with a random permutation σ σ‘ = swap two elements in permutation σ Accept the new permutation σ’ If new permutation is better (gives higher likelihood) Else accept with prob. proportional to the ratio of likelihoods (no need to calculate the normalizing constant!) 1 2 4 3 Swap node labels 1 and 4 4 2 1 1 2 3 4 1 1 1 0 1 1 1 0 1 1 1 1 0 0 1 1 21 3 1 2 3 4 11/15/2010 1 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Metropolis permutation sampling algorithm j k Need to efficiently calculate the likelihood ratios But the permutations σ(i) and σ(i+1) only differ at 2 positions So we only traverse to update 2 rows (columns) of Θk We can evaluate the likelihood ratio efficiently 22 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu [ICML ‘07] Calculating naively P(G|Θ,σ) takes O(N2) Idea: First calculate likelihood of empty graph, a graph with 0 edges Correct the likelihood for edges that we observe in the graph By exploiting the structure of Kronecker product we obtain closed form for likelihood of an empty graph Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23 11/15/2010 [ICML ‘07] We approximate the likelihood: Empty graph No‐edge likelihood Edge likelihood The sum goes only over the edges Evaluating P(G|Θ,σ) takes O(E) time Real graphs are sparse, E << N2 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24 11/15/2010 Real graphs are sparse so we first calculate likelihood of empty graph Probability of edge (i,j) is in general pij =θ1aθ2b θ3c θ4d By using Taylor approximation to pij and summing the multinomial series we obtain: Θk pij =θ1aθ2bθ3cθ1d We approximate the likelihood: Taylor approximation log(1-x) ~ -x – 0.5 x2 Empty graph 11/15/2010 No‐edge likelihood Edge likelihood 25 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Experimental setup Given real graph G Gradient descent from random initial point Obtain estimated parameters Θ Generate synthetic graph K using Θ Compare properties of graphs G and K Note: We do not fit the graph properties themselves We fit the likelihood and then compare the properties 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26 Can gradient descent recover true parameters? Generate a graph from random parameters Start at random point and use gradient descent We recover true parameters 98% of the times 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27 [ICML ‘07] Real and Kronecker are very close: 1 0.49 0.13 0.99 0.54 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28 [JMLR ‘10] Real and Kronecker are very close: 1 0.51 0.22 0.99 0.57 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29 [JMLR ‘10] What do estimated parameters tell us about the network structure? b edges ab K1 cd a edges c edges d edges 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 30 [JMLR ‘10] What do estimated parameters tell us about the network structure? K1 0.5 edges 0.9 0.5 0.5 0.1 Core 0.9 edges 0.5 edges Periphery 0.1 edges Nested Core‐periphery 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 31 [JMLR ‘10] Small and large networks are very different: K1 = 11/15/2010 0.99 0.17 0.17 0.82 K1 = 0.49 0.13 0.99 0.54 32 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Large scale network structure: Large networks are different from small networks and manifolds Nested Core‐periphery Recursive onion‐like structure of the network where each layer decomposes into a core and periphery 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 33 Remember the SKG theorems: Connected, if b+c>1: 0.55+0.15 > 1. No! K1 0.55 0.15 0.99 0.55 Giant component, if (a+b)·(b+c)>1: (0.99+0.55)∙(0.55+0.15) > 1. Yes! Real graphs are in the in the parameter region analogous to the giant component of an extremely sparse Gnp Gnp 1/n real‐networks log(n)/n 34 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu [WAW ‘10] Each node u has associated binary vector Au Think of it as feature vector Thi Initiator matrix K acts like a ”similarity” matrix Probability of a link between nodes u, v: P(u , v) K1 ( Au (i ), Av (i )) 0 1 i 1 k K1 v2 = (0,1) (0 v3 = (1,0) 11/15/2010 a c b d 0 1 = 35 P(v2,v3) = b·c Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu [WAW ‘10] For each node u we have a binary vector Au For each edge (u,v) determine prob. P (u , v) K i ( Au (i ), Av (i )) Au Av i 1 k Ki 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 36 [WAW ‘10] How to think of Ki a c b d Au Av Ki Attribute‐attribute similarity matrix: Can model homophily: Heterophily: 0.1 0.9 0.9 0.1 0.9 0.5 0.5 0.1 37 0.9 0.1 0.1 0.9 Core‐periphery: 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu [WAW ‘10] For each node u generate a binary vector Au draw k (k log2(|V|) independent samples from a Bernoulli() For each pair of nodes (u,v) determine an edge prob. P(u , v) K ( Au (i ), Av (i )) i 1 k 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 38 2 ingredients of Kronecker model: (1) Each of 2k nodes has a unique binary vector of length k Node id expressed binary is the vector (2) The initiator matrix K Question: What if ingredient (1) is dropped? i.e., do we need high variability of feature vectors? 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 39 Adjacency matrices: 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 40 Received 19 entries Top score: 14,690 Top 10: Rank 1 2 3 4 5 6 7 8 9 10 Name Wang,Fan Cui,Jingyu Preston,Dan Pham,Peter Thien Tan Wang,Dakan Moreinis,Stanislav Wang,Chunyan Kim,Hyung Jin Wu,Yu Jin,Ye Score 0 +1 +3 +4 +9 +13 +17 +19 +28 +31 Reward 10% 10% 8% 8% 6% 4% 2% 2% Random partitioning gives score of ~50,000 11/15/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 41 (1) Everyone used some form of greedy hill‐ climbing: Repeat until no improvement: (1) Pick a node (multiple nodes), move it to the other side if it improves a score (2) Pick an edge, move the endpoints so that score is most improved (2) Randomization techniques and simulated annealing to escape local minima Repeat (1) until no improvement Randomize and restart (3) Signed Laplacian matrix Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42 11/15/2010 ...
View Full Document

This note was uploaded on 01/11/2011 for the course CS 224 at Stanford.

Ask a homework question - tutors are online