2. Basic Concepts.pdf - Basic Concepts of Social Networks 1...

This preview shows page 1 out of 56 pages.

Unformatted text preview: Basic Concepts of Social Networks 1 “Different networks (and nodes within them) will have varying network properties and that these variations account for differences in outcomes for the networks (or nodes)” - Borgatti et al., Network Analysis in the Social Sciences, Science 2009 - How fast can we compute the properties (big O)? - What do we mean by “different” (p-value)? 2 Big-O • Big-O Notation is a mathematical way of describing the limiting behaviours of a function. In other words, it is a way of defining how efficient an algorithm is by how "fast" it will run. • Example: Find the maximum value among these 7 numbers: 2, 5, 4, 7, 10, 4, 3 • Example: Travelling salesman problem: Find a shortest possible tour that visits each of the 7 cities exactly once 3 Big-O Notation: a way to describe the runtime or memory space requirement of an algorithm as input size changes NP-hard: it is widely suspected that there are no polynomial-time algorithms for NP-hard problems 4 Sometimes it does not hurt to think out of the box… 5 P-value In statistical hypothesis testing, the p-value is the probability for a given statistical model that, assuming the null hypothesis is true, the statistical summary would be more extreme than or equal to the actual observed results. Example: H0: Stevens students have an average height of 7 feet (2.13 meters). 6 Basic Elements in a Social Network/Graph points lines vertices edges, arcs math nodes links computer science sites bonds physics actors ties, relations sociology 7 Definitions of edges and nodes are subjective! 8 9 Apollo 13 Movie Network • Main Actors in Apollo 13 the Movie: • Tom Hanks • Kevin Bacon • Gary Sinise • Bill Paxton • Ed Harris • Actors are nodes. Edges connect actors who were in a movie together. • Since all were in Apollo 13, this is not interesting. Let’s make a new network that connects them if they were in an additional movie together. 10 Apollo 13 Movie Network Tom Hanks Bill Paxton Kevin Bacon Gary Sinise Ed Harris 11 Representation Adjacency Matrix Tom Hanks Bill Paxto n Kevin Bacon Gary Sinise Ed Harris 12 Adjacency Matrix 2 3 1 5 4 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 1 1 1 0 0 0 13 Representation Adjacency List Tom Hanks Bill Paxto n Kevin Bacon Gary Sinise • TH,BP, 1 • TH,GS, 2 • BP,GS, 1 • GS,KB, 1 • GS,EH, 1 Ed Harris 14 Adjacency List 2 3 1 5 4 2, 3 2, 4 3, 2 3, 4 4, 5 5, 2 5, 1 15 Adjacency Matrix • A graph with V vertices(nodes) and E edges • Uses O(V2) memory • Only use when V is less than a few thousands • and when the graph is dense • An easy way to store connectivity information – Checking if two nodes are directly connected: O(1) constant time 16 Adjacency List • Only uses O(V + E) memory • harder to check whether two nodes are connected in a graph • easier to work with if network is • large • sparse • quickly retrieve all neighbors for a node 17 * Graph Laplacian (L) • Another matrix representation of a graph • Has important applications in spectral clustering L=D-A D: Degree matrix (degree of the nodes on the diagonal) A: Adjacency matrix 18 Connectedness • Two nodes are connected if there is a path between them. • A graph is connected if there is a path between every pair of nodes. • In a directed graph, it is strongly connected if there is a directed path between each pair. It is weakly connected if there is a path between every pair if direction is ignored. 19 Connected components • Strongly connected components • Each node within the component can be reached from every other node in the component by following directed links B ■ Strongly connected components ■ BCDE ■ A ■ GH ■ F ■ G C A E H D Weakly connected components: every node can be reached from every other node by following links in either direction ■ Weakly connected components ■ ABCDE ■ GHF ■ F In undirected networks one talks simply about ‘connected components’ B F G C A E D H 20 Giant component • if the largest connected component encompasses a significant fraction of the graph, it is called the giant component Bearman, Peter, James Moody, and Katherine Stovel. 2004. "Chains of Affection: The Structure of Adolescent Romantic and Sexual Networks." American Journal of Sociology Cliques A group of nodes all adjacent to one another. Tom Hanks Bill Paxton Kevin Bacon Gary Sinise Ed Harris 23 Cliques How many cliques are there? A E D C F H G B 24 Paths • A path is a series of nodes that can be traversed following edges between them. • Two nodes in a graph are called connected if there is a path between them in the network. • does not need to be a direct edge • an entire graph is called connected if all pairs of nodes are connected. 25 Shortest Path (geodesic distance) • Unweighted edges • Breadth-first search • O(E + V ) • Weighted edges • Dijkstra's algorithm (cannot have negative weights) • O(E + V log V) 26 Diameter the length of longest of the shortest paths between any two nodes in a network I J H G K R L N M Q F A P E D O C B 27 Hubs and Bridges Hub - connects many nodes I J H G K Bridge - if you delete a bridge, the endpoints will lie in different components R L N M Q F A P E D O C B 28 Clusters/Communities • A cluster is a group of nodes that are tightly connected • “tightly” varies, but usually means they are more tightly connected than the network as a whole • Does not need to be a clique 29 Subnetworks any subset of nodes and edges in the graph Q A P E D O C B 30 Egocentric Networks (Degree 1 of node D) Q A E D C B 31 Egocentric Networks (Degree 1.5) All of D’s friends and the connections between them Q A E D C B 32 33 Node degree • Network properties of nodes • indegree how many directed edges (arcs) are incident on a node indegree=3 • outdegree how many directed edges (arcs) originate at a node • degree (in or out) outdegree=2 number of edges incident on a node • Network properties of the entire graph degree=5 • degree distribution • density • clustering coefficient (transitivity) 34 Degree Distribution Degrees Tom Hanks Bill Paxton Gary Sinise Kevin Bacon Ed Harris 2 2 4 1 1 35 Degree Distribution 36 Density Edges: 5 37 Density Nodes: 8 Edges: 12 Total Possible Edges: ?? # Nodes * (# Nodes -1) 2 (8*7)/2 = 56/2 = 28 Density: 12/28 = 0.43 38 Clustering Coefficient (Transitivity) Density of a node’s 1.5 degree egocentric network, with the node itself excluded Q Q A A E E D C C B B # of total possible edges: 5*4/2 = 10 # of actual edges:5 Density = 0.5 39 Research: Probabilistic Counting in Networks 40 41 Research: What do likes on Facebook reveal? Kosinski, Michal, David Stillwell, and Thore Graepel. "Private traits and attributes are predictable from digital records of human behavior." Proceedings of the National Academy of Sciences (2013): 201218772. 42 Prediction of Dichotomous Variables 44 Prediction of Numeric Variables 45 Predictive Power of Likes 46 Research: Can marketers predict individual behavior with social networks? Is it worth it? Goel, Sharad, and Daniel G. Goldstein. "Predicting individual behavior with social networks." Marketing Science 33.1 (2013): 82-93. 47 Goldstein and Goel - Predicting individual behavior with social networks (Marketing Science 2014) • Social network-based targeting ads • Hill et al. (2006): identified a target set comprising customers who were socially connected to people who had adopted a new service, and they showed that these individuals were statistically more likely than average to adopt the service. • Bhatt et al. (2010): user features and social features are roughly equally important for predicting adoption and that these feature sets are not redundant: combining them improves prediction considerably. 48 But… statistically significant ≠ practically significant • the number of people with an adopting contact may be exceedingly small. • fewer than 1 person in 750 is connected to an adopter • target set constituted just 0.3% of the customer base in Hill (2006) • • what is a statistically significant predictor may be practically insignificant 49 What’s the real worth of social network data? •Yahoo! Communication network •Nodes: Users •Edge: email or instant messages exchanges during a 2-month period •Outcomes: •Responding to ads (clicking) •Fantasy sports league •Purchasing 50 Introduction to R 56 ...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture