17-AnalyzingNetworkData.pdf - Analyzing Network Data Introduction to Computational Thinking and Data Science Lecture 14

17-AnalyzingNetworkData.pdf - Analyzing Network Data...

This preview shows page 1 out of 45 pages.

You've reached the end of your free preview.

Want to read all 45 pages?

Unformatted text preview: Analyzing Network Data Introduction to Computational Thinking and Data Science Lecture 14 Today’s Topics uNetwork Data uNetwork structure uTypes of Networks u Homogeneous networks u Heterogeneous networks u Bipartite networks u Weighted networks u Network Analysis u Cliques u Distance u Bridges u Centrality u Characterize network structure u Scale-free or not Network Data (e.g. Twitter Network Graphs) Representing Network Structure u Graphs of nodes connected by links u Nodes: entities of interest (ie, a person, a protein, a Web page,… u Links: relation between two nodes (ie, a communication, a common interest, etc.). Representing Network Behavior: Dynamic Networks u Behaviors of the entities over time u Network structure may also change over time u Weak vs strong links u Changes almost always have trickle effects on the rest of the network. Interesting Questions u If you select two random people: what is the probability that they know one another? u Can you connect two random people through their network of acquaintances? Small-World Networks: Milgram’s Experiment Small-World Networks: Milgram’s Experiment u Several experiments examining the average path length for social networks of people in US u Human society is a small- world-type network characterized by short pathlengths. u The experiments are often associated with the phrase "six degrees of separation" u Six degrees of separation is the idea that all living things and everything else in the world is six or fewer steps away from each other u so that a chain of "a friend of a friend" statements can be made to connect any two people in a maximum of six steps. Milgram’s experiment A person P in Nebraska was given a letter to deliver to another person Q in Massachusetts. P was told about Q’s address and occupation, and instructed to send the letter to someone she knew on a first-name basis in order to transmit the letter to the destination as fast as possible. u Stanley Milgram (1967) Small-World Networks: Milgram’s Experiment “Six Degrees of Separation” "I know a guy who knows a guy who knows a guy who knows a guy who knows a guy who knows Kevin Bacon.” Bacon number = 6 oracleofbacon.org using :Movies & Role type: Actors The Erdos Number • A person's Erdős number is the sum of one's Erdős number • Measures the "collaborative distance" in authoring academic papers between that person and Hungarian mathematician Paul Erdős (cropped).jpg Sources of Networked Data u Messages across people (e.g., emails, blogs) u Social network sites (e.g., Facebook) u Social media (e.g., twitter) u Constructing networks from other data Networks Created from Different Kinds of Data TouchGraph's visualization of senator co-sponsorship patterns in the 110th congress shows that he has acting like a democrat for years. US Senators that share an Alma Mater wood-,.,popularized by the game Six Degrees of Kevin Bacon, in which players try to connect actors to Bacon via the movies in which they have appeared together-is scale-free. A quantitative analy- yeast, one of the simplest eukaryotic (nucleus-containing) cells, with thousands of proteins, we discovered a scale-free topology: although most proteins interact with only one or two others, a few are able to Examples of Networks netwo lywoo 1890 the n half a ing to only ago, lions, to tho work. real oppor F When page, locati only that s necte By si exerc This Types of Networks u Homogeneous networks u Heterogeneous networks u Bipartite networks u Weighted networks Homogeneous vs Heterogeneous Networks u Homogeneous networks u Single typed Nodes, single typed links mationnetworks-by-jiawei-han Bipartite Networks Nodes of a bipartite network can be divided into two disjoint sets so that no links connect 2 nodes in the same set. Actor networks: Actors are connected to films. Advertisement coloring of the graph with two colors 1 a 2 b 3 c 4 d Queries Weighted Networks u A weighted network is a network where the ties among nodes have weights assigned to them. work.svg Interesting Types of Networks u Social networks u People as nodes u Scale-free networks u Each node has few (and different) connections u A few nodes may have many connections (simplified in left map), consist of nodes with randomly placed distribution connections. In such systems, a plot of the distribution of node linkages will follow a bell-shaped curve (left graph), with most nodes having approximately the same number of links. In contrast, scale-free networks, which resemble,the U.S. in that most nodes have just a few connections distribution airline system (simplified in right map). contain hubs [red)- [right graph), of node linkages follows a power law [center graph) and some have Random vs Scale-Free Networks U.S. Highway system a tremendous "scale." The defining RandomNetwork Bell Curve ~istribution of Node Linkages of such networks of links, if plotted on a double-logarithmic results in a straight is that the scale line. U.S. Airline system PowerLaw Distribution of Node Linkages ~ '0 c;; ..c E :::J Z 0 0 ~ ~ E :::J. Z Number of Links characteristic has no Scale-Free Network If) QJ -c 0 Z Z number of links. In that sense, the system L ~ . ~~ 0 zCij ""'0 0 ~ QJ If) 011 ~~ ' E~ :::J Z Number of Links Number of Links (log scale) Scale-Free Networks: The Internet and the Web More than 80 % of the pages have a handful of links, but a small set (less than 0.01%), have more than 1,000 links [Barabassi et al 2008]. grees of Kevin Bacon, in which players try to connect actors to Bacon via the movies in which they have appeared together-is scale-free. A quantitative analy- cleus-containing) cells, with thousands of proteins, we discovered a scale-free topology: although most proteins interact with only one or two others, a few are able to These Are All Scale-Free Networks! ly 1 th h in o a li to w re o W p lo o th n B e T o Social Networks u Major issue: networks in social sites are often not publicly accessible Network Analysis u Cliques u Distance u Bridges u Centrality u Characterize network structure u Scale-free or not Cliques and Connected Components u A clique is a subgraph where all the nodes are connected through a link to all the other nodes in the clique u A connected component is a subgraph where for any two nodes in the subgraph there is a path that connects them Distance ? Bridges u A bridge is a link between two nodes that if removed would result in the nodes being in disconnected components of the graph Betweenness Centrality u Betweenness centrality for a node n is the total number of shortest paths between any two nodes in the graph that include n See How Things Spread Over a Network: vax.herokuapp.com View Your Email Network: immersion.media.mit.edu ARPANET, March 1972 ARPANET was the network that became the basis for the Internet. Based on a concept first published in 1967, ARPANET was developed under the direction of the U.S. Advanced Research Projects Agency (ARPA). ARPANET, September 1973 Internet Mapping project This map appeared in the December 1998 Wired. Today’s Topics uNetworked Data uNetwork structure uTypes of Networks u Homogeneous networks u Heterogeneous networks u Bipartite networks u Weighted networks u Network Analysis u Cliques u Distance u Bridges u Centrality u Characterize network structure u Scale-free or not ...
View Full Document

  • Fall '17
  • Six degrees of separation, u Links

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes