#### You've reached the end of your free preview.

Want to read all 45 pages?

**Unformatted text preview: **Analyzing
Network Data
Introduction to Computational Thinking and Data Science
Lecture 14 Today’s Topics
uNetwork Data
uNetwork structure
uTypes of Networks
u Homogeneous networks
u Heterogeneous networks
u Bipartite networks
u Weighted networks u Network Analysis
u Cliques
u Distance
u Bridges
u Centrality
u Characterize network
structure
u Scale-free or not Network Data (e.g. Twitter Network Graphs) Representing Network Structure
u Graphs of nodes connected by links
u Nodes: entities of interest (ie, a person, a protein, a
Web page,…
u Links: relation between two nodes (ie, a
communication, a common
interest, etc.). Representing Network Behavior:
Dynamic Networks
u Behaviors of the entities over time u Network structure may also change over time u Weak vs strong links
u Changes almost always have trickle effects on
the rest of the network. Interesting Questions u If you select two random people: what is the
probability that they
know one another?
u Can you connect two random people through
their network of
acquaintances? Small-World Networks:
Milgram’s Experiment Small-World Networks:
Milgram’s Experiment
u Several experiments examining the average path
length for social networks of
people in US u Human society is a small- world-type network
characterized by short pathlengths.
u The experiments are often associated with the phrase "six degrees of separation"
u Six degrees of separation is the idea that all living things and everything
else in the world is six or fewer steps
away from each other
u so that a chain of "a friend of a friend" statements can be made to
connect any two people in a
maximum of six steps. Milgram’s experiment A person P in Nebraska was given a letter to deliver to
another person Q in Massachusetts.
P was told about Q’s address and occupation, and instructed to
send the letter to someone she knew on a first-name basis
in order to transmit the letter to the destination as fast as
possible.
u Stanley Milgram (1967) Small-World Networks:
Milgram’s Experiment “Six Degrees of Separation” "I know a guy who knows a
guy who knows a guy who
knows a guy who knows a
guy who knows Kevin
Bacon.”
Bacon number = 6
oracleofbacon.org
using :Movies & Role type: Actors The Erdos Number • A person's Erdős number is the sum of one's Erdős number
• Measures the "collaborative distance" in authoring academic papers between
that person and Hungarian mathematician Paul Erdős
(cropped).jpg Sources of Networked Data u Messages across people (e.g., emails, blogs)
u Social network sites (e.g., Facebook)
u Social media (e.g., twitter)
u Constructing networks from other data
Networks Created from
Different Kinds of Data TouchGraph's
visualization of senator
co-sponsorship patterns in
the 110th congress shows
that he has acting like a
democrat for years.
US Senators that share an Alma Mater wood-,.,popularized by the game Six Degrees of Kevin Bacon, in which players
try to connect actors to Bacon via the
movies in which they have appeared together-is scale-free. A quantitative analy- yeast, one of the simplest eukaryotic (nucleus-containing) cells, with thousands of
proteins, we discovered a scale-free topology: although most proteins interact with
only one or two others, a few are able to Examples of Networks netwo
lywoo
1890
the n
half a
ing to
only
ago,
lions,
to tho
work.
real
oppor
F
When
page,
locati
only
that s necte
By si
exerc
This Types of Networks
u Homogeneous networks u Heterogeneous networks u Bipartite networks
u Weighted networks Homogeneous vs
Heterogeneous Networks
u Homogeneous networks
u Single typed Nodes, single typed links mationnetworks-by-jiawei-han Bipartite Networks Nodes of a bipartite network can be divided
into two disjoint sets so that no links
connect 2 nodes in the same set.
Actor networks:
Actors are connected to films. Advertisement coloring of the graph with two colors 1 a 2 b 3 c 4 d Queries Weighted Networks u A weighted network is a network where the ties
among nodes have
weights assigned to
them.
work.svg Interesting Types of Networks u Social networks
u People as nodes
u Scale-free networks
u Each node has few (and different)
connections
u A few nodes may
have many
connections (simplified in left map), consist of nodes with randomly placed distribution connections. In such systems, a plot of the distribution of node
linkages will follow a bell-shaped curve (left graph), with most
nodes having approximately the same number of links.
In contrast, scale-free networks, which resemble,the U.S. in that most nodes have just a few connections distribution airline system (simplified in right map). contain hubs [red)- [right graph), of node linkages follows a power law [center graph)
and some have Random vs Scale-Free Networks
U.S. Highway system a tremendous "scale." The defining RandomNetwork Bell Curve ~istribution of Node Linkages of such networks of links, if plotted on a double-logarithmic
results in a straight is that the
scale line. U.S. Airline system PowerLaw Distribution of Node Linkages ~ '0
c;;
..c
E
:::J Z 0
0
~ ~ E
:::J.
Z
Number of Links characteristic has no Scale-Free Network If)
QJ
-c
0
Z Z number of links. In that sense, the system L ~
. ~~
0 zCij ""'0
0
~ QJ If)
011 ~~ ' E~
:::J
Z Number of Links Number of Links (log scale) Scale-Free Networks:
The Internet and the Web More than 80 % of the
pages have a handful of
links, but a small set
(less than 0.01%), have
more than 1,000 links
[Barabassi et al 2008]. grees of Kevin Bacon, in which players
try to connect actors to Bacon via the
movies in which they have appeared together-is scale-free. A quantitative analy- cleus-containing) cells, with thousands of
proteins, we discovered a scale-free topology: although most proteins interact with
only one or two others, a few are able to These Are All
Scale-Free Networks! ly
1
th
h
in
o
a
li
to
w
re
o W
p
lo
o
th n
B
e T
o Social Networks u Major issue: networks in social sites are often not
publicly accessible Network Analysis u Cliques
u Distance
u Bridges
u Centrality
u Characterize network structure
u Scale-free or not Cliques and Connected Components
u A clique is a subgraph where all the nodes are connected through a link to all the other nodes in the clique
u A connected component is a subgraph where for any two nodes in the subgraph there is a path that connects them Distance ? Bridges
u A bridge is a link between two nodes that if removed would result in the nodes being in
disconnected components of the graph Betweenness Centrality
u Betweenness centrality for a node n is the total number of shortest paths between any two nodes in
the graph that include n See How Things Spread Over a
Network: vax.herokuapp.com View Your Email Network:
immersion.media.mit.edu ARPANET, March 1972 ARPANET was the network that became the basis for the Internet. Based on
a concept first published in 1967, ARPANET was developed under the
direction of the U.S. Advanced Research Projects Agency (ARPA).
ARPANET, September 1973 Internet Mapping project
This map appeared in the December 1998 Wired. Today’s Topics
uNetworked Data
uNetwork structure
uTypes of Networks
u Homogeneous networks
u Heterogeneous networks
u Bipartite networks
u Weighted networks u Network Analysis
u Cliques
u Distance
u Bridges
u Centrality
u Characterize network
structure
u Scale-free or not ...

View
Full Document

- Fall '17
- Six degrees of separation, u Links