Unformatted text preview: Computer Science 211 Data Structures
Mount Holyoke College Fall 2009 Topic Notes: Graphs
When does a tree stop being a tree? When it has a cycle! Just like a list is really just a boring case of a tree (everyone has just one child), a tree is really just a boring case of a graph (no cycles). Deﬁnition and Terminology
A graph G is a collection of nodes or vertices, in a set V , joined by edges in a set E . Vertices have labels. Edges can also have labels (which often represent weights). The graph structure represents relationships (the edges) among the objects stored (the vertices). For a tree, we might think of the tree nodes as vertices and edges labeled “parent” and “child” to represent nodes that have those relationships. B
4 D
7 1 3 C
11 8 5 2 A H E F G • Two vertices are adjacent if there exists an edge between them. e.g., A is adjacent to B, G is adjacent to E, but A is not adjacent to C. CS 211 Data Structures Fall 2009 • A path is a sequence of adjacent vertices. e.g., ABCFB is a path. • A simple path has no vertices repeated (except that the ﬁrst and last may be the same). e.g., ABCE is a simple path. • A simple path is a cycle if the ﬁrst and last vertex in the path are same. e.g., BCFB is a cycle. • Directed graphs differ from undirected graphs in that each edge is given a direction. • The degree of a vertex is the number of edges incident on that vertex. e.g., the degree of C is 3, the degree of D is 1, the degree of H is 0. For a directed graph, we have more speciﬁc outdegree and indegree. • Two vertices u and v are connected if a simple path exists between them. • A subgraph S is a connected component iff there exists a path between every pair of vertices in S . e.g., {A,B,C,D,E,F,G} and {H} are the connected components of our example. • A graph is acyclic if it contains no cycles. • A graph is complete if every pair of vertices is connected by an edge. A Sample Graph Problem
Many problems in computer science can be converted to graph problems. If you go on to take Algorithms (COMSC 312), you will spend a signiﬁcant amount of time studying graphrelated algorithms. We will just sample a few here. Consider, for example, an application in which we need to plan a driving route from Williamstown to Boston. We might represent South Hadley and all other towns in Massachusetts as vertices; we might represent roads as edges between the vertices. If we labeled the edges with the mileage between the vertex cities, the path planning problem then becomes a problem of ﬁnding the shortest (weighted) path in the graph between South Hadley and Boston. Here’s a subset of that data that we’ll use later: 2 CS 211 Data Structures Fall 2009 Lowell 1 0 1 0 Greenfield 11111 1111111111 0000000000 00000 33 Williamstown 000000000000 37 1 0 50 11 111 000 1 111111111111 000000000000 00 11111111111 1 0 Fitchburg 1111111111 0000000000 11111 00000 1 11 0 00 1 0 11 00 11 00 1 0 111111111111 000000000000 1111111111 0000000000 11111 00000 1 11 0 00 1 0 11 00 50 11 00 North Adams 1 111111111111 000000000000 1 0 1111111111 0000000000 11111 00000 11 00 11 00 1 0 1 0 11111 00000 11 00 1 11 00 0 1 0 11111 00000 11 00 11 1 0 00 1 0 11111 00000 11 00 30 11 00 21 1 0 1 0 11111 00000 11 00 11 00 1 1 0 11111 00000 11 00 21 0 11 00 1 0 1 0 32 11111 00000 11 00 11 00 1 0 1 0 11111 00000 11 00 39 11 1 0 00 1 0 11111 00000 Boston 11 00 1 0 1 0 11111 00000 11 00Pittsfield 1 0 11 00 1 0 1 0 11111 00000 11 00 1 0 11 00 1 0 111111111111111 000000000000000 000000 1111 0000 111111 1 0 47 11 00 1 0 11 00 1 0 111111111111111 000000000000000 111111 000000 1111 0000 1 0 11 00 Provincetown Auburn 1 0 111111111111111 11111111111 000000000000000 00000000000 000000 1111 0000 111111 1 0 11 00 0000000000 1 0 1111111111 1 011 111111111111111 11111111111 000000000000000 00000000000 000000 1111 0000 111111 11111111111 00000000000 11 00 1 0 47 0000000000 1 0 1111111111 1 0 111111 11111111111 000000 00000000000 1111 0000 11111111111 00000000000 11 00 1 0 0000000000 1111111111 1 0 111111 11111111111 000000 00000000000 1111 0000 11111111111 00000000000 1 0 1111111111 0000000000 1 0 44 1 0 000000 00000000000 111111 11111111111 1111 0000 11111111111 00000000000 111111111111 000000000000 0000000000 1111111111 1 0 1 0 000000 00000000000 1111 0000 111111 11111111111 111111111111 000000000000 1 0 0000000000 1111111111 Springfield 1 0 000000 00000000000 1111 0000 111111 11111111111 1 0 1111111111 0000000000 111111 11111111111 000000 00000000000 1111 0000 1111111111 0000000000 40 000000 00000000000 111111 11111111111 1111 0000 Lee 0000000000 1111111111 000000 00000000000 111111 11111111111 1111 0000 0000000000 1111111111 111111 11111111111 000000 00000000000 1111 0000 0000000000 1111111111 76 111111 11111111111 000000 00000000000 1111 0000 1111111111 0000000000 000000 00000000000 111111 11111111111 1111 0000 0000000000 1111111111 000000 00000000000 111111 11111111111 1111 0000 0000000000 1111111111 000000 00000000000 111111 11111111111 1111 0000 1111111111 0000000000 58 71 111111 11111111111 000000 00000000000 0000 1111 1111111111 0000000000 000000 00000000000 111111 11111111111 1111 0000 1111111111 0000000000 11111111 00000000 000000 00000000000 111111 11111111111 1111 0000 0000000000 1111111111 11111111 00000000 000000 0 111111 1 0000000000 1111111111 11111111 00000000 000000 0 111111 1 0000000000 1111111111 11111111 00000000 111111 000000 Plymouth
New Bedford The Graph Interface
As with many of our structures this semester, we will have an interface that deﬁnes a general behavior of graphs, independent of what structures we actually use to represent them in speciﬁc implementations. See Structure Source: /home/jteresco/shared/cs211/src/structure5/Graph.java Graph has two type parameters – V determines the types of the labels of the graph vertices, E determines the types of the labels of the graph edges. We have the usual methods like add, remove, get, contains, but what should these mean? We have both vertices and edges! Here, we use these to manipulate the vertices in the graph. Note that vertices are speciﬁed by their labels. Often we will use strings, but the labels may be of any type. There is a corresponding set of methods that deals with edges, but these are named addEdge, removeEdge, etc. Note that edges are added by specifying the labels of the vertices to which it is connected, and the label of the edge itself. The getEdge method doesn’t return an edge label but rather a new structure we haven’t looked at yet called an Edge. We will see this one shortly. And there are also a number of methods that deal with vertices and edges being “visited”. Many graph algorithms need to know which vertices or edges they’ve already considered, so this has been designed right into the graph interface. 3 CS 211 Data Structures Fall 2009 Finally, there are a number of methods that give us some information about the graph or about particular vertices, such as degree, neighbors, and iterators over vertices an edges. Implementations of Graphs
First, we have classes to represent vertices and edges. These are quite simple: See Structure Source: /home/jteresco/shared/cs211/src/structure5/Vertex.java First, we notice that Vertex is not a public class. Code outside of the structure cannot create a Vertex. A Vertex is uniquely deﬁned by its label (an object of type E). Important note: the label used for our Vertex can be of any type, but is assumed to be immutable. If it is an instance of a class that can be modiﬁed (e.g., a Vector), we cannot modify it after using it as a Vertex label. We also keep the visited ﬂag for use later in traversals and other algorithms. See Structure Source: /home/jteresco/shared/cs211/src/structure5/Edge.java Unlike the Vertex, Edge is a public class. Some Graph methods return an Edge, so it must be public. An Edge is deﬁned by its two Vertexs, and also may have a label of its own. It also has the visited ﬂag. Vertex and Edge classes may need to be extended as we implement speciﬁc types of Graphs. A Graph is really just a mechanism to manage all of these edges and vertices. If there are a ﬁxed number of edges from each node then we can have ﬁxed number of edges stored with each node (like a binary tree). For general graphs, we typically use either 1. an adjacency matrix, or 2. adjacency lists. As a running example, we will consider an undirected graph where the vertices represent the states in the northeastern U.S.: NY, VT, NH, ME, MA, CT, and RI. An edge exist between two states if they share a common border, and we assign edge weights to represent the length of their border. We will represent this graph as both an adjacency matrix and an adjacency list. Adjacency Matrix Representation
In an adjacency matrix, we have a twodimensional array, indexed by the graph vertices. Entries in this array give information about the existence or nonexistence of edges. 4 CS 211 Data Structures Fall 2009 We represent a missing edge with null and the existence of an edge with a label (often a positive number) representing the edge label (often representing a weight). Labels of vertices are stored in a dictionary, so we can look up corresponding index for each vertex label. Adjacency matrix representation of NE graph NY VT NH ME MA CT null 150 null null 54 70 150 null 172 null 36 null null 172 null 160 86 null null null 160 null null null 54 36 86 null null 80 70 null null null 80 null null null null null 58 42 NY VT NH ME MA CT RI RI null null null null 58 42 null If the graph is undirected, then we could store only the lower (or upper) triangular part, since the matrix is symmetric. Since there is a lot of the implementation that will be common between the directed and undirected matrixbased graphs, the structure package deﬁnes an abstract class GraphMatrix. See Structure Source: /home/jteresco/shared/cs211/src/structure5/GraphMatrix.java Two implementations, GraphMatrixDirected and GraphMatrixUndirected, extend it, adding in the functionality that depends on the directedness of the graph. See Structure Source: /home/jteresco/shared/cs211/src/structure5/GraphMatrixDirected.java See Structure Source: /home/jteresco/shared/cs211/src/structure5/GraphMatrixUndirected.java In the abstract class, we declare all of the instance variables needed to support both matrixbaseds implementations: • data: a two dimensional array of edges. Note that we need to store them as Object for the same reasons we saw in the implementation of Vector. In actuality, the items stored in this array will be of type Edge<V,E>. • freeList: a list of integers which represent available vertex indices. More on this below. • dict: a mapping from vertex labels to (integer) vertex indices that can be used to index into the data array. • directed: a boolean ﬂag to indicate the directedness of the graph 5 CS 211 Data Structures Fall 2009 We won’t worry too much about the Map that translates vertex labels to indices yet. It’s using a hash table – a topic we’ll cover after graphs. For now, just realize it should be (and will be) an efﬁcient tool to look up indicies from vertex labels. The free list indicates which of our vertex indices are available to be assigned to new vertices being added to the graph. For efﬁciency of the matrixbased implementation, the maximum number of vertices is speciﬁed at construction time. This will be an important restriction to be aware of with the matrixbased representation of graphs. It could be made to expand as needed like a Vector, but this implementation does not support that. We would simply run out of space for vertices and throw an exception. The constructor, as we expect, initializes our instance variables to represent an empty graph. Since the constructor doesn’t need to care whether the edges are directed or not, the constructor can be deﬁned in the abstract class. However, it is declared as protected since this will not be called by users, they will need to construct directed or undirected constructors. Those constructors don’t do anything else, but they are necessary because we can’t construct an instance of an abstract class. Note that they pass the appropriate boolean value to the abstract class constructor to indicate directedness. Note that by constructing a GraphMatrix capable of storing up to size vertices, we allocate O(size2 ) space, even for an empty graph! Adding a vertex can be done entirely in the abstract class, as this is the same for both directed and undirected graphs. If the vertex is not already in the graph, we look up a free index and associate it in our map with the label of the vertex. However, we store more than just the index for the label, we have a GraphMatrixspeciﬁc extension of the Vertex class. See Structure Source: /home/jteresco/shared/cs211/src/structure5/GraphMatrixVertex.java In addition to the label and the visited ﬂag provided by Vertex, GraphMatrixVertex stored the index to allow quick access from a Vertex to its row/column index in the adjacency matrix. In the vertex add method, we are just making sure that we’re not adding a duplicate vertex, getting an available row, and creating a new GraphMatrixVertex and remembering it in our mapping between labels and vertices. The row/col number is remembered as part of the vertex. The cost of this depends on the cost of the methods associated with the label/vertex mapping. Efﬁcient implementations of such mappings will be the subject of the last major topic in the course. At worst, it should involve linear time searches, and we’ll see it can be much better. Adding an edge, however, requires knowledge of the directedness, so this is an abstract method in the abstract class, and is provided by the subclasses. The implementations are similar: • For the undirected graph, we ﬁnd the indices of its endpoints and create an edge to be stored in two matrix slots (since we need to represent it in both directions). 6 CS 211 Data Structures Fall 2009 • For a directed graph, the method is the same, except we only add the edge in the speciﬁed direction, leaving the edge corresponding to the other direction alone. Removing a vertex can be done in the abstract class. We remove it from the lookup table, clear any edges that might be using that index, and add the nowavailable position to the free list. Note that this means edges are silently removed if either of their vertices is removed. Removing an edge needs to be done in the subclasses, again so we can remove the edge from just one matrix slot in the directed case, two matrix slots in the undirected case. Finding a vertex or an edge or checking containment of vertices or edges are also simple and done in the abstract class. Mutator and accessor methods to set and retrieve the visited attributes of the vertices and edges are also straightforward. visit and isVisited apply to vertices, visitEdge and isVisitedEdge apply to edges, reset clears the visited ﬂags for all vertices and edges. We can easily get the number of vertices (returned by size()) by querying the number of vertices in the mapping. The number of edges can’t be determined in the abstract class, so it is an abstract method and is deﬁned appropriately in the subclasses. We can compute the degree of a vertex by looking across its row and counting up the nonnulls. If the graph is directed, this will be either in or outdegree, depending on how we orient the matrix, and if we want the other, it would have to be provided in a separate method. There is also a method neighbors to get an iterator over all vertices adjacent to a given vertex. Adjacency List Representation
An adjacency list is composed of a list of vertices. Associated with each each vertex is a linked list of the edges adjacent to that vertex. 7 CS 211 Data Structures Fall 2009 Vertices NY VT NH ME MA CT RI
VT/150 NY/150 VT/172 NH/160 NY/54 NY/70 MA/58 Edges
MA/54 NH/172 ME/160 CT/70 MA/36 MA/86 VT/36 MA/80 CT/42 NH/86 RI/42 CT/80 RI/58 Once again, the implementation is broken into an abstract class that provides the data and functionality that are common to both the directed and undirected case, and concrete classes that implement the speciﬁcs for each directedness. See Structure Source: /home/jteresco/shared/cs211/src/structure5/GraphList.java See Structure Source: /home/jteresco/shared/cs211/src/structure5/GraphListDirected.java See Structure Source: /home/jteresco/shared/cs211/src/structure5/GraphListUndirected.java In GraphList, we see that the graph needs to contain a collection of vertices. This collection could be vector or linked list, but we’ll use something more clever. Again, we will consider an efﬁcient way to do this after our discussion of graphs. Each vertex holds collection of edges that are adjacent to it. Similarly the list of edges could be implemented in many ways, including all kinds of lists or binary search trees. We’ll use singlylinked lists. See Structure Source: /home/jteresco/shared/cs211/src/structure5/GraphListVertex.java There’s a lot more going on here than there was in the GraphMatrixVertex. Addition of an edge to a vertex’s singlylinked list of edges will always be done at beginning of list (constant time, once we ﬁnd the vertex). Edges connected to a given vertex can be held in order by key, but we do not do this. For directed graphs, we only need to store an edge in one vertex’s list. For undirected, each edge 8 CS 211 Data Structures Fall 2009 is inserted into two lists. Back to the GraphList abstract class. Again, we implement those things that are independent of directedness. The constructor doesn’t need to do as much, and doesn’t allocate much space (O(1), though we haven’t yet seen the details of the HashTable implementation of a Map). Adding vertices is just the addition of a new entry in the mapping. Remove needs to be done in the subclasses, since we must remove the vertex from all edge lists in which it appears (see below). Many operations on edges depend on the directedness. Some operations that we could implement in the abstract class for the adjacency matrix representation need to be implemented in the subclasses. Some others have been moved into the vertex implementation. First, we’ll look more at GraphListUndirected. Adding edges is relatively straightforward: just add it to the adjacency lists of both vertices if it is not already there. Notice how deleting a vertex is expensive since we must delete all adjacent edges which are in each neighboring vertex. Fortunately, we don’t have to check for the edge in all vertex edge lists, only the neighbors of the vertex being removed. Deleting an edge requires a search of the appropriate vertex edge list(s). What about space usage? The adjacency matrix representation is more efﬁcient for relatively dense graphs. The adjacency list representation is more efﬁcient (spacewise) for sparse graphs. Graph Applications
Example: Reachability
As a simple example of something we can do with a graph, we determine the subset of the vertices of a graph G = (V, E ) which are reachable from a given vertex s by traversing existing edges. A possible application of this is to answer the question “where can we ﬂy to from ALB?”. Given a directed graph where vertices represent airports and edges connect cities which have a regularlyscheduled ﬂight from one to the next, we compute which other airports you can ﬂy to from the starting airport. To make it a little more realistic, perhaps we restrict to ﬂights on a speciﬁc airline...
View
Full Document
 Spring '09
 Graph Theory, Data Structures, Vertex, vertices

Click to edit the document details