Now that we’re done with searching and sorting on basic data structures, it’s time to dive into the most applicable data structure in all of computers: graphs.
This article is part of the Data Structures and Algorithms Series following the course The Analysis of Algorithms. If you missed the previous article, check that out first as well as a copy of The Algorithm Design Manual.
Graphs are the grand unifier of computers, both literally and figuratively. Graphs are the most interesting problems in computer science because they reflect the real-world: networks, communication, and organization all operate in the abstract as a graph. In fact, your computer is reading this article as part of the grand graph known as the internet. Graphs are the analogy for connectivity, so when we study graphs, we are studying how the people of the world interact with each other.
As we mentioned in a previous article about designing algorithms, when we describe graphing problems, we often use words like network, circuit, web, and relationship. When describing problems with properties of those words, the solution is often an algorithm of navigating a graph. More important than knowing solutions to graph problems is recognizing the kinds of graphs, their data structures, and the kinds of problems they run into.
Kinds of graphs
There are a lot of different kinds of graphs. A graph is just a collection of nodes (or vertices) and the edges that connect them. Mathematics succinctly defines a graph as G = (V,E)
, so just like we use Big O notation to describe runtimes, so too will we use this graphing notation. But what kinds of graphs will we be dealing with?
- By connectivity. Connected graphs have all vertices connected to each other. Disconnected graphs have some vertices that have no edges connecting them. Bittorrent networks are connected graphs since each vertex offers a portion of the files they share. Bittorrent networks can also be disconnected graphs if someone has a file that isn’t being shared (or desired) by any other computer on the network.
- By direction. Directed graphs will have edges that force you to move from one vertex to the other in a certain direction. Undirected graphs let you freely travel on an edge in either direction. Two-way streets are undirected. Arteries (vessels bringing blood away from the heart) and veins (vessels bringing blood back to the heart) are directed.
- By weight. Weighted graphs put a price on the edges you travel between vertices. Unweighted graphs have every edge cost the same. It costs more time (and money) to travel from Boston to Seattle than it does to travel from San Francisco to Los Angeles. It never costs anything to walk from your home to the post office and back, and that distance is always the same.
- By simplicity. Simple graphs connect to other nodes just once. Non-simple graphs can connect to other nodes with multiple edges, or even themselves. Tunnels, like subway systems, are simple graphs because each tunnel edge is meant to get from one place to the other. Roads, on the other hand, are non-simple because there are so many ways to get from one building to the next.
- By density. Dense graphs have a lot of edges connecting vertices to each other. Sparse graphs have only a few edges connecting vertices. Unlike the other examples, this distinction is kind of vague, but if you can describe the connections in a quadratic order (i.e. every node connected to each other) it’s dense. If you can describe the connections in a linear order (i.e. every node is one other node or its neighbors) it’s sparse. Most problems deal with sparse graphs. Roads are sparse graphs since each intersection is usually only the crossing of two streets. Broadcast networks are dense since a broadcast has to reach all of its subscribers.
- By cycles. Cyclic graphs will repeat back to one origin vertex at some point. Conversely, acyclic graphs do not. Tree data strutures can be described as connected, undirected, acyclic graphs. Adding direction to those graphs (also know as DAGs for short) gives them a topology (i.e. relation or arrangement) between each other. Scheduling problems where A must come before B are represented as DAGs and can be reasoned about with topological sorting algorithms, which we’ll learn about a bit later.
- By topology. Topological graphs have vertices that are related by how their edges are portrayed. For example, if points on a map are shaded and arranged by their distance and height like on a real topography map, that creates a topological graph. Otherwise, embedded graphs represent projections of how real graphs are connected. Any drawing of a graph is an example of an embedded graph.
- By implication. Implicit graphs have some murky details. They don’t fully describe all of the connections in a graph. In fact the traversal of such a graph is how we may uncover a graph to make it explicit. Explicit graphs on the other hand clearly outline all of the vertices and all of the edges from the beginning. If you’ve ever played Super Metroid, the map of planet Zebes starts as an implicit graph. The connection between zones is not known until you explore them. Only when you’ve played the whole game can you create an explicit graph of the game map.
- By labeling. Labeled graphs give names or keys to each vertex, while unlabeled graphs do not make that distinction. The fact that we can name a path between Boston and Seattle means that the graph of roads between cities (the cities being the label on each vertex) is a labeled graph.
Data structures for graphs
Now that we know the kinds of graphs, how do we represent them in the memory of a computer? A few such data structures arise:
- Adjacency matrix. If there are
n
vertices in a graph, we can create ann
xn
matrix (i.e. an array of arrays) which represents how each vertex on a row is connected to each vertex on a column. A cell with a value of1
means the row vertex is connected to the column vertex. Otherwise, the cell has a value of0
. - Adjacency list. A more space-efficient way to represent a sparse graph is with a linked list, where the pointers represent the edges that connect neighboring vertices to each other.
So we have two major data structures for storing the data of graph problems. How do you know when to use each?
When adjacency matrices are a better fit
- Testing if an edge is in a graph. We said indexing from arrays is faster than lists. This is just a multi-dimensional application of this concept.
- When you have really big graphs. When space isn’t a concern, indexing directly into a graph’s relationships is going to be faster than having to traverse a list to find vertices deep within a graph.
- When you’re focused on writing connections (e.g. insertions and deletions of edges) to the graph. Since we’ve represented every connection, it’s as simple as toggling a
0
to a1
which occurs in constant time.
When adjacency lists are a better fit
- Finding the degree of a vertex. The degree just means the number of connections to another vertex. Since a linked list inherently stores the pointers to other nodes, it’s trivial to find the degree by simply summing the total number of pointers at a given vertex, which takes a constant-time number of operations. A matrix, however, requires summing the number of
1
s in a given row for a vertex of relationships, which involves traversing that array which takes linear time instead of constant time. - When you have small graphs. Conversely to a matrix, with small graphs the relationships are simple to map as pointers in a list, which are going to be very fast to access, and much more space efficient.
- When you’re focused on reads and traveling around (i.e. traversing) a graph. Since the data structure of a linked list bakes in relationships between nodes, this is naturally a good data structure for navigating from vertex to vertex. Matrices, on the other hand, verify relationships but offer no hint of where to go next, so it’s a chaotic mess that will take at most
O(n^2)
calls on average while a list will do that in linear time.
Since most problems focus on solving these kinds of issues, this is usually the right data structure for solving most graphing problems.
Let’s see how these might look in JavaScript:
class Vertex {
constructor(adjacencyInfo = null, nextVertex = null, weight = null) {
this.adjacencyInfo = adjacencyInfo;
this.nextVertex = nextVertex;
this.weight = weight;
}
}
class Graph {
const MAX_VERTICES = 1000; // can be any number
constructor(hasDirection) {
this.connections = new Array(MAX_VERTICES).fill(null);
this.degrees = new Array(MAX_VERTICES).fill(0);
this.edges = 0;
this.isDirected = hasDirection;
this.vertices = 0;
}
insert(vertexStart, vertexEnd) {
let vertex = new Vertex(vertexEnd, this.connections[vertexStart]);
this.connections[vertexStart] = vertex;
this.degrees[vertexStart] += 1;
if (this.isDirected) {
this.edges += 1;
} else {
this.insert(vertexEnd, vertexStart);
}
}
print() {
for (let i = 0; i < this.vertices; i++) {
console.log(`${i}: `);
let connection = this.connections[i];
while (connection) {
console.log(` ${connection.adjacencyInfo}`);
connection = connection.next;
}
console.log('\n'); // new line for clarity
}
}
}
The next Daily Problem
To get us thinking a bit about graphs, our next Daily Problem is Problem 5-8 from The Algorithm Design Manual:
Present correct and efficient algorithms to convert an undirected graph
G
between the following graph data structures. You must give the time complexity of each algorithm, assumingn
vertices andm
edges.(a) Convert from an adjacency matrix to adjacency lists.
(b) Convert from an adjacency list to an incidence matrix. An incidence matrix
M
has a row for each vertex and a column for each edge, such thatM[i,j] = 1
if vertexi
is part of edgej
,otherwise M[i,j] = 0
.(c) Convert from an incidence matrix to adjacency lists.
Think you have it? Have questions? Send it over on Twitter and I’d be happy to help, and good luck!
Get the FREE UI crash course
Sign up for our newsletter and receive a free UI crash course to help you build beautiful applications without needing a design background. Just enter your email below and you'll get a download link instantly.