In the last article we introduced the next section in our series: graphs. Graphs are super important because essentially every major problem involved on the web can be represented as a graph. Now that we know what kinds of graphs there are and what data structures we can use to represent them, it’s time to start traversing them so we can discover every vertex and edge in a graph.
This article is part of the Data Structures and Algorithms Series following the course The Analysis of Algorithms. If you missed the previous article, check that out first as well as a copy of The Algorithm Design Manual.
Answers to the previous Daily Problem
Present correct and efficient algorithms to convert an undirected graph
G
between the following graph data structures. You must give the time complexity of each algorithm, assumingn
vertices andm
edges.
Basically we’re being asked to take a graph and morph it between the data structures we learned about in the last article.
(a) Convert from an adjacency matrix to adjacency lists.
In this case we’re going from the space-heavy to the space-efficient, and mapping the relationships. In this case we iterate through every row in our 2D array, which represents each node (read: vertex) in our list. Then all we have to do is, for each node, add the next
nodes for each pointer that is activated (read: the 1
s in the matrix). That’s an O(n^2)
algorithm since we’re doing a constant amount of work for each assignment and we have to compare the n
columns against the n
rows.
(b) Convert from an adjacency list to an incidence matrix. An incidence matrix
M
has a row for each vertex and a column for each edge, such thatM[i,j] = 1
if vertexi
is part of edgej
,otherwise M[i,j] = 0
.
This one kind of gives it away by saying for each
twice, since we’re walking down the vertices and comparing against the other vertex on that edge giving us an O(n+m)
algorithm.
(c) Convert from an incidence matrix to adjacency lists.
This one is actually interesting. We know we need to iterate across all edges, so that’s O(m)
immediately. The difference is that for every 1
we find in an incidence matrix, we need to find the 2 n
vertices that represent the two ends of that edge, therefore we need to make 2 operations for each vertex. So this conversion is actually O(2nm)
, which reduces down to O(nm)
because as we learned back with our article on Big O notation, the constants on any variable are dropped.
Graph traversal definitions
Before we dive into the various traversal algorithms, we need to introduce how we track our traversal. This is the same across all algorithms, but we want to define these states since we’ll be using them quite a bit:
- Unvisited: You haven’t visited this vertex in the graph yet.
- Visited: You have visited this vertex.
- Processed: You not only visited this vertex, but you’ve travelled down all of its edges, too.
Breadth-First Search
So now that we’ve described some definitions we’ll use, let’s look at our first graph traversal algorithm: breadth-first search (BFS for short). Later we’ll look at depth-first search, so remove the confusion now, I want you to think on how you describe something by its breadth versus its depth. How do you physically describe it in space?
Breadth you probably think in a left-right relationship, like extending your arms out to show something broad and wide. The data structure that makes us think of shuffling left-to-right? A queue.
Depth you probably think in a up-down relationship, like the depths of the sea. The data structure that makes us think of diving down and coming back up? A stack.
So when remembering breadth-first search, just remember that we’re going to visit nodes in a left-to-right fashion. So a tree that looks like this:
8
/ \
6 10
/ \ \
4 5 12
Will read the tree with BFS in this order: [8, 6, 10, 4, 5, 12]
. The generic algorithm for all graphs (not just trees) using BFS looks like this:
class Graph {
// ... see last article for our implementation
const PROCESSED = new Array(MAX_VERTICES);
const VISITED = new Array(MAX_VERTICES);
const PARENTS = new Array(MAX_VERTICES);
bfs(startVertex) {
let visitQueue = new Queue();
let currentVertex, nextVertex, tempVertex;
visitQueue.enqueue(startVertex);
visited[start] = true;
while (visitQueue.length > 0) {
currentVertex = visitQueue.dequeue();
console.log("PRE-PROCESSED!");
PROCESSED[currentVertex] = true;
tempVertex = this.connections[currentVertex];
while (tempVertex) {
nextVertex = tempVertex.adjacencyInfo;
if (!PROCESSED[nextVertex] || this.isDirected) {
console.log(`PROCESSED EDGE ${currentVertex}=>${nextVertex}`);
}
if (!VISITED[nextVertex]) {
visitQueue.enqueue(nextVertex);
VISITED[nextVertex] = true;
PARENTS[nextVertex] = currentVertex;
}
tempVertex = tempVertex.nextVertex;
}
console.log("POST-PROCESSED");
}
}
}
The runtime here is O(n + m)
. Why is that? When using an adjacency list, we’re going to explore all of the edges for each vertex. Every vertex will get processed once in the queue. Every edge is going to get walked on both coming and going. That makes the total number of operations n + 2m
, so when we reduce that down for worst-case runtime, that leaves us with O(n + m)
.
Why use breadth-first search?
So now we know the what and the how of BFS, but why should we care to use it? Here’s a couple reasons:
- Visualize clustered relationships. A cluster is just a subset of the graph that is connected. Remember, not all vertices need to be connected to each other. Since BFS doesn’t care for an absolute root, it will pick any vertex, find all of its connections, and then keep discovering all of the explicitly-defined vertices that haven’t already been discovered via connection. BFS is great for mapping people and neighbors/friends across the world. There’s going to be a cluster in Australia. There’s going to be a cluster in Japan. Each of these islands creates a physical cluster of relationships. They’re all part of the total graph of people in the world, but they aren’t all necessarily connected. The best part is, no matter how much stuff we do to find connected components, it only ever adds constant time to find clusters, so a connected clustering algorithm still runs in
O(n + m)
. - The Two-Color problem. This is one of those specific graph theory math problems that’s great for BFS. Each vertex gets a label (in this case, a color). The edges of said vertex cannot lead to another vertex of the same color. So
R->G->R
passes, butR->R->B
fails. The valid graph is known as a bipartite graph; bi- meaning two and partite meaning to divide. There are lots of real-world examples in biology and medicine. Have you ever played Sudoku? A Sudoku solver is just an application of graph coloring! In fact, any sort of tree is a bipartite graph because the whole purpose of the tree is to divide the nodes involved.
The next Daily Problem
So there’s a few clear use cases for BFS. We’ll find quite a few more applications for DFS in the next article, but for now, let’s think on what we’ve learned from BFS with a Daily Problem:
Prove that in a breadth-first search on a undirected graph
G
, every edge is either a tree edge or a cross edge, wherex
is neither an ancestor nor descendant ofy
, in cross edge(x,y)
.
More problems to practice on
Now that homework 3 is out, here are a few problems that are relevant to the sorting we’ve discussed so far:
- Problem 5-4.
- Problem 5-7.
That’s all for today, stay tuned for depth-first search in the next article!
Get the FREE UI crash course
Sign up for our newsletter and receive a free UI crash course to help you build beautiful applications without needing a design background. Just enter your email below and you'll get a download link instantly.