Stateful Graph Algorithms in Haskell
- Motivation
- Introduction
- How to Run
- Implementation Details
- Code Reference
- Testing
- Referenced Paper
- Appendix
Motivation
In my Programming Languages course as part of my graduate program (CS 421 at UIUC by Professor Beckman in Summer ‘23), I was got extensive experience using Haskell, a functional programming language. This type of language is much different than object-oriented languages like Python and C++, of which I was much more familiar.
There’s a lot more I could and should say about differentiating Haskell from these object-oriented languages, but here are a few highlights:
- the languages is built entirely around functions, not variables or classes (per se).
- there are NO
for
orwhile
loops; only recursion. - a properly defined function reads surprisingly like mathematics, which is like music to my ears after majoring in math as an undergrad.
Given these big differences with Haskell, an important aspect of the language revolves around the idea of managing state. As a final project for the course, I decided to investigate this concept further with respect to graph algorithms, and what follows here is a slightly modified version of my final project report.
Introduction
I have a special interest in graph algorithms, and I was impressed with the elegance and simplicity of binary search tree implementation using the Haskell programming language as demonstrated in class by Professor Beckman (see Appendix). While trees are acyclic, graphs in general can contain cycles. In such cases, as pointed out in the King & Launchbury paper linked below, while performing a depth-first search of a graph there is a need to maintain state in the form of the knowing which nodes have already been visited. Conveniently, as the authors also describe, we can “use monads to provide updatable state,” and we can apply this idea to maintaining a set of visited nodes.
Nevertheless, I have found the logic of the Monad
type class to be difficult to grasp but its utility is undeniable. With this in mind, my goal for the final project was to combine these two concepts of graph algorithms and managing state, with the hopes of furthering my understanding of both.
Therefore, I implemented the following in Haskell, as per my project proposal:
- A
Graph
data type that allows for cycles (i.e., not just for trees). - A
traversal
function that returns an in-order traversal of a graph via depth-first search. - A
shortestPathLens
function that returns the number of hops in the shortest path between two nodes of a graph via breadth-first search. - The above two functions make use of monads to maintain state.
Furthermore, I implemented several additional features that seemed warranted:
- For the purpose of comparing with the monadic implementations of the two functions above, I also implemented them by simply passing state variables as arguments.
- I created a small Python module for the purposes of testing my Haskell implementations, relying heavily on the
networkx
library. More details in the Testing section below. - I created a REPL of my Haskell application to allow the user to more easily test and play with the algorithms for whatever graphs the user might please, either by reading a pre-built graph from a text file or by manually entering graph edges to create a custom graph.
In the end, I was surprised to learn that manually passing state variables resulted in more concise code compared to the monadic implementations. To be sure, using a state monad allowed for functions with fewer parameters, but repeatedly accessing and modifying the state resulted in more lines.
How to Run
- download the code (e.g.,
git clone https://github.com/dougissi/stateful-graph-algos-haskell.git
) - navigate to the project directory (e.g.,
cd stateful-graph-algos-haskell
) - run all unit tests via
stack test
- enter REPL via
stack run
Implementation Details
Graph
data type
To test my understanding, I developed my own Graph
data type as opposed to using the Data.Graph module or copying the implementation found in the referenced paper. While my implementation may lack some of the robustness of these other implementations, it makes up for it in is clarity and simplicity.
Nodes are represented as integers. A graph is simply a mapping between a node and its neighbors.
import Data.Map as M
type Node = Int
type Neighbors = [Node]
type Graph = M.Map Node Neighbors
Building a Graph
For my purposes, I defaulted to creating undirected graphs.
A graph can be built by passing in its list of edges. For example, consider the following graph of a triangle.
It can be built from the REPL by simply entering the information on the three edges:
> graph (1,2) (1,3) (2,3)
Graph: 1->[2,3] 2->[1,3] 3->[1,2]
The order of nodes within an edge is irrelevant. Entering (1,2)
as an edge is equivalent to entering (2,1)
.
The order of edges matters only insofar as the neighbors of a given node are added in order of appearance. For example, if I enter graph (1,3) (1,2)
, then node 1
’s neighbors will be ordered [3,2]
(likewise the neighbors of nodes 2
and 3
with both be [1]
).
You can replace the currently loaded graph by simply entering a new graph.
The underlying implementation of building a graph can be found here.
In-Order Traversal
To maintain the state of an in-order traversal of a graph g
starting at src
, we ought to keep track of two items:
- which nodes have been visited (
v
in code below) - the current state of the traversal (
t
in code below)
type VisitedNodes = S.Set Node
type Traversal = [Node]
type TraversalState = (VisitedNodes, Traversal)
traversalM :: Graph -> Node -> State TraversalState Traversal
traversalM g = dfs
where
dfs root = do
(v, t) <- get
case M.lookup root g of
Nothing -> return t
Just neighbors -> do
let v' = S.insert root v -- mark root as visited
t' = t ++ [root] -- add root to traversal
put (v', t')
loop neighbors -- do dfs on nonvisited neighbors
loop [] = do (_, t) <- get -- no remaining neighbors
return t
loop (i:is) = do (v, _) <- get
if i `S.notMember` v
then do _ <- dfs i -- continue traversal
loop is
else loop is -- try next neighbor
For the purpose of comparison, here is a non-monadic implementation
type Traversal = [Node]
traversal :: Graph -> Node -> Traversal
traversal g src =
let (_, t) = dfs src S.empty []
in t
where
dfs root v t =
case M.lookup root g of
Nothing -> (v, t)
Just nbrs ->
let v' = S.insert root v -- mark root as visited
t' = t ++ [root] -- add root to traversal
in loop nbrs v' t' -- do dfs on nonvisited neighbors
loop [] v t = (v, t) -- no neighbors remaining
loop (i:is) v t =
if i `S.notMember` v
then let (v', t') = dfs i v t -- continue traversal
in loop is v' t'
else loop is v t -- try next neighbor
From the REPL, once the user has built a graph, a traversal from a given node can be obtained.
> graph (1,2) (1,3) (2,3)
Graph: 1->[2,3] 2->[1,3] 3->[1,2]
> traversal 1
[1,2,3]
Shortest Path Lengths
Central to being able to calculate shortest path lengths between nodes is to be able to perform a breadth-first search. Now, in order to maintain state of a graph g
when performing a breadth-first search between start node s
and end node e
, we ought to keep track of
- which nodes have been visited (
v
in code below) - a queue of which nodes to check next, each linked with its depth from starting node (
q
in code below)
type VisitedNodes = S.Set Node
type Depth = Int
type Queue = [(Node, Depth)]
type StateBFS = (VisitedNodes, Queue)
bfsM :: Graph -> Node -> Node -> State StateBFS Int
bfsM g s e = do
case M.lookup s g of
Nothing -> return (-1) -- start not in graph
Just _ ->
case M.lookup e g of
Nothing -> return (-1) -- end not in graph
_ -> do -- start, end both in graph
(v, q) <- get
put (v, q ++ [(s, 0)]) -- add start to queue
aux -- process queue
where
aux = do
(v, q) <- get
case q of
[] -> return (-1) -- queue empty; fail
((i, d): q') -> do -- "pop" next node
if i == e -- win; return depth
then return d
else if i `S.member` v -- already visited; skip
then do put (v, q')
aux
else do
case M.lookup i g of -- get node's neighbors
Nothing -> do put (v, q') -- none; skip
aux
Just nbrs -> do -- add neighbors to queue
let q'' = q' ++ [(x, d + 1) | x <- nbrs]
v' = S.insert i v
put (v', q'')
aux
For the purpose of comparison, here is a non-monadic implementation
type Depth = Int
bfs :: Graph -> Node -> Node -> Depth
bfs g s e =
case M.lookup s g of
Nothing -> -1 -- start not in graph
Just _ ->
case M.lookup e g of
Nothing -> -1 -- end not in graph
_ -> -- start, end both in graph
let v = S.empty -- initialize empty visited set
q = [(s, 0)] -- initialize queue with start
in aux v q -- process queue
where
aux _ [] = -1 -- queue empty; fail
aux v ((i, d): q') -- "pop" next node
| i == e = d -- win; return depth
| S.member i v = aux v q' -- already visited; skip
| otherwise =
case M.lookup i g of -- get node's neighbors
Nothing -> aux v q' -- none; skip
Just nbrs -> -- add neighbors to queue
let q'' = q' ++ [(x, d + 1) | x <- nbrs]
v' = S.insert i v
in aux v' q''
There is little work to be done once we have a working breadth-first search algorithm to extend it to find the shortest paths between all nodes (if a path exists, of course). See here for the implementation.
From the REPL, once the user has built a graph, the list of shortest paths can be obtained.
> graph (1,2) (1,3) (2,3)
Graph: 1->[2,3] 2->[1,3] 3->[1,2]
> shortestPathLens
[(1,[(2,1),(3,1)]),(2,[(1,1),(3,1)]),(3,[(1,1),(2,1)])]
Code Reference
All the code for this project can be found at github.com/dougissi/stateful-graph-algos-haskell.
Here is a reference for the various components.
- stateful-graph-algos-haskell/
- app/
- Main.hs — REPL
- graphs/
- python/
- graph_algos_networkx.py — Python submodule using
networkx
to check test cases and visualize graphs
- graph_algos_networkx.py — Python submodule using
- src/
- GraphAlgos.hs — non-monadic implementations of graph algos
- GraphAlgosMonad.hs — monadic implementations of graph algos
- GraphsCommon.hs — graph construction and common functions
- Parse.hs — parsing graph edges entered into REPL
- test/
- Spec.hs — HUnit unit tests
- README.md
- app/
Testing
Overview
A series of unit tests were created using the HUnit library to confirm the accuracy many aspects of the codebase, particularly that of the traversal and shortest path length algorithms. The complete test code can be found at ./test/Spec.hs.
Even for simple graphs, it can be difficult for the human eye to determine an in-order traversal or the shortest path between two nodes. So rather than rely on my own ability, I built a Python submodule to generate its own outputs using the networkx
library and relying on its built-in depth-first search and shortest paths algorithms. These Python outputs were copied into the HUnit test cases for my Haskell implementations. The python submodule can be found at ./python/graph_algorithms_networkx.py
To run all tests, from the command line navigate to the root directory of the project folder and enter stack test
Triangle Tests
Logic: shortest paths between all nodes should be of length 1
Linear Tests
Logic:
- traversal from node 1 should do each in order
- trivial shortest paths
Triangle and Rectangle Tests
Logic:
- Traversal from node 1 should not include any nodes in the square
- Traversal from node 4 should not include any nodes in the triangle
- There should be no shortest paths between the nodes in the triangle and the nodes in the square.
Unbalanced Kite Tests
Logic:
- Shortest path from node 1 to node 7 should be through node 2, not node 3
- Traversal from node 1 should be via depth first search (recursion ordering based on order of edges in graph construction)
Referenced Paper
Structuring Depth First Search Algorithms in Haskell
David King and John Launchbury. Proc. ACM Principles of Programming Languages, San
Francisco, 1995.
Appendix
Binary Search Tree Implementation
As demonstrated in the course by Professor Beckman, here is the implementation of a few functions for binary search trees. The elegance and simplicity demonstrates the utility of the Haskell programming language.
data Tree a = Node a (Tree a) (Tree a)
| Empty
deriving (Show)
add :: (Ord a) => a -> Tree a -> Tree a
add a Empty = Node a Empty Empty
add a (Node b left right)
| a <= b = Node b (add a left) right
| otherwise = Node b left (add a right)
find :: (Ord a) => a -> Tree a -> Bool
find a Empty = False
find a (Node b left right)
| a == b = True
| a < b = find a left
| otherwise = find a right
lookupBST :: (Ord k) => k -> Tree (k,v) -> Maybe v
lookupBST k Empty = Nothing
lookupBST k (Node (a,b) left right)
| k == a = Just b
| k < a = lookupBST k left
| otherwise = lookupBST k right
mergeChildren :: (Ord a) => Tree a -> Tree a -> Tree a
mergeChildren Empty right = right
mergeChildren left Empty = left
mergeChildren left (Node r Empty subright) = Node r left subright
mergeChildren left (Node r subleft subright) = Node r (mergeChildren left subleft) subright
delete :: (Ord a) => a -> Tree a -> Tree a
delete a Empty = Empty
delete a (Node x left right)
| a > x = Node x left (delete a right)
| a < x = Node x (delete a left) right
| otherwise = mergeChildren left right