Having recently graduated college, I’m in a bit of a contemplative mood. So, I’ve decided to write up a post idea that I’ve had for a while: a sampling of some of the best “aha” moments that I’ve had over college.

To use a bit of a cliché analogy, CMU is like a long, painful hike in the mountains. Most of the time, you’re just trudging along what seems like an endless trail. But every once in a while, you break through the treeline and suddenly find yourself at the summit. From this glorious vantage point, you can see where all of your previous hiking was leading, and the whole world seems to come together in one beautiful moment. Was this worth the painful hike? I guess that depends on your personality.

What follows is a sampling of what I consider my favorite “mountaintop” moments over the course of my undergrad. I think a lot of these moments arise from the interaction between concepts in different classes, when learning one thing made me think of something I’d known before in a previous light.

In 15-252 More Great Ideas in Theoretical Computer Science, we spent our time going over a lot of really cool stuff, but probably my favorite was spectral graph theory. In brief, spectral graph theory tries to relate combinatorial properties of graphs with properties of their algebraic representations.

Here’s one very accessible example: if you’ve studied computer science before, you’re probably familiar with the *adjacency matrix* representation of a graph. Given some graph $G = (V, E)$, the adjacency matrix $A$ is the matrix indexed by $(u,v) \in V^2$ with $A_{u,v} =
\begin{cases}
1 & (u,v) \in E \\ 0 & (u,v) \notin E.
\end{cases}$ What’s nice is that this is also the representation used practically in many programs; it’s not just some abstract theoretical construct. Let’s refer to the eigenvalues of $A$ as $\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_n$.^{1}

I used to think of the adjacency matrix as just an implementation trick, not really related to any interesting theoretical properties of graphs. But it turns out that the eigenvalues $\lambda_i$ actually tell you quite a lot about the graph! For example, here are some *really* cool theorems:^{2}

- If $G$ is $d$-regular (i.e. all vertices have degree $d$), then $G$ is connected if and only if $\lambda_2 < d$. (More generally, you can relate the number of eigenvalues equal to $\lambda_1$ to the number of connected components of $G$.)
- Let $G$ be connected. Then $\lambda_n = -\lambda_1$ if and only if $G$ is bipartite. (A version of this was one of our homework problems!)

Adjacency matrices also pop up elsewhere; for example, Markov chains can be modeled in terms of powers of the transition matrix (for random walks on graphs, this is essentially a weighted adjacency matrix), and the eigenvector with $\lambda = 1$ of the transition matrix gives the stationary distribution.^{3} I think this result is much more intuitive than the previous one about connectivity.

There’s a *lot* more that could be said about spectral graph theory, especially relating to expander graphs (basically sparse graphs with good connectivity), but the connection between eigenvalues and connectivity was what initially wowed me so much, because it linked two fairly elementary things in an unintuitive way.

Regular expressions are another very practical part of computing; most programmers have probably used them at some point or another. I’m particularly fond of them because they’re a place where *good theory* and *good implementation* meet; for a classic tour of this intersection, see Russ Cox’s blog post on implementing regular expression matching using nondeterministic finite automata.

But the connections between math and regular expressions go far deeper than that. There’s a lot of beautiful algebraic structure behind regular expressions, captured by Kleene algebras. What do I mean by this? Well, in the spirit of *a monad is just a monoid in the category of endofunctors*,^{4} I’d like to offer: *a Kleene algebra is just an idempotent semiring endowed with a closure operator*.

To make this idea clear, let me start by giving a mathematically suggestive syntax^{5} for regular expressions $\mathcal{R}$ over $\Sigma$ and a denotational semantics $\llbracket \cdot \rrbracket : \mathcal{R} \to \mathcal{P}(\Sigma^\star)$. (If you’ve taken a course on automata theory before, you can probably skip this part unless you want a refresher.) To keep things simple, we will restrict ourselves to the alphabet $\Sigma = \{a,b\}$, i.e. we will only consider strings containing $a$ and $b$. For a regular expression $r \in \mathcal{R}$, let $\llbracket r \rrbracket$ denote the set of strings matched by $r$. In this case, a regular expression is one of:

- The empty expression $0$, with $\llbracket 0 \rrbracket = \varnothing$. That is, no strings match $0$.
- The unit expression $1$, with $\llbracket 1 \rrbracket = \{\varepsilon\}$. That is, the only string matching $1$ is the empty string $\varepsilon$ (sometimes written
`""`

). - A single character $a$ or $b$, with $\llbracket a \rrbracket = \{a\}$ and $\llbracket b \rrbracket = \{b\}$. That is, the only string matching $a$ is
`"a"`

, and likewise for $b$. - The
*alternation*$r_1 + r_2$ of any two regular expressions $r_1$ and $r_2$, with $\llbracket r_1 + r_2 \rrbracket = \llbracket r_1 \rrbracket \cup \llbracket r_2 \rrbracket$. Note that this is often written`<r1>|<r2>`

in other syntaxes. - The
*concatenation*$r_1 \times r_2$ of any two regular expressions $r_1$ and $r_2$, with $\llbracket r_1 \times r_2 \rrbracket = \{ s_1 s_2 \mid s_1 \in \llbracket r_1 \rrbracket, s_2 \in \llbracket r_2 \rrbracket \}$. Note that this is often written`<r1><r2>`

in other syntaxes. - The
*repetition*$r^\star$ of any regular expression $r$, with $\llbracket r^\star \rrbracket = \{ s_1 \cdots s_n \mid s_i \in \llbracket r \rrbracket, n \ge 0 \}$. This is also called the*Kleene star*operator, and is usually denoted`<r>*`

.

To give an example, the regular expression `a(a|b)b*`

matches any string starting with $a$, then any letter, then any number of $b$’s, such as `"aabbb"`

, `"aab"`

, and `"ab"`

. In our syntax, this would be written as $a \times (a + b) \times b^\star$.

Now why did we go about defining this unorthodox syntax? Because it draws attention to something remarkable: viewed this way, $(\mathcal{R}, +, \times, 0, 1)$ is an idempotent^{6} semiring, endowed with an extra unary closure^{7} operator $\star$. (At least, up to extensional equivalence—let’s agree to say $r \simeq r'$ if $\llbracket r \rrbracket = \llbracket r' \rrbracket$. To make this rigorous, one ought really to consider the quotient structure $\mathcal{R}/\mathord\simeq$, with operations extended over congruence classes in the natural way. But I think I’ve already been pedantic enough here.)

This insight neatly frames a *lot* of things you can do with regular expressions—such as taking derivatives,^{8} solving equations over $\mathcal{R}$, etc.—but I’ll just end with a particularly funny, almost coincidental application. (I actually briefly went over this example in a previous post, without all this explanation.) Here’s a not-particularly-intuitive theorem of abstract algebra: given a (not necessarily commutative) ring where $1-ab$ is invertible, $(1-ba)^{-1}$ also exists, and is given by the formula $(1-ba)^{-1} = 1 + b (1 - ab)^{-1} a.$ Let’s try to recast this in terms of regular expressions. In a way, we can identify $(1-r)^{-1}$ with $r^\star$, if you’ll allow some abuse of notation: $\frac{1}{1-r} \simeq \sum_{n=0}^{\infty} r^n \simeq r^\star.$ That is, we can think of $r^\star$ as a formal power series in the semiring of regular expressions. But then this theorem just says $(b \times a)^\star \simeq 1 + b \times (a \times b)^\star \times a,$ which is actually quite obvious! Put in more conventional notation, `(ba)*`

matches the same strings as `() | (b (ab)* a)`

.

Now we move on to something slightly more abstract: first-class continuations and their relation to classical logic. I’ll use SML as my example language; in SML/NJ, first-class continuations are implemented in the `Cont`

module:

```
type 'a cont = 'a SMLofNJ.Cont.cont
val callcc : ('a cont -> 'a) -> 'a = SMLofNJ.Cont.callcc
val throw : 'a cont -> 'a -> 'b = SMLofNJ.Cont.throw
```

This is similar to the Scheme `call/cc`

, if you’re familiar with that. The idea is that a continuation represents a *reified control stack*, allowing for nonlocal jumps in program execution, almost like a `goto`

in imperative programming. An explanation of how this works is beyond the scope of this blog post; if you’ve taken a functional programming class before, you might want to read these old lecture notes or Matt Might’s blog post on the topic.

One particularly beautiful thing: these continuations correspond very exactly to contradictions in classical logic,^{9} in the sense of the Curry–Howard correspondence.^{10}

Now ordinarily, a value of type $\tau$ is just a constructive proof of the corresponding proposition $t$.^{11} Here’s the thing: with constructive (or intuitionistic) logic, one does not have the law of the excluded middle, i.e. $\forall a.\, a \vee \neg a$. Of course, such a thing is absolutely permissible in classical logic; Aristotle is said to be the first one to come up with this rule. But in constructive logic, we demand that everything come with a proof that tells us how to construct it. If I claim that $A \vee B$, I need to verify this claim by showing you either an $A$ or a $B$. It’s not enough for me to wave my hands and say, “well, one of them is true!”^{12}

But here’s the catch: if we augment our programming language with first-class continuations (where the type $\tau\,\textsf{cont}$ corresponds^{13} to the contradiction $\neg t$), then we recover the law of the excluded middle! This can be shown by exhibiting a value `lem`

of type $\tau + \tau\,\textsf{cont}$,^{14} representing the proposition $\forall t.\,t \vee \neg t$.

```
datatype ('a, 'b) sum = Left of 'a | Right of 'b
(* We wrap the ['a + 'a cont] in a thunk to circumvent the value
restriction, but this is not essential to our example. *)
val lem : unit -> ('a, 'a cont) sum =
fn () =>
fn ret => Left (callcc (fn k => throw ret (Right k)))) callcc (
```

There’s actually an amazingly intuitive interpretation of this construction:

- If you ask me for a $\tau + \tau\,\textsf{cont}$, I just lie to you and say I have a $\tau\,\textsf{cont}$.
- Now you might never call my bluff, in which case I’m in the clear. However, if you do challenge me, the
*only*way you can do it is by throwing a $\tau$ to my continuation, in which case I backtrack to where I lied to you and change my mind, giving you back the $\tau$ that you just gave me.

This accords very well with the idea of a continuation as a form of nonlocal control flow. I call this “intuitive,” but the idea actually took me several years to grasp. I first came upon the idea of continuations as a freshman in 15-150 Principles of Functional Programming. I then learned of the connection with classical logic in 98-317 Hype for Types, but it wasn’t until I took 15-312 Foundations of Programming Languages as a senior that it really clicked for me.

The idea of continuations also pops up in quite a few other places, albeit in slightly different forms:

- They give rise to the continuation-style denotational semantics of programming languages.
- When compiling functional programs, continuation passing style (CPS) conversion is equivalent in some ways to the single static assignment (SSA) form used for optimizations in many compilers, including LLVM.

This list is certainly not exhaustive, and there are many worthwhile things that I learned that aren’t so easily distilled into bloggable bites. I do find it interesting that this list somewhat captures my progression from “theory A” (algorithms and complexity) to “theory B” (logic and languages). I may revisit this topic to write a part 2 if I think of more things to write. In general, I’m becoming more and more convinced of the value of posts that summarize what I’ve learned rather than trying to present dazzling new insights that others haven’t thought of before.

Notice that if $G$ is an undirected graph, then $A$ is a real, symmetric matrix, and so the spectral theorem guarantees the existence of these eigenvalues.↩︎

Actually, most of these results are more cleanly stated using the Laplacian rather than the adjacency matrix, but this is less frequently used as a real graph representation in programs.↩︎

I say

*the*stationary distribution because various ergodicity theorems provide sufficient conditions for its uniqueness. In fact, such results can be extended to infinite Markov chains, with some care.↩︎For the uninitiated, this is a widely-memed quote from Mac Lane’s textbook

*Categories for the Working Mathematician*. The original quote reads: “All told, a monad in $X$ is just a monoid in the category of endofunctors of $X$, with product $\times$ replaced by composition of endofunctors and unit set by the identity endofunctor.”↩︎For those at CMU, this is the syntax used in 15-150. Generally speaking, “weird things” done by 15-150 staff are done that way because they point to something deeper, and it’s often quite satisfying to finally realize

*why*something was initially presented one way. Also, if you’re interested in an algebraic treatment of computation more generally, be sure to check out 15-354 Computational Discrete Math; I hear that next fall might be the last offering.↩︎We say it’s idempotent because $\llbracket r + r \rrbracket = \llbracket r \rrbracket$ for any regular expression $r$.↩︎

So-called because $\llbracket r^\star \rrbracket$ is the least set containing $\llbracket r \rrbracket$ and $\varepsilon$ which is closed under string concatenation.↩︎

Note that these are also referred to as “left quotients,” which confusingly can be viewed as “right actions.” Nomenclature doesn’t always make sense…↩︎

In this view, the name $\tau\,\textsf{cont}$ is a bit of a pun between

*continuation*and*contradiction*.↩︎This is sometimes called the

*propositions-as-types correspondence*. There’s also a deep connection between types and categories, giving rise to the notion of a computational Trinity. In fact, I would say that the beautiful correspondence between types, propositions, and categories is the strongest argument for functional programming.↩︎One does have to be slightly careful with divergence.↩︎

An example of such a proof is the following classical argument showing that “an irrational to an irrational power may be rational.” Either $\sqrt{2}^{\sqrt{2}}$ is rational or it’s not. In the former case, we are done. In the latter case, note $(\sqrt{2}^{\sqrt{2}})^{\sqrt{2}} = 2$.↩︎

Actually, we should be a bit careful with this. Another way to construct a “contradiction” is the type $\tau \to \textsf{void}$, where $\textsf{void}$ is the type inhabited by no values. Intuitively, a function of type $\tau \to \textsf{void}$ would be able to produce a value of type $\textsf{void}$ if you ever gave it a value of type $\tau$, but there don’t exist values of type $\textsf{void}$! This is slightly different from the formulation using continuations.↩︎

In SML, there is a slight inconvenience known as the

*value restriction*, which determines which expressions may be polymorphic at the top level. In particular, only syntactic values are allowed, to avoid bad interactions with mutable references. We thus actually prove the proposition $\forall t.\,\top \implies (t \vee \neg t)$, where $\top$ is trivial truth and corresponds to the unit type. But this doesn’t really matter for the purposes of this post.↩︎

## Comments