(*Fair disclaimer: I took a couple of undergraduate classes in programming language semantics, category theory, and such, but I by no means consider myself to be an expert on type theory. What follows are my own musings and should be taken cum grano salis*.)

One intuitive interpretation of a type `t`

is that it represents a set inhabited by some values. This seems perfectly sensible for many commonly-encountered types; to hash this out more formally, let’s say that $\llbracket t \rrbracket$ is the set of values inhabiting the type `t`

. So for some common OCaml types, for instance:

- the unit type is represented by $\llbracket \textsf{unit} \rrbracket = \{ () \}$,
- booleans are represented by $\llbracket \textsf{bool} \rrbracket = \{ \textsf{true}, \textsf{false} \}$, and
- the built-in integers are represented by $\llbracket \textsf{int} \rrbracket = [-2^{62}, 2^{62}-1]$.
^{1}

We can even extend this interpretation very naturally to (monomorphic, finite) algebraic data types:

- Given types
`a`

and`b`

, the product (tuple) type`t = a * b`

is simply represented by the Cartesian product $\llbracket t \rrbracket = \llbracket a \rrbracket \times \llbracket b \rrbracket$. - Given types
`a`

and`b`

, the variant type`t = A of a | B of b`

is represented by the*disjoint union*^{2}$\llbracket t \rrbracket = \llbracket a \rrbracket + \llbracket b \rrbracket$. - Given types
`a`

and`b`

, the function type`t = a -> b`

is represented by a collection of set-theoretic functions: $\llbracket t \rrbracket = \llbracket b \rrbracket^{\llbracket a \rrbracket}$.^{3}

To work out an example, consider the following type `t`

.

```
type t =
of unit * bool
| A of (bool -> unit) | B
```

We have $\llbracket t \rrbracket = \{ (A, ((), \textsf{true})), (A, ((), \textsf{false})), (B, \{ (\textsf{true}, ()), (\textsf{false}, ()) \}) \}$. One neat application of the above is that for any type `t`

, we can alternatively represent the type `t -> bool`

as the power set $\mathcal{P}(\llbracket t \rrbracket)$, since $\llbracket t \to \textsf{bool} \rrbracket = \llbracket \textsf{bool} \rrbracket^{\llbracket t \rrbracket}$, and an indicator function $f : A \to \{0,1\}$ identifies a particular subset of $A$.

In fact, this set-theoretic “denotational” semantics of types is slightly more powerful than advertised: it can handle some rather rudimentary infinite types as well, such as the classic recursively-defined natural numbers `nat`

.

```
type nat =
of unit
| Zero of nat | Succ
```

We just have $\llbracket \textsf{nat} \rrbracket = \{ (\textsf{Zero}, ()), (\textsf{Succ}, (\textsf{Zero}, ())), (\textsf{Succ}, (\textsf{Succ}, (\textsf{Zero}, ()))), \dots \}$. Now if we want to be really careful, we do need to redefine the semantics of variant types here, because we’re not allowed to define a set in terms of itself. It would be silly to define $\llbracket \textsf{nat} \rrbracket = \llbracket \textsf{unit} \rrbracket + \llbracket \textsf{nat} \rrbracket$, since this is a circular definition! We should instead think of $\llbracket \textsf{nat} \rrbracket$ as the *least fixed point* of the equation $N = \llbracket \textsf{unit} \rrbracket + N$. This can be done without too much difficulty as the limit of a chain of approximations:

- Define $\llbracket \textsf{nat} \rrbracket_0 = \{ (\textsf{Zero}, ()) \}$.
- For $k \in \mathbb{N}$, define $\llbracket \textsf{nat} \rrbracket_{k+1} = \llbracket \textsf{nat} \rrbracket_0 \cup \{ (\textsf{Succ}, n) \mid n \in \llbracket \textsf{nat} \rrbracket_k \}$.

Finally, we let $\llbracket \textsf{nat} \rrbracket = \bigcup_{k \in \mathbb{N}} \llbracket \textsf{nat} \rrbracket_k.$ Notice that there’s a very obvious bijection between $\llbracket \textsf{nat} \rrbracket$ and $\mathbb{N}$, the good ol’ natural numbers. See, no problems thus far!

Now let us recall the notion of a type isomorphism. We say that two types `t`

and `t'`

are *isomorphic* (written $t \simeq t'$), if there exist two functions `f : t -> t'`

and `f' : t' -> t`

such that $f \circ f' = \textsf{id}_{t'}$ and $f' \circ f = \textsf{id}_{t}$. That is, `f`

and `f'`

are “inverses” of each other.^{4}

To give an example, $\textsf{bool} \times \textsf{bool} \simeq \textsf{bool} \to \textsf{bool}$, witnessed by the following conversion functions:

```
let f : (bool * bool) -> (bool -> bool) =
fun (t, f) ->
function | true -> t | false -> f
let f' : (bool -> bool) -> (bool * bool) =
fun f ->
true, f false) (f
```

There are tons of interesting isomorphisms that can be proven; for instance, it’s not too hard to show that $\textsf{unit} \simeq t \to \textsf{unit}$ for any type `t`

, which is to say that there is exactly one function of type `t -> unit`

, namely `fun _ -> ()`

.^{5}

One notable point: under the set-theoretic semantics $\llbracket \cdot \rrbracket$, there’s a natural interpretation of the isomorphism $t \simeq t'$ as a *bijection* between $\llbracket t \rrbracket$ and $\llbracket t' \rrbracket$. In particular, bijections are the way set theorists compare the sizes of sets; we should have $|\llbracket t \rrbracket| = |\llbracket t' \rrbracket|$ whenever $t \simeq t'$.

Now that we’ve proposed a seemingly-reasonable model for thinking about types, let’s proceed to poke a hole it in. Consider the following mysterious type `cantor`

:

`type cantor = T of cantor -> bool`

Now this is not a bogus type; you can actually construct values of type `cantor`

, like `T (fun _ -> true)`

. But it is a tad bit fishy. There is, of course, the trivial isomorphism $\textsf{cantor} \simeq \textsf{cantor} \to \textsf{bool}$, witnessed by

```
let f : cantor -> (cantor -> bool) =
fun (T x) -> x
let f' : (cantor -> bool) -> cantor =
fun x -> T x
```

This is to say that there is a very natural bijection between $\llbracket \textsf{cantor} \rrbracket$ and $\llbracket \textsf{cantor} \to \textsf{bool} \rrbracket$. But recall that $\llbracket \textsf{cantor} \to \textsf{bool} \rrbracket$ is likewise in bijection with $\mathcal{P}(\llbracket \textsf{cantor} \rrbracket)$. This then implies that $|\llbracket \textsf{cantor} \rrbracket| = \left\lvert \llbracket \textsf{bool} \rrbracket^{\llbracket \textsf{cantor} \rrbracket} \right\rvert = | \mathcal{P}(\llbracket \textsf{cantor} \rrbracket)|,$ and we have a real problem on our hands: Cantor once famously proved that no set can have cardinality equal to its own power set, in a theorem now named after him! Our seemingly-reasonable set-theoretic interpretation of types has gone astray and given us an answer that we know cannot be correct. I think the punchline here is that, counter-intuitively, sets and functions are not a good way to model the semantics of the pared-down version of OCaml that we’re considering in this post,^{6} because they can’t quite capture recursive types. In fact, mathematicians studying the categorical semantics of such languages often do so using domains and Scott-continuous functions, rather than sets and ordinary functions.

In fact, there is another (somewhat subtle) cardinality issue with the model that we’ve been discussing so far. For this, we don’t even need the tricky `cantor`

example; consider the following type `real`

:

`type real = nat -> bool`

This is not a recursive type, though of course it is defined in terms of a recursive type, namely `nat`

. Yet there is still a bit of a problem here. Our set-theoretic semantics of types would tell us that we have the following bijections: $|\llbracket \textsf{real} \rrbracket| = |\mathcal{P}(\llbracket
\textsf{nat} \rrbracket)| = |\mathbb{R}|,$ which is perhaps what we’d naively expect. But then something weird is going on: there cannot possibly be uncountably many values of type `real`

, because there are only countably many OCaml programs! (The set of all strings over a finite alphabet is enumerable, after all.) This is to say that the majority of mathematical functions $\llbracket \textsf{nat} \rrbracket \to \llbracket \textsf{bool} \rrbracket$ are not representable as values in our programming language.

As an aside, let’s leave aside the semantics for a moment and go back to type isomorphisms, which we didn’t need $\llbracket \cdot \rrbracket$ to define. There’s a somewhat cute way of showing that $\textsf{cantor} \not\simeq \textsf{real}$ by making use of what is, in my opinion, one of the most surprising facts about types.

Suppose for a moment that $\textsf{cantor} \simeq \textsf{real}$. Then $\textsf{real} \simeq \textsf{real} \to \textsf{bool}$ (exercise for the reader). However, this cannot be true, for the reason that $\simeq$ preserves the decidability of extensional equivalence (another exercise for the reader), yet:

- extensional equivalence between values of type
`real`

is not decidable, because this would let us decide the halting problem (recall that we defined`real = nat -> bool`

); but - extensional equivalence between values of type
`real -> bool`

*is*decidable, for topological reasons that I don’t understand.^{7}*This is really neat and surprising!*

In general, I love finding silly uses of various mathematical results; one of my favorite posts that I’ve ever written is on using Fermat’s last theorem to solve a problem in an undergraduate theory class.

Due to an obscure optimization, OCaml integers are actually 63-bit. This is so that you can store $n$ as $2n + 1$, so as to distinguish integers from pointers (which must always be even numbers due to memory alignment requirements, at least on most modern systems).↩︎

For those unfamiliar with the term, this is sometimes called the

*tagged union*(or the “coproduct in the category of sets,” if you want to sound fancy) $A + B = \{(0,a) \mid a \in A \} \cup \{(1,b) \mid b \in B\}$. Of course, we can replace the labels $0$ and $1$ with anything we like, such as the constructor names of the variant in question.↩︎Here, I used the notation $B^A = \{ f \mid f : A \to B \}$. The “exponential” notation is somewhat suggestive.↩︎

There is a sneaky thing I did here, which is to slip in the assumption that the notion of “equality” between values makes sense. But this is not necessarily a given: what does it mean for two functions to be equal, for instance? In this case, the natural choice is to restrict ourselves to pure, total functions and to say that by equality, we really mean

*extensional equivalence*. Note that it is not the case that equality between values is decidable on all types!↩︎From a categorical perspective, this is to say that

`unit`

is final in the category of types.↩︎This pared-down OCaml is very similar in spirit to PCF, which is essentially the simply-typed lambda calculus enriched with recursion.↩︎

See this guest post by Martín Escardó on Andrej Bauer’s blog for details.↩︎

## Comments