# How to use Dafny to prove type safety

There are many ways even well tested programs can go wrong (see my previous blog post on how Dafny helps). Static typing often solves a class of these programming issues entirely by preventing unintended usage of a value.

This blog post takes a deep dive… take a deep breath… on how to write *terms of a programming language*, and both a *type-checker* and an *evaluator* on such terms, such that the following *soundness property* holds:

- [Progress] If a type checker accepts a term, then the evaluator will
not get stuckon that term.- [Preservation] If a term has a type T, then the evaluator will also return a term of type T

We will illustrate this using the infrastructure of the following Blockly workspace. If you build a full term and pass it to “Type check”, on success, the block is surrounded with green, otherwise with red. If surrounded with green, then attaching the term to “Evaluate” and clicking on “Evaluate” will always do something, except if the term is a final value. “Evaluate” is surrounded in red if its term does not type check.

# Play with the evaluator and the type checker

By the way, have you seen that `Pred(False)`

does not type check?
Someone made the remark that it’s counter-intuitive, because they thought that `Pred`

would
stand up for `Predicate`

and so it should type check. But `Pred`

stands for `Predecessor`

,
and that’s why it applies only to numbers. The type-checker’s role is also to solve such ambiguities before the execution.

# Examples

Feel free to click on the examples below to load them in the Blockly workspace above.

# Writing a type checker in Dafny

Writing a type-checker that guarantees that evaluation of a term won’t get stuck is not an easy task as we must dive into maths, but fortunately, Dafny makes it easier.

#### The term language

First, let’s define the term language used in the Blockly interface above. Note that the logic of the Blockly workspace above uses that exact code written on this page. Yeah, Dafny compiles to JavaScript too!

#### The type language

Let’s also add the two types our expressions can have.

#### The type checker

We can now write a *type checker* for the terms above. In our case, a type checker will take a term, and return a type if the term has that type. We will not dive into error reporting in this blog post.
First, because a term may or may not have a type, we want an `Option<A>`

type like this:

Now, we can define a function that computes the type of a term, if it has one. For example, in a conditional term, the condition has to be a boolean, while we only require the “then” and “else” part to have the same, defined type. In general, computing types is a task linear in the size of the code, whereas evaluating the code could have any complexity. This is why type checking is an efficient way of preventing obvious mistakes.

A well-typed term is one for which a type exists.

#### The evaluator and the progress check

At first, we can define the notion of evaluating a term. We can evaluate a term using small-step semantics, meaning we only replace a term or a subterm by another one.
**Not being stuck** means that we will always be able to find a term to “replace” or to “compute”, it’s a bit of a synonym here.

There are terms where no replacement is possible: value terms. Here is what we want them to look like: either booleans, zero, positive integers, or negative integers.

Now, we can write our one-step evaluation method. As a requirement, we add that the term must be well-typed and nonfinal.

The interesting points to note about the function above, in a language like Dafny where every pattern must be exhaustive, are the following:

- Every call consists either of
`OneStepEvaluate`

on one sub-argument, or a transformation that reduces the size of the tree. So something is always happening here. - All the cases are covered, Dafny does not complain!
- For example, when encountering the case
`IsZero(e)`

, if`e`

is a final value, it must be either`Pred`

or`Succ`

. It cannot be`True`

or`False`

as it’s well-typed and the previous pattern precludes`Zero`

. - Similarly, if the condition of an if term is a final value, because it’s well-typed, Dafny knows it’s either
`True`

or`False`

.

- For example, when encountering the case

That concludes the *progress* part of soundness checking: whenever a term type-checks, there is always an applicable small step evaluation rule unless it’s a final value.

#### The preservation check

Soundness has another aspect, preservation, as stated in the intro. It says that, when evaluating a well-typed term, the evaluator will not get stuck and the result will have the same type as the original term. Dafny can also prove it for our language, out of the box. Well done, that means our language and evaluator make sense together!

#### Conclusion

All the code above powers this page, which is why I can guarantee you that you won’t be able to find a term that the type checker accepts and that won’t result in a final value. Of course, in a real programming language term, you might add some infinite loops, but the soundness property above is not about termination, it’s about constant progress, which you also want in embedded systems to ensure they never need reboot.

Now that you know what a type checker is and how to implement one in Dafny, perhaps you will feel much better prepared to model and experiment on your new programming language, like recently the Cedar team did?

This is the end of the blog post. I hope you enjoyed it!

## Bonus: more advanced modeling

If you are looking for some advanced concepts, feel free to continue reading! Beware, math ahead!

Sometimes, modeling the evaluator and the type-checker as functions is not enough. One wants to model them as relations, and determine some properties about these relations, such as the order of evaluation being irrelevant for the final result.

In the rest of this blog post, largely inspired by the book “Types and Programming Languages”, Chapter 8, written by Benjamin Pierce, I will illustrate one element of the proof: the one that inductive and constructive versions of the set of terms are equivalent. Having equivalence enables obtaining other results out of the scope of this blog post, including that the order of evaluation does not matter.

With the help of this trick, it becomes possible to prove similar equivalences for different inductive and constructive definitions of:

- The set of
`(Expr, Expr)`

of small-step evaluations - The set of
`(Expr, Type)`

of type checking

but I leave these as an exercise for the interested reader.

In Types and Programming Languages, chapter 3.2, we discover that there are two other mathematical definitions of the “set of all terms”.
The first one in definition 3.2.1 states that the set of *terms* is the smallest set \(𝒯\) such that:

- \(\{\texttt{true}, \texttt{false}, 0\} \subseteq 𝒯\);
- if \(t_1 \in 𝒯\), then \(\{\texttt{succ}\;t_1, \texttt{pred}\;t_1, \texttt{is_zero}\;t_1\} \subseteq 𝒯\);
- if \(t_1 \in 𝒯\), \(\;\; t_2 \in 𝒯\) and \(t_3 \in 𝒯\), then \(\texttt{if}\;t_1\;\texttt{then}\;t_2\;\texttt{else}\;t_3 \in 𝒯\).

Note that these terms omit `Double`

and `Add`

above. This means we cannot state that this set is the same as `set t: Term | true`

as one would like to write, but let’s continue.

We can write the inductive definition above in Dafny too:

The second definition for the set of all terms in section 3.2.3 is done constructively. We first define a set \(S_i\) for each natural number \(i\), as follows

\(S_0 = ∅\);

\(\begin{aligned}S_{i+1} = && && \{\texttt{true}, \texttt{false}, 0\} \\ && \bigcup && \{\texttt{succ}\;t_1, \texttt{pred}\;t_1, \texttt{is_zero}\;t_1 \mid t_1 \in S_i \} \\ && \bigcup && \{\texttt{if}\;t_1\;\texttt{then}\;t_2\;\texttt{else}\;t_3\mid t_1 \in S_i,\; t_2 \in S_i,\; t_3 \in S_i\}\end{aligned}\).

This we can enter in Dafny too:

But now, we are left with the existential question: are these two sets the same?
We rush in Dafny and write a lemma ensuring `AllTermsConstructively == AllTermsInductively`

by invoking the lemma `InductiveAxioms()`

, but… Dafny can’t prove it.

If you think deeply about it, how do you know that the two are the same? It seems obvious but why? It seems straightforward to prove that `AllTermsInductively <= AllTermsConstructively`

because by definition, `AllTermsConstructively`

obeys induction rules. But is it the smallest of such sets? But what if there was an element of `AllTermsConstructively`

that is not in `AllTermsInductively`

? It could actually happen if, instead of a datatype, we only had a trait, and some external user could implement new terms yet unknown to us.

Here is Benjamin Pierce’s proof sketch, then translated and verified in Dafny.

- First, prove
`AllTermsInductively <= AllTermsConstructively`

by showing that`AllTermsConstructively`

satisfies the predicate`InductionCriteria`

. - Second, for any set
`someset`

satisfying the induction criteria, for every`i`

, we prove by induction that*every set of terms*.`S(i)`

is inside`someset`

`AllTermsConstructively`

being the*union*of all these`S(i)`

, it is also contained in any set satisfying the induction criteria, including`AllTermsInductively`

which is the smallest one, so`AllTermsConstructively <= AllTermsInductively`

- From 1. and 3. we obtain
`AllTermsConstructively == AllTermsInductively`

.

Let’s prove it in Dafny!

## 0. Intermediate sets are cumulative

First, we want to show that, for every `i <= j`

, we have `S(i) <= S(j)`

(set inclusion). We do this in two steps: first, we show this cumulative effect between two consecutive sets, and then
between any two sets.

We use the annotation `{:vcs_split_on_every_assert}`

which makes Dafny verify each assertion independently, which, in this example, helps the verifier. Yes, helping the verifier is something we must occasionally do in Dafny.
To further control the situation, we use the annotation `{:induction false}`

to ensure Dafny does not try to prove induction hypotheses by itself, which gives us control over the proof. Otherwise, Dafny can both automate the proof a lot (which is great!) and sometimes time out because automation is stuck (which is less great!). I left assertions in the code so that not only Dafny, but you too can understand the proof.

## 1. Smallest inductive set contained in constructive set

After proving that intermediate sets form an increasing sequence, we want to prove that the smallest inductive set is contained in the constructive set. Because the smallest inductive set is the intersection of all sets that satisfies the induction criteria, it suffices to prove that the constructive set satisfies the induction criteria.

Note that I use the annotation `{:rlimit 4000}`

which is only a way for Dafny to say that every assertion batch should verify using less than 4 million resource units (unit provided by the underlying solver), which reduces the chances of proof variability during development.

## 2. Intermediate constructive sets are included in every set that satisfy the induction criteria

Now we want to prove that every `S(i)`

is included in every set that satisfies the induction criteria. That way, their union, the constructive set, will also be included in any set that satisfies the induction criteria. The proof works by remarking that every element of `S(i)`

is built from elements of `S(i-1)`

, so if these elements are in the set satisfying the induction criteria, so is the element by induction.
I intentionally detailed the proof so that you can understand it, but if you run it yourself, you might see that you can remove a lot of the proof and Dafny will still figure it out.

## 3. The constructive set is included in the smallest inductive set that satisfies the induction criteria

We can deduce from the previous result that the constructive definition of all terms is also included in any set of term that satisfies the induction criteria. From this we can deduce automatically that the constructive definition of all terms is included in the smallest inductive set satisfying the induction criteria.

## 4. Conclusion with the equality

Because we have `<=`

and `>=`

between these two sets, we can now prove equality.

## Bonus Conclusion

We were able to put together two definitions for infinite sets, and prove that these sets were equivalent. As stated in the introduction, having multiple definitions of a single infinite set makes it possible to pick the definition adequate to the job to prove other results. For example,

- If a term is in the constructive set, then it cannot be constructed with
`Add`

for example, because it would need to be in a`S(i)`

and none of the`S(i)`

define`Add`

. This can be illustrated in Dafny with the following lemma:

which Dafny can verify pretty easily. However, if you put `AllTermsInductively`

instead of `AllTermsConstructively`

, Dafny would have a hard time figuring out.

- If
`x`

is in the inductive set, then`Succ(x)`

is in the inductive set as well. Dafny can figure it out by itself using the`AllTermsInductively`

definition, but won’t be able to do it with`AllTermsConstructively`

without a rigorous proof.

This could be useful for a rewriter or an optimizer to ensure the elements it writes are in the same set.

Everything said, everything above can be a bit overweight for regular Dafny users. In practice, you’re better off writing the inductive predicate explicitly as a function rather than an infinite set with a predicate, so that you get both inductive and constructive axioms that enable you to prove something similar to the two results above.

This above example illustrates what Dafny does best: it automates all the hard work under the hood
so that you can focus on what is most interesting to you, and even better, it ensures you don’t need to define `{:axiom}`

yourself in this case.
I hope you give Dafny a try and looking forward to your interesting questions!