Parser Combinators in Dafny
Imagine you need to parse and format complex nested structures like this LISP factorial function:
(define (factorial n) (if (= n 0) 1 (* n (factorial (- n 1)))))
Most parsing approaches would require hundreds of lines of complex, error-prone code. Traditional parser generators
like ANTLR or Coco/R involve separate compilation phases with grammar files and lexer specifications. But with parser combinators, you
can build an elegant, working parser in just a few dozen lines. Even better - it compiles to JavaScript and runs in
your browser! Plus, it's available as a new Dafny standard library using --standard-libraries. Written in Dafny, these parser combinators have unique guarantees: they always terminate (even recursive ones) and never throw runtime exceptions. They're optimized enough for quick prototyping and can serve as specification language for more performance-critical implementations.
Interactive S-Expression Formatter
Try it yourself! Enter any S-expression below and watch it get parsed and beautifully formatted:
This formatter handles nested structures, proper indentation, and error reportingβtry adding an extra closing parenthesis!βall built from simple, composable pieces using just 23 lines of parser combinators and 239 lines of datatypes and helpers. How is this possible? Let's explore the building blocks.
The Building Blocks
Parser combinators are higher-order functions that build complex parsers from simple ones. Each parser returns a ParseResult with either the parsed value and remaining input, or an error. Let's see how they work:
import opened Std.Parsers.StringBuilders
Character Testing (CharTest) - Parsing Anger
Let's say we want to create a parser that parses one angry smiley. Without combinators, we'd write something like this:
const AngerParser := (input: string) =>
if |input| >= 1 && (input[0] == 'π ' || input[0] == 'π‘' || input[0] == 'π€¬' || input[0] == 'π€') then
ParseSuccess(input[0], input[1..])
else
ParseFailure(Recoverable, FailureData("expected Angry Smiley", input, None))
That's verbose and error-prone! The CharTest combinator makes this much cleaner by taking a predicate
function and handling all the boilerplate:
const AngerParser := CharTest(
c => c == 'π '
|| c == 'π‘'
|| c == 'π€¬'
|| c == 'π€',
"Angry Smiley")
Try it: Click an example or enter your own text:
Notice how the combinator approach also generates nice error messages automatically! Try the last two examples above - when parsing fails, you get clear feedback about what was expected. This is another advantage of combinators: they handle both success and failure cases elegantly.
Repetition (Rep) - Parsing Joy
The Rep combinator applies a parser zero or more times. Let's use it to parse sequences of joyful
characters:
const JoyParser := CharTest(
c => c == 'π'
|| c == 'π'
|| c == 'π'
|| c == 'π'
|| c == 'π₯³',
"joy").Rep()
Try it: Click an example or enter your own text:
Mapping (M) - Joy Score Calculator
The M combinator transforms the result of a parser. Let's create a joy score calculator that gives 2
points for each joyful character:
const JoyScoreParser := CharTest(
c => c == 'π'
|| c == 'π'
|| c == 'π'
|| c == 'π'
|| c == 'π₯³',
"joy").Rep().M(
joyString => |joyString| * 2
)
Try it: Click an example or enter your own text:
Atoms - The Building Blocks of S-Expressions
S-expressions are made of atoms (like factorial, +, 42) and
lists (like (+ 1 2)).
Let's start by parsing atoms - any non-empty sequence (Rep1) of characters that isn't a parenthesis or semicolon:
const AtomParser := CharTest(
c => c != '(' && c != ')' && c != ';'
&& c != ' ' && c != '\t' && c != '\n',
"atom character"
).Rep1()
Try it: Click an example or enter your own atoms:
Numbers vs Symbols - Choice in Action
The letter O (choice) combinator tries parsers in sequence until one succeeds. Let's use it to distinguish
between numbers and symbols:
const NumberOrSymbol := O([
CharTest(c => '0' <= c <= '9', "digit")
.Rep1()
.M(digits => "NUMBER:" + digits),
AtomParser.M(atom => "SYMBOL:" + atom)
])
Try it: See how the parser chooses between numbers and symbols:
Lists - Parsing S-Expression Structure
S-expressions use parentheses to create lists. Let's build a parser that recognizes the start of a function call - an opening parenthesis followed by a function name. The concatenation operators let us combine parsers:
I_I: Keeps both resultse_I: Keeps only the right result (discard left)I_e: Keeps only the left result (discard right)
The naming convention: I means "include" and e means "exclude". We'd prefer Scala-style
arrows like <~ and ~>, but Dafny identifiers can't start with underscores, so we use
this readable alternative.
const ConcatDemo_I_I := S("(").I_I(AtomParser).M(
(pair: (string, string)) =>
"BOTH: (" + pair.0 + ", " + pair.1 + ")"
)
const ConcatDemo_e_I := S("(").e_I(AtomParser).M(
name => "RIGHT: " + name
)
const ConcatDemo_I_e := S("(").I_e(AtomParser).M(
paren => "LEFT: " + paren
)
Try it: Choose a concatenation operator and see how it affects the result:
Whitespace - The Invisible Glue
S-expressions need whitespace to
separate atoms. The built-in WS parser handles spaces (including unicode spaces), tabs, and newlines.
Let's see how it works with our atoms:
const AtomWithSpaces := AtomParser.I_e(WS)
Try it: See how whitespace gets consumed after parsing atoms:
Recursion (Rec) - Balanced Payments
The Rec combinator makes it possible to create recursive parsers - parsers that can call themselves. Instead of the usual balanced parentheses parsing example, let's create a
fun example: parsing balanced payments where you will get exactly one apple A after each coin $ you find, but before
you buy an apple, you can have other transactions. You can also have multiple sequential transactions, but at the end you need to spend all your coins.
const BalancedPayment: B<string> := Rec((transaction: B<string>) =>
O([
S("$").e_I(transaction).I_e(S("A")).M(
(transaction: string) =>
"COIN " + transaction + "APPLE! "
).Rep().M((transactions: seq<string>) =>
Std.Collections.Seq.Flatten(transactions)
),
S("")
])).End()
Try it: Test balanced and unbalanced payment patterns:
Notice how Rec allows the parser to call itself! The BalancedPayment parser can contain another
BalancedPayment, creating nested structures. This is exactly how parentheses work in programming languages - each
opening parenthesis must have a matching closing one.
Important: Since Dafny programs must terminate, Rec includes built-in protection against
infinite recursion. If a recursive parser made no progress (doesn't consume any input), Dafny will return a parse error.
For parsers that might hit stack limits, Dafny also
provides RecNoStack - see the SmtParser
example for usage.
Building the Complete S-Expression Formatter
From parsing angry emojis to balanced payments, you've seen how simple combinators compose into powerful parsers. The S-expression formatter combines these same building blocks.
The Data Structure
Our S-expression parser uses a data structure that handles comments as first-class citizens:
datatype SExpr =
| Atom(name: string)
| List(items: seq<SExpr>)
| Comment(comment: string, underlyingNode: SExpr)
The Comment variant wraps around other expressions, preserving the logical structure while keeping
comments attached to their relevant code.
The Core Parser
The actual parser that powers the formatter above is surprisingly concise - just 20 lines! Here's the complete parser combinators code:
const noParensNoSpace :=
CharTest((c: char) =>
c != '(' && c != ')' && c != ' ' && c != '\t'
&& c != '\n' && c != '\r', "atom character").Rep1()
const notNewline :=
CharTest((c: char) => c != '\n', "anything except newline")
const commentText: B<string> :=
S(";").e_I(notNewline.Rep()).I_e(O([S("\n"), EOS.M(x => "")]))
const parserSExpr: B<SExpr> :=
Rec(
(SExpr: B<SExpr>) =>
O([ commentText.I_e(WS).I_I(SExpr).M(
(commentAndExpr: (string, SExpr)) =>
Comment(commentAndExpr.0, commentAndExpr.1)),
S("(").e_I(WS).Then(
(r: string) => SExpr.I_e(WS).Rep().I_e(S(")")).I_e(WS)
).M((r: seq<SExpr>) => List(r)),
noParensNoSpace.M((r: string) => Atom(r)).I_e(WS)]))
const p: B<SExpr> :=
parserSExpr.I_e(WS).End()
const topLevelParser: B<TopLevelExpr> :=
WS.e_I(parserSExpr.I_e(WS).Rep()).I_e(WS).End().M(
(items: seq<SExpr>) => TopLevel(items))
You can see the Rec combinator in action here - just like in the balanced payment example, it allows the
parser to call itself to handle arbitrarily nested structures.
Syntactic Sugar
The formatter generates syntactic sugar for common Lisp constructs (like define β
function, if β if-then-else, infix operators) for better readability. The full
implementation is in the source code.
No tokenization step - the string flows directly through the parser combinators, with each one consuming characters and building up the final syntax tree. If that is scary, note that you can always create a parser to tokenize before creating a parser that builds the tree out of the tokens. Note also that B3 is a project that successfully used these parser combinators and does not have a lexer.
Conclusion - The Power of Composition
From basic character tests like CharTest(c => c == 'π ', "Angry Smiley"), we built a complete S-expression
parser using simple combinators:
Rep()for repetitionM()for transformationO([])for choiceI_I,e_I,I_efor concatenationRec()for recursion
207 lines of verified, composable code that parses and emits beautiful syntactic sugar. This is parser combinators: turning parsing from an arcane art into a compositional science.
Best of all, Dafny compiles to JavaScript, C#, Go, Python and Java - so your parser combinators work across platforms with the same verified code.
Start with CharTest, add some combinators, and see where composition takes you. Explore more in the Dafny
Standard Libraries.
Need to debug your parser combinators? Check out the debugging guide for tips on troubleshooting parser issues and understanding failure modes.