Parser Combinators in Dafny

Imagine you need to parse and format complex nested structures like this LISP factorial function:

(define (factorial n) (if (= n 0) 1 (* n (factorial (- n 1)))))

Most parsing approaches would require hundreds of lines of complex, error-prone code. Traditional parser generators like ANTLR or Coco/R involve separate compilation phases with grammar files and lexer specifications. But with parser combinators, you can build an elegant, working parser in just a few dozen lines. Even better - it compiles to JavaScript and runs in your browser! Plus, it's available as a new Dafny standard library using --standard-libraries. Written in Dafny, these parser combinators have unique guarantees: they always terminate (even recursive ones) and never throw runtime exceptions. They're optimized enough for quick prototyping and can serve as specification language for more performance-critical implementations.

Interactive S-Expression Formatter

Try it yourself! Enter any S-expression below and watch it get parsed and beautifully formatted:

Formatted output will appear here

This formatter handles nested structures, proper indentation, and error reporting—try adding an extra closing parenthesis!—all built from simple, composable pieces using just 23 lines of parser combinators and 239 lines of datatypes and helpers. How is this possible? Let's explore the building blocks.

The Building Blocks

Parser combinators are higher-order functions that build complex parsers from simple ones. Each parser returns a ParseResult with either the parsed value and remaining input, or an error. Let's see how they work:

import opened Std.Parsers.StringBuilders

Character Testing (`CharTest`) - Parsing Anger

Let's say we want to create a parser that parses one angry smiley. Without combinators, we'd write something like this:

const AngerParser := (input: string) =>
  if |input| >= 1 && (input[0] == '😠' || input[0] == '😡' || input[0] == '🤬' || input[0] == '😤') then
    ParseSuccess(input[0], input[1..])
  else
    ParseFailure(Recoverable, FailureData("expected Angry Smiley", input, None))

That's verbose and error-prone! The CharTest combinator makes this much cleaner by taking a predicate function and handling all the boilerplate:

const AngerParser := CharTest(
  c => c == '😠'
    || c == '😡'
    || c == '🤬'
    || c == '😤',
  "Angry Smiley")

Try it: Click an example or enter your own text:

Parsed:

Remaining:

Notice how the combinator approach also generates nice error messages automatically! Try the last two examples above - when parsing fails, you get clear feedback about what was expected. This is another advantage of combinators: they handle both success and failure cases elegantly.

Repetition (Rep) - Parsing Joy

The Rep combinator applies a parser zero or more times. Let's use it to parse sequences of joyful characters:

const JoyParser := CharTest(
  c => c == '😀'
    || c == '😃'
    || c == '😄'
    || c == '😁'
    || c == '🥳',
  "joy").Rep()

Try it: Click an example or enter your own text:

Parsed:

Remaining:

Mapping (M) - Joy Score Calculator

The M combinator transforms the result of a parser. Let's create a joy score calculator that gives 2 points for each joyful character:

const JoyScoreParser := CharTest(
  c => c == '😀'
    || c == '😃'
    || c == '😄'
    || c == '😁'
    || c == '🥳',
  "joy").Rep().M(
    joyString => |joyString| * 2
  )

Try it: Click an example or enter your own text:

Parsed:

Remaining:

Atoms - The Building Blocks of S-Expressions

S-expressions are made of atoms (like factorial, +, 42) and lists (like (+ 1 2)). Let's start by parsing atoms - any non-empty sequence (Rep1) of characters that isn't a parenthesis or semicolon:

const AtomParser := CharTest(
  c => c != '(' && c != ')' && c != ';' 
    && c != ' ' && c != '\t' && c != '\n',
  "atom character"
).Rep1()

Try it: Click an example or enter your own atoms:

Parsed:

Remaining:

Numbers vs Symbols - Choice in Action

The letter O (choice) combinator tries parsers in sequence until one succeeds. Let's use it to distinguish between numbers and symbols:

const NumberOrSymbol := O([
  CharTest(c => '0' <= c <= '9', "digit")
    .Rep1()
    .M(digits => "NUMBER:" + digits),
  AtomParser.M(atom => "SYMBOL:" + atom)
])

Try it: See how the parser chooses between numbers and symbols:

Parsed:

Remaining:

Lists - Parsing S-Expression Structure

S-expressions use parentheses to create lists. Let's build a parser that recognizes the start of a function call - an opening parenthesis followed by a function name. The concatenation operators let us combine parsers:

I_I: Keeps both results
e_I: Keeps only the right result (discard left)
I_e: Keeps only the left result (discard right)

The naming convention: I means "include" and e means "exclude". We'd prefer Scala-style arrows like <~ and ~>, but Dafny identifiers can't start with underscores, so we use this readable alternative.

const ConcatDemo_I_I := S("(").I_I(AtomParser).M(
  (pair: (string, string)) => 
    "BOTH: (" + pair.0 + ", " + pair.1 + ")"
)
const ConcatDemo_e_I := S("(").e_I(AtomParser).M(
  name => "RIGHT: " + name
)
const ConcatDemo_I_e := S("(").I_e(AtomParser).M(
  paren => "LEFT: " + paren
)

Try it: Choose a concatenation operator and see how it affects the result:

Concatenation Operator:

Parsed:

Remaining:

Whitespace - The Invisible Glue

S-expressions need whitespace to separate atoms. The built-in WS parser handles spaces (including unicode spaces), tabs, and newlines. Let's see how it works with our atoms:

const AtomWithSpaces := AtomParser.I_e(WS)

Try it: See how whitespace gets consumed after parsing atoms:

Parsed:

Remaining:

Recursion (Rec) - Balanced Payments

The Rec combinator makes it possible to create recursive parsers - parsers that can call themselves. Instead of the usual balanced parentheses parsing example, let's create a fun example: parsing balanced payments where you will get exactly one apple A after each coin $ you find, but before you buy an apple, you can have other transactions. You can also have multiple sequential transactions, but at the end you need to spend all your coins.

const BalancedPayment: B<string> := Rec((transaction: B<string>) =>
  O([
    S("$").e_I(transaction).I_e(S("A")).M(
      (transaction: string) => 
        "COIN " + transaction + "APPLE! "
    ).Rep().M((transactions: seq<string>) =>
       Std.Collections.Seq.Flatten(transactions)
    ),
    S("")
  ])).End()

Try it: Test balanced and unbalanced payment patterns:

Parsed:

Remaining:

Notice how Rec allows the parser to call itself! The BalancedPayment parser can contain another BalancedPayment, creating nested structures. This is exactly how parentheses work in programming languages - each opening parenthesis must have a matching closing one.

Important: Since Dafny programs must terminate, Rec includes built-in protection against infinite recursion. If a recursive parser made no progress (doesn't consume any input), Dafny will return a parse error.

For parsers that might hit stack limits, Dafny also provides RecNoStack - see the SmtParser example for usage.

Building the Complete S-Expression Formatter

From parsing angry emojis to balanced payments, you've seen how simple combinators compose into powerful parsers. The S-expression formatter combines these same building blocks.

The Data Structure

Our S-expression parser uses a data structure that handles comments as first-class citizens:

datatype SExpr =
  | Atom(name: string)
  | List(items: seq<SExpr>)
  | Comment(comment: string, underlyingNode: SExpr)

The Comment variant wraps around other expressions, preserving the logical structure while keeping comments attached to their relevant code.

The Core Parser

The actual parser that powers the formatter above is surprisingly concise - just 20 lines! Here's the complete parser combinators code:

  const noParensNoSpace :=
    CharTest((c: char) =>
      c != '(' && c != ')' && c != ' ' && c != '\t'
      && c != '\n' && c != '\r', "atom character").Rep1()

  const notNewline :=
    CharTest((c: char) => c != '\n', "anything except newline")

  const commentText: B<string> :=
    S(";").e_I(notNewline.Rep()).I_e(O([S("\n"), EOS.M(x => "")]))

  const parserSExpr: B<SExpr> :=
    Rec(
      (SExpr: B<SExpr>) =>
        O([ commentText.I_e(WS).I_I(SExpr).M(
              (commentAndExpr: (string, SExpr)) =>
                Comment(commentAndExpr.0, commentAndExpr.1)),
            S("(").e_I(WS).Then(
              (r: string) => SExpr.I_e(WS).Rep().I_e(S(")")).I_e(WS)
            ).M((r: seq<SExpr>) => List(r)),
            noParensNoSpace.M((r: string) => Atom(r)).I_e(WS)]))

  const p: B<SExpr> :=
    parserSExpr.I_e(WS).End()

  const topLevelParser: B<TopLevelExpr> :=
    WS.e_I(parserSExpr.I_e(WS).Rep()).I_e(WS).End().M(
      (items: seq<SExpr>) => TopLevel(items))

You can see the Rec combinator in action here - just like in the balanced payment example, it allows the parser to call itself to handle arbitrarily nested structures.

Syntactic Sugar

The formatter generates syntactic sugar for common Lisp constructs (like define → function, if → if-then-else, infix operators) for better readability. The full implementation is in the source code.

No tokenization step - the string flows directly through the parser combinators, with each one consuming characters and building up the final syntax tree. If that is scary, note that you can always create a parser to tokenize before creating a parser that builds the tree out of the tokens. Note also that B3 is a project that successfully used these parser combinators and does not have a lexer.

Conclusion - The Power of Composition

From basic character tests like CharTest(c => c == '😠', "Angry Smiley"), we built a complete S-expression parser using simple combinators:

Rep() for repetition
M() for transformation
O([]) for choice
I_I, e_I, I_e for concatenation
Rec() for recursion

207 lines of verified, composable code that parses and emits beautiful syntactic sugar. This is parser combinators: turning parsing from an arcane art into a compositional science.

Best of all, Dafny compiles to JavaScript, C#, Go, Python and Java - so your parser combinators work across platforms with the same verified code.

Start with CharTest, add some combinators, and see where composition takes you. Explore more in the Dafny Standard Libraries.

Need to debug your parser combinators? Check out the debugging guide for tips on troubleshooting parser issues and understanding failure modes.

Interactive S-Expression Formatter

The Building Blocks

Character Testing (CharTest) - Parsing Anger

Repetition (Rep) - Parsing Joy

Mapping (M) - Joy Score Calculator

Atoms - The Building Blocks of S-Expressions

Numbers vs Symbols - Choice in Action

Lists - Parsing S-Expression Structure

Whitespace - The Invisible Glue

Recursion (Rec) - Balanced Payments

Building the Complete S-Expression Formatter

The Data Structure

The Core Parser

Syntactic Sugar

Conclusion - The Power of Composition

Character Testing (`CharTest`) - Parsing Anger