Tree-Sitter BNF Tool, version 0.2.0

A few weeks ago I wrote about Writing Tree-sitter grammars in BNF, introducing a new toy project, a utility to write grammars in a clean BNF notation and convert them to the Tree-Sitter JavaScript syntax. The project repository is on GitHub: ambs/tree-sitter-bnf-tools.

Version 0.2.0 is out now. I introduced a couple of new sub-commands to make the tool more useful.

Breaking change: convert is now explicit

In the first version, ts-bnf-tool <file> implicitly converted the file in BNF notation to the JS syntax, but as I wanted to introduce some new sub-commands, that default is gone. You now have to say which sub-command you want, and the old default behavior is not available as the convert sub-command.

# before (v0.1.0)
ts-bnf-tool grammar.bnf

# after (v0.2.0)
ts-bnf-tool convert grammar.bnf
Bash

This sub-command got several additions:

Strict mode

The flag --strict exits with a non-zero value when any warnings are present. The output is still written before exiting so you can inspect it. It conflicts with --no-check that, as the name suggests, skips all checks.

ts-bnf-tool convert --strict grammar.bnf
Bash

Source mapping comments.

The generated grammar now includes comments above each rule, with the filename and line, mapping the JavaScript back to the originating BNF line. Handy when debugging tree-sitter error messages:

// grammar.bnf:5
expression: $ => choice(
  $.binary_expr,
  $.unary_expr,
),
JavaScript

Omit them with the --rules-only flag.

Header comment.

Every generated file now starts with a comment, with the name of the file used as source for the conversion, the version of the Tree-Sitter BNF tool, and a note stating that the file is generated.

// Generated by ts-bnf-tool v0.2.0 from grammar.bnf — do not edit by hand.
JavaScript

Use the flag --no-header to suppress it.

Grammar name validation.

Up now, the grammar name was inferred from the BNF file, but no check was done to understand if the inferred grammar name is valid. As an example, a filename can start with a digit, but that can not happen in a grammar. Now, if the inferred grammar name is not a valid JavaScript identifier (e.g. my-grammar.bnf → name my-grammar), convert warns you. Pass --name to override with a custom name, --strict to treat it as fatal, or --no-check to suppress it.

format: canonical style for BNF files

The new format sub-command pretty-prints a BNF file in a consistent style: uniform spacing around the operators, and one alternative per line when a rule exceeds 80 characters. The directives are emitted first, in the top of the file, in a standard order.

# preview the result
ts-bnf-tool format grammar.bnf

# rewrite in place (atomic write, safe to use in scripts)
ts-bnf-tool format --in-place grammar.bnf

# exit non-zero if the file is not already formatted (for CI)
ts-bnf-tool format --check grammar.bnf
Bash

For now, the flag --strip-comments is the default. It means that commends are removed from the output and, if you use the --check flag, the comparison ignores comments so a well-formatted file with comments still passes.

The --check flag is especially useful to enforce style in CI:

  - run: ts-bnf-tool format --check grammar.bnf
YAML

check: static analysis for grammar correctness

The check sub-command runs all static checks and exits non-zero if anything is wrong — exit code 2 on errors, 1 on warnings only, 0 when clean. It is the right command to add to a CI pipeline if you want a grammar-correctness gate separate from building the parser.

ts-bnf-tool check grammar.bnf
Bash

Three categories of issues are caught.

  • Left-recursion detection. Tree-sitter cannot generate a parser for left-recursive grammars. check flags both direct and mutual left-recursion as errors: error: rule 'expr' is directly left-recursive (line 3) This was previously a warning on convert; it is now an error that aborts conversion unless you pass --no-check.
  • Unreachable rules. Rules that are never referenced by any other rule or directive are reported as warnings. The root rule (the first rule in the file) is always exempt, as are rules listed in %extras: warning: rule 'unused_helper' is never referenced (line 42)
  • Duplicate rule names. Defining the same rule name more than once is a likely mistake. check (and convert) now warn about it; the second definition wins: `warning: rule 'expression' is defined more than once (line 17).

All diagnostics include a source line number, which was missing in 0.1.0.

Editor support

On top of the already existing syntax highlighting, there are two additions for those who like to pimp their editors:

  • folds.scm marks each rule node as foldable, so editors with tree-sitter fold support can collapse individual rules to navigate large grammars.
  • docs/editors.md is a step-by-step setup guide for Neovim (nvim-treesitter) and Helix, covering parser installation, query file placement, and filetype registration.

firsts subcommand

ts-bnf-tool firsts grammar.bnf prints the FIRST set of each rule — the terminals that can appear as the first token of any string the rule derives. It is a grammar-analysis tool aimed at debugging ambiguities and understanding how tree-sitter will tokenise your language.

Leave a Reply