Tree-Sitter-BNF: All Bases Covered

I know that in the last weeks I have been posting too much about this tool, but I am having fun developing it, and I hope it gets useful for someone. Version 0.4.0 of tree-sitter-bnf-tools is out, and the headline this time is completeness: the BNF dialect now covers the full set of tree-sitter grammar-level directives, and check has grown substantially smarter at catching problems before tree-sitter generate gets a chance to. This way, we reduce the cycle length of developing a grammar, testing, finding errors, and getting back to the grammar rewrite.

The Missing Directives

Tree-sitter grammars have a handful of top-level fields — word, externals, precedences, reserved — that previously had no BNF equivalent. They’re all there now:

  • %word ruleName declares the identifier token for keyword extraction and better error recovery.
  • %externals name1, name2 lists tokens defined by an external C scanner instead of the grammar.
  • %precedences [a, b], [c, d] declares named precedence groups in descending priority order.
  • %reserved setName: [r1, r2] defines named reserved-word sets, with per-occurrence overrides via (body %reserved setName) in rule bodies.

Named precedence levels can now be used in %prec: you can now write %prec 'unary' and have it emit prec('unary', …) in grammar.js, as long as the name is declared in some %precedences group. And for completeness, negative integers (%prec -1) and regex flag suffixes (/pattern/i) are now valid syntax too.

Check Yourself Before You Wreck Yourself

Now, check is smarter. A whole class of issues that used to slip through check and blow up later at tree-sitter generate time now fail early:

  • Undefined rule references in %conflicts, %inline, %supertypes, %externals, and %reserved are now errors, not warnings.
  • A hidden start rule (via _ prefix or listed in %supertypes) is caught immediately.
  • Invalid %inline targets — the start rule, an external token, or a pure token body — are all reported.
  • %word‘s target must be a pure token with a unique body; both violations are now caught.
  • A %supertypes rule with a pure-token body or multi-step alternatives is rejected.
  • A name declared in both %externals and a rule body is an error.

Error messages also got better: syntax errors now report file, line, column, and a source snippet for every problem in the file (up to 10), instead of a bare SyntaxError.

One More Thing

convert --generate now writes a minimal tree-sitter.json to the output directory when one doesn’t exist. This satisfies tree-sitter ≥ 0.25’s ABI 15 requirement and silences the fallback-to-ABI-14 warning without any extra steps.

Documentation

If you are not sure what all this is about, we got you covered as well. The tutorial now includes a section dedicated to parsers tree-sitter concepts (LR parsing, shift-reduce conflicts, GLR, keyword extraction, external scanners), and we also included a worked-example, where you can see the tool working in a real grammar from scratch.

Install or upgrade via Cargo:

cargo install ts-bnf-tool

Leave a Reply