I will get back to Rust to investigate more Tree-Sitter features, like cursors. But for my job, I might need to use Tree-Sitter from C#. So, I decided to write a post on how to do that, this time, from Windows. In the future, maybe I can look into how to do that with dotnet in Linux. This post assumes you have access to a Microsoft Visual Studio installation with a proper C compiler.
The first thing we need to remember is the way Tree-Sitter works. As we have seen from using a tree-sitter parser with C or Rust, there is a need to create and compile a library that is later used for parsing purposes. In the case of C, given that the parser was also written in C, we could compile everything together. In the case of Rust, we created a crate that hides the C library and allows us to write the parser in a simple Rust-only crate that depends on the parser crate. For C#, we will need to create a DLL, and then we will be able to use it from our C# application.
Creating a Parser DLL
This tutorial is highly based on the code available in the csharp-tree-sitter GitHub. Although it has everything that is needed, maybe it is not clear to everyone. So, we will go step by step, creating a DLL for our BNF parser.
To start with, create a folder where we will build the DLL. I called it ts
. Inside, put the folder with our tree-sitter-bnf
code. Just copy it from our previous experiments. Also, create another folder with a checkout of the tree-sitter repository:
git checkout https://github.com/tree-sitter/tree-sitter.git
Now, on the root of the ts
folder, add the file tree-sitter.def
, obtained from the csharp-tree-sitter
repository. This file specified the API the Tree-Sitter library answers to. Inside PowerShell you can download that file with:
Invoke-WebRequest https://raw.githubusercontent.com/tree-sitter/csharp-tree-sitter/refs/heads/main/tree-sitter/tree-sitter.def -Outfile tree-sitter.def
We also need a similar file to our BNF library. Create the file tree-sitter-bnf.def
with the following content:
LIBRARY TREE-SITTER-BNF
EXPORTS
tree_sitter_bnf
PlaintextNow, lets add the Makefile, that will be used to compile the library:
CFLAGS=/nologo /FC /Od /Z7 /Gy /diagnostics:column /Itree-sitter/lib/include
LFLAGS=/def:$(@B).def /incremental:no /debug /MACHINE:X64
BIN=dlls
DLLS=$(BIN)/tree-sitter.dll $(BIN)/tree-sitter-bnf.dll
all: dir $(DLLS)
dir:
@if not exist $(BIN)\nul mkdir $(BIN)
clean:
@if exist $(BIN)\nul del /f /q $(BIN)\*
## Tree-Sitter DLL
$(BIN)/tree-sitter.obj: \
tree-sitter/lib/src/alloc.c \
tree-sitter/lib/src/get_changed_ranges.c \
tree-sitter/lib/src/language.c \
tree-sitter/lib/src/lexer.c \
tree-sitter/lib/src/lib.c \
tree-sitter/lib/src/node.c \
tree-sitter/lib/src/parser.c \
tree-sitter/lib/src/query.c \
tree-sitter/lib/src/stack.c \
tree-sitter/lib/src/subtree.c \
tree-sitter/lib/src/tree.c \
tree-sitter/lib/src/tree_cursor.c
cl $(CFLAGS) /Fo:$@ \
/Itree-sitter/lib/src /Itree-sitter/lib/src/unicode \
/c tree-sitter/lib/src/lib.c
$(BIN)/tree-sitter.dll: $(BIN)/tree-sitter.obj
cl /LD $(CFLAGS) /Fe:$@ $** /link $(LFLAGS)
## Tree-Sitter-BNF DLL
$(BIN)/tree-sitter-bnf-parser.obj: tree-sitter-bnf/src/parser.c
cl $(CFLAGS) /Fo:$@ /Itree-sitter-bnf/src /c $**
$(BIN)/tree-sitter-bnf.dll: $(BIN)/tree-sitter-bnf-parser.obj
cl /LD $(CFLAGS) /Fe:$@ $** /link $(LFLAGS)
MakefileNow, open a x64 Developer Command Prompt for VS. Just type “x64 Developer Command” in your search bar, and if you have it installed, you will find it. It is required so we have the compiling tools in the PATH environment variable. It will also ensure the compiler is targeting an x64 architecture. Enter the ts
folder and type nmake
. If everything goes as expected, you will have two dll
files inside the dlls
folder.
Developing the Bnf2Ts Project
Start by creating a console application. I used JetBrains Rider, my IDE of choice for C#. My only special request, on making the project within Rider, was not to use top-level statements. You can use it, of course. I just do not like it much for applications. I named the project Bnf2Ts
.
The next step is to add the bindings.cs
file, with the API of the Tree-Sitter library, pointing to the respective dll
file. It is too large to paste here, so grab it from this link. If you like to use the shell:
Invoke-WebRequest https://raw.githubusercontent.com/tree-sitter/csharp-tree-sitter/refs/heads/main/src/binding.cs -Outfile Bnf2Ts/bindings.cs
This is my first time writing a C# application that accesses an external DLL. Thus, I started by trying to guarantee that everything was linking correctly. Following that idea, this was the minimal code I needed to write so things break. While the code compiles properly, it does not run given the DLL are not found:
using GitHub.TreeSitter;
namespace Bnf2Ts;
class Program
{
static void Main(string[] args)
{
using var parser = new TSParser();
}
}
C#To fix the issue, I right-clicked the solution in Rider, used ‘Add existing item’, navigated into the folder we used to output our DLLS, and selected tree-sitter.dll. When asked if I wanted to copy or link, I selected link. Then, right-clicked the tree-sitter.dll file, and selected copy always
in the build configuration “copy to output directory”. The process should be similar to Visual Studio. If you used VS and managed to do it, and it is not trivial, drop me a line, and I will add some instructions here. This was enough to make the application run.
If it worked, take the chance to link the other DLL as well. And to guarantee it is also working, edit your code to look like this:
using System.Runtime.InteropServices;
using GitHub.TreeSitter;
namespace Bnf2Ts;
class Program
{
[DllImport("tree-sitter-bnf.dll", CallingConvention = CallingConvention.Cdecl)]
private static extern IntPtr tree_sitter_bnf();
private static TSLanguage lang = new TSLanguage(tree_sitter_bnf());
static void Main(string[] args)
{
using var parser = new TSParser();
parser.set_language(lang);
}
}
C#Note we added line 9, with the prototype of the function that initialises the parser data for our BNF parser. We also needed to add line 8 specifying where that method is. Then, line 11 just declares the language as a static variable. You could use it inside Main
, but I decided to use it that way. Finally, added line 16 to initialise the parser. Everything should be ready to do the proper visitor implementation.
Finally, my project was created by default with nullable types, which means that, by default, you can’t assign null
to a type unless it is nullable. Given that bindings.cs
is not prepared for that, I suggest disabling them. In my case, I needed to edit the Bnf2Ts.csproj
file, and set <Nullable>disable</Nullable>
.
Visiting the Parse Tree
This is my main
function:
static int Main(string[] args)
{
if (args.Length != 1)
{
Console.Error.WriteLine("A single argument is required: filename to parse.");
return 1;
}
var filename = args[0];
if (!File.Exists(filename))
{
Console.Error.WriteLine($"File {filename} was not found!");
return 1;
}
var fileContents = File.ReadAllText(filename);
using var parser = new TSParser();
parser.set_language(lang);
var tree = parser.parse_string(null, fileContents);
if (tree is null)
{
Console.Error.WriteLine("Error parsing file.");
return 1;
}
Visit(tree.root_node(), fileContents);
return 0;
}
C#In this code, we check if there is only one argument. If there is, we check if that argument is a filename that exists. If it does, we read its contents and run the parser’s parse_string
method. If the result is not null, then the parsing should succeed, and therefore, we can call the Visit
method with the root node. Note that the Visit
method does not exist yet, we will create it and the related visitors with the same structure as the Rust code. Therefore, my comments will be brief. For more information, please check previous posts on this subject.
The dispatch visitor is similar to Rust. Unfortunately, as methods are void, I couldn’t use the switch expression, and ended up with a crowded switch statement:
private static void Visit(TSNode node, string fileContents)
{
switch (node.type())
{
case "grammar": VisitGrammar(node, fileContents); break;
case "rule": VisitRule(node, fileContents); break;
case "nonTerminal": VisitNonTerminal(node, fileContents); break;
case "ruleBody": VisitRuleBody(node, fileContents); break;
case "symbolSeq": VisitSymbolSeq(node, fileContents); break;
case "symbol": VisitSymbol(node, fileContents); break;
case "pattern": VisitPattern(node, fileContents); break;
case "literal": VisitLiteral(node, fileContents); break;
case "subSeq": VisitSymbolSubseq(node, fileContents); break;
default: Console.Error.WriteLine($"Node kind: {node.type()}"); break;
}
}
C#The visitor for the grammar, the main thing I would like to stress out, is that child
receives an unsigned integer. Thus, we either declare the variable as uint
or, as I did, you add the u
suffix to the 0 literal.
private static void VisitGrammar(TSNode node, string fileContents)
{
Console.WriteLine("rules: {");
for (var i = 0u; i < node.child_count(); i++)
{
Visit(node.child(i), fileContents);
}
Console.WriteLine("}");
}
C#The visitor for the Rule predicate:
private static void VisitRule(TSNode node, string fileContents)
{
Visit(node.child(0), fileContents);
Console.Write(": $ =>");
Visit(node.child(2), fileContents);
Console.WriteLine(",");
}
C#Then, the visitors that just print the terminal symbol:
private static void VisitNonTerminal(TSNode node, string fileContents)
=> Console.Write(node.text(fileContents));
private static void VisitPattern(TSNode node, string fileContents)
=> Console.Write(node.text(fileContents));
private static void VisitLiteral(TSNode node, string fileContents)
=> Console.Write(node.text(fileContents));
C#The text
method is implemented in Bindings.cs
. Note that it is buggy: if your source is longer than the size of an integer (as start offset and end offset return an unsigned int, and the text
code casts it to int
. If that does not make sense to you, just ignore this note.
Now, the RuleBody
visitor is similar to the Rust one, but with a for loop, instead of a while:
private static void VisitRuleBody(TSNode node, string fileContents)
{
var count = node.child_count();
if (count == 1)
Visit(node.child(0), fileContents);
else
{
Console.Write("choice(");
for (var i = 0u; i < node.child_count(); i += 2)
{
Visit(node.child(i), fileContents);
if (i < count)
Console.Write(", ");
}
}
}
C#The visitor for symbol
is very similar to the Rust one, just with some syntax changes:
private static void VisitSymbol(TSNode node, string fileContents)
{
var child = node.child(0);
var kleene = "";
if (node.child_count() > 1)
kleene = node.child(1).type();
Console.Write(kleene switch
{
"plus" => "repeat1(",
"asterisk" => "repeat(",
_ => ""
});
if (child.type() == "nonTerminal")
Console.Write("$.");
Visit(child, fileContents);
if (!string.IsNullOrEmpty(kleene))
Console.Write(")");
}
C#And the SymbolSeq
visitor:
private static void VisitSymbolSeq(TSNode node, string fileContents)
{
var count = node.child_count();
if (count == 1)
Visit(node.child(0), fileContents);
else
{
Console.Write("seq(");
for (var i = 0u; i < node.child_count(); i++)
{
Visit(node.child(i), fileContents);
if (i < count - 1)
Console.Write(", ");
}
Console.Write(")");
}
}
C#And finally, the SymbolSubseq
visitor:
private static void VisitSymbolSubseq(TSNode node, string fileContents)
=> Visit(node.child(1), fileContents);
C#Now, compile your code and run it. Do not forget to supply the path to the bnf sample file!!