I’ve been working with Checkmarx for more than one and a half years, and I’ve been using ANTLR mostly daily. While there are some books on ANTLR, a couple of reference sites and some tutorials from the Strumenta site, I decided to write my own, with my vision of parsing and parsing writing in C#.

Nevertheless, it is important to stress that I visit the Strumenta site quite often. Thus, a lot of the content I write here was learned from there and might have some similarities. But I tried to write my ideas and approach. But I need to be fair and point the reader to their tutorials:

On my side, I decided to:

  • Do not use a specific IDE. I will show you the hairy details. Of course, using a decent IDE is crucial for being productive in C#. I suggest Rider, from JetBrains, or Visual Studio Code. This also means I will use the dotnet command line interface (CLI), and I will do this for Linux.
  • I will not use a plugin to generate the grammar, as I like to have a standalone build system that can be executed from the command line. Given I did not find a package to do that, I will write a build task from scratch, and thus, this will be, also, a tutorial on writing tasks from MSBuild.
  • My pet language will be the LOGO, a programming language for beginners and children. It is used to draw with a set of commands. Instead of making it interactive, I will write a tool to convert a LOGO program into an SVG file, that you can see in any browser or illustration application. You can play with the LOGO language in this online playground. Our tool will be named Logo2Svg (LOGO to SVG).

Disclaimer: parts of this tutorial, namely the build task in the next section, were learned to write this tutorial, and I might need to change its behaviour during the next tutorial parts. If that happens, I will add a comment with the changelog. I will also update the changelog for any comments and fixes that I might receive from readers.

Updates:

  1. Edited the Logo2Svg.csproj file and set the deletion of generated files to happen before the Build task. This way we do not get warnings issued by MSBuild, when running the build more than once, about adding duplicated files.

ANTLR Build Task

ANTLR is written in Java and is available as a JAR file. You need Java installed to run it and convert grammar into C# code (or any other supported language). We will need to teach the MSBuild system how to invoke ANTLR CLI. This is done through an MSBuild task. These tasks are written in C# and are class libraries responsible for producing C# code from some input.

We will create a C# solution, and there we will include two projects (for now): the ANTLR build task library, and the first skeleton for our Logo2Svg tool.

dotnet new sln -n Logo
Bash

This will create the Logo.sln file, which stores information about the projects. A new project can be created similarly. We will also need to, manually, add the project to the solution.

dotnet new classlib -o BuildAntlr -f netstandard2.0
dotnet sln Logo.sln add BuildAntlr/
dotnet add BuildAntlr package Microsoft.Build.Utilities.Core
Bash

The first one will create a folder for the project, specify it will be a class library, and it will target the netstandard Framework, version 2.0. This framework is, as far as I could understand, the only one that is, currently compatible between the Visual Studio MSBuild command and the dotnet MSBuild one. Finally, the last line just adds a reference to Microsoft.Build.Utilities.Core NuGet package. This package is a helper to create MSBuild tasks.

Inside the new BuildAntlr folder, there is a Class1.cs file. Rename it to BuildAntlr.cs.

mv BuildAntlr/Class1.cs BuildAntlr/BuildAntlr.cs
Bash

Everything is ready. Edit this just-renamed file, and add the following content.

BuildAntlr.cs
using Microsoft.Build.Framework;   // for ITaskItem
using Microsoft.Build.Utilities;   // for Task
using System.Linq;   // For Select and ToArray
using System.Diagnostics;   // for Process

namespace BuildAntlr
{
    public class BuildAntlr : Task
    {
        [Required]
        public string JavaPath { get; set; }

        [Required]
        public string AntlrJar { get; set; }

        [Required]
        public bool Visitors { get; set; }

        [Required]
        public bool Listeners { get; set; }

        [Required]
        public ITaskItem[] Files { get; set; }

        public override bool Execute()
        {
            string pathSeparator = System.IO.Path.PathSeparator.ToString();
            string classpath = System.Environment.GetEnvironmentVariable("CLASSPATH");
            string cp = $"{classpath}{pathSeparator}{AntlrJar}";

            var filenames = Files.Select(item => item.GetMetadata("FullPath")).ToArray();
            string files = string.Join(" ", filenames);
            
            var visitor = Visitors ? "-visitor" : "-no-visitor";
            var listener = Listeners ? "-listener" : "-no-listener";

            var proc = new Process 
            {
                StartInfo = new ProcessStartInfo
                {
                    FileName = JavaPath,
                    Arguments = $"-Xmx500M -cp {cp} org.antlr.v4.Tool {visitor} {listener} -Dlanguage=CSharp {files}",
                    UseShellExecute = false,
                    RedirectStandardOutput = false,
                    CreateNoWindow = true
                }
            };

            proc.Start();
            proc.WaitForExit();
            return true;
        }
    }
}
C#

Digging the file contents:

  • MSBuild Tasks inherit from the Task class, declared in using Microsoft.Build.Utilities. This class will declare public fields that need (or should) be filled in when using the task. It will be more clear when we configure it later. It also overrides the Execute method, which will be run when the task is started.
  • We declare five fields with the Required annotation. The first four declare configuration items we need to run ANTLR: where is the Java interpreter, where is the ANTLR Jar file, and boolean values that state if we plan to generate visitors and listeners. The fifth field is the source files that will be processed.
  • The Execute method constructs the command that runs the ANTLR jar file. First, we set the CLASSPATH variable, getting its current value from the environment as well as the path separator. Then, we get the filenames that will be processed from the Files field and concatenate them in a string separated by spaces. We then set the flags to configure if listeners and visitors are to be generated.
  • Finally, we set up the configuration of the Process class to execute the external ANTLR command. I think the configuration options are quite straightforward to understand.

We can now compile the library using

dotnet build BuildAntlr
Bash

To use ANTLR you will need to have its JAR file in your computer. You can download it from the official website. At the time of writing, 4.13.1 is the latest version. You can install it into /opt/local/lib with the following steps:

curl -LO https://www.antlr.org/download/antlr-4.13.1-complete.jar
sudo mkdir -p /opt/local/lib
sudo mv antlr-4.13.1-complete.jar /opt/local/lib/
Bash

You can put it anywhere you like. You can even keep it in your project folder and use that path later in the tutorial.

The Logo2Svg Skeleton

For the LOGO language, we will support only two commands at the moment (see the site I mentioned above to understand the language itself, as I will not explain its semantics):

  • Forward, abbreviated as Fd, that makes the drawing marker (the tortoise) move forward a specific length;
  • Right, abbreviated as Rt, makes the marker turn right a specific amount of degrees.

First, we will create the empty project, register it in the solution, and add the ANTLR NuGet package reference.

dotnet new console -o Logo2Svg
dotnet sln Logo.sln add Logo2Svg/
dotnet add Logo2Svg package Antlr4.Runtime.Standard
Bash

Again, the template will create a default CS file. We will start by renaming it:

mv Logo2Svg/Program.cs Logo2Svg/logo2svg.cs
Bash

Before editing this file, we will create the ANTLR file, which includes the Parser and the Lexer. It will be named Logo.g4 and put inside the Logo2Svg folder.

Logo.g4
grammar Logo;   // this will be also the prefix used on the generated files

// Parser rules

program : White* command ( White+ command)* White* EOF
        ;

command : Right White+ Value
        | Forward White+ Value
        ;

// Lexer Rules

Right   : 'RIGHT' | 'RT' ;
Forward : 'FORWARD' | 'FD' ;

Value   : [0-9]+ ;           
            
White   : [ \n\t\r] ;    
Plaintext

Note that the filename must be capitalized just like the grammar name in the first line of the file. For now, we will keep the lexer and parser in the same file. Later we might move them to two separate files.

Note this is not a parser course, and I will not do a proper introduction to grammar formalism. We say that a program is composed of at least one command. If we have more than one, then we need at least one space to separate it from the next one (and so on). We might also have spaces at the beginning or the end of the file. The file ends with the EOF (end-of-file) token.

A command is either the right command or the forward command. In both cases, they have spaces separating the command name from the argument. We are requiring the spaces to exist. Later we can discuss if that is a requirement.

As for the lexer, we have the Right and Forward tokens that recognize the commands and their abbreviated variations. Note that we are forcing the commands to be uppercase. Again, we can decide to change that later if we like. The Value token is just a sequence of digits, and the White token is either a space, new line, tab or the carriage return character.

Now, the content of the logo2Svg.cs file is mostly standard C# code.

logo.cs
using Antlr4.Runtime;

class Program
{
    static int Main(string[] args)
    {
        // Check we have the correct arguments
        if (args.Length != 2) {
            Console.Error.WriteLine("Usage: logo2svg <input.logo> <output.svg>");
            return 1;
        }
        // Show some output
        var input = args[0];
        var output = args[1];
        Console.WriteLine($"From {input} to {output}");
        try
        {
             // Call ANTLR parser
             using FileStream fs = File.OpenRead(input);
             AntlrInputStream inputStream = new AntlrInputStream(fs);
             var lexer = new LogoLexer(inputStream);
             var parser = new LogoParser(new CommonTokenStream(lexer));

             LogoParser.ProgramContext program = parser.program();
        }
        catch (Exception exception)
        {
             Console.WriteLine($"Error: {exception}");                
        }
        return 0;
    }
}
C#

Our class declares the Main method. First, we check if we have two arguments, the input and the output (the input will be the name of the LOGO program, and the output the name of the SVG file that will be generated).

For the parsing, we open the file for reading, creating a stream that is supplied to the Lexer to create a token stream. The Parser takes the tokens and parses them accordingly with the grammar.

As the final step, we need to edit the project file, Logo2svg.csproj.

Logo2Svg.csproj
<Project Sdk="Microsoft.NET.Sdk">

  <UsingTask TaskName="BuildAntlr.BuildAntlr" AssemblyFile="../BuildAntlr/bin/Debug/netstandard2.0/BuildAntlr.dll"/>
  
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>

    <RootFolder>$(MSBuildProjectDirectory)</RootFolder>    
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Antlr4.Runtime.Standard" Version="4.13.1" />
  </ItemGroup>

  <Target Name="BuildAntlr" BeforeTargets="CoreCompile" Inputs="$(RootFolder)/Logo.g4" Outputs="$(RootFolder)/LogoParser.cs $(RootFolder)/LogoLexer.cs $(RootFolder)/LogoBaseVisitor.cs $(RootFolder)/LogoVisitor.cs">
     <BuildAntlr
        JavaPath="/usr/bin/java"
        AntlrJar="/opt/local/lib/antlr-4.13.1-complete.jar"
        Listeners="false"
        Visitors="true"
        Files="$(RootFolder)/Logo.g4">
     </BuildAntlr>
     <ItemGroup>
        <Compile Include="$(RootFolder)/LogoParser.cs" />
        <Compile Include="$(RootFolder)/LogoLexer.cs" />
        <Compile Include="$(RootFolder)/LogoVisitor.cs" />
        <Compile Include="$(RootFolder)/LogoBaseVisitor.cs" />
     </ItemGroup>
  </Target>

  <Target Name="ForceReGenerateOnRebuild" BeforeTargets="Build"> 
     <Delete Files="$(RootFolder)/LogoParser.cs" />
     <Delete Files="$(RootFolder)/LogoLexer.cs" /> 
     <Delete Files="$(RootFolder)/LogoBaseVisitor.cs" />
     <Delete Files="$(RootFolder)/LogoVisitor.cs" />        
  </Target>
  
</Project>
XML

The more important blocks here are:

  • The first UsingTask that refers to the Build Task we just implemented above.
  • Inside the next block, we added a RootFolder directive to create a shorter variable to refer to the project’s root folder.
  • Then we have the reference to the ANTLR NuGet, which was added by the dotnet command we ran before.
  • The next target block does the magic, it specifies a build block, with some name, that runs before the CoreCompile task. We also declare the input file and the output files. I would like not to have to describe each file manually, but for now, I did not find an alternative.
  • The BuildAntlr block describes how to invoke our custom build task. Here we fill the fields with the path to the Java interpreter, where we store the JAR file, and the boolean values that control if listeners and visitors are to be generated.
  • Follows a block that includes four C# files in the compile task.
  • Finally, at the end, we describe which files need to be deleted when a clean is requested.

We can now compile this:

dotnet build Logo2Svg
Plaintext

Closing Up

To properly test the code above (in a future part I might write about unit testing, but for now, we will test manually) create two files, with a correct LOGO program, and a bad one, with syntax errors.

good.logo
FORWARD 100
RT 90
FD 50
Logo
bad.logo
FORWARD TWICE 100
ROTATE 90
Logo

You can now test your application with

dotnet run --project Logo2Svg good.logo good.svg
dotnet run --project Logo2Svg bad.logo bad.svg
Bash

When running with the bad file you should have messages from the compiler stating there were problems. Later we will hide this and try to give a more professional output.

If you add this to GIT, add the following paths to your .gitignore:

.gitignore
*/obj
*/bin
Logo2Svg/LogoBaseVisitor.cs
Logo2Svg/Logo.interp
Logo2Svg/LogoLexer.cs
Logo2Svg/LogoLexer.interp
Logo2Svg/LogoLexer.tokens
Logo2Svg/LogoParser.cs
Logo2Svg/Logo.tokens
Logo2Svg/LogoVisitor.cs
Plaintext

The code for this tutorial is available on GitHub. Check tag ‘PartI‘.

One thought on “ANTLR Tutorial – Part I

  1. Great stuff!
    I always finding confusing the different types of build compatibilities between Visual Studio MSBuild and dotnet one … it’s always a challenge :D

Leave a Reply