ANTLR Tutorial – Part VI

Welcome back to my ANTLR tutorial. This part will be focused on the support of variables in LOGO. We will work on the assignment of values to variables and variables evaluation. We will also take the opportunity to split the grammar file into two: the lexer and the grammar parts.

The previous parts of this tutorial: I, II, III, IV and V. Also, this post explains adding a GitHub Action to this tutorial repository.

If you are following each step of this tutorial, note that, in the last part, there is some extra code that was written without being properly described in the post. It also includes some new files. In this part of the tutorial, we will need to edit some of those files, and use some of the code that was not properly discussed. I will try to add comments when needed to make it easier to understand.

Splitting the Grammar

Splitting the grammar into two files does not bring a lot of advantages. But it can make the code easier to maintain, making it clear what are lexer rules and parsing rules, and it would also allow us to extend the default ANTLR lexer and parser classes. Nevertheless, for our specific case, we will only take advantage of making the files easier to maintain.

We will start by replacing the Logo.g4 file with the LogoParser.g4 and LogoLexer.g4 files. The parser file is very similar to the original file we had, but just with the grammar:

Language/LogoParser.g4
parser grammar LogoParser;

options { tokenVocab=LogoLexer; }

program : command+ EOF
        ;

command : simpleCommand
        | SetPos squarePoint
        | SetXY simplePoint
        | Home
        | Arc expr expr
        ;

simpleCommand : cmd=(Right|Left|Forward|Back|SetX|SetY|SetH) expr ;

simplePoint : expr expr
            ;

squarePoint : '[' expr expr ']'
            ;

expr : cmd=(Sum|Difference|Product|
                  Quotient|Power|Remainder|Modulo) expr expr  #prefixBinaryOp 
     | op=(Minus|MinusSign) expr                              #unaryMinus
     | fun=(Abs|Int|Round|Sqrt|Exp|Log10|Ln|Arctan|Sin|
              Cos|Tan|Radarctan|Radsin|Radcos|Radtan) expr    #arithFuncs
     | '(' fun=(Arctan|Radarctan) expr expr ')'               #arithFuncs
     | '(' Sum expr+ ')'                                      #summation
     | '(' Product expr+ ')'                                  #product
     | '(' Quotient expr ')'                                  #quotient
     | <assoc=right> expr op='^' expr                         #binaryOp
     | <assoc=left> expr op=('*'|'/'|'%') expr                #binaryOp
     | <assoc=left> expr op=('+'|'-') expr                    #binaryOp
     | value                                                  #scalar
     | '(' expr ')'                                           #scalar
     ;

value : IntegerValue
      | RealValue
      ;
Plaintext

Note the change on the first line, which now specifies parser grammar instead of just grammar. Also, line 3 adds an annotation, that points to the lexer file. Other than that, everything was kept with a change.

Now, for the lexer, we will also replace the first line with lexer grammar. Lines 3 to 12 are also new, defining tokens for the symbols we use directly in the grammar. This is required because the lexer is generated by this LogoLexer.g4 specification. If you prefer, you can replace the references to those characters in the file above, and use the defined symbols. Nevertheless, I like it more to use the characters directly, and they make the grammar easier to read.

Language/LogoLexer.g4
lexer grammar LogoLexer;

RightSqBracket : ']' ;
LeftSqBracket  : '[' ;
MinusSign      : '-' ;
PlusSign       : '+' ;
AsteriskSign   : '*' ;
SlashSigh      : '/' ;
LeftParen      : '(' ;
RightParen     : ')' ;
CircunflexSign : '^' ;
PercentSign    : '%' ;


Minus      : M I N U S ;
Power      : P O W E R ;
Quotient   : Q U O T I E N T ;
Product    : P R O D U C T ;
Difference : D I F F E R E N C E ;
Sum        : S U M ;
Remainder  : R E M A I N D E R ;
Modulo     : M O D U L O ;
Abs        : A B S ;
Int        : I N T ;
Round      : R O U N D ;
Sqrt       : S Q R T ;
Exp        : E X P ;
Log10      : L O G '10' ;
Ln         : L N ;
Arctan     : A R C T A N ;
Sin        : S I N ;
Cos        : C O S ;
Tan        : T A N ;
Radarctan  : R A D A R C T A N ;
Radsin     : R A D S I N ;
Radcos     : R A D C O S ;
Radtan     : R A D T A N ;
Right   : R I G H T | R T ;
Forward : F O R W A R D | F D ;
Arc     : A R C ;
Home    : H O M E ;
Back    : B A C K | B K ;
Left    : L E F T | L T ;
SetX    : S E T X ;
SetY    : S E T Y ;
SetXY   : S E T X Y ;
SetPos  : S E T P O S ;
SetH    : S E T H ( E A D I N G )? ;
IntegerValue   : [0-9]+ ;
RealValue      : [0-9]* '.' [0-9]+ ;
White   : [ \n\t\r] -> skip;
    
fragment A : [Aa] ;
fragment B : [Bb] ;
fragment C : [Cc] ;
fragment D : [Dd] ;
fragment E : [Ee] ;
fragment F : [Ff] ;
fragment G : [Gg] ;
fragment H : [Hh] ;
fragment I : [Ii] ;
fragment J : [Jj] ;
fragment K : [Kk] ;
fragment L : [Ll] ;
fragment M : [Mm] ;
fragment N : [Nn] ;
fragment O : [Oo] ;
fragment P : [Pp] ;
fragment Q : [Qq] ;
fragment R : [Rr] ;
fragment S : [Ss] ;
fragment T : [Tt] ;
fragment U : [Uu] ;
fragment V : [Vv] ;
fragment W : [Ww] ;
fragment X : [Xx] ;
fragment Y : [Yy] ;
fragment Z : [Zz] ;
Plaintext

To make this change work we need to perform some more changes:

  • In the AST/TreeVisitor.cs file, we need to replace the base class. Now, it should be: public class TreeVisitor : LogoParserBaseVisitor<INode>
  • In the Logo2Svg.csproj file, all references to LogoVisitor should be replaced by LogoParserVisitor and LogoBaseVisitor should be replaced by LogoParserBaseVisitor. The line with the parameters for the BuildAntlr task, the attribute Files needs to be updated: Files="$(GrammarPath)/LogoLexer.g4 $(GrammarPath)/LogoParser.g4"

This should be enough to have the project compiling again, now with two separate files.

As usual, to implement new features in a language we need a couple of steps:

  1. Update the lexer and grammar files to support the new tokens and syntax;
  2. Add the required code to create the abstract syntax tree;
  3. Add the code to properly evaluate the abstract syntax tree with the new features;
  4. And optionally, but highly suggested, add proper unit testing.

Do not forget to look into the LOGO documentation to learn about variables. We will implement the name, make, show and thing commands.

Grammar and Lexer Changes

In the lexer, we need to support the new commands, name, make, show and thing, as well as the rule to recognize a variable (and the shorthand for using thing). Add this code anywhere, just be sure it appears before the fragment lines.

Language/LogoLexer.g4
Variable    : '"' [a-zA-Z0-9_]+ { Text = Text.Substring(1); };
VariableRef : ':' [a-zA-Z0-9_]+ { Text = Text.Substring(1); };

Make       : M A K E ;
Name       : N A M E ;
Thing      : T H I N G ;
Show       : S H O W ;
Plaintext

Lines 1 and 2 recognize a variable, represented in LOGO as "varname and a shorthand for thing "varname represented as :varname. The usage of a prefix (the quote or the colon) allows variable names to be disambiguated from numbers and other commands. Thus, we define the regular expression to allow any sequence of letters, uppercase or lowercase, digits and the underscore character.

These two rules have a code block, between curly braces. In this case, we are changing the text that is sent from the lexer to the grammar, removing the first character. As the lexer sends an object to the parser, and that object has a type, removing the quote or colon does not lose any information. And it gets easier to use for the visitor.

In the grammar, two rules need changes. One of them is the expr rule, which should now support variables too. Note that, accordingly with the LOGO reference, you can refer to a variable using :varname or thing "varname:

Language/LogoParser.g4
expr : cmd=(Sum|Difference|Product|
                  Quotient|Power|Remainder|Modulo) expr expr  #prefixBinaryOp 
     | op=(Minus|MinusSign) expr                              #unaryMinus
     | fun=(Abs|Int|Round|Sqrt|Exp|Log10|Ln|Arctan|Sin|
              Cos|Tan|Radarctan|Radsin|Radcos|Radtan) expr    #arithFuncs
     | '(' fun=(Arctan|Radarctan) expr expr ')'               #arithFuncs
     | '(' Sum expr+ ')'                                      #summation
     | '(' Product expr+ ')'                                  #product
     | '(' Quotient expr ')'                                  #quotient
     | <assoc=right> expr op='^' expr                         #binaryOp
     | <assoc=left> expr op=('*'|'/'|'%') expr                #binaryOp
     | <assoc=left> expr op=('+'|'-') expr                    #binaryOp
     | value                                                  #scalar
     | '(' expr ')'                                           #scalar
     | VariableRef                                            #variable
     | Thing Variable                                         #variable
     ;
Plaintext

In this rule only lines 15 and 16 are new. They refer to variable references, using the Thing command or the colon shortcut.

For the new commands, we will add them to the Command rule. I took the opportunity to add different names to each rule, to make visitors smaller, with fewer conditionals. That will result in a bigger difference in the code below, but it is worth it.

Language/LogoParser.g4
command : simpleCommand          #basicCommand
        | SetPos squarePoint     #setPosition
        | SetXY simplePoint      #setPosition
        | Home                   #home
        | Arc expr expr          #arc
        | Make Variable expr     #setVariable
        | Name expr Variable     #setVariable
        | Show expr              #show
        ;
Plaintext

Other than the rule names, lines 6 to 8 are new. Lines 6 and 7 show the two different syntaxes to define a variable. Line 8 introduces the command to show an expression value in the console.

Grammar Visitors

Before writing the visitor code, we need to create a new data structure, to store information about when a variable is used inside an expression. For that, create a file named AST/VarName.cs with the following code:

AST/VarName.cs
namespace Logo2Svg.AST;

public class VarName : Parameter
{
    public readonly string Name;

    public VarName(string varName) => Name = varName;

    public override float Value(Turtle turtle) 
         => turtle.RetrieveVariable(Name, out var expr) ? expr : 0f;

    public override string ToString() => $@"""{Name}";
}
C#

This class defines a string to store the variable name. It is public to make it easier to write tests. The constructor sets the variable name, the stringification just writes the variable name. The evaluator function takes advantage of a new method, named RetrieveVariable, which will be implemented in the turtle and will allow us to track the values of variables whenever needed. This data structure is usually named as a symbol table.

Let’s take the opportunity to implement the symbol table. Open the turtle file, and add the following code:

Turtle/Turtle.cs
private readonly Dictionary<string, float> _symbolTable = new();
 
public void DefineVariable(string varName, float value) => _symbolTable[varName] = value;

public bool RetrieveVariable(string varName, out float value)
     => _symbolTable.TryGetValue(varName, out value);
 
C#

The symbol table maps a variable (that is a string) to its current value (a float value). To store a variable we just need its name and value. And, to get its value from the symbol table, we just need the variable name. Easy.

For the visitors, open the TreeVisitor.cs file and remove the VisitCommand method. We will start by replacing it with the new independent visitors:

AST/TreeVisitor.cs
public override INode VisitHome([NotNull] LogoParser.HomeContext context)
{
    return new Command(LogoLexer.Home, "Home");
}

public override INode VisitSetPosition(LogoParser.SetPositionContext context)
{
    var (command, param) = context.SetXY() is { } setXyCtx?
        (setXyCtx.Symbol, Visit<Parameter>(context.simplePoint())) :
        (context.SetPos().Symbol, Visit<Parameter>(context.squarePoint()));
    
    return new Command(command.Type, command.Text, param);
}

public override INode VisitArc(LogoParser.ArcContext context)
{
    var parameters = context.expr().Select(Visit<Parameter>).ToArray();
    return new Command(LogoLexer.Arc, "Arc", parameters);
}
C#

From the older visitor, we removed the code that visits the simpleCommand as it just returns the value of the subrule. When we have such visitor methods, which just return the result of visiting their only child, we can just ignore them and not write the visitor at all. For the other, the code is quite straightforward. Just note that I changed the VisitSetPosition method to use a conditional ternary operator, and I take advantage of the tuple syntax to set two different variables simultaneously.

The visitor for the command show is also trivial:

AST/TreeVisitor.cs
public override INode VisitShow(LogoParser.ShowContext context)
  => new Command(LogoLexer.Show, "show", Visit<Parameter>(context.expr()));
C#

The visitor for setting a variable value is not too complex, either. We just need to create the variable object in the visitor, as Variable is a terminal symbol:

AST/TreeVisitor.cs
public override INode VisitSetVariable(LogoParser.SetVariableContext context)
{
    var varName = context.Variable().GetText();
    var value = Visit<Parameter>(context.expr());
    return new Command(LogoLexer.Make, "make", new VarName(varName), value);
}
C#

Note that in this visitor we are not distinguishing the make and name commands. They work the same way. They just have the parameters in a different order. Nevertheless, as the types of these parameters are different, we do not need to handle the two rules independently.

Finally, the usage of a variable in an expression (with the Thing command or its shortcut) also needs its visitor:

AST/TreeVisitor.cs
public override INode VisitVariable(LogoParser.VariableContext context)
    => new VarName(context.Variable()?.GetText() ?? context.VariableRef()?.GetText());
C#

In this case, if the user utilizes the shorthand, the variable will be written as :varname, but when using the long version, it will be written as "varname. Thus, we can’t merge the two usages in a single rule, as their terminal symbols are different. Nevertheless, as we just need its value, the code to check if any of them is null can be written easily.

Evaluation

The evaluation of the new commands is easy to implement as well. For the show and the make commands we just need the following code in the AST/Command.cs file, under the switch statement in the Execute method:

AST/Command.cs
        case LogoLexer.Show:
        {
            var value = Parameter(0).Value(turtle);
            Console.WriteLine(value);
            break;
        }
        case LogoLexer.Make:
        {
            turtle.DefineVariable(Parameter<VarName>(0).Name, Parameter(1).Value(turtle));
            break;
        }
C#

For the show command, the parameter is evaluated and printed to the console. For the make command, the parameter is evaluated and stored in the symbol table. As simple as it gets.

Note that the evaluation of the variables in the expressions was already implemented when we wrote the VarName.cs class.

Testing

We will implement two tests for the expressions. The first one will be in the file LogoTests/AstTests.cs. This file was not created during the previous tutorials. It was added offline by me, with some tests on the validation of the tree created by the parser and the visitors. To make it clear how these tests work, in this issue of the tutorial we will implement a test to validate the structure of the abstract syntax tree for the make command and the usage of variables:

LogoTests/AstTests.cs
  [TestMethod]
  public void VariableThings()
  {
      var tree = @"MAKE ""a 10 MAKE ""b 20 MAKE ""c thing ""a + :b".ToAst() as Program;
      Assert.IsNotNull(tree);
      Assert.AreEqual(3, tree.Count);

      var names = new[] {"a", "b", "c"};
      for (var i = 0; i < 3; i++)
      {
          Assert.AreEqual(LogoLexer.Make, tree[i].Id);
          Assert.AreEqual(names[i], (tree[i].Params[0] as VarName)?.Name);
      }
      Assert.AreEqual(10f, tree[0].Parameter<ValueParam>(1).FloatValue);
      Assert.AreEqual(20f, tree[1].Parameter<ValueParam>(1).FloatValue);
      var sum = tree[2].Parameter<ExprParam>(1);
      Assert.IsNotNull(sum);
      Assert.AreEqual(LogoLexer.Sum, sum.Op);
      for (var i = 0; i < 2; i++)
      {
          var varName = sum.Parameters[i] as VarName;
          Assert.IsNotNull(varName);
          Assert.AreEqual(names[i], varName.Name);
      }
  }
C#

Line 4 includes the LOGO code that will be tested. It consists of three make commands, the first two defining values for the variables a and b, and the third one defining the variable c as the sum of the previous two. Notice that I use the two distinct syntaxes to access the value of a variable. This way we test both syntaxes in a single test case. The ToAst extension method, defined in the TestUtils.cs file, constructs an ANTLR parser and runs the string through it. The result is an INode that we cast to Program, so we can access the individual commands.

Lines 5 and 6 guarantee the parser returns a valid program, and that the program has three commands. Lines 8 to 12 test that every command is an make instruction, and check the name of the variable being assigned. Lines 14 and 15 validate the second parameter of the two first assignments. The remainder lines test the sum for the third assignment. We cast the first parameter as an ExprParam object, and then validate it is a sum, and that both operands are of type VarName.

For this code to work we need to perform two changes:

  1. In the file AST/Command.cs, the accessor Parameter<T> needs to be public: public T Parameter<T>(int i)
  2. In the file AST/ExprParam, the field _op should be made public (and thus, renamed to Op).
  3. In this same file, add an accessor to access the expression parameters. I decided to implement this accessor by returning a read-only collection, guaranteeing no one changes it: public ReadOnlyCollection<Parameter> Parameters => _parameters.AsReadOnly(); To have access to the ReadOnlyCollection type you need to import using System.Collections.ObjectModel;

We should have tests that validate atomic features, or at least features as small as possible. With that in mind, I added a new test to LogoTests/ExpressionsParser.cs that validates that variables are correctly stored in the symbol table:

LogoTests/ExpressionsParser.cs
  [TestMethod]
  public void Variable()
  {
      var tree = @"MAKE ""a 10 MAKE ""b 20 MAKE ""c thing ""a + :b".ToAst();
      var turtle = new Turtle();
      tree.Execute(turtle);
      Assert.IsTrue(turtle.RetrieveVariable("a", out var val1));
      Assert.AreEqual(10, val1);
      Assert.IsTrue(turtle.RetrieveVariable("b", out var val2));
      Assert.AreEqual(20, val2);
      Assert.IsTrue(turtle.RetrieveVariable("c", out var val3));
      Assert.AreEqual(30, val3);
  }
C#

Making sure that every change is properly documented is hard. If you find any information lacking, please contact me. I will be happy to add the proper explanation to the respective issue of this tutorial. Knowing anyone is reading this and that this is being useful, will make my day.

Leave a Reply