Welcome back to my ANTLR tutorial. This part will be focused on the support of variables in LOGO. We will work on the assignment of values to variables and variables evaluation. We will also take the opportunity to split the grammar file into two: the lexer and the grammar parts.
The previous parts of this tutorial: I, II, III, IV and V. Also, this post explains adding a GitHub Action to this tutorial repository.
If you are following each step of this tutorial, note that, in the last part, there is some extra code that was written without being properly described in the post. It also includes some new files. In this part of the tutorial, we will need to edit some of those files, and use some of the code that was not properly discussed. I will try to add comments when needed to make it easier to understand.
Splitting the Grammar
Splitting the grammar into two files does not bring a lot of advantages. But it can make the code easier to maintain, making it clear what are lexer rules and parsing rules, and it would also allow us to extend the default ANTLR lexer and parser classes. Nevertheless, for our specific case, we will only take advantage of making the files easier to maintain.
We will start by replacing the Logo.g4
file with the LogoParser.g4
and LogoLexer.g4
files. The parser file is very similar to the original file we had, but just with the grammar:
parser grammar LogoParser;
options { tokenVocab=LogoLexer; }
program : command+ EOF
;
command : simpleCommand
| SetPos squarePoint
| SetXY simplePoint
| Home
| Arc expr expr
;
simpleCommand : cmd=(Right|Left|Forward|Back|SetX|SetY|SetH) expr ;
simplePoint : expr expr
;
squarePoint : '[' expr expr ']'
;
expr : cmd=(Sum|Difference|Product|
Quotient|Power|Remainder|Modulo) expr expr #prefixBinaryOp
| op=(Minus|MinusSign) expr #unaryMinus
| fun=(Abs|Int|Round|Sqrt|Exp|Log10|Ln|Arctan|Sin|
Cos|Tan|Radarctan|Radsin|Radcos|Radtan) expr #arithFuncs
| '(' fun=(Arctan|Radarctan) expr expr ')' #arithFuncs
| '(' Sum expr+ ')' #summation
| '(' Product expr+ ')' #product
| '(' Quotient expr ')' #quotient
| <assoc=right> expr op='^' expr #binaryOp
| <assoc=left> expr op=('*'|'/'|'%') expr #binaryOp
| <assoc=left> expr op=('+'|'-') expr #binaryOp
| value #scalar
| '(' expr ')' #scalar
;
value : IntegerValue
| RealValue
;
PlaintextNote the change on the first line, which now specifies parser grammar
instead of just grammar
. Also, line 3 adds an annotation, that points to the lexer file. Other than that, everything was kept with a change.
Now, for the lexer, we will also replace the first line with lexer grammar
. Lines 3 to 12 are also new, defining tokens for the symbols we use directly in the grammar. This is required because the lexer is generated by this LogoLexer.g4
specification. If you prefer, you can replace the references to those characters in the file above, and use the defined symbols. Nevertheless, I like it more to use the characters directly, and they make the grammar easier to read.
lexer grammar LogoLexer;
RightSqBracket : ']' ;
LeftSqBracket : '[' ;
MinusSign : '-' ;
PlusSign : '+' ;
AsteriskSign : '*' ;
SlashSigh : '/' ;
LeftParen : '(' ;
RightParen : ')' ;
CircunflexSign : '^' ;
PercentSign : '%' ;
Minus : M I N U S ;
Power : P O W E R ;
Quotient : Q U O T I E N T ;
Product : P R O D U C T ;
Difference : D I F F E R E N C E ;
Sum : S U M ;
Remainder : R E M A I N D E R ;
Modulo : M O D U L O ;
Abs : A B S ;
Int : I N T ;
Round : R O U N D ;
Sqrt : S Q R T ;
Exp : E X P ;
Log10 : L O G '10' ;
Ln : L N ;
Arctan : A R C T A N ;
Sin : S I N ;
Cos : C O S ;
Tan : T A N ;
Radarctan : R A D A R C T A N ;
Radsin : R A D S I N ;
Radcos : R A D C O S ;
Radtan : R A D T A N ;
Right : R I G H T | R T ;
Forward : F O R W A R D | F D ;
Arc : A R C ;
Home : H O M E ;
Back : B A C K | B K ;
Left : L E F T | L T ;
SetX : S E T X ;
SetY : S E T Y ;
SetXY : S E T X Y ;
SetPos : S E T P O S ;
SetH : S E T H ( E A D I N G )? ;
IntegerValue : [0-9]+ ;
RealValue : [0-9]* '.' [0-9]+ ;
White : [ \n\t\r] -> skip;
fragment A : [Aa] ;
fragment B : [Bb] ;
fragment C : [Cc] ;
fragment D : [Dd] ;
fragment E : [Ee] ;
fragment F : [Ff] ;
fragment G : [Gg] ;
fragment H : [Hh] ;
fragment I : [Ii] ;
fragment J : [Jj] ;
fragment K : [Kk] ;
fragment L : [Ll] ;
fragment M : [Mm] ;
fragment N : [Nn] ;
fragment O : [Oo] ;
fragment P : [Pp] ;
fragment Q : [Qq] ;
fragment R : [Rr] ;
fragment S : [Ss] ;
fragment T : [Tt] ;
fragment U : [Uu] ;
fragment V : [Vv] ;
fragment W : [Ww] ;
fragment X : [Xx] ;
fragment Y : [Yy] ;
fragment Z : [Zz] ;
PlaintextTo make this change work we need to perform some more changes:
- In the
AST/TreeVisitor.cs
file, we need to replace the base class. Now, it should be:public class TreeVisitor : LogoParserBaseVisitor<INode>
- In the
Logo2Svg.csproj
file, all references toLogoVisitor
should be replaced byLogoParserVisitor
andLogoBaseVisitor
should be replaced byLogoParserBaseVisitor
. The line with the parameters for theBuildAntlr
task, the attributeFiles
needs to be updated:Files="$(GrammarPath)/LogoLexer.g4 $(GrammarPath)/LogoParser.g4"
This should be enough to have the project compiling again, now with two separate files.
Variables in LOGO
As usual, to implement new features in a language we need a couple of steps:
- Update the lexer and grammar files to support the new tokens and syntax;
- Add the required code to create the abstract syntax tree;
- Add the code to properly evaluate the abstract syntax tree with the new features;
- And optionally, but highly suggested, add proper unit testing.
Do not forget to look into the LOGO documentation to learn about variables. We will implement the name
, make
, show
and thing
commands.
Grammar and Lexer Changes
In the lexer, we need to support the new commands, name
, make
, show
and thing
, as well as the rule to recognize a variable (and the shorthand for using thing
). Add this code anywhere, just be sure it appears before the fragment lines.
Variable : '"' [a-zA-Z0-9_]+ { Text = Text.Substring(1); };
VariableRef : ':' [a-zA-Z0-9_]+ { Text = Text.Substring(1); };
Make : M A K E ;
Name : N A M E ;
Thing : T H I N G ;
Show : S H O W ;
PlaintextLines 1 and 2 recognize a variable, represented in LOGO as "varname
and a shorthand for thing "varname
represented as :varname
. The usage of a prefix (the quote or the colon) allows variable names to be disambiguated from numbers and other commands. Thus, we define the regular expression to allow any sequence of letters, uppercase or lowercase, digits and the underscore character.
These two rules have a code block, between curly braces. In this case, we are changing the text that is sent from the lexer to the grammar, removing the first character. As the lexer sends an object to the parser, and that object has a type, removing the quote or colon does not lose any information. And it gets easier to use for the visitor.
In the grammar, two rules need changes. One of them is the expr
rule, which should now support variables too. Note that, accordingly with the LOGO reference, you can refer to a variable using :varname
or thing "varname
:
expr : cmd=(Sum|Difference|Product|
Quotient|Power|Remainder|Modulo) expr expr #prefixBinaryOp
| op=(Minus|MinusSign) expr #unaryMinus
| fun=(Abs|Int|Round|Sqrt|Exp|Log10|Ln|Arctan|Sin|
Cos|Tan|Radarctan|Radsin|Radcos|Radtan) expr #arithFuncs
| '(' fun=(Arctan|Radarctan) expr expr ')' #arithFuncs
| '(' Sum expr+ ')' #summation
| '(' Product expr+ ')' #product
| '(' Quotient expr ')' #quotient
| <assoc=right> expr op='^' expr #binaryOp
| <assoc=left> expr op=('*'|'/'|'%') expr #binaryOp
| <assoc=left> expr op=('+'|'-') expr #binaryOp
| value #scalar
| '(' expr ')' #scalar
| VariableRef #variable
| Thing Variable #variable
;
PlaintextIn this rule only lines 15 and 16 are new. They refer to variable references, using the Thing
command or the colon shortcut.
For the new commands, we will add them to the Command
rule. I took the opportunity to add different names to each rule, to make visitors smaller, with fewer conditionals. That will result in a bigger difference in the code below, but it is worth it.
command : simpleCommand #basicCommand
| SetPos squarePoint #setPosition
| SetXY simplePoint #setPosition
| Home #home
| Arc expr expr #arc
| Make Variable expr #setVariable
| Name expr Variable #setVariable
| Show expr #show
;
PlaintextOther than the rule names, lines 6 to 8 are new. Lines 6 and 7 show the two different syntaxes to define a variable. Line 8 introduces the command to show an expression value in the console.
Grammar Visitors
Before writing the visitor code, we need to create a new data structure, to store information about when a variable is used inside an expression. For that, create a file named AST/VarName.cs
with the following code:
namespace Logo2Svg.AST;
public class VarName : Parameter
{
public readonly string Name;
public VarName(string varName) => Name = varName;
public override float Value(Turtle turtle)
=> turtle.RetrieveVariable(Name, out var expr) ? expr : 0f;
public override string ToString() => $@"""{Name}";
}
C#This class defines a string to store the variable name. It is public to make it easier to write tests. The constructor sets the variable name, the stringification just writes the variable name. The evaluator function takes advantage of a new method, named RetrieveVariable
, which will be implemented in the turtle and will allow us to track the values of variables whenever needed. This data structure is usually named as a symbol table.
Let’s take the opportunity to implement the symbol table. Open the turtle file, and add the following code:
private readonly Dictionary<string, float> _symbolTable = new();
public void DefineVariable(string varName, float value) => _symbolTable[varName] = value;
public bool RetrieveVariable(string varName, out float value)
=> _symbolTable.TryGetValue(varName, out value);
C#The symbol table maps a variable (that is a string) to its current value (a float value). To store a variable we just need its name and value. And, to get its value from the symbol table, we just need the variable name. Easy.
For the visitors, open the TreeVisitor.cs
file and remove the VisitCommand
method. We will start by replacing it with the new independent visitors:
public override INode VisitHome([NotNull] LogoParser.HomeContext context)
{
return new Command(LogoLexer.Home, "Home");
}
public override INode VisitSetPosition(LogoParser.SetPositionContext context)
{
var (command, param) = context.SetXY() is { } setXyCtx?
(setXyCtx.Symbol, Visit<Parameter>(context.simplePoint())) :
(context.SetPos().Symbol, Visit<Parameter>(context.squarePoint()));
return new Command(command.Type, command.Text, param);
}
public override INode VisitArc(LogoParser.ArcContext context)
{
var parameters = context.expr().Select(Visit<Parameter>).ToArray();
return new Command(LogoLexer.Arc, "Arc", parameters);
}
C#From the older visitor, we removed the code that visits the simpleCommand
as it just returns the value of the subrule. When we have such visitor methods, which just return the result of visiting their only child, we can just ignore them and not write the visitor at all. For the other, the code is quite straightforward. Just note that I changed the VisitSetPosition
method to use a conditional ternary operator, and I take advantage of the tuple syntax to set two different variables simultaneously.
The visitor for the command show
is also trivial:
public override INode VisitShow(LogoParser.ShowContext context)
=> new Command(LogoLexer.Show, "show", Visit<Parameter>(context.expr()));
C#The visitor for setting a variable value is not too complex, either. We just need to create the variable object in the visitor, as Variable
is a terminal symbol:
public override INode VisitSetVariable(LogoParser.SetVariableContext context)
{
var varName = context.Variable().GetText();
var value = Visit<Parameter>(context.expr());
return new Command(LogoLexer.Make, "make", new VarName(varName), value);
}
C#Note that in this visitor we are not distinguishing the make
and name
commands. They work the same way. They just have the parameters in a different order. Nevertheless, as the types of these parameters are different, we do not need to handle the two rules independently.
Finally, the usage of a variable in an expression (with the Thing command or its shortcut) also needs its visitor:
public override INode VisitVariable(LogoParser.VariableContext context)
=> new VarName(context.Variable()?.GetText() ?? context.VariableRef()?.GetText());
C#In this case, if the user utilizes the shorthand, the variable will be written as :varname
, but when using the long version, it will be written as "varname
. Thus, we can’t merge the two usages in a single rule, as their terminal symbols are different. Nevertheless, as we just need its value, the code to check if any of them is null can be written easily.
Evaluation
The evaluation of the new commands is easy to implement as well. For the show
and the make
commands we just need the following code in the AST/Command.cs
file, under the switch statement in the Execute
method:
case LogoLexer.Show:
{
var value = Parameter(0).Value(turtle);
Console.WriteLine(value);
break;
}
case LogoLexer.Make:
{
turtle.DefineVariable(Parameter<VarName>(0).Name, Parameter(1).Value(turtle));
break;
}
C#For the show
command, the parameter is evaluated and printed to the console. For the make
command, the parameter is evaluated and stored in the symbol table. As simple as it gets.
Note that the evaluation of the variables in the expressions was already implemented when we wrote the VarName.cs
class.
Testing
We will implement two tests for the expressions. The first one will be in the file LogoTests/AstTests.cs
. This file was not created during the previous tutorials. It was added offline by me, with some tests on the validation of the tree created by the parser and the visitors. To make it clear how these tests work, in this issue of the tutorial we will implement a test to validate the structure of the abstract syntax tree for the make
command and the usage of variables:
[TestMethod]
public void VariableThings()
{
var tree = @"MAKE ""a 10 MAKE ""b 20 MAKE ""c thing ""a + :b".ToAst() as Program;
Assert.IsNotNull(tree);
Assert.AreEqual(3, tree.Count);
var names = new[] {"a", "b", "c"};
for (var i = 0; i < 3; i++)
{
Assert.AreEqual(LogoLexer.Make, tree[i].Id);
Assert.AreEqual(names[i], (tree[i].Params[0] as VarName)?.Name);
}
Assert.AreEqual(10f, tree[0].Parameter<ValueParam>(1).FloatValue);
Assert.AreEqual(20f, tree[1].Parameter<ValueParam>(1).FloatValue);
var sum = tree[2].Parameter<ExprParam>(1);
Assert.IsNotNull(sum);
Assert.AreEqual(LogoLexer.Sum, sum.Op);
for (var i = 0; i < 2; i++)
{
var varName = sum.Parameters[i] as VarName;
Assert.IsNotNull(varName);
Assert.AreEqual(names[i], varName.Name);
}
}
C#Line 4 includes the LOGO code that will be tested. It consists of three make
commands, the first two defining values for the variables a and b, and the third one defining the variable c as the sum of the previous two. Notice that I use the two distinct syntaxes to access the value of a variable. This way we test both syntaxes in a single test case. The ToAst
extension method, defined in the TestUtils.cs
file, constructs an ANTLR parser and runs the string through it. The result is an INode
that we cast to Program
, so we can access the individual commands.
Lines 5 and 6 guarantee the parser returns a valid program, and that the program has three commands. Lines 8 to 12 test that every command is an make
instruction, and check the name of the variable being assigned. Lines 14 and 15 validate the second parameter of the two first assignments. The remainder lines test the sum for the third assignment. We cast the first parameter as an ExprParam
object, and then validate it is a sum, and that both operands are of type VarName
.
For this code to work we need to perform two changes:
- In the file
AST/Command.cs
, the accessorParameter<T>
needs to be public:public T Parameter<T>(int i)
- In the file
AST/ExprParam
, the field_op
should be made public (and thus, renamed toOp
). - In this same file, add an accessor to access the expression parameters. I decided to implement this accessor by returning a read-only collection, guaranteeing no one changes it:
public ReadOnlyCollection<Parameter> Parameters => _parameters.AsReadOnly();
To have access to theReadOnlyCollection
type you need to importusing System.Collections.ObjectModel;
We should have tests that validate atomic features, or at least features as small as possible. With that in mind, I added a new test to LogoTests/ExpressionsParser.cs
that validates that variables are correctly stored in the symbol table:
[TestMethod]
public void Variable()
{
var tree = @"MAKE ""a 10 MAKE ""b 20 MAKE ""c thing ""a + :b".ToAst();
var turtle = new Turtle();
tree.Execute(turtle);
Assert.IsTrue(turtle.RetrieveVariable("a", out var val1));
Assert.AreEqual(10, val1);
Assert.IsTrue(turtle.RetrieveVariable("b", out var val2));
Assert.AreEqual(20, val2);
Assert.IsTrue(turtle.RetrieveVariable("c", out var val3));
Assert.AreEqual(30, val3);
}
C#
Making sure that every change is properly documented is hard. If you find any information lacking, please contact me. I will be happy to add the proper explanation to the respective issue of this tutorial. Knowing anyone is reading this and that this is being useful, will make my day.