For years, my preferred approach for processing XML documents has been using a down-translator processor: a set of methods that start processing the DOM from the leaves and compute values that are propagated to their parents, and so on, until it reaches the tree root node. If this seems strange, probably it will get better later with an example.

This way to process XML documents was first introduced to me by the author of the Perl module XML::DT (JJoão). With time I got my hands on this Perl module, and I am currently a co-maintainer. I also wrote a JavaScript version of the algorithm, although I used it for a unique situation, and given I am not a JavaScript user, the library is mostly dead. I did not publicize it much, so I do not think many people know about it. More recently, JJoão and I wrote a version of this algorithm for Python. This one is available in PyPi, but I am sure we are its only users. Nevertheless, it was an interesting challenge to write the original idea in an Object-oriented fashion and try to stick to most Python principles (although I know we deliberately ignored some of them).

As the title of this post implies, I decided to implement this same algorithm for C#. For now, I have it only in my repository of projects to blog about. It is not yet finished. Only the basic behaviour is available, and there are some challenges to deal with (things that languages like Python, Perl or JavaScript were more permissive, like the lack of types). If someone shows interest in this code, I will be happy to move it to a repository of its own and start accepting contributions (and even prepare a NuGet package of it).

I will use as an example an XSPF file (XML Shareable Playlist Format). I asked Chat-GPT to generate one, and it gave me an example of Queen’s music. One extra point for Chat-GPT for that. The full XML file is shown below.

XSPF file example

XML
<?xml version="1.0" encoding="UTF-8"?>
<playlist version="1" xmlns="http://xspf.org/ns/0/">
  <title>Real Music Playlist</title>
  <creator>Your Name</creator>
  <date>2024-08-20T12:00:00</date>
  <trackList>
    <track>
      <location>https://example.com/music/bohemian_rhapsody.mp3</location>
      <title>Bohemian Rhapsody</title>
      <creator>Queen</creator>
      <album>A Night at the Opera</album>
      <duration>354000</duration>
    </track>
    <track>
      <location>https://example.com/music/stairway_to_heaven.mp3</location>
      <title>Stairway to Heaven</title>
      <creator>Led Zeppelin</creator>
      <album>Led Zeppelin IV</album>
      <duration>482000</duration>
    </track>
    <track>
      <location>https://example.com/music/dont_stop_me_now.mp3</location>
      <title>Don't Stop Me Now</title>
      <creator>Queen</creator>
      <album>Jazz</album>
      <duration>210000</duration>
    </track>
    <track>
      <location>https://example.com/music/imagine.mp3</location>
      <title>Imagine</title>
      <creator>John Lennon</creator>
      <album>Imagine</album>
      <duration>183000</duration>
    </track>
    <track>
      <location>https://example.com/music/hotel_california.mp3</location>
      <title>Hotel California</title>
      <creator>Eagles</creator>
      <album>Hotel California</album>
      <duration>390000</duration>
    </track>
    <track>
      <location>https://example.com/music/smells_like_teen_spirit.mp3</location>
      <title>Smells Like Teen Spirit</title>
      <creator>Nirvana</creator>
      <album>Nevermind</album>
      <duration>301000</duration>
    </track>
    <track>
      <location>https://example.com/music/billie_jean.mp3</location>
      <title>Billie Jean</title>
      <creator>Michael Jackson</creator>
      <album>Thriller</album>
      <duration>294000</duration>
    </track>
    <track>
      <location>https://example.com/music/comfortably_numb.mp3</location>
      <title>Comfortably Numb</title>
      <creator>Pink Floyd</creator>
      <album>The Wall</album>
      <duration>384000</duration>
    </track>
    <track>
      <location>https://example.com/music/another_one_bites_the_dust.mp3</location>
      <title>Another One Bites the Dust</title>
      <creator>Queen</creator>
      <album>The Game</album>
      <duration>216000</duration>
    </track>
    <track>
      <location>https://example.com/music/come_together.mp3</location>
      <title>Come Together</title>
      <creator>The Beatles</creator>
      <album>Abbey Road</album>
      <duration>259000</duration>
    </track>
  </trackList>
</playlist>
XML

As you can see, we have a header section, with metadata about the playlist. Then, we have each one of the play items, with their location, title, creator, album and duration. I will use this file as an example, and we will write a couple of simple processors using the Down Translation approach.

Example 1: Conversion to HTML

The first example is not something we need in the current days, as we can include XML documents directly in a web page, and format it using CSS. But forget that for now and imagine us in the old days, when browsers just accepted HTML tag names.

The processor script has the following code:

Xspf2Html.cs
using NetXmlDt;

namespace Samples;

public class Xspf2Html : NetXmlDtProcessor
{
    protected override object Default_(Element element) => "";

    public string title(Element el)
    {
        el.TagName = InPath_("track") ? "td" : "h1";
        return el.ToXml();
    }

    public string creator(Element el) => InPath_("track") ? el.SetTag("td").ToXml() : "";

    public string trackList(Element el) => el.SetTag("table").ToXml();

    public string track(Element el) => el.SetTag("tr").ToXml();

    public string album(Element el) => el.SetTag("td").ToXml();

    public string playlist(Element el) => el.SetTag("html").ToXml();
}
C#

This processor inherits the NetXmlDtProcessor class. Except for the Default_ method, all the other methods are tag names. They receive as a parameter the element being processed and should return the result of processing that node. Node that, when the parameter is received, that Element parameter includes the tag name being processed, its attributes, and more relevant, its content after being processed. For example, the track method will be invoked after the methods title, creator, album, playlist, etc, get called. As we are generating strings, the default aggregation method is the string concatenation. Thus, the result of processing title, creator, album, etc., is concatenated together to be fed to the track method.

I know this is not very clear. But after some usage, it gets straightforward to use.

Now, looking into the actual code:

  • The Default_ method is called for any element whose handler is not defined. For our example, the duration element does not have a handler. In this case, the Default_ method is called for it. While the underscore at the end is not very friendly, it does not pollute the namespace with the default keyword, which might be useful when processing some XML documents. Note that this method returns an object. In a future post, I will discuss this further.
  • The title and creator methods are quite similar. As these tags are used in the original document in two distinct contexts (the top-level metadata, and each music data), we need to disambiguate between them. For that, we use the InPath_ method. It allows us to understand if any of our parents have the provided name. Depending on the context, we change the element tag name and then use its ToXml method, which renders an XML string representing that element.
  • Finally, the other methods are straightforward mappings from the original XML elements to the desired element name.

The main program, that uses this processor, is written as:

Program.cs
using NetXmlDt;
using Samples;

var fileContents = File.ReadAllText("playlist.xml");
var convert2Html = new NetXmlDt<Xspf2Html>(fileContents);
var result = (string) convert2Html.Dt();
Console.WriteLine(result);
C#

Example 2: Collecting data

This second example collects data into a pair of dictionaries, printing it at the end. First, the code.

XspfStats.cs
using NetXmlDt;

namespace Samples;
public class XspfStats : NetXmlDtProcessor
{
    private readonly Dictionary<string, List<string>> _byCreator = [];
    private readonly Dictionary<string, List<string>> _byAlbum = [];

    public void title(Element el)
    {
        if (InPath_("track"))
            Father_.Attributes["title"] = (string) el.Content;
    }

    public void creator(Element el)
    {
        if (InPath_("track"))
            Father_.Attributes["creator"] = (string) el.Content;
    }

    public void album(Element el) => Father_.Attributes["album"] = (string) el.Content;

    public void track(Element el)
    {
        _byCreator.AddToList(el.Attributes["creator"], el.Attributes["title"]);
        _byAlbum.AddToList(el.Attributes["album"], el.Attributes["title"]);
    }

    public void playlist(Element el)
    {
        Console.WriteLine("Titles by Creator");
        foreach (var creator in _byCreator.Keys)
        {
            Console.WriteLine(creator + " => " + string.Join("; ", _byCreator[creator]));
        }
        Console.WriteLine("Titles by Album");
        foreach (var creator in _byAlbum.Keys)
        {
            Console.WriteLine(creator + " => " + string.Join("; ", _byAlbum[creator]));
        }
    }
}

static class DictionaryAuxMethods
{
    public static void AddToList(this Dictionary<string, List<string>> dict,
                                 string key, string value)
    {
        if (!dict.TryGetValue(key, out var valList))
            dict[key] = [value];
        else
            valList.Add(value);
    }
}
C#

Let’s look into the code, block by block:

  • We create two dictionaries mapping creator and album names into a list of titles. They are filled in during the XML processing.
  • The title and creator methods use the InPath_ method, as before, for the same reasons.
  • The title, creator and album methods use a properly, named Father_. It allows a handler to manipulate its parent element. In this case, they add an attribute in the father element. This is useful to propagate information up in the tree.
  • The track handler takes this information and stores it in the dictionaries, using the auxiliary method.
  • Finally, the playlist handler just prints the output.

Note that a handler can be void. In this case, you do not need to return a value. Gets handy.

The code is available at my blog’s code repository. Note that this project is not concluded, and is a moving target. At the moment, my latest commit has the following hash id: d5e2b3bd70d16cdf62be942414b1e161041b07f2.

Some features still annoy me, such as the castings in the middle of the code, or the restriction of returning strings in the handlers. Things will get better with time. I will write about the developments in time. While writing this article, a couple of interesting features were implemented.

As stated above, if you like this way of processing XML documents, please let me know. I would be happy to have collaborators.

Leave a Reply