Abbreviations on TEI

I have been using the Text Encoding Initiative Guidelines to encode dictionaries. I used it originally in Dicionário Aberto, and more recently in a work with the Portuguese Dictionary of the Lisbon Science Academy.

In the last week I started teaching a course on Digital Lexicography (do not ask what that is, it is just the best name we could find) and I started running an OCR and transcribing and annotating the Caldas Aulete dictionary from 1925.

In my previous uses of TEI, I never discussed much the usage of abbreviations. I just used them. Nothing fancy. This time, I decided to include in the document, somehow, the abbreviation expansions.

When looking up how to encode an abbreviation and its expansion, the following approach is suggested:

<choice>
   <abbr>s.</abbr>
   <expan>singular</expan>
</choice>

But as far as the TEI examples go, this should be done for each one occurrence of the abbreviation, saying that, in that specific point, there are two different ways to encode the information.

While this can be useful in a text where an abbreviation is used one or two times, this is not a good approach for something that is repeated some thousand of times during the document. I suspect that the better approach is (and that is what I am doing at the moment) to include a list of all abbreviations somewhere, just to have that information encoded, and during the remaining of the document, just use the abbreviation. At the moment I am not referring one to each other using XPointer or XML IDs. Just using them, as later, programmatically, I can add that information.

But this is not a single example of this kind of thing happening on TEI. I would really like to discuss these things with my old and dead friend Sebastian Rahtz, that contributed to both TEI and LaTeX and, in this last one, I think abbreviations are being done the correct (or better) way.

TEI – Well Done!

I will not detail anything about TEI. Sorry. I would just like to let you know that every time I need to work with any TEI subset, I find myself amazed with the quality of their documentation and the details they thought on before writing the standard.

Sometimes I just get to me thinking… do I really need all this stuff? The common answer is, no, I do not need so much detail on my annotations.

But that doesn’t mean I should not use TEI. Probably I should look to the section about the items I am trying to annotate and meditate. Probably I will not need the amount of different tags and details that are defined by TEI. But I am almost sure I will find one or two that I did not thought about. Then, I can use the portion of TEI I really want and forget about the rest. Probably my document will not validate against TEI, but probably it will not be too far away. And, probably, if someone else looks to the document, she will probably understand. And, if she don’t, I can always point to the TEI documentation and say: I am not using it all, but the subset I thought to be relevant.

Where am I using TEI? You can see it being used in the Dicionário-Aberto project, where the dictionary is encoded in a TEI subset. Also, I am looking to the TEI header and filtering it, making it an option to annotate documents on a parallel corpora project.