I have been using the Text Encoding Initiative Guidelines to encode dictionaries. I used it originally in DicionĂ¡rio Aberto, and more recently in a work with the Portuguese Dictionary of the Lisbon Science Academy.

In the last week I started teaching a course on Digital Lexicography (do not ask what that is, it is just the best name we could find) and I started running an OCR and transcribing and annotating the Caldas Aulete dictionary from 1925.

In my previous uses of TEI, I never discussed much the usage of abbreviations. I just used them. Nothing fancy. This time, I decided to include in the document, somehow, the abbreviation expansions.

When looking up how to encode an abbreviation and its expansion, the following approach is suggested:

<choice>
   <abbr>s.</abbr>
   <expan>singular</expan>
</choice>

But as far as the TEI examples go, this should be done for each one occurrence of the abbreviation, saying that, in that specific point, there are two different ways to encode the information.

While this can be useful in a text where an abbreviation is used one or two times, this is not a good approach for something that is repeated some thousand of times during the document. I suspect that the better approach is (and that is what I am doing at the moment) to include a list of all abbreviations somewhere, just to have that information encoded, and during the remaining of the document, just use the abbreviation. At the moment I am not referring one to each other using XPointer or XML IDs. Just using them, as later, programmatically, I can add that information.

But this is not a single example of this kind of thing happening on TEI. I would really like to discuss these things with my old and dead friend Sebastian Rahtz, that contributed to both TEI and LaTeX and, in this last one, I think abbreviations are being done the correct (or better) way.

Leave a Reply