TEI – Well Done!

I will not detail anything about TEI. Sorry. I would just like to let you know that every time I need to work with any TEI subset, I find myself amazed with the quality of their documentation and the details they thought on before writing the standard.

Sometimes I just get to me thinking… do I really need all this stuff? The common answer is, no, I do not need so much detail on my annotations.

But that doesn’t mean I should not use TEI. Probably I should look to the section about the items I am trying to annotate and meditate. Probably I will not need the amount of different tags and details that are defined by TEI. But I am almost sure I will find one or two that I did not thought about. Then, I can use the portion of TEI I really want and forget about the rest. Probably my document will not validate against TEI, but probably it will not be too far away. And, probably, if someone else looks to the document, she will probably understand. And, if she don’t, I can always point to the TEI documentation and say: I am not using it all, but the subset I thought to be relevant.

Where am I using TEI? You can see it being used in the Dicionário-Aberto project, where the dictionary is encoded in a TEI subset. Also, I am looking to the TEI header and filtering it, making it an option to annotate documents on a parallel corpora project.

Dicionário-Aberto Restful API

In the last days I have been developing a Restful API for Dicionário Aberto. While Dicionário Aberto is a dictionary for the Portuguese language and therefore most users are Portuguese I prefer to describe the API here in English.

You can use a JSON or XML approach. URLs are the same. You can just change your ACCEPTS HTTP header to request xml or json documents, or you can add a .json and .xml in the URL as shown in the examples bellow.

To search for a specific entry, say cavalo, use http://dicionario-aberto.net/search/cavalo.xml or http://dicionario-aberto.net/search/cavalo.json

I know the json is not very readable, but it is being generated automatically from the XML. The XML is a subset of the TEI standard for dictionaries. Unfortunately this is not yet the final annotated version, but the main structure will not change.

If you would like to have a suggestions box, you can get ten words starting with a specific sequence of words. The next examples search for words starting with cav: http://dicionario-aberto.net/search/cav.xml?list=1 and http://dicionario-aberto.net/search/cav.json?list=1

If you use this API please give me some feedback on how I can make it better, or on what applications you develop so I can advertise them on Dicionário-Aberto main page.