Experimenting eXist-DB on Docker

I’ve been using eXist-DB for some time for the project of the Dicionário da Academia de Ciências de Lisboa, that is being revived from the PDF into TEI, so that a new digital version can be soon released.

Recently I needed to update the server where eXist-DB was running, and decided to use a dockerized version of it. Although that can make things a little slower (not really sure), it makes things easier to replicate, and now I can have, easily, the dictionary database running on my laptop or in the server, using the same code.

I am using the default latest version of eXist-DB docker image. The only difference is that, because my XQuery code uses FunctX functions, I needed to import that module. Thus, my Dockerfile is composed by:

FROM existdb/existdb:latest

ADD http://exist-db.org/exist/apps/public-repo/public/functx-1.0.1.xar /exist/autodeploy

I have the data and application on a GIT repository, as exported by the eXist-DB backup tool. Thus, I decided to create a simple script to import the data, instead o creating the docker image already with that data. Therefore, my docker-compose.yml file is composed by:

version: '3.3'
services:
  exist:
    build: ./dacl/docker
    container_name: exist
    ports:
        - 8080:8080
        - 8443:8443
    volumes:
        - ./data:/exist/data
        - ./config:/exist/config
        - ./dacl:/import
        - ./outdir:/export

The relevant parts:

  • The path to the folder including the Dockerfile (dacl/docker)
  • Ports 8080 and 8443 are used by eXist-DB, and I am just forwarding them to the host
  • Created four volumes: data stores the binary database data, config stores the configuration files, and import and export volumes are used to import data, and export data for backup.

For importing all data into the database I am using a shell script. The first five lines import some collections. The last two execute two auxiliary scripts, the first to re-index application data, and the second to create the proper groups, users and assign passwords.

First, the import.sh script is:

#!/usr/bin/env bash

docker-compose exec -T exist java org.exist.start.Main backup -u admin -r /import/db/academia/__contents__.xml
docker-compose exec -T exist java org.exist.start.Main backup -u admin -r /import/db/apps/academia/__contents__.xml

docker-compose exec -T exist java org.exist.start.Main backup -u admin -r /import/db/academia-2001/__contents__.xml
docker-compose exec -T exist java org.exist.start.Main backup -u admin -r /import/db/apps/academia-2001/__contents__.xml

docker-compose exec -T exist java org.exist.start.Main backup -u admin -r /import/db/schemas/__contents__.xml

docker-compose exec -T exist java org.exist.start.Main client -u admin -F /import/xq/repair.xq
docker-compose exec -T exist java org.exist.start.Main client -u admin -F /import/xq/users.xq

Note that the XQuery scripts are being held in the same folder that is mounted in the import volume. Otherwise, you will not be able to access it from inside the container.

The repair XQuery script holds this code:

import module namespace repair="http://exist-db.org/xquery/repo/repair"
at "resource:org/exist/xquery/modules/expathrepo/repair.xql";
repair:clean-all(),
repair:repair()

And finally, the users Xquery script has the following code:

sm:passwd('admin','admin-password),
sm:create-group('dacl'),
sm:create-account('ana','ana-password','dacl'),
sm:chgrp(xs:anyURI('/db/academia'), 'dacl'),
sm:chgrp(xs:anyURI('/db/academia-2001'), 'dacl'),
sm:chgrp(xs:anyURI('/db/apps/academia-2001'), 'dacl'),
sm:chgrp(xs:anyURI('/db/apps/academia'), 'dacl'),
sm:chmod(xs:anyURI('/db/academia-2001'), 'rwxrwx---'),
sm:chmod(xs:anyURI('/db/academia'), 'rwxrwx---'),
sm:chmod(xs:anyURI('/db/apps/academia-2001'), 'rwxrwx---'),
sm:chmod(xs:anyURI('/db/apps/academia'), 'rwxrwx---')

Also, in case it gets useful, this is my backup.sh script

docker-compose exec -T exist java org.exist.start.Main backup -u admin -p admin.entrada -b /db/academia -d /export
docker-compose exec -T exist java org.exist.start.Main backup -u admin -p admin.entrada -b /db/academia-2001 -d /export
docker-compose exec -T exist java org.exist.start.Main backup -u admin -p admin.entrada -b /db/apps/academia -d /export
docker-compose exec -T exist java org.exist.start.Main backup -u admin -p admin.entrada -b /db/apps/academia-2001 -d /export
docker-compose exec -T exist java org.exist.start.Main backup -u admin -p admin.entrada -b /db/schemas -d /export

rsync -aASPvz --delete-after outdir/db/ dacl/db/

DATE=`date +%Y%m%d`
cd dacl && git commit -a -m "Backup $DATE" && git push origin v5

Of course this is not rocket science, and this approach might have a lot of problems, but in the other hand, it might get handy to someone.

Abbreviations on TEI

I have been using the Text Encoding Initiative Guidelines to encode dictionaries. I used it originally in Dicionário Aberto, and more recently in a work with the Portuguese Dictionary of the Lisbon Science Academy.

In the last week I started teaching a course on Digital Lexicography (do not ask what that is, it is just the best name we could find) and I started running an OCR and transcribing and annotating the Caldas Aulete dictionary from 1925.

In my previous uses of TEI, I never discussed much the usage of abbreviations. I just used them. Nothing fancy. This time, I decided to include in the document, somehow, the abbreviation expansions.

When looking up how to encode an abbreviation and its expansion, the following approach is suggested:

<choice>
   <abbr>s.</abbr>
   <expan>singular</expan>
</choice>

But as far as the TEI examples go, this should be done for each one occurrence of the abbreviation, saying that, in that specific point, there are two different ways to encode the information.

While this can be useful in a text where an abbreviation is used one or two times, this is not a good approach for something that is repeated some thousand of times during the document. I suspect that the better approach is (and that is what I am doing at the moment) to include a list of all abbreviations somewhere, just to have that information encoded, and during the remaining of the document, just use the abbreviation. At the moment I am not referring one to each other using XPointer or XML IDs. Just using them, as later, programmatically, I can add that information.

But this is not a single example of this kind of thing happening on TEI. I would really like to discuss these things with my old and dead friend Sebastian Rahtz, that contributed to both TEI and LaTeX and, in this last one, I think abbreviations are being done the correct (or better) way.

Plotting functions with LaTeX & Tikz

I’ve been working on some notes for my students on Neural Networks. I am using Pandoc to write the text, and convert it back to LaTeX for PDF generation, and to HTML, for making the documents available online. In a future post I might talk about Pandoc, but for now, I want to share something I found on TeX Stack Exchange to plot functions. To make it clear how to use, I am presenting four simple examples: four common activation functions used in Neural Networks.

To start with, the well known sigmoid function: \[a = \frac{1}{1+e^{-z}}\]:

This can be achieved with the following tikzpicture environment:

\begin{tikzpicture}
\begin{axis}[
    axis lines=middle,
    xmax=10,
    xmin=-10,
    ymin=-0.05,
    ymax=1.05,
    xlabel={$x$},
    ylabel={$y$}
]
\addplot [domain=-9.5:9.5, samples=100,
          thick, blue] {1/(1+exp(-x)};
\end{axis}
\end{tikzpicture}

Now, for the tanh, defined as \[a=\frac{e^z – e^{-z}}{e^z + e^{-z}}\], you get:

With the following code:

\begin{tikzpicture}
\begin{axis}[
    axis lines=middle,
    xmax=10,
    xmin=-10,
    ymin=-1.05,
    ymax=1.05,
    xlabel={$x$},
    ylabel={$y$}]
\addplot [domain=-9.5:9.5, samples=100,
     thick, blue] {(exp(x) - exp(-x))/(exp(x) + exp(-x))};
\end{axis}
\end{tikzpicture}

I was already loving it, but it was awesome when I tested the ReLU math, and it worked as expected: \[a=\max(0, z)\]

This one can be obtained using the simple…

\begin{tikzpicture}
\begin{axis}[
    axis lines=middle,
    xmax=6,
    xmin=-6,
    ymin=-0.05,
    ymax=5.05,
    xlabel={$x$},
    ylabel={$y$}]
\addplot [domain=-5.5:5.5, samples=100, thick, blue] {max(0, x)};
\end{axis}
\end{tikzpicture}

And now, although not being extremely interesting, the Leaky ReLU (well, not implemented exactly as the definition, just to get it a little more easy to read): \[a=max(0.001 \times z, z)\]

With the cheating…

\begin{tikzpicture}
\begin{axis}[
    axis lines=middle,
    xmax=6,
    xmin=-6,
    ymin=-1.05,
    ymax=5.05,
    xlabel={$x$},
    ylabel={$y$}]
\addplot [domain=-5.5:5.5, samples=100,
          thick, blue] {max(0.1 * x, x)};
\end{axis}
\end{tikzpicture}

And to conclude, and just for reference, do not forget to load the required packages:

\usepackage{tikz}
\usepackage{pgfplots}

Arango::Tango Perl 5 Module

(cross posted from blogs.perl.org)

I am planning on a new project, and a friend suggested me to look to Arango, as an alternative do Mongo, specially because it includes a graphs query language integrated. As I am not really used to Mongo at all, decided to give it a try.

Unfortunately I did not find a proper module to use Arango from Perl. Therefore I decided to start one from scratch. Probably not a great idea as my free time is likely non-existent. But nevertheless, I did it. Arango::DB was born!

I am trying to abstract major entities (database, collection, document, cursor) in order to use them directly as proper objects in Perl, and at the same time, try to keep a low code profile, delegating most options directly to Arango REST interface. In order not to be too relaxed, I am using JSON::Schema::Fit to validate the parameters sent to Arango API.

The main problem is that there are too many options and endpoints to use, and too few time to do it. Anyway, there is a working version, that already allows the storage of documents, and querying them using AQL. Unfortunately all the user-related functionalities are not yet implemented.

While I would love to keep maintaining and defining the direction of the module, I am very grateful to any patches, bug fixes of added functionalities, keeping the same simple approach in mind.

Regarding using a module for objects, I might decide to use Moo, but for now, I am still happy using Perl blessed hashes directly.

Hope to post more news very soon.

GIT repository: https://gitlab.com/ambs/perl5-arango-tango/
Latest release: https://metacpan.org/pod/Arango::Tango

Hawaii Five-0 (2010) – S01

I watched some random episodes in the TV, and recently decided to give it a watch. As my sister also likes, it allows some nice time at the night.

The series is not something formidable. But it is fun, it has some action, and it runs in Hawaii, with nice landscapes, great beaches and bikini babes. What else can we ask for in a TV series?

As any other series, some episodes are great, some good, some acceptable. But I must say that Season 1 ends with high interest. Looking forward to start watching Season 2 today.

Star Trek: The Motion Picture (1979)

Enterprise shipyard

I know this is an old movie. After watching it I remember I did it before. Well, regarding it, what can I say? Of course I can’t comment the special effects. But while some images look cool (see the one above) some other are quite, quite bad (as an example, the small ship used to travel between Starfleet quarters to enterprise. So, not sure why some are not that bad, and some are awful. But then, in 1979 I was one year old.

About the story, acceptable. I found V-GER story interesting. Always nice to see some cross-over between sci-fi and some reality.

Also, I just discovered that the series is prior to the movie. So, now I have in my queue to watch at least one episode from it, and see if I want to see them all.

The Devil’s Violinist (2013)

David Garrett

Really bad movie for what I was expected. It is not a documentary (just like Bohemiam Raphsody isn’t a documentary). Thus, we expect a minimal story. And this story is… well, mostly non existent.

Other than that, I was expecting better acting. I know David Garrett is not an actor. But I suppose there was a casting for that character. And choosing him was a bad idea. Even being a violinist player, he exaggerated on his movements. Paganini was a virtuoso, and played some dissonant musics. But you should not move like player a rock music to mimic Paganini. Really.

The nice part of the movie, as usual, is a girl. Andrea Deck is gorgeous. And her acting was quite good. Kudos to her.

August Rush (2007)

Freddie Highmore

Someone told me and my sister about this movie, as being nice, and interesting to present to music students (10 yo). I took the chance to watch it too. And although there is a lot I can argue about (namely to someone learn how to write music from a single explanation of the position of notes in a keyboard), the movie is interesting enough. Of course it is a Drama, something I usually prefer not to watch. But the story is nice, and ends well, as expected for Hollywood movies.

The most interesting thing (other than having the nice and cute Keri Russell, is to listen my sister saying, about the main character (above in the picture) that he resembles the actor for Good Doctor. And then, when looking at the information at IMDB, find out that she is right. I wouldn’t get there myself.

Chef (2014)

Jon Favreau and Emjay Anthony
Jon Favreau and Emjay Anthony

Another movie that was in the TV, and that I started watching just because I like the title. I will not talk about the story. It is nothing really new, and it is quite simple (I could explain the whole movie in two lines). But the movie is enjoyable, fun, and results on some good time. What I did not understand is why we have Scarlett Johansson, Dustin Hoffman or Robert Downey Jr. They are all well known actors, and their characters in this movie are less than secondary. And in 2014 they were already well known for Marvel (at least Scarlett and Robert) and I do not think they were needing the money. And this movie production could be quite cheaper with other actors. Just saying.

Other than that, kudos to Emjay Anthony. I really enjoyed his acting in this movie.

Astérix: Le secret de la potion magique (2018)

Gaulish Team
Gaulish Team

It would be hard to write this post to talk about the story and the characters. I am a fan of Asterix, but I am used to read it in Portuguese, and writing about it in English would be hard. And for those wondering, no, I did not watch the original (French) version, but the dubbed version. And although I usually hate Portuguese dubbed versions, it wasn’t that bad. Acceptable enough.

About the story, well it focus the usual ingredients of Asterix stories: the magic potion, the Romans, the pirates, the wild boars, etc. It isn’t as good as the first books. It focus some ideas from recent movies that I will not refer so I do not give any spoilers. Ah… and it has a religion reference that, although I find it humorous, I think some other Catholics might not like.