I bought this book during YAPC::EU::2010, in Pisa. At that moment I just opened the book a little, and found the index interesting. It was a book being auctioned, and therefore my investment was not only thinking in the book contents and their usefulness, but also in the help to the conference organizers.
I was surprised the book was written with examples in Ruby. Just surprised because I was in a Perl conference, and when I opened the book, it seemed like Perl. But I am happy with other programming languages, and Ruby is an interesting language.
My problem arrived when I started reading the Natural Language Processing section on the book. I’ve read a couple of paragraphs and get annoyed. The author claims that works since 1980 in Natural Language Processing, but the book contents on this subject are quite basic and not well explained.
Would like to point three aspects that annoyed me. Two of them are related to Natural Language Processing, one other just about how examples are written in a powerful programming language like Ruby:
- The section on stemming seems like black magic. It shows an example of an word, and the respective stemmed word. But it doesn’t explain how to perform that task. Just points to a module that implements that task. I am against this kind of book. It is crucial that the author explains the basics on how the algorithms are implemented. Otherwise the reader will not learn a thing.
- On the segmentation algorithm, to show that some special words can make this task difficult mentions the human names prefixes, like Mr., Dr., etc. Then, they use an array of these human name prefixes for the algorithm to work. Not a problem, that is a common approach. But not to mention that we are talking about abbreviations, and not just human name prefixes, shows a lot of ignorance. No, I can understand that the author knows that. But to get an example, and develop code just for that example, without generalizing it, seems a bad idea.
- Finally, I do not understand why the author needs to declare an array with the digits from 0 to 9 no match floating point numbers in a string. Doesn’t Ruby as regular expressions, or the usual isDigit method? Or, at least, methods to construct lists by comprehension instead of listing all digits?