30 Jun 2011 Couchtato – A CouchDB Document Utility Tool Written In Node.js
We have been using Apache CouchDB at one of our clients project, and despite how well CouchDB worked, we learnt that there was still no easy way to find certain documents without resorting to writing map reduce functions either via a temporary view or a design document. This was an issue in particular for our testers and others who were not exactly familiar with CouchDB. We needed an easier to use alternative, and that was the main reason why I wrote Couchtato.
Couchtato is a command line tool that remotely iterates all documents in a CouchDB database and applies a set of JavaScript functions against each document, this can roughly be seen as applying ‘offline views’ to a remote database. These functions are defined in a couchtato.js file that lives in file system.
Here’s an example of a simple function that logs the ID of all documents with category ‘Cafe’ and city ‘Melbourne’. Notice that you only need to deal with the documents directly (via doc variable) and you don’t have to worry about implementing map reduce functions.
exports.conf = { "tasks": { "find-documents-by-criteria": function (c, doc) { if (doc.category === 'Cafe' && doc.city === 'Melbourne') { console.log(doc._id); } } } }
But wait, that’s not all! Couchtato also exposes a utility ‘c’ variable which provides convenient functions to save and remove a document, and also to count and log any string which is usually built up with values from the document itself.
exports.conf = { "tasks": { "rename-city": function (c, doc) { if (doc.city === 'Melbourne') { doc.city = 'Adelaide'; c.save(doc); } }, "delete-no-city": function (c, doc) { if (!doc.city) { c.remove(doc); } }, "count-by-city": function (c, doc) { if (doc.city) { c.count('Documents with city ' + doc.city); } }, "log-city": function (c, doc) { if (doc.city === 'unknown') { c.log('Found unexpected ' + doc.city + ' city!'); } } } }
How does it work? Couchtato uses Cradle as the default database driver. It retrieves all documents using linked list pagination technique where each page is then processed asynchronously, and each document within each page is exposed to each couchtato.js function one by one.
So that was the easy part. Now, for the power users… couchtato.js is basically a Node.js module file, which means you can use Node.js API and require various Node.js modules, making couchtato.js pretty powerful. For example, you can post the documents to Twitter or Facebook if you want to, or check the documents against Akismet spam detection service using the relevant Node.js modules for those services, both of which are not something you can easily implement in CouchDB temporary view.
Going back to our client’s project. What have we used Couchtato for thus far?
- Our testers use Couchtato to look for documents with certain criteria.
- One of the developers uses Couchtato to export the data in XML/CSV format.
- The support team uses Couchtato to delete duplicated data and to fix incorrect data.
- Hooked up to a continuous integration tool, we also use Couchtato to automatically generate statistics report.
We’ve been quite happy with Couchtato and it has been a nice addition to the developer’s tool belt.
Couchtato source code, installation, and usage instructions are available on GitHub at https://github.com/cliffano/couchtato. Feedback and contribution are welcome!
Marc Fasel
Posted at 09:11h, 30 JuneHey Cliff,
looking forward to using this on our CouchDB project as well!
Cheers
Marc
adrian
Posted at 23:06h, 30 JuneWe ran into the same issue, and ended up turning to ElasticSearch to index our databases.
http://developmentseed.org/blog/2011/may/31/flexible-faceting-and-full-text-indexes-using-elasticsearch
gives you the full power of lucene for searches.
Max Ogden
Posted at 05:01h, 01 Julynice workflow. you might also be interested in this combo: https://github.com/maxogden/removalist & https://github.com/maxogden/refine-uploader
Cliffano Subagio
Posted at 12:27h, 01 JulyHi Adrian,
Thanks for the link :), I like how ES solved those CouchDB limitations for you.
We do have full text indexing. We’re currently using couchdb-lucene (which we also contributed a patch to to enable easy Tomcat deployment https://github.com/rnewson/couchdb-lucene/commit/0081272a30dc679effc1cf1298e365b953f568a5), and another indexing solution using FAST http://www.microsoft.com/enterprisesearch/ via a custom adapter.
These two search solutions are not something that we can easily replace at the moment. And as you probably already know, their indices are not flexible enough, often our testers approach one of the developers asking “Hey I need to find the documents with criteria X, Y, and Z.” and guess what, those fields are not available on the indices. With Couchtato, they would be able to specify their own custom criteria and inspect each document as how it’s originally stored in CouchDB, this saves a lot of dev/devops time.
At the end of the day, Couchtato allows more than just the ability to look up for documents with the save/remove/count utility functions on top of access to NodeJS API and modules, so even though the ‘search problem’ started it, Couchtato ended up solving more than that.
Pingback:JavaScript Magazine Blog for JSMag » Blog Archive » News roundup: Paper.js, Fathom.js, test262
Posted at 03:07h, 02 July[…] Couchtato – A CouchDB Document Utility Tool Written In Node.js […]
Cliffano Subagio
Posted at 09:36h, 02 JulyMarc, looking forward to getting your feedback.
Thanks Max, I think Removalist would be handy for a UI-based CSV download.
Mike Miller
Posted at 00:07h, 20 JulyJust stumbled on this looking for a node+couch example — thanks! If you’re looking for flexible indices, the TextAndNumbers index on Cloudant search enables a full index of documents with snappy numerical range queries as well as FTI, it’s replaced a lot of the boilerplate map views that I used to write.
(*) https://cloudant.tenderapp.com/kb/search/search-indexing
{
“language”:”java”,
“views” :
{
“index” : {“map”:{“classname”:”com.cloudant.indexers.TextAndNumberSearch”},”reduce”:”_count”}
}
}
Cliffano Subagio
Posted at 01:49h, 20 JulyThanks Mike.
It was great news when Cloudant Search was announced a week ago http://blog.cloudant.com/announcing-cloudant-search/ . Something to consider along with ElasticSearch.