You’ve probably heard it somewhere already: NoSQL is the new hotness. There are a growing number of weirdly named storage engines out there purporting to be part of the NoSQL movement. This post is the first of a small series about some recent work we’ve been doing with CouchDB. The project is still ongoing, but now seems like an appropriate time to cover what I’ve learned so far. If I’ve made any newbie mistakes, please feel free to flame me in the comments.

Anyway, let’s get on with it then. In this post, I’ll talk a little about the bits of CouchDB that have instantly appealed to me. The good bits.

CouchDB fundamentals are easy to understand

People rave about performance vs. traditional RDBM systems and its powerful replication features, but the thing that really stands out for me is how easy it is to understand and start using CouchDB. I love CouchDB’s simplicity — and I’m not just talking about its neat little web-based user interface, Futon. If you can largely comprehend JSON, JavaScript and HTTP, you’re well on the way to understanding the basics of CouchDB.

In a single CouchDB instance you can have any number of databases. Each database acts as an isolated namespace for a collection of documents. Each document is represented in CouchDB as a JSON object. Documents are not associated with one another in any meaningful way as they might be in an RDBMS. CouchDB databases may also have any number of “design documents”, which are special documents where you may write functions for views and validation functions, among other things.

Users define views in CouchDB by writing functions that map a document to one or more key/value pairs and, optionally, functions to somehow combine the aggregate values associated with each key. This approach is more popularly known as Map/Reduce. Out of the box, CouchDB supports JavaScript for writing Map/Reduce functions using Mozilla SpiderMonkey.

With all this talk about JSON and JavaScript, you might be thinking that most of your interactions with CouchDB are going to involve JavaScript in one form or another — and you probably wouldn’t be far off the mark. By default, CouchDB supports JavaScript for Map/Reduce functions, but is capable of supporting any language for which a query server implementation exists. See CouchDB’s documentation on the topic for supported languages.

CouchDB’s API is HTTP and JSON

Another thing that makes CouchDB great for developers is its RESTful HTTP API and the use of JSON objects to represent documents and collections of documents. This gives us a lot of flexibility when choosing tools — flexibility that we probably wouldn’t have if we were working with custom, binary or closed protocols and/or data formats.

For example, I recently ran into something that many mainstream CouchDB client libraries seem to struggle with: stream-based parsing of JSON arrays. It was a simple matter to quickly hack together a solution to my particular problem using Jackson, JRuby and plain old HTTP. Running into this kind of limitation in a library supporting a closed protocol or even an open-but-unfamiliar binary protocol (as in MongoDB) would have required significantly more effort on my part.

No Architectural Lock-in

CouchDB doesn’t force you into any arrangement of nodes. All CouchDB instances are equal, with no explicit master/slave relationships. Any CouchDB instance can push to any other CouchDB instance. This means you are free to pick an arrangement of Couch nodes that works for your particular application. This is probably to be expected, after all: NoSQL is about choice.

Easy Replication

Replicating data from one CouchDB database to another is a snap. Even if the target database is over a network. Further, replication is very easy to set up and use. I’ve not yet had the opportunity to use this feature in the work we’re doing beyond a few simple tests, but it’s already quite clear that CouchDB’s support for replication is going to make our lives a lot easier when it comes time to scale out.

The Stuff You Get For Free

Append-only writes ensure that data corruption is not an issue.

Automatic conflict resolution ensures that peers can replicate between one another without needing too much hand-holding.

There’s a lot of stuff in CouchDB that you probably don’t experience directly, but that you will often hear touted as benefits. Some of these features are really “behind the scenes” and don’t necessarily jump out to slap you in the face. As a result, I think to a degree we take a some of these more subtle CouchDB features for granted.

A Summary of the Good Bits

The system is conceptually simple. The JSON/HTTP API is beautiful and easy. Replication between nodes is a walk in the park. As a result, you can arrange CouchDB nodes in a way that suit you and your problem. Then, of course, there all the little things that CouchDB does which you will likely take for granted until they save your skin.

There’s a lot to like about CouchDB, but it hasn’t been all roses. In the next part of this series, I’ll talk a little about the aspects of the system that have caused us some pain on a real-world project.

One comment

Leave a Reply

%d bloggers like this: