You’ve probably heard it somewhere already: NoSQL is the new hotness. There are a growing number of weirdly named storage engines out there purporting to be part of the NoSQL movement. This post is the first of a small series about some recent work we’ve been doing with CouchDB. The project is still ongoing, but now seems like an appropriate time to cover what I’ve learned so far. If I’ve made any newbie mistakes, please feel free to flame me in the comments.
Anyway, let’s get on with it then. In this post, I’ll talk a little about the bits of CouchDB that have instantly appealed to me. The good bits.
CouchDB fundamentals are easy to understand
In a single CouchDB instance you can have any number of databases. Each database acts as an isolated namespace for a collection of documents. Each document is represented in CouchDB as a JSON object. Documents are not associated with one another in any meaningful way as they might be in an RDBMS. CouchDB databases may also have any number of “design documents”, which are special documents where you may write functions for views and validation functions, among other things.
CouchDB’s API is HTTP and JSON
Another thing that makes CouchDB great for developers is its RESTful HTTP API and the use of JSON objects to represent documents and collections of documents. This gives us a lot of flexibility when choosing tools — flexibility that we probably wouldn’t have if we were working with custom, binary or closed protocols and/or data formats.
For example, I recently ran into something that many mainstream CouchDB client libraries seem to struggle with: stream-based parsing of JSON arrays. It was a simple matter to quickly hack together a solution to my particular problem using Jackson, JRuby and plain old HTTP. Running into this kind of limitation in a library supporting a closed protocol or even an open-but-unfamiliar binary protocol (as in MongoDB) would have required significantly more effort on my part.
No Architectural Lock-in
CouchDB doesn’t force you into any arrangement of nodes. All CouchDB instances are equal, with no explicit master/slave relationships. Any CouchDB instance can push to any other CouchDB instance. This means you are free to pick an arrangement of Couch nodes that works for your particular application. This is probably to be expected, after all: NoSQL is about choice.
Replicating data from one CouchDB database to another is a snap. Even if the target database is over a network. Further, replication is very easy to set up and use. I’ve not yet had the opportunity to use this feature in the work we’re doing beyond a few simple tests, but it’s already quite clear that CouchDB’s support for replication is going to make our lives a lot easier when it comes time to scale out.
The Stuff You Get For Free
Append-only writes ensure that data corruption is not an issue.
Automatic conflict resolution ensures that peers can replicate between one another without needing too much hand-holding.
There’s a lot of stuff in CouchDB that you probably don’t experience directly, but that you will often hear touted as benefits. Some of these features are really “behind the scenes” and don’t necessarily jump out to slap you in the face. As a result, I think to a degree we take a some of these more subtle CouchDB features for granted.
A Summary of the Good Bits
The system is conceptually simple. The JSON/HTTP API is beautiful and easy. Replication between nodes is a walk in the park. As a result, you can arrange CouchDB nodes in a way that suit you and your problem. Then, of course, there all the little things that CouchDB does which you will likely take for granted until they save your skin.
There’s a lot to like about CouchDB, but it hasn’t been all roses. In the next part of this series, I’ll talk a little about the aspects of the system that have caused us some pain on a real-world project.