This first page is about CouchDb, the underlying database. The aim of Hydra was to build a messaging system on top of mutually replicating databases, and CouchDb has the best replication system I’ve found. However, it is a NoSql database, which brings certain advantages and disadvantages, and you need to know about those to understand the later implementation decisions. NoSql is also flavour-of-the-month and it’s handy to know something about it to appreciate its advantages and disadvantages. This post can be read as a standalone very brief CouchDb introduction even if you have no interest in Hydra at all.

CouchDb

CouchDb is a NoSql database. The only thing in common between NoSql databases is that they cannot be queried with SQL, so that does not tell you much in itself. The interesting CouchDb features from the Hydra point of view are document storage, views, and replication, so we’ll deal with each of those in turn.

Document storage

NoSql databases range from simple key/value stores to surprisingly familiar-looking things with tables and columns. CouchDb is a document store, which is one step up from a key/value store. The objects in a document store are accessed by a unique document id string, but they differ from those in a simple key/value store by having internal structure which is accessible to the database.

CouchDb documents are JSON values. JSON is JavaScript’s object serialisation notation and is a simple way of serialising basic values, arrays, and objects to strings. For example:

{"type": "message", "topic": "Posting", "data": {"p1": [23, true], "p2": null}}

is the JSON representation of an object with three properties: type and message are strings; data is itself an object with two properties – an array of a number and a boolean, and the null value. As JavaScript is untyped, it is happy to have arrays with mixed contents.

Any JSON value can be stored as a CouchDb document. As we’ll see in a moment, these values are available to some functions within the database that extend it beyond a simple key/value store. Because CouchDb is schemaless, the JSON objects stored under different keys need not possess the same properties, or they might have a property with the same name, but containing values of different types. CouchDb puts no restrictions on the particular lump of JSON stored under any key.

Fortunately there are several serialisers for converting between .NET objects and JSON strings. Microsoft’s is currently (as of .NET 4) only available within Silverlight, though the ASP.NET Web API project looks as though it will bring the relevant DLL into a more accessible part of the .NET Framework. The most widely used serialiser, and the one I use in Hydra, is Json.NET. Hydra’s CouchDb documents are all serialised .NET objects, but it might not remain that way forever.

Views

In a key/value store, values can only be accessed by their keys. This is inconvenient if you want to get all the documents that have some particular property. CouchDb views provide indexes that allow this sort of query. Views are implemented through JavaScript functions and are perhaps best illustrated by example. Let’s say we have a database populated with documents like the one above and we want to fetch documents with a given topic value; then we might write a view along these lines:

 

function(doc) {
if (doc.type == 'message') emit([doc.topic, doc._id], null)
}

 

 

CouchDb runs each view function on every document in the database. Our function checks to see if the document has type property and, if it exists, that its value is “message”. If the document passes the test then we call the special emit function to put an entry into the view’s index. The first argument is the index entry itself, and can be any JavaScript object – in this case it is a two-element array whose first element is the topic value from the document and whose second element is the unique document id string. The second emit argument is a value that is stored along with the index entry and is useful if we want to make use of CouchDb’s map/reduce functionality, which I’m not going to cover here.

If our database has these documents:

Doc Id   Document
“qr01”   {“type”: “message”, “topic”: “Posting”, “data”: …}
“ab02”   {“type”: “message”, “topic”: “Posting”, “data”: …}
“ab01”   {“type”: “stuff”, “topic”: “Posting”, “data”: …}
“cd11”   {“hello”: “world”}
“aa01”   {“type”: “message”, “topic”: “Query”, “data”: …}

Then the view function above will create this ordered index:

["Posting", "ab02"]
["Posting", "qr01"]
["Query", "aa01"]

Only three documents pass the test and make it into the index. The index is sorted in order, so the Posting entries come together, ordered by document id, followed by the Query ones. (If we had omitted the doc.type test in the function, then every document would have produced an index entry, but all the non-message ones would have null as the first element of their index entry array. Nothing that follows would break, but there would just be a collection of pointless index entries that all sort to one end of the index.)

CouchDb then lets us query the index by asking for all elements between some specified start and end points. For each element in the range, it returns the index value, the value in the second argument to emit (null in our case) and, optionally, the document from which the index value was emitted. So we could get all the documents with topic of “Posting” by asking for everything between ["Posting", ""] and ["Posting", "zzzz"].

This sort of query style is typical of NoSql databases: you cannot just run an arbitrary “Select … Where …” sort of query. You have to consider what queries you will run in advance and set up your database to answer them. In this case we’ve decided that we need a “select by topic” query, and have defined an appropriate view. Hydra does in fact use exactly this view, along with others allowing selection by specific combinations of fields including message destination and the CouchDb instance to which the message is posted.

Views are at the heart of CouchDb. Here I’ve only brushed the surface, to explain how they relate to Hydra. There is much more about them on the CouchDb wiki and in the CouchDb guide.

Replication

CouchDb replication is very simple in concept, though there is some complexity behind the scenes in providing that simplicity. Replication is set up through JSON documents in a special _replicator CouchDb database. At its simplest each document specifies a source and a target database and whether replication is performed by the source or the target CouchDb instance. If _replicator documents are set up to replicate A to B and B to A, then we have two-way replication. You can also set up replication between any number of databases with every database replicating to every other in the set, to give full mesh replication.

Every action on a CouchDb database is either the creation, update, or deletion of a document. Replication simply performs these same actions on the target. To allow conflict detection, every document has a revision number as well as its unique document id. When updating or deleting a document, you must specify the document id and revision number of the document affected. If the revision number does not match (because someone else has changed the document), the change is rejected with an error. Replication behaviour is slightly different: is a document is changed on two replicating instances, and then the changes cross in flight, as it were, then both databases keep both new versions and mark them as being in conflict (but carry on replicating other changes, so the replication process is not affected). An arbitrary but deterministic algorithm decides which is the conflict winner, so that both databases have the same winner. An API allows users to determine which documents are in conflict, and then resolve the conflict by updating the winner – that change then replicates so that all databases come into sync. (It’s possible that conflicting resolutions may happen simultaneously, in which case they will be marked as being in conflict, and themselves need resolution.) The system is clever enough to notice if an incoming replication update has already been applied or if it does not change the document contents – in this case the change is ignored and not itself replicated. This prevents replication changes in the full mesh case from propagating endlessly around the system.

That was a slightly complex explanation. From the Hydra point of view, they key point is that CouchDb offers full multi-master replication between any number of live databases in a way that’s very easy to set up and, because messages are only created and never updated, there is no need to worry about the possibility of conflicting updates. This is the feature that really marks out CouchDb from other NoSql (and indeed SQL) databases for Hydra’s purposes: other systems tend to have only master-slave replication, or at best, two-way replication between a pair of databases. That replication tends to be hard to set up and maintain, and fragile. CouchDb is a long way ahead in this respect – in fact there’s another refinement, that the _replicator database can itself be replicated, which can make setting up replication on new CouchDb instances very easy indeed.

Summary

That was a lightning tour of CouchDb as it relates to Hydra. The key points are:

  • Support for full-mesh multi-master replication. This is the killer feature.
  • Storage of messages as structured documents.
  • Access to data is restricted to creating ordered indexes on document properties and then selecting contiguous slices of those indexes. Bending this to the requirements of message delivery is the key consideration in Hydra design, especially in its selection of document ids.

There’s a lot more to CouchDb than this. In particular I’ve said very little about map/reduce and CouchDb’s use of a RESTful HTTP interface. Maybe another day…

The second page is about how Hydra uses CouchDb to create a messaging system that can give guaranteed, mostly ordered message delivery across a distributed system, without the need for any server components besides CouchDb itself.

Last edited Apr 8, 2012 at 7:47 PM by NickNorth, version 2

Comments

No comments yet.