Failover

Sometimes networks or machines fail, and a Hydra server may become temporarily unreachable by some of its clients. Because all Hydra messages are delivered to all servers, this need not be a problem: as long as some server is reachable, clients can swap to that one to continue posting and receiving messages.

Round-robin failover

The .NET library for Hydra supports a couple of transparent failover technique. If you have a list of Hydra servers, then you can supply them in the initial HydraService creation as follows:

var servers = new List<string> {"server1", "server2"};
_hydraService = new HydraService(new RoundRobinConfigProvider(servers, "hydra"));

The first argument can be any IEnumerable<string> and the system will use server1 until it fails, when it will swap to server2, and so on in a round-robin fashion.

Behind the scenes there are a few elements to this:

  1. The internal polling system notifies the RoundRobinConfigProvider if the server it is polling is not responding.
  2. The RoundRobinConfigProvider monitors notifications from all the pollers in the app, and decides whether to swap server or not. If it feels there is a sufficiently serious problem, it switches the app to use a different server.
  3. Every time the poller polls, it first checks to see if the application server has changed since the last poll. If it has, then it uses the new server.

The RoundRobinConfigProvider class is not very clever about server switching: when notified of an error from the currently active server, it switches to the next one in the provided list.

Nearest server failover

A better system would actively check the state of the servers in the background and make active decisions about which one to use. The NearestServerConfigProvider does this. It is created in just the same way as the RoundRobinConfigProvider:

var servers = new List<string> {"server1", "server2"};
_hydraService = new HydraService(new NearestServerConfigProvider(servers, "hydra"));

NearestServerConfigProvider periodically executes a query on each of the servers. If one of them is significantly faster to respond than the current server, it will switch. This means that a temporary local server failure will cause the system to switch to a more distant one, but then switch back when the local server comes back on line. It only switches if a server is significantly faster, to prevent constant flitting between two servers with similar response times.

Note that, under either failover mechanism, failure to send a message notifies the IConfigProvider, and also raises an error to the client. This means that there is a good chance that retrying a failed send will work. It might be useful to add a retry into the main sending code and only raise an error if repeated retries fail.

Preferred server failover

Sometimes you want some control over which server should be used: you might want the client to use a server on the local network if it is available, and only failover to a more remote server if the local one goes down. PreferredOrderConfigProvider gives this ability. It is created just as the other config providers:

var servers = new List<string> {"server1", "server2"};
_hydraService = new HydraService(new PreferredOrderConfigProvider(servers, "hydra"));

It polls the given list of servers, just like NearestServerConfigProvider, but it always uses the first working server in the list, regardless of response time. If that server fails, it will switch to one further down the list, and switch back when an earlier one in the list becomes available.

Hydra Version 0.5

Last edited Aug 11, 2012 at 4:47 PM by NickNorth, version 6

Comments

No comments yet.