Comparison of ZeroMQ and Redis for a robot control platform
This document is research for the selection of a communication platform for robot-net.
The purpose of this component is to enable rapid, reliable, and elegant communication between the various nodes of the network, including controllers, sensors, and actuators (robot drivers). It will act as the core of robot-net to create a standardized infrastructure for robot control.
Requirements:
Given these requirements and available technologies, the final two choices for this component come down to ZeroMQ and Redis. They are both best-in-class, but very different tools.
ZeroMQ
ZeroMQ is a high-performance asynchronous messaging library for distributed or concurrent applications. It acts like a message queue, but without any requirements for an intermediate broker. It uses a minimalistic socket-like API, and can use TCP, PGM, or IPC (Unix-style socket) transports. It has several messaging patterns like request-reply, publish-subscribe, push-pull, and exclusive pair, that provide differing protocols and behaviors. Performance tests over 10Gb Ethernet are here, showing a throughput of 2.8 million msg/s at 10 bytes messages and 1.4 million msg/s at 100 bytes messages. Latency stays around a constant 33us for messages under 4000 bytes. Maximum bandwidth possible is ~2.5Gb/s. This document shows the effects of copying data on latency at the application level, generally increasing latency by a factor of about 30%. Bindings exist for every major language. A possible alternative to ZeroMQ is the fork nanomsg.
Redis
Redis is an advanced key-value store, or data structure server. It works in-memory, with optional persistency. The primary data type is a string, but Redis also supports hashes, lists, sets, sorted sets, bitmaps, and hyperloglogs. It is one of the most popular key-value stores, and performance is at the very top when on-disk durability is not required. Redis runs as a centralized server, and clients communicate using the Redis Serialization Protocol, which is a request-response model using TCP. Interestingly, it also acts separately as a publish/subscribe server. Performance benchmarks show typical throughputs of 30-100k requests/s, and around 200-400k for pipelining (batch requests, not relevant for high-frequency sampling). However, on my machine, I see more like 100-200k requests/s, and 700-1000k requests/s with pipelining. These results are about constant for data under 1000 bytes over 1Gb Ethernet. Average latency (over localhost) seems to be around 150us. Redis clients exist for every major language.
Ties
ZeroMQ Advantages
Redis Advantages
Basic benchmark programs were written in C++ for both ZeroMQ and Redis. Both send messages of the form "Hello at 1419140353074", where the number is the current epoch time in milliseconds. This lets us test throughput and latency together. The string message has a size of approximately 22 bytes. For both tests, there is a sending and receiving process, both communicating over the loopback TCP interface.
All complete benchmarks are available here.
Results for one writer, one reader:
Results for one writer, four readers (using time parallel -j 4 ./build/COMMAND_NAME -- 1 2 3 4
):
The decision comes down to the difference between a fully distributed solution with raw speed (ZeroMQ) versus a centralized solution with more accessibility (Redis). Using Redis with an event library and asynchronous looping greatly speeds it up, but ZeroMQ still has about 6x higher messaging rate. Both ZeroMQ and Redis can be sped up by pipelining (batching) requests - it is unclear if this is helpful, if each node is dealing with data regarding a separate robot. Redis definitely makes a lot of things easier - logging, persistence, pipelining, event loops. However, it is a single-threaded centralized server and will slow down when many nodes are reading, whereas a distributed system will be affected less by congestion. All tests were run on a local machine, but we can imagine that most applications of this library will run on a local network, so network latencies should never be too bad. Both solutions should fulfil the requirements. The big question is how much throughput we actually need to handle. If we assume four robots at 10 kHz each, we see that Redis should be able to handle this just fine. Therefore, because it will be faster to implement a complete solution, the current conclusion is to proceed with Redis until we run into an important use case for which it is prohibitively slow.