In the infrastructure design process we were focusing on our main goal – to develop it very simple with true UNIX-way in mind => e.g. each subsystem should do only one thing and should do it well. This, in addition to the highest level of HA, allowed us to avoid dependencies on any single point of failure.
flespi High Availability features
Based on our experience with the administration of the Wialon Hosting system now operating in multiple distributed data centers we divided HA into a few levels:
- hardware level: e.g. racks, switches, network cables, power connectors, drives;
- network traffic level: e.g. more than one gateway;
- software level: all types of software should be developed to enable seamless operation on multiple servers at once and should "survive" if any of these servers will become inaccessible.
On the hardware level, we made the maximum possible – there are two switches in each rack, each server is connected via bond wires to both of them and comes with two separate power outlets. For system drives, we use MD RAID-1.
Now two servers hosting gateway services are filtering and distributing traffic from external IP addresses to other LAN subsystems. Servers are controlled by Pacemaker, in addition, there is a third server storing static daily archive and history in case of malfunction of the main ones.
To gain software HA we use different techniques: first of all, the most sensitive data is stored in the PostgreSQL database which is running in the HA mode on multiple database servers. Not all the data, because this RDBMS is slow when we talk about our volumes of telematics data, but just most of the metadata. Then, each system is fully autonomous and communicates with other systems via the internal HTTP REST API. These independent subsystems are called microservices and are becoming a very convenient architectural approach. The only important thing in microservices is the API, and it has been standardized. We developed guidelines for it and were following these guidelines during API development. All other internals are not so important. It’s up to the subsystem developer to select this or another approach or even test a few variations of implementations on parallel nodes. As long as they both operate using the same API, it is absolutely ok.
flespi platform architecture can be described by the following chart:
flespi subsystems
Currently, we have the following subsystems operating:
- PROD: product gateway, client registration, utilization control, access limitation, and billing;
- GW: telematics gateway that provides services for communication with telematics devices and is designed to parse all incoming traffic from various devices into the unified message format;
- MDB: our own database for telematics data where we store channel messages, logs, and command results;
- ADMIN: the system responsible for automatic configuration, installation, log collection, and control of services on various servers;
- MBUS: generic bus operating over MQTT protocol allowing internal services to publish and subscribe to various events.
And here is the operation scheme for the GW subsystem, consisting of “tg” and “tgctl” processes. “tgctl” processes are responsible for the REST API method, while “tg” processes deal with real communications with telematics devices.
Each system comprises multiple processes running on different hardware servers, and there is no single point of failure. They are all designed for just one thing – to function the best possible way. I will describe all our systems in detail in the next posts, and for now, I’ll just provide you with a few insights:
- we can install different binary versions for a few processes, test it on the production measuring the performance of multiple versions of the same type of process, and revert to a stable version any moment if anything goes wrong;
- all our processes are single-threaded and event loop-based. Some processes, like “tgctl”, automatically fork themselves into 4 worker processes managed by a master process checking their status, doing stack trace collection and restarting workers in case of problems. We do not use mutexes or any other multi-threaded things. Unlike the situation with Wialon where context switch or mutex lock/unlock are the most CPU consuming operations when we deal with high load.
- each “tgctl” process operates with our own HTTP client/server library, and our HTTP server is faster than Nginx nearly without any optimization. It’s just designed to be very simple and powerful.
- each “tg” process can handle tens of thousands of messages per second on one CPU core all day long;
- protocols are described with our own “pvm” language, which is being compiled into the assembler code (also defined by us) by a proprietary compiler and then executed on the embedded virtual machine. It is specially adapted to describe protocols for telematics devices and performs much faster than most of the scripting languages developed nowadays. And it is capable of describing complex binary protocols in 50-100 lines of “pvm” code (like Teltonika, Ruptela, Wialon Retranslator, etc);
- the configuration of the whole system is fully automatic via the ADMIN subsystem. Each developer has a full set of processes in the home environment and everything is available with one click (actually, via one POST/PUT/DELETE request to the REST API of the configuration service).
Try it yourself and stay connected!