Creating a product is like raising a child — you want all the best for them. And when you have a demanding project ahead, you want to perform exceptionally well.
When the flespi team released own MQTT broker at the end of 2017, another part of Gurtam — Development Center — responsible for the flagship Wialon Hosting platform — started using it in the ongoing product developments.
Wialon Hosting located in the Wialon Data Center has over 800 000 connected GPS devices sending telemetry information to servers a few times per minute. This results in up to 20 000 new messages per second with the size of each message in JSON representation of about 300 bytes.
The idea was to publish all these messages to the MQTT broker and consume them from various locations and for various needs.
Testing the MQTT broker
First, we checked the broker itself to see if it’s capable of continuously devouring such data flow and find the best C++ client implementation providing the highest throughput.
It took us a month to test it in various conditions at 4-10 times higher load than the actual numbers. So far our MQTT broker can process 200 000 of 300-byte messages per second with no visible impact on the CPU, I/O or memory on the servers. Of course, we optimized it for such kind of load.
Next, we needed to feed the entire data flow to the broker. 20 000 incoming messages per second. JSON payload. 300 bytes each. We wanted to devise some business logic for messages processing, e.g., detect messages with LBS information and maintain up-to-date LBS base stations map. Or store messages in backup storage. We can’t specify what we want in an MQTT message topic so that the broker can take care of the load and deliver only the messages we want.
Initially, we just took Python with a standard paho library... and it sucked. We exhausted the CPU of the modern Xeon-based server but couldn’t process all traffic.
Benchmarking the MQTT + JSON stacks
This small failure urged us to benchmark popular implementations of MQTT+JSON stacks and our implementation of MQTT. We benchmarked with the task in question — from all messages extract only those that have LBS information and publish its back to MQTT broker with a different topic.
Each message delivered to the broker had a standard Wialon message JSON format like this:
{"id":160714,"msg":{"t":1516779384,"f":1073741827,"tp":"ud",
"pos":{"y":59.9377466,"x":30.450435,"z":3.2,"s":21,"c":240,"sc":10},"i":0,"lc":0,
"p":{"hdop":0.8,"io_caused":7,"gsm_signal":17,"fuel_lvl":0,"current_profile":1,
"pcb_temp":31,"movement_sens":0,"battery":4033,"power":25458,"battery_current":1,
"can_rpm":0,"adc1":0,"adc2":0,"can_fuel_used":0,"can_distance":0,
"odometer":400507371,"gsm_operator":25001}}}
Our job was to detect messages with the ‘mnc,’ ‘mcc,’ ‘lac,’ and ‘cell_id’ values in the ‘p’ parameter and deliver them to another topic via MQTT.
We tested the Python 3 code with the next MQTT clients: aiomqtt, hbmqtt, paho and gmqtt with few events loops and also tried an alternative python implementation called pypy.
We could stop there with just Python implementation and libraries benchmark, but we were curious about the wider picture. So we added the flespi implementation of the MQTT client (kibo-c) written on pure C to the candidate list. We also created a similar Lua script implementing the business logic (kibo-lua), but MQTT and JSON base on our C library. We also tried Go language with paho library — just for fun.
All source codes are available for download, see the links in the table below. You just need to reproduce the setup with own message publisher and replace ‘XXXXX’ with a correct flespi token.
Side note: We discovered that our commercial plan traffic limitations are OK to handle this massive load, but with the free plan, you might hit the limits at 10 000 messages/second.
Testing environment: We ran the benchmarking at 1 core of Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz, Turbo Boost — 3GHz with 132GB of RAM.
We benchmarked two cases:
entire workflow: receive MQTT => parse JSON => detect LBS => publish back to MQTT;
limited workflow: to test pure MQTT library and not depend on the JSON parser implementation — just receive MQTT messages.
As a result, we also presented a so-called CPU score for each scenario that reflects how many messages the CPU can handle. Bigger values mean better implementation.
Benchmarking results
Here are the scores we've got for CPU and RAM usage across all tested MQTT + JSON implementations.
Entire workflow (Subscribe -> JSON Parse -> Publish)
Language | CPU Load, % | Msgs/sec | RAM, MB | CPU Scores |
python3-aiomqtt | 119% | 4550 | 20 | 382 |
python3-hbmqtt | 100% | 4475 | 26 | 448 |
python3-hbmqtt(uvloop) | 100% | 5300 | 28 | 530 |
python3-aiomqtt(uvloop) | 119% | 6350 | 21 | 534 |
pypy3-hbmqtt | 100% | 8500 | 132 | 850 |
pypy3-aiomqtt | 101% | 16150 | 121 | 1599 |
kibo-lua | 100% | 16000 | 6,5 | 1600 |
python3-paho | 100% | 17000 | 16 | 1700 |
python3-gmqtt | 96% | 24448 | 18 | 2547 |
python3-gmqtt(uvloop) | 98% | 25268 | 21 | 2578 |
pypy3-paho | 68% | 28800 | 109 | 4235 |
pypy3-gmqtt | 58% | 25397 | 125 | 4379 |
nodejs-mqtt.js | 47% | 26700 | 100 | 5681 |
golang-paho | 44% | 26200 | 10,9 | 5955 |
kibo-c | 39% | 25600 | 4,4 | 6564 |
Limited workflow (Subscribe only)
Language | CPU Load, % | Msgs/sec | CPU Scores |
python3-hbmqtt | 100% | 5 925 | 593 |
python3-hbmqtt(uvloop) | 100% | 7 450 | 745 |
python3-aiomqtt | 126% | 11 555 | 917 |
pypy3-hbmqtt | 100% | 10 000 | 1000 |
python3-aiomqtt(uvloop) | 134% | 17 635 | 1316 |
python3-paho | 91% | 26 445 | 2906 |
python3-gmqtt | 60% | 24 417 | 4070 |
python3-gmqtt(uvloop) | 61% | 24 884 | 4079 |
pypy3-aiomqtt | 57% | 24 009 | 4212 |
pypy3-gmqtt | 37% | 25 303 | 6839 |
pypy3-paho | 36% | 26 443 | 7345 |
golang-paho | 30% | 24 017 | 8006 |
nodejs-mqtt.js | 24% | 24 114 | 10048 |
kibo-lua | 18% | 26 033 | 14463 |
kibo-c | 5,6% | 24 152 | 43129 |
Benchmarking takeaways:
Implementation of JSON manipulation is at least as important as implementation of the MQTT library. Sometimes it even higher. NodeJS has one of the best JSON implementations, that’s why it is quite fast.
Python implementations of the MQTT client in hugely vary in performance. Here’s the list from the fastest to slowest: paho, gmqtt, aiomqtt, hbmqtt.
Pypy helps a lot. It allows Python to run twice faster.
uvloop is not as fast as it declares, but it is indeed fast especially on high I/O load.
Pure C is way ahead of competitors. As expected.
Lua scripting language is quite fast. It outperforms Python and NodeJS in the pure MQTT test.
***
When we were running this benchmark, we introduced shared subscriptions feature to our MQTT broker allowing to balance the load between multiple processes. It means that we do not need to fit into 100% of a single CPU core. Now it is the matter of money we want to spend on hardware. From DevOps perspective, it is better to select an implementation that is easier to manage, modify and supply with libraries to cover all possible tasks for a business logic application.