MQTT Load Balancing with Shared Subscriptions

Learn how to scale your MQTT data processing with flespi's shared subscriptions

Problem and Solution

As your flespi integration grows and you start processing thousands of device messages per minute, you'll quickly discover that a single MQTT client becomes a bottleneck. 

MQTT shared subscriptions solve this by distributing messages among group members. Prefixing your topic with $share/group_name/ ensures each message goes to exactly one subscriber in the same group. The broker handles all distribution logic, automatically rebalancing and resending messages when clients connect or disconnect.

Implementation

To implement shared subscriptions, you modify your subscription topic by prefixing it with $share/group_name/. For example, instead of subscribing to flespi/message/gw/devices/+, your workers would subscribe to $share/processors/flespi/message/gw/devices/+. Any client that subscribes using the same group name becomes part of the processing pool, and the broker ensures each message is delivered to only one member.

The implementation requires QoS=1 subscriptions to guarantee message delivery and re-route unacknowledged sent messages to another session when some session disconnects. Also we suggest using persistent client sessions by setting clean=false and non-zero session expiry interval in MQTT connection options to prevent messages loss when all subscribers in the group are disconnected. 

When all your workers are connected and healthy, messages distribute evenly among them. If one worker disconnects or crashes, the broker automatically redistributes its share among the remaining workers without any message loss. 

By default messages are redistributed only within connected sessions. To prevent message loss when all shared group subscribers are offline, flespi implements a feature beyond MQTT 5.0 standard. If you are using persistent sessions, when all workers go offline - instead of losing messages, the broker redistributes and stores them in the individual session buffers for each persistent session. As soon as the persistent session reconnects, it begins receiving both new messages and all the messages that accumulated during the outage.

However please note that there is no single "shared group buffer". When all sessions in a shared group are offline, messages are distributed and stored across the individual session buffers of the group members. When any session reconnects, it delivers the messages from its own buffer. This is standard MQTT session behavior - the only difference is the coordination of which messages go to which session buffer.

This message delivery coordination mechanism ensures you never lose messages, even during complete system outages or deployment windows. You can use as many persistent sessions as needed and even keep them disconnected by default during normal operation.

Extra

While persistent sessions provide the most robust solution, there are scenarios where you might prefer a simpler approach. If your processing is completely stateless and you can tolerate potential message loss during disconnections, you can use clean sessions with dynamically generated client IDs. This approach simplifies deployment and scaling since workers don't need to maintain session state, but you sacrifice the safety net of message persistence.

For more sophisticated use cases where you need messages from the same device to always route to the same worker - perhaps for maintaining local caches or session state - flespi offers sticky shared subscriptions. This advanced pattern lets you specify which part of the topic should be used for routing decisions, ensuring consistent message delivery patterns. You can explore this feature in detail in our sticky shared subscriptions article.


See also
MQTT bridge is a very useful tool when you need to separate and partially isolate parts of your system, as well as not be fully dependent on a third-party MQTT broker.
How to achieve MQTT bridging functionality in flespi