6 January, 2022

The story of the five nines

What it takes to deliver a high availability telematics backend service at a reasonable price.

As we move into the year 2022, we can now summarize our stats for 2021. One of the key metrics we care about and consider a reflection of the quality of our service is uptime. 

We are a backend platform. If we fail, you fail as well. We clearly realize this direct cause-and-effect relationship. This is why we do so many behind-the-scenes works that no one notices. And since we do these works proactively, we are not praised for them (when you prevent a catastrophe, you never get praised;)). The best praise for us is that you trust us your businesses. And we have to deliver.

In 2021 we delivered the five-nines uptime in both datacenters:

  • 99.99903% in the EU region and

  • 99.99934% in the RU region

It’s 303 and 260 seconds of downtime respectively. This is an impressive achievement for our team since we never stopped our platform development and improvement processes. In fact, in 2021 we committed 250+ platform updates that makes up around 5 updates each week. Flespi is stable but by no means static.

How do we ensure High Availability?

It’s all about architecture. Everything else is layered on top of it. From the very beginning, flespi was designed as a high-load system capable of serving millions of simultaneous connections. It has redundancy on hardware, network, and software levels and relies on a thoroughly thought-out core of microservices.

It’s worth reminding that cloud platforms do not exist in a vacuum — they communicate with the outer world via network uplink providers. And one of the primary causes of downtime is a network failure. This emphasizes the importance of partnering with reliable and trustworthy network providers. We are lucky to have such partners in both regions — Zylon in the EU region and RetN in the RU region. It’s their 24/7 responsiveness, preventive maintenance, and bulletproof hardware and routing services that make network hiccups so negligible.

As for the platform health monitoring, it has been almost five years since we wrote an article describing the process of downtime detection in detail. It hasn’t changed much — it has been evolving and improving gradually to address any imperfections and adjust to the growing scale of the platform infrastructure.

To put it short, the availability of our servers is checked from multiple locations every minute:

flespi platform health monitoring system

Our complete uptime history since 2017 is openly available on the Status page.

Understanding the value of one nine

If you rely on a dedicated server infrastructure, adding one more “nine” to the current uptime level may increase expenses on infrastructure three to five times. This not only includes the increased cost of implementing load balancing, traffic management, and other failover mechanisms but also the increased cost of maintenance of such complex infrastructure that requires more experienced technicians. 

Keep in mind that if your infrastructure includes several systems operating as a chain, the total uptime equals the product of the uptimes of individual systems which will always be less than the lowest individual uptime.

Such a noticeable increase in expenses may pose the following dilemma — will this added “nine” help you earn more money than you have to pay for adding it? In essence, will additional earnings associated with the higher uptime level cover the expense on a more reliable infrastructure? 

With flespi, this dilemma is beside the point since you get access to a High Availability backend platform at a fraction of the cost of a dedicated infrastructure. In addition, flespi is a one-stop system developed and maintained by a single team providing a range of telematics and IoT services. This leads to fewer inter-system transitions, lower data transfer overhead, smoother and more organic communication between microservices, and as a result — to more solid operation (which is exactly what you need, right?). 

***

Being dependant is hard. And yes, we consciously wanted to become an indispensable part of your business solution. We wanted to hook you. But we never wanted you to blindly trust us. We have to deserve your trust by delivering high availability service every month year after year. Our achievements are your confidence in the stable operation of your business.

Thanks for being with us. We’ll keep delivering!