3 February, 2020

January 2020 change log

Major flespi improvements in January 2020

This is a very strange winter. For the first time ever Belarus does not have any snow on the ground and our lakes are not covered with ice. The temperature in January 2020 is five degrees above average compared to the previous years and this is scary. The whole winter entertainment industry (e.g. production of sleds) has been paused waiting for the next year's weather and this also influences seasonal business a lot. At the same time, the summer in the opposite hemisphere is killing people and nature in Australia with fires never seen before. The world's weather is changing. And I do not think it is going in the right direction.

From the flespi perspective, we try to help here a little bit. Our intentions are to provide high performance, efficient data communication and transformation engine between the GPS trackers, IoT devices, smartphones, asset sensors, and business-specific software, taking all the complexity and all CPU/RAM/IOps-intensive calculations as a part of our service. The contribution to nature is due to cloud-based resources shared between multiple users and of course because of platform implementation in pure C language which is 10-1000 times faster in most operations compared to most popular today .NET, Java or Python frameworks.

january 2020 in belarus

Compared to the global weather changes, flespi hasn't changed much in January. The monthly uptime made up 99.97% and we had only one, but quite a major fault that took 801 seconds to resolve. We implemented multiple regions feature into our services, but made a configuration mistake. A couple of months ago we also implemented the feature that allowed our central configuration service to read itself via HTTP for the configuration, so like other services it updates its configuration via HTTP from this central configuration service once a minute. During the implementation of this feature, the fallback routine in case of problems with newly read configuration was implemented with the bug and actually didn’t work.

With this configuration mistake, the newer version of services was able to read this new configuration and older wasn’t. Initially, all our services were in the old version i.e. unable to read this configuration. After the new configuration has been committed the main configuration service was unable to parse it and fallback routine due to the bug wasn’t helpful. Main configuration service delivered to other services incorrect (empty) configuration address and all services one-by-one disconnected from it and stopped to pull the configuration. Once services were disconnected the routing system which maps external IP addresses to internal also stopped to operate. Actually even our bots serviced by the same configuration system lost their connections and only one of them was able to provide us a last-minute SOS notification into our SRE group. Only one, but it dragged our attention. In a few minutes, we diagnosed the problem and started performing a manual restart of services in order for them to read the correct address of configuration service and resume normal operation. We did this in four hands loudly shouting in the room — who takes which server, who publishes a manual NOC message — and after several minutes of intensive keyboard strokes, the full set of primary services was up and running. 

After this incident, we made at least four fixes to prevent similar cases in the future and gained a lot of experience. Our apologies — we were really very close to 100% monthly uptime.

  • Our primary focus for the entire January was the launch of the new flespi region in Russia with the local independent datacenter. We planned to finish it in January but due to a variety of reasons we had to shift this deadline and now plan to finalize it in February and even open this region for new flespi user registrations.
  • We have published a nice and interesting example of adopting MQTT Tiles dashboard to display analytical information and we will use it to present the case of trash containers monitoring with BLE sensors — we are implementing this use case locally and will describe you the whole set of hardware and software tools you need for this soon.
  • And similar to MQTT Tiles, but not for free of course — the visualization of calculated analytical information in Tableau. This is a ready-to-use (sell) solution that looks very professional. It is easy to configure flespi analytics to calculate any telemetry — geofences control, trips, engine hours, data from CAN bus, drivers binding with daily and monthly averages, this is quite cheap and you can have professional GUI where you can visualize this information for your users. Sounds interesting, doesn’t it?
  • Into the new pvmII engine we have added remote settings management and are now working on the IDE for it with the suite for automatic protocols compilation, testing, and other tools our protocol developers need in their daily work. We are 1-2 months distance from the OTA firmware upgrade feature for some protocols and settings management over UDP which is important for users of some popular protocols like calamp.

In February we plan to finally deliver a new flespi region in Russia and resume delivering the new features and protocols for the platform. Our primary focus will be analytics, some HASD extensions and, of course, a lot of protocols development for the telematics hub that we do on a permanent basis.