19 January, 2018

Top 6 flespi Screw Ups 2017

Mistakes that we owned and lessons that we learned from our experiments throughout the previous year.

With 2017 behind, it’s time for us to look back and compare the reality with our initial expectations. We will go the unusual way — we won't cover the good things we’ve done, we will focus on what went wrong or not the way we expected. Here are our top six missteps in 2017.

1. Struggling with protocols generalization

PVM technology was the third attempt to generalize and simplify communication with trackers. Mixed with the fact that we had experience of integrating 1000+ different device types from hundreds of manufacturers worldwide in Wialon, the technology we developed should have been very stable. But it was not! 

We did a couple of iterations, put enormous efforts in making things standard — from incoming telemetry data from devices to universal remote device configuration tool. And still a year later in January 2018, we are creating the next generation of this platform by introducing a "device type" object because simple "protocol" type is not enough. We underestimated the complexity of various protocols used by GPS/GSM manufacturers. If we realized this a year or two ago, we would have thought twice about attempting to generalize them.

We expected to integrate 50 most popular protocols with the full set of parameters but ended up with less than 20 with around 70% of incoming data parsed. I hope we are very close to the moment when technology allows us to move forward faster and integrate new protocols and devices easily. We changed focus from the number of protocols to the quality of each integration. The number of devices is huge, but somebody has to do the hard job. “Why not us?” we thought.

2. Misunderstanding user expectations

We measured software developers by ourselves and expected our users to be more “software engineers” than they are. Our product was created by developers and for developers. This was our expectation. But in reality, our users are not always developers. Sometimes they are just telematics service providers with little knowledge of how software can interact with other software. We do have software developers using flespi, but we changed focus from very specific API/REST tools and deep technical articles to GUI-based tools and more marketing and descriptive articles on our blog. We are adapting our materials and targeting both markets in 2018.

3. Keeping the website up-to-date

Initial development of flespi.com website with base product information took us three months. Next few months we updated the website with more and more features — in spring we added protocols page with the listing of integrated devices. In summer we introduced the platform page with registry and gateway modules. In autumn we kept adding small new features. But we never expected that flespi development would be so fast. In the first half of 2017, we developed technology most of the time. In Q4 2017 and Q1 2018 we started to release products based on this technology at an incredible rate. 

A year ago we knew almost nothing about MQTT, and now we are MQTT experts, released own MQTT broker, and switched platform internal communication to using it. We are constantly creating so many features that it is almost impossible to market them in the correct time frame. The best representation of our current state is our blog. However, articles released nine months ago are already outdated, and we need to insert disclaimers telling that now it is a little different and usually much simpler.

It is difficult not only to do something but also to show what we have done at the same rate and in an attractive form. We understand that we cannot follow the standard approach with the website and need to develop the concept capable of reflecting our updates to the public almost in real-time.

4. Upgrading servers OS

In September we decided to upgrade the Debian operating system on our servers to the new version. We waited for a year after its release so it should have been safe. Everything was OK until the last stage when we decided to upgrade our gateways managed via the pacemaker HA tool. It was an epic sysadmin failure that left a few hours of downtime in our history and a deep trail in our system administration experience.

We finished September with 99.70% uptime which is lower than what we guarantee in SLA. Never, never again will we allow any automatic system to influence our traffic!

5. Intentionally overloading the storage system

In December we made another mistake. We knew that our MQTT broker was holding 20GB of messages for a persistent session in the temporary storage accumulated by the stream. We unblocked this stream, and immediately our storage system got overloaded. Reading and deleting buffered messages was going slow due to the enormous size of the buffer. We had nothing to do but sit and wait until the storage system handled this load and finalized all queued operations.

The situation resulted in 20+ minutes of downtime and one more deep cut in our experience with a bunch of new tasks on our to-do list to eliminate a similar situation in the future. But if you asked me what I would do next time — tried to risk and get the answer, or skipped the test — I would pick trying and getting the answer! This is the only way to improve is to know your weaknesses. Otherwise, there’s no motivation for efforts.

6. Buzz word not paying off

We overestimated IoT, especially its marketing part. In reality, IoT is only emerging, and we in the vehicle telematics world sometimes do even more IoT things than IoT itself. We thought — let's go into the IoT segment and push vehicle telematics manufacturers and system providers there. The reality was different — although we are in the middle of both market segments, we are still doing more useful stuff for vehicle telematics than for IoT. Now we are back to our natural habitat but with a thought about IoT in the back of our minds.

***

It was a long year. It was a productive year. We finished it with a few commercial customers, a few more projects in the development stage, a few products, and a host of fresh technologies developed. We think 2018 will bring even more. We already have more things to do than the team can bear. But we are already developing more technologies to handle this increasing workload and keep the quality high.

As a part of the Gurtam-wide Team Recharge project, we are spending a week of February 11-17 in Groningen, the Netherlands where our datacenter and its maintenance team live. We’ll be thinking about the future products and flespi.com website redesign, get rid of the daily routine and concentrate on strategic decisions.

Stay tuned for our ups and downs — we have a lot to say!