Ever since the explosion of online software products and consumer demand in them, companies have been learning from their customers long after the product sales are made. Software products are no longer static entities, but rather dynamic and continuously evolving with their users.
Until recently, this ability to learn from customers “post sale” has been made possible for hardware products with IoT technologies. However unlike software products, hardware in the field presents a host of hard to solve problems that we don’t often get to hear about. These problems only grow in complexity and number with the growing number of devices in the real world. 1 connected product is tough, a fleet of 100,000 connected devices is another challenge altogether.
This year at the first annual PDC, I had the opportunity to learn a lot about issues and solutions inherent with managing fleets of devices in the field from the CTO of Dwelo, Karen Sun. Karen’s experience is easy to see as she discussed the variety of technical challenges and tradeoffs of deploying over 10M connected devices to the Dwelo platform (recently acquired by Level).
Karen Sun and Dwelo
Before diving into this goldmine of IoT fleet management info, here is a bit of background on Karen Sun and Dwelo.
Previous to Dwelo, Karen Sun was an engineering manager at Quizlet, where she built the data science team and designed backend services. Karen has over a decade of experience working with high scale distributed systems and doing interesting things with data, like building book retail recommendation engines and computer vision systems for art recognition.
Dwelo is a smart apartment company that has been around for around 6 years developing technology focused on centralizing the working and living environments in smart apartments. Dwelo gives the ability to remote control devices (locks, lights, heating and cooling, etc) to apartment and building owners at the individual level, along with a centralized management service for all independent software systems used to manage residential buildings.
The minute a person moves into a Dwelo managed apartment, they simply download the Dwelo app and then enjoy the ability to remotely connect and control all of the elements of their home. Lights, doors, gates, blinds, heater, you name it… This modern experience not only gives renters access to the latest in tech for home control, but also saves the building owners time and money in the process.
Apartment and building owners use Dwelo to set schedules for lights, unlock doors, grant access for self tours, manage parking lots, maintain a secure space, and give access to maintenance staff to specific areas at specific times.
To accomplish this, Dwelo has designed an ecosystem of connected and discrete technology systems, integrated with an IoT hub that collects all device signals and relays information from sensors to the Dwelo cloud. In the cloud layer, the platform handles data abstraction, analytics, control, and integrations to third party platforms.
Recently merged with Level, Dwelo is working to vertically integrate all levels of apartment security and control and is focused on designing the future of what connected living spaces will look like.
Getting into it
To build such a large ecosystem of connected devices, fleet management turns out to be one of the largest issues to handle. In Karen’s talk at PDC, she gave me an insight into the complexities of this issue using the lessons she learned from deploying and managing 100,000+ devices.
She layed out many of the major issues of fleet management by going over tradeoffs that she made whilst designing Dwelo.
Tradeoff 1: In House vs Managed
One of the first things to decide while designing your fleet management system is to build from the ground up or utilize a host of growing management solutions. Karen framed her decision in this respect around an issue that has to be dealt with by all fleet owners, which she referred to as Intermittent Connectivity Circumstances, which results in:
- Partial downloads – Latest code pushes do not always make it to all the end devices.
- Corruption – Partial downloads can often result in corruption of end devices and hubs
- Fleet fragmentation – when not all devices are running the same version of code, it is not trivial to manage cohesively or troubleshoot issues.
- Bandwidth consumption – Devices not configured correctly often results in high bandwidth costs and poor performance
On the one hand, it was Dwelo’s goal to own the entire value chain from manufacturing development all the way through deployment and support. On the other hand, there has been an advent of companies offering various services in the space.
Three years ago, Karen made the decision to go with the managed route, taking the considerations in the image above into her decision matrix. In the end, it came down to not having the resources at the right time and not having enough information about the problem at the time to abstract all of the issues that they may have.
Going with Balena helped dwelo learn from other companies in the market and preemptively solve issues. Additionally, Karen mentioned that there were a few things that were appealing about Balena specifically that helped Dwelo make the decision:
- Balena is built on code that is ostensibly open source, allowing them to mitigate the vendor lock-in problem
- They have abstraction over some of the most common failure cases, specifically over the air updates and intermittent connectivity problems.
- A/B partitioning
- They are container based, allowing for efficiency and small update size
- A lot of programmatic and visual controls
All that being said, the biggest drawback is that utilizing Balena adds yet another element in the matrix to consider while running a large and complex operation. Balena is a 3rd party that can go down on its own (albeit rarely). Overall, Karen has been extremely pleased with the technology and the team at Balena.
Tradeoff 2: Firmware Versioning
The second major tradeoff that Karen zoomed in on was the level of flexibility of the firmware within the end devices and hubs. How smart should the gateways be? What are the risks between continuously updated firmware vs keeping it relatively fixed?
Before diving in, there are a number of hard problems to take into account due to working with hardware rather than software. Memory management becomes critical otherwise costs can go up and performance will drop. Depreciation and backwards compatibility needs to be taken into account for different types of hardware in different environments. Bandwidth allocation and telemetry can cause problems as you don’t want to be dealing with too much data.
When weighing the choices, there were three elements Karen took into account:
- Fragmentation is inevitable, there will always be elements that break. Teams need to be trained to deal with all elements of the product.
- Black Swan events should be planned for. What happens when a device is breached from a security standpoint? What happens when an entire service is under pressure?
- When doing fixed deployment, the standards for deployment are much higher. It raises the bar and is good for companies to have this mindset but can be risky if you miss key elements or if things change fast.
At the end, Karen was swayed to move towards a continuous deployment firmware provisioning strategy as they have new types of devices that are being added to the system every month and the security advantages of OTA overshadowed the benefits for Dwelo.
Tradeoff 3: Iot Orchestration vs Edge
The final tradeoff Karen urged fleet owners to consider was the decision to manage the fleet from a centralized or (edge) on premise perspective. Where should your logic live? Is it all centralized, on the edge, or in some middle ground?
Karen compared this environment to a split brain problem, where does the brain actually live?
Dwelo had the most flexibility here, however there are a lot of independent variables to consider when making the decision to have everything pass through a centralized cloud brain, or have some decisions be computed at the edge. For instance:
- What version do devices have? What version should they have?
- Can you write your logic in one place or do you need to distribute it?
- How do you know what state you want your devices in?
- What if you have race conditions?
Karen found that it was not imperative to draw a line in the sand and that there are times when both strategies make logical sense. One of the use cases that best depicts this optimal blended approach is that of device scheduling.
Let’s say you have a thermostat that is meant to set itself to 70 degrees everyday at 5pm. However perhaps the Hub gateway is offline for an hour or get’s unplugged. Without any level of edge computing or logic the thermostat would not operate as intended.
If some of the logic lives on the edge, even if it loses connection to the internet it can operate as intended.
Dwelo’s mantra is to always find a way to implement actions using common sense and make sure to tell the user what exactly is happening.
With Karen at the lead, Dwelo has made some amazing things happen in the smart apartment space and is helping pave the way for other aspiring fleet developers.
I am grateful for the information Karen shared at PDC from the hard lessons she learned growing Dwelo.
Finally, as Dwelo is growing rapidly post merger with Level, Kared asked me to mention that she is hiring! If you are interested in working with Dwelo and the amazing team there, please check out some of their open positions on the site here (link to ioterra jobs board).