Security considerations for OTA software updates for IoT gateway devices

Smart devices are increasingly used in a wide variety of applications, from smart home systems to industrial controls. The Internet of Things market is expected to grow to over $1.5 trillion by 2025. More and more devices are including network-accessible functionality, but sadly many of them do not take appropriate security measures. Malicious actors can infiltrate poorly secured devices and collect them into what is called a “botnet.” The effects of these botnets can be dramatic, sometimes resulting in outages on a global scale. Ransomware attacks are becoming commonplace. Even connected devices that seem to be harmless, such as fish tanks, can be used as vectors for costly attacks.

In order to respond to threats, designers must provide the ability for remotely deployed systems to receive software updates, which if implemented poorly, can provide another vector for attacks. In this post, we will discuss some security and privacy considerations for connected devices and dig into specifics related specifically to the over-the-air (OTA) software update framework in an enterprise setting. We start by reviewing some general security principles that should be a part of any system design. We will then examine considerations for the update framework itself, including design decisions that should be made early to avoid retrofitting the system layer on. We will then detail the security needs of the target device operating system. Finally, we will present some thoughts on how to securely integrate an update system into an application development workflow.

Security is a process and a mindset. There is no magic switch we can toggle to make a system secure. It is important to stay vigilant, reviewing existing security flaws, and adapting to your workflow to account for them. New classes of attacks appear seemingly every day and engineering teams must prepare for this in order to remain secure. The white hats have to get it right every time while the black hats only need to get it right once. You need to identify what resources are worthy of being protected. A database of weather readings is unlikely to contain proprietary information whereas a customer database most certainly is. You will want to tailor the security to match the severity of a breach. The objective of most security devices is to increase the cost of an attack or reduce the value of any successful breaches.

It is important to realize that the OTA update system is generally only concerned with potential attacks and vulnerabilities to the update process itself. It does not provide any protection against attacks that happened outside of the update change. For these kinds of attacks, you need to rely on other components provided by your operating system.

One extremely important general security consideration is the principle of least privilege. This states that components of your system should have the minimum access necessary to achieve their required functions. This helps to limit damage that can be done when a particular component is breached.

Other things to consider when designing your system, but not directly related to OTA updates are:

Access control systems (such as SELinux or SMACK)
Secure boot
Read-only storage
Security hardware

Besides the general security principles that any networked device should follow, OTA update frameworks have some particular concerns. These concerns affect three parts of your update process:

server-side, where updates are stored and deployed
client side, the device receiving the update and the software running on the device that implements OTA updates
tooling, the certificates, keys, and other tools that ensure the process is secure.

One of the first decisions when designing the server side of your update framework is where you will host the system. It is possible to host it on servers that you own but you need to consider the physical security of them. Who should have access? How is that controlled, and more importantly revoked? What other security is in place? Are there appropriate security cameras and other measures? We’ve all seen movies or TV shows where characters put on a uniform and carry a clipboard (or more likely these days, a tablet) and simply walk by the front security desk in a building. Getting physical access to these systems can easily bypass many of the other security mechanisms we have in place.

Due to the complexity of managing your own physical servers, more and more designs are using cloud providers to outsource those issues. Vendors such as Amazon, Google and Microsoft provide robust systems with complete remote management that should be considered for your server setup unless you already have expertise and physical infrastructure in place. Building your server components using container systems such as Docker give you great flexibility in the choice of cloud vendor and allow you the possibility to migrate between them should it be needed. The biggest advantage of using these providers is that they handle all the physical security needs allowing you to focus on your applications. You are normally still responsible for the digital security of your cloud computing nodes, but the providers generally provide sane defaults, which can speed your development significantly. Special care should be taken with the assets that are stored in the cloud. There have been many data leaks due to improper permissions on cloud-stored data. These issues are not problems with cloud storage but rather with specific poor implementation choices.

Any update framework will require communication between the client devices and the server. To ensure privacy of data, this will be handled over an encrypted connection. Protocols such as HTTPS and MQTT are often used here and each have their own specific requirements. Both HTTPS and MQTT connections can be encrypted with TLS. It is important to use TLS keys properly signed and attested to by a known certificate authority (CA). This allows client devices to verify the identity of the server without needing specific verification keys in the device. This also means that the server keys can be updated without modification to the client as the certificate authorities will be aware of the changes.

User management in the server is another area of concern that should be addressed early in your design. Login credentials stored by your server should be properly salted and encrypted. This provides good protection even when users reuse passwords across sites. Assign roles to individual users (ie Role Based Access Control or RBAC) instead of individual permissions to limit access to resources in a predictable and auditable way. But your system should be able to revoke specific permissions or remove a users access entirely to minimize the potential damage that can occur from large device fleets.

Update frameworks will need to accept unsolicited connections from devices deployed to the field. It is common for devices to be packaged and on a shelf for a long time before they are powered on and connected for the first time. Identifying valid devices and more importantly, ignoring invalid connection attempts are two of the most important features of the server in an update framework. Each device should have a unique identifier, such as a serial number, that should be paired with a cryptographically secure key. Industry best practices for key lengths and algorithm selection will help protect against brute force attacks. The development staff must also monitor security reports for new weaknesses that may be discovered in particular algorithms so that they can adjust accordingly.

All non-trivial software has bugs and your update framework will require continual updates through the life of your project. With a well maintained package, updates will be planned for and easy to apply. This also allows you to add functionality which may provide greater value for your use case. Your update framework will probably have dependencies on other packages, such as libraries provided by third parties. The update framework engineering team must monitor the development of any dependencies and ensure that bug fixes for these components are quickly incorporated into the product.

Finally, consider what logging and reporting is available from your update framework. This is crucial for security issues, as well as for the smooth operation of your system. How easy is it to get the reports you need without getting overwhelmed by large amounts of extraneous information? Can you get automatic reporting? Are there priorities and escalations for security issues and other operational concerns?

When deciding on a logging architecture, take special care that information contained in the logs does not cause further problems. Logs may contain sensitive information and should be encrypted if kept on the device or completely removed after transmitted off of the device.

As discussed above, the communication channel between devices and the server should be encrypted. Industry standard protocols such as TLS should be used along with CA-signed certificates used on your server. This ensures that the device can validate that it is communicating with the expected server and removes the threat of man-in-the-middle attacks. A further level of security can be attained using a mutual-TLS system. This allows the server to verify the devices using a similar mechanism.

Given that client devices are often deployed into hostile networking environments with little control of physical security available to device vendors, using hardware-based security keys should be strongly considered. These devices provide stronger security for the keys than simply storing them as files in the target filesystem. If bad actors can physically access the devices, then the filesystems are generally easily accessible. Using hardware keys provides another level of security to help ensure that the keys are not compromised.

Update payloads should use cryptographic signatures. The strongest approach for this is to use public-key cryptography which will allow you to not only verify that the image being downloaded is unmodified, but also that it is authentic. Client software can validate the payload independently of the communication channel verification, which decouples your payload checks from the infrastructure. Effectively, this ensures that, even if your server is breached, allowing counterfeit payloads to be downloaded, they will never be installed as the signature verification will fail. Industry standard public/private key mechanisms should be supported.

As for security implications of the update process itself, it is vital that the system support automatic rollbacks. This ensures that issues that occur during the upgrade process itself, such as loss of power or loss of connectivity, will not result in a bricked device fleet.After every deployment, the update framework must check that it is still able to communicate with the server to ensure that future updates can be installed. The framework should also provide a mechanism to allow system designers to create their own post-installation sanity checks, focusing on their unique application and tailored to check the features that are important to verify before considering an update completed.

Any system that uses digital certificates and cryptographic keys needs a mechanism to replace these files over time. Whether done periodically as a precaution or as a result of a specific breach, it is best practice for your update framework to support this. Usually this will require a two-step update. The first update will use the old keys, but add in the new keys and re-authenticate with the framework using them. The second update will remove the old keys.

Depending on the security needs of your application and particularly the risk tolerance your team has for protecting your private keys, you may want to consider using air-gapped systems for applying digital signatures to your system images. These are systems that are not connected to any networks but that have the tools necessary to sign the images. The private keys never need to leave this system so the risk of leaking them is very low. This does require a mechanism such as external hard drives to transport the images to and from the signing system so the risk is not completely eliminated but it is greatly reduced.

To successfully deploy a design with an automatic OTA update framework, there are several things you need to consider. The level of formality in your processes and design will depend on a number of factors including the size of your device fleet, the cost to rescue bricked devices, and the target audience. If your users are more of the hands-on variety (i.e. makers) then having a more manual system is appropriate. However, if you plan to roll out a fleet of thousands or tens of thousands of devices you will want as much automation as possible; relying on the end-users to run convoluted firmware update steps is a recipe for out of date devices.

Modern software development relies heavily on build automation tools. Builds are run automatically on every commit or at the least, on a scheduled basis. The systems that manage these builds, known as continuous integration/continuous delivery (CI/CD) systems, have tools to help automate the deployment of new builds based on the phase of development and can be readily tailored to your development workflow. For example, your nightly “development” builds can be automatically uploaded to the OTA server and deployed to a targeted group of devices. This ensures that your QA team is ready to go each morning with the latest images for testing. Similarly, your “production” builds can be uploaded and staged for automatic deployment to your entire device fleet, easily keeping your devices in the field updated.

When designing the software for a connected embedded device (such as IoT devices), you generally have a golden-master image containing the bits of software that are common across all devices. This image is likely programmed into the device in manufacturing, but can also be done as a post-manufacturing step if a manufacturing-specific image is used for burn-in diagnostics. On top of the golden-master bits, your system will likely need some unique details for each board. This can consist of security certificates, serial numbers, or any other data that will vary from one device to another. Many mechanisms can be used to populate the unique data to your devices but you need to plan for this. This data is generally needed elsewhere in your system. For instance, device identification certificates may need to be loaded into your server software so that your server will recognize and automatically admit devices when they connect for the first time. Proper handling of this device-specific data is critical to maintaining the security of your system as a whole.

Using a properly designed OTA update framework should not introduce significant security issues into your design. The security considerations discussed above should minimize the impact of adding this functionality. However, note that adding software always increases your attack surface so it is crucial that you monitor the development of the update framework to ensure that you are including changes to that system as well as your own software. All software has bugs, but if you include a well maintained OTA-capable update framework in your design, then you will have the capability to respond to issues without having to resort to costly recalls or manual updates by end users.

We have discussed many considerations for the OTA software update framework and general system security. We hope that it is clear that security is an ongoing concern, requiring vigilance and the ability to adapt to ever-changing threats. You cannot be completely secure, and as such, consider how you will respond when breached. Customers reward transparency and punish secrecy, so reveal as much as you can about a breach without reducing your ability to respond to future threats.

Updating software is generally the first step in addressing security concerns, and the update system itself needs a security mindset. Using industry best practices, such as public/private key encryption, are a minimum requirement. Whether rolling your own or using a third party update framework, make sure to use well-maintained software that is updated regularly to address newly found issues and threats. If the open-source system you are using has no commits for several years, we strongly recommend you continue looking for something with an active development community.

As a final note, please do not delay discussions of security until late in your design phase. This needs to be a part of your thinking from the very beginning of your project. Not all members of your team will be security experts but it is important to have a security mindset in your organization. The costs to your organization can be severe if your product is breached, especially if the public considers you to be negligent.

Drew Moseley is the Chief Technical Architect at Mender.io. If you want to learn more about the best approaches to over-the-air software updates for IoT gateway devices, visit Mender.io and try the OTA software updater for free

Security considerations for OTA software updates for IoT gateway devices

General security principles

Security considerations for an update framework

Server-side

Client-side

Tooling

Integrating an OTA software update framework into your product

Wrapping up

Add to the discussion