Powerflex, a subsidiary of EDF renewables, offers solutions for solar, electricity storage, and smart EV charging to commercial and workspace customers, such as Los Angeles airport. Their products control power management and delivery via real time monitoring and their suite of cloud products, helping drivers, building owners and utilities. They forecast EV load, weather and power status, along with historical load, to coordinate power delivery, which helps building owners to charge more vehicles with less infrastructure and less cost, helps utilities avoid peak usage spikes, and helps drivers save money.
To do this, they deploy an edge architecture that includes smart meters, batteries, micro grid controllers and more electrical system components – and also a management system including cell network & wifi, compute, power, a Zigbee controller, etc.
Powerflex initially launched their product with an MVP which was highly customized at each customer site. As they scaled, they knew they needed to focus on an automated deployment pipeline, and implement an edge OS that offered flexibility, standardization, support and troubleshooting tools.
They selected Kubernetes at the edge as they already used GKE for cloud services, and were familiar with Kubernetes, and knew that it offered the resiliency , automation, security and monitoring that they needed to scale.
In the first iteration of their Kubernetes edge platform, they chose k3OS as the operating system – that worked OK, but presented a variety of roadblocks and hurdles that they had to work around as they scaled. They felt constrained that they were adapting their application to the technology, rather than the technology supporting them in their needs. For example, there was a lack of custom driver support that they needed for the industrial computers they were using – this prevented them from using certain peripherals, but more importantly led to CPUs overheating, and thus constant, expensive truck rolls to replace equipment. Another issue was that their prior OS provided no ability to reliably remotely reboot or change the configuration post-deployment, so this led to more truck rolls to remote locations.
They realised they needed a better architecture for the deployment of Kubernetes at the edge.
After extensive research and evaluation, they chose Talos Linux as the preferred edge OS, in combination with Omni, the Saas for Kubernetes for simple edge remote management.
The reasons for selecting Talos Linux were:
- Talos was small and fast, leaving most resources for the workloads – this is important on industrial computers, which are often constrained in CPU.
- It’s hardened, and has an immutable filesystem, which is ideal when nodes are running in environments that are not your typical secure datacenter.
- Managed via authenticated API, and is simple to manage via a declarative machine configuration file.
- Current with latest stable Kubernetes and Kernel, keeping it secure.
- Consistent across edge, cloud, datacenter and laptop. All environments are the same, so testing and development lifecycle is easier and more meaningful.
- Talos Linux includes Kubespan to encrypt traffic between nodes in disparate locations.
- Talos supports a wider range of hardware – both in terms of driver support, and also in that it can run on smaller compute platforms as well as datacenter servers.
- Talos allowed them to support custom drivers and complex networking that was not possible with k3os.
- Talos’s flexibility, operations, and security allowed them to build generic images that they can patch with customer specifics, in a simple and secure workflow.
Omni added further efficiencies for their SRE team to use Talos Linux for Kubernetes edge deployments:
- Omni makes a single pane of glass for all clusters: it’s really easy to control access to every cluster, troubleshoot and handle operational tasks like upgrades, all from one place – via UI or API.
- Omni allows the team to provision production Kubernetes clusters, with less deep experience required. Adding a new machines is as simple as booting and adding it via a click in the UI.
- All cluster operations – creation, updating, etc – can be completely automated with cluster templates
- Omni makes Kubernetes upgrades really simple and safe
- Omni integrates smart authentication and controls the level of access- no more passing admin level Kubeconfigs. Authentication is via SAML or google or GitHub accounts. Revoking access can be done just by deleting the user in Omni, or the enterprise SAML directory.
- Omni handles access to the control planes – as opposed to having to create cloud load balancers, health checks, and other components that don’t really make sense at the edge, Omni creates and hosts the load balancer endpoints for every cluster. So if an edge site is lost, simply shipping a new box and adding it to the existing cluster is all that is required to be back up and running.
- Omni is firewall friendly, and can manage nodes even with no inbound connectivity
Powerflex has now fully rolled out their Talos Linux/Omni Kubernetes platform into hundreds of sites in the field. This will enable them to efficiently scale to thousands of clusters:
- Omni Is a game changer – enabling seamless deployments, centralized cluster management, and remote support tools
- Prior to Omni, their workflow involved generating a customer specific ISO that had a short life span, due to an embedded API token with a short life span. Techs spent a lot of time prepping machines, which had to be done with same router as was used in the field site. This also required production Vault secrets to create the image, which was a big security risk. The entire process wasn’t scalable.
- With Omni and Talos, there is automated tooling to generate customer specific configuration patches, which are automatically applied. Technicians can now grab a machine off the shelf, boot Talos, create a cluster in Omni , and the customer specific patches are applied automatically. Easy.
- Omni has effectively removed the need for any truck rolls for operating system related reasons.
- Because Talos Linux can run on smaller (read: cheaper!) nodes than k3os, it is now cost-feasible to deploy multi-node clusters at locations where they had only been running single-node clusters – so this will further improve reliability.