A guide to immutable operating systems for Kubernetes

Originally published at The New Stack.

So you are on board with Kubernetes (or thinking about exploring some Kubernetes deployments.) There are lots of good reasons for this, which you are probably well aware of  – Kubernetes takes care of the container management, scheduling workloads onto a cluster, dealing with scaling and redundancy, automating rollouts and rollbacks. Kubernetes is an infrastructure-neutral system, and by using declarative statements describing the state your systems and application should be in, it drives managed elements to the desired state. This results in an easier to manage system that is powerful and extensible. Of course, this “ease of management” has a learning curve, but it’s well worth it to get the benefits of modern container based software development, on infrastructure that delivers scalability and infrastructure portability. (You can read about how these features translate into a business case for Kubernetes elsewhere on our blog.)

While Kubernetes does enable operational scalability and management for containers, it doesn’t directly help you manage the infrastructure that Kubernetes itself depends on. Kubernetes is itself an application (or set of applications), and these applications have to run somewhere. Despite what you may have heard, Kubernetes is not an operating system, but still depends on Linux (or Windows) to be installed on the nodes.  Kubernetes can run on cloud providers like AWS or GCE, or virtualization platforms like VMware, within laptops on tools like Docker, or on bare metal server hardware using tools like Sidero – but all of these still require an operating system to be installed first. (Some, like AWS EKS, remove the need to manage the control plane nodes, but still require you to set up Linux servers for worker nodes.)

Operationally, the focus is on Kubernetes and the workloads it runs – as it should be! – but this leads to an issue commonly seen in Kubernetes deployments. While Kubernetes may be regularly patched and upgraded (although it is often not, and left in a “set it and forget” security risky state), the maintenance, updates, securing and operations of the underlying operating systems is often forgotten or neglected – at least until it’s time for a security audit!!  I’ve frequently heard SREs and systems administrators say that having to manage Linux as well as Kubernetes results in having an extra job.  Kubernetes needs patching, updates, securing, controlling of user access, and so on – just like a generic Linux OS does. But just because those tasks are being done at the Kubernetes level does not mean they can be ignored at the OS level. However, the selection of the right underlying operating system distribution can go a long way to reducing the workload in maintaining the OS, and mitigating the effects of not keeping current. 

So, given that you need to install Linux first to run Kubernetes on, and there will be implications that flow from the underlying OS – which is the best Linux for Kubernetes? There are a variety of options you could select from, but they generally fall into two types: Container optimized OSs, or general purpose OSs.

General Purpose Linux Operating Systems

These are the “normal” kinds of Linux.

Most people will be familiar with running a general purpose Linux operating system, such as Ubuntu, Debian, CentOS, Red Hat Enterprise Linux (RHEL), or Fedora. That is one of the main advantages of running a general purpose OS under your Kubernetes cluster – your systems administrators will be familiar with how to install, update and secure such Linux distributions. Existing toolsets for kickstarting servers, installing the OS, and configuring it to a base level of security can be used. Existing patch management and security detection tools should run fine on these systems, even if running Kubernetes on top of them.

However….

With a general purpose Linux system comes…. general purpose Linux administration overhead. This means that user account management, patch management, kernel updates, firewalling of services, securing of SSH, disabling of root logins, disabling unused daemons, kernel tuning, etc all need to be done and kept up to date.  As noted, many of these tasks can be done with existing tools (Ansible, Chef, Puppet, etc) that may be managing other servers – however, updating the manifests or control files so that the server profiles are appropriate to Kubernetes master and worker nodes is…. non-trivial, shall we say.

Another problem is the coordination of the operating system changes with Kubernetes maintenance. Frequently, there is no coordination, so that the operating system is left as-is after an install. As time goes on, Kubernetes will (hopefully) be upgraded, but the underlying operating system may be left static, slowly accumulating a burden of known CVE’s (common vulnerabilities and exposures) in the various packages and the installed kernel.

Ideally, you want the automation platform (like Ansible or Puppet) to coordinate with Kubernetes, so that the operating system of the nodes can be upgraded without disrupting Kubernetes operations. This means that a system needs to:

  • Cordon the node so no new workloads are scheduled on the node
  • Drain the node so all of the running pods are moved to other nodes
  • Update and patch the node
  • Uncordon the node

And of course the system needs to ensure that not too many nodes are being updated at once, so that the workload capacity of the cluster is not adversely Impacted (nor too few nodes, so that the updating of a large cluster does not occur slower than patches and updates are released). You may want to coordinate OS updates with Kubernetes updates, to minimize reboots and disruption, but you will also need to support more critical OS updates on short notice.

The great advantage of a general purpose Linux OS is the familiarity that staff will have with it. This means that they will be familiar with deployment, but also with troubleshooting techniques. They can use (and install if not already present) their regular operating system tools such as tcpdump, strace, lsof, etc. Configurations can be changed easily to correct errors and to test alternatives (something that is both a blessing and a curse!) The disadvantage is the overhead of systems administration that needs to be kept up on, the greater difficulty and work needed to secure the platforms, and the need to coordinate updates with Kubernetes infrastructure and operations.

Container Specific Operating Systems

The National Institute of Standards and Technology has a nice summary defining a Container Specific OS that summarizes some of the advantages:

“A container-specific host OS is a minimalist OS explicitly designed to only run containers, with all other services and functionality disabled, and with read-only file systems and other hardening practices employed. When using a container-specific host OS, attack surfaces are typically much smaller than they would be with a general-purpose host OS, so there are fewer opportunities to attack and compromise a container-specific host OS. Accordingly, whenever possible, organizations should use container-specific host OSs .” https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-190.pdf

NIST Special Publication 800-190 Application Container Security Guide

To summarize the obvious – the less software and packages an OS is running, the less there is to attack, and the less vulnerabilities will be present. This makes container specific OS’s significantly more secure from the start, even without frequent patching.

Container specific operating systems may also employ other security approaches, such as making the root file system (or ideally all file systems!) read only, mitigating the impact any vulnerability can have. 

Container specific OS’s generally do not run (or support) package managers. This reduces the chance of the installation or update of a package causing a conflict that stops a node or service from functioning. The absence of management tools such as Chef and Puppet also reduces the chance of configuration changes, or incomplete runs, from adversely affecting the operational stability of the system. Instead, a complete OS image with all updates and configurations applied is installed in an alternate boot mechanism, and booted into at the next reboot, with fallback to the prior known working good image. This means that the configuration of the nodes is exactly known at any point, and any version can be reverted to from the version control system in use.

Some Container Specific Operating Systems are closer to general purpose Linux distributions – e.g. PhotonOS from VMware has a small number of packages installed compared to a regular Linux distribution, but still includes a package manager, SSH access, and does not mount file systems as read only. One point that people sometimes get confused by is that “cloud optimized” versions of general purpose Linux systems are still general purpose linux systems. e.g. Ubuntu releases “cloud images”,  which are “customized by Ubuntu engineering to run on public clouds”. However, these are still full blown distributions of Linux, with all the packages – just with an additional cloud-init package so they can be more easily configured to boot without human intervention.

CoreOS was the first commonly adopted container specific OS, and popularized the idea of running all processes in containers for extra security and isolation. CoreOS did away with the package manager, and used rebooting into one of two read-only /usr partitions to ensure updates were atomic and could be rolled-back. CoreOS however has been end-of-lifed by RedHat since its acquisition.

Current Container Specific OSs all adopt the stance of being minimal (very few packages installed in the operating system); locked down (to some degree); run processes in containers (for better security, stability and service isolation), and providing atomic updates (by booting into one bootable partition, and updating the other). Examples of these are:

  • Google’s “Container-Optimized OS“, which supports a read only root fs, but allows SSH and only runs in GCP
  • RancherOS, which runs SSH and does not use readonly file system to protect root.
  • K3os, is also by rancher, but does not run a full vanilla K8s distribution. Management is via Kubectl, but SSH is supported.
  • AWS Bottlerocket is another OS with immutable root fs and SSH support, that is (at least initially) focussed on AWS workloads.

An outlier is Talos OS, the most opinionated of the Container Specific Operating Systems. Like the others, Talos OS is minimal, with no package manager, uses only read-only file systems (excepting /var and /etc/kubernetes, and one or two special files that are writeable but ephemeral (reset on reboot) like /etc/resolv.conf), and integrated with K8s for upgrades via an upgrade controller.  However, Talos OS takes the concept of immutable infrastructure further than other OS’s, by removing all SSH and console access, and making all OS access and management API driven. There are API calls for all the things you’d want to do on a node running Kubernetes – show all the containers, inspect the network set up, etc – but no way to do things you shouldn’t be doing on a node, like unmounting a file system.  Talos also chose to rewrite the Linux Init system entirely to do just one thing – start Kubernetes. No user defined services can be managed (they should all be managed through Kubernetes.) This further improves security exposure (no ssh, no console), reduces maintenance (no users, no patching), and reduces the impact of any CVE (as file systems are immutable and ephemeral.)  You may not agree that giving up SSH access, constraining the actions of SRE’s, and forcing nodes to be fully immutable is desirable – but that was also the argument against immutable containers not too long ago, so it’s worth looking at. Having an API managed OS also lends itself very well to large scale operations and management – if you need to examine the logs for a particular container on one node, one class of nodes, or all nodes, it’s the same API call with different parameters. 

Summary

If you have adopted the cattle-not-pets view of container management – destroying a container and launching a new version when an update or fix is to be deployed – then it makes sense to ensure the same approach is adopted for the infrastructure that supports the containers. It may take a little education to adopt the paradigm that your nodes should be managed similar to containers, being destroyed and reprovisioned for updates instead of patching, but adopting a Container Specific OS helps drive this adoption, reduces administrative overhead, and improves security. Container Specific Operating Systems also help with operational stability – without the ability for a sysadmin or developer to change a config to “just get it working”, the chance of human errors, or misconfigurations that break the next upgrade, are eliminated.

Given that many enterprises are still early in their Kubernetes adoption lifecycle, now is a good time to become familiar with this next generation of Operating Systems. By enmeshing the OS tightly with Kubernetes, it is possible to treat the entire Kubernetes cluster as a computer, reduce the amount of overhead, and foster enhanced security. This lets the focus remain on the workloads and value the compute infrastructure is providing, and is another step towards the API driven datacenter.

Subscribe!

Occasional Updates On Sidero Labs, Kubernetes And More!