Is Vanilla Kubernetes Really Too Heavy For The Raspberry Pi?

Itās been bothering me for over a week now. For no good reason, I had a shower thought a couple of weekends back about Kubernetes on the Raspberry Pi. Everywhere I turn in the community, it seems like folks are saying how inefficient Kubernetes is on the Pi and that using something like K3s is the only answer to this problem.
It just doesnāt sit well. As someone focused on building ātheĀ best damned OS for Kubernetesā, Iāve seen some big performance gains just by stripping things down at the OS level. And in our community weāve seen people using Talos OS and vanillaĀ Kubernetes on the PiĀ to great success.
So Iāve been stuck with a whole host of questions:
- How much more performant is K3s? Especially on these much smaller machines.
- Can the performance gain be mitigated by deploying Kubernetes differently? By doing something like choosing āthe best damned OS for Kubernetesā as a base?
- If it really is as good as everyone is saying, are there lessons to be learned for us at the OS level, as well as in the larger Kubernetes community?
IĀ took to TwitterĀ to see if I could find more data. Thatās where all great tech knowledge lives, right? In all seriousness, there was some awesome discussion in the thread, but no hard numbers.Ā Sigh. So I set out to fetch these myself.
The Tests
Things Tested
Iāve tried to keep the testing super simple here. I wanted to focus on the usage of K3s and vanilla Kubernetes in their normal habitats on an 8GB Raspberry Pi 4. As such, I deployed the latest ARM64 build of Raspberry Pi OS for K3s, and the latest beta release of Talos OS for vanilla Kubernetes.
Interestingly enough, I found a bug in our kernel config that didnāt enable CPU scaling for Raspberry Pis. As such, these Talos OS tests used the latest v0.10 beta that includes the fix. Easy performance gains FTW!
Because Talos OS doesnāt have SSH and the normal tools used for gathering info like this, I ran all tests in a privileged container in both Talos OS and Raspberry Pi OS in an effort to standardize a bit. Hereās the manifest used for this debug container:
apiVersion: v1
kind: Pod
metadata:
name: workspace
spec:
tolerations:
- operator: Exists
hostPID: true
hostIPC: true
hostNetwork: true
containers:
- name: workspace
image: ubuntu:latest
command: ["/bin/sh", "-c", "--"]
args: ["trap : TERM INT; (while true; do sleep 1000; done) & wait"]
securityContext:
privileged: trueThe tests themselves I wanted to keep āreal worldā. What I mean by that is I wanted to try and get a feel for how these different setups perform in practice, so I tested the following things. Keep in mind that thereās no ābefore K8sā in running Talos OS, so itās impossible to get the idle memory usage mentioned below in the āBefore deploying K3sā section.
Before deploying K3s:
- Idle memory usage withĀ
free -m
After deploying K3s/K8s and from within test container:
- Idle memory usage withĀ
free -m - Idle CPU usage every 30s for 5m withĀ
sar 30 10 - CPU performance under 100% utilization for 60s withĀ
sysbench cpu run --threads=4 --time=60 --cpu-max-prime=20000 - Memory performance withĀ
sysbench memory run --memory-total-size=100G --memory-oper=write --memory-access-mode=rnd --time=0
Things Not Tested
Hereās some things that I selfishly didnāt care about for these tests:
Resource Consumption During Bootstrapping
This is something that lots of folks seem to keep mentioning about K3s vs. something like Kubeadm, but it doesnāt apply to us at Sidero and we donāt have the same timeout problems that tend to bite during the initial creation process. It also just generally seems unimportant to me. Iām far more concerned with the cluster being stable after creation than how long it takes to get there, assuming itās not some absurd amount of time.
Other Vanilla K8s Spins
Similar to above, I didnāt mess with any vanilla K8s experience that wasnāt Talos OS. Why would I? I told you already that itās āthe best damned OS for Kubernetesā. My gut tells me that using something else would result in a heavier and less performant experience for vanilla K8s, which is something we should prove out in the future!
Even Smaller Hardware
It feels to me like a 4GB or 8GB Raspberry Pi 4 is a pretty approachable piece of hardware for most folks doing the edgy thing. As such, I didnāt bother with seeing how small I could get by trying to run on a 1GB Pi or something of that nature.
Even Bigger Hardware
Talos OS runs anywhere. So does K3s it seems. It may be worth spending some time with these tests in the cloud or on bare metal sometime in the future, but itās not in scope for my current gripes.
Disk Performance
I did all of this testing on the same 16GB SD card for each OS. Testing disk performance on this card would probably be kind of useless. If I want to do that in the future, Iāll probably spend some time with the USB 3.0 -> SSD connections you can now do with the Pi.
Raspberry Pi OS With K3s
A quick note on this section. I initially carried out these tests and found that the memory usage for k3s was actually slightly higher than Talos OS. After testing and showing the results to some other Sidero folks, it was pointed out that swap is recommended to be disabled, which I had forgotten to do and is not present in the K3s docs from what I saw. I disabled swap and reran the tests, which are the results seen below.
Remember to do disable swap if you deploy K3s!
Idle Memory (OS Only and Before Disabling Swap)
pi@raspberrypi:~ $ free -m
total used free shared buff/cache available
Mem: 7813 73 7377 16 362 7611
Swap: 99 0 99Idle Memory (After Bootstrapping and Disabling Swap)
root@raspberrypi:/# free -m
total used free shared buff/cache available
Mem: 7813 649 6725 9 437 7101
Swap: 0 0 0Idle CPU Usage
root@raspberrypi:/# sar 30 10
Linux 5.10.17-v8+ (raspberrypi) 04/19/21 _aarch64_ (4 CPU)
13:11:23 CPU %user %nice %system %iowait %steal %idle
13:11:53 all 3.26 0.01 1.81 0.03 0.00 94.89
13:12:23 all 3.87 0.00 1.81 0.14 0.00 94.18
13:12:53 all 3.14 0.00 1.80 0.01 0.00 95.06
13:13:23 all 3.09 0.01 1.75 0.02 0.00 95.13
13:13:53 all 3.61 0.00 1.76 0.01 0.00 94.62
13:14:23 all 3.17 0.00 1.81 0.02 0.00 95.00
13:14:53 all 3.10 0.01 1.79 0.02 0.00 95.09
13:15:23 all 3.18 0.01 1.84 0.09 0.00 94.88
13:15:53 all 3.66 0.01 1.72 0.02 0.00 94.60
13:16:23 all 3.35 0.00 1.89 0.01 0.00 94.75
Average: all 3.34 0.00 1.80 0.04 0.00 94.82CPU Performance
root@raspberrypi:/# sysbench cpu run --threads=4 --time=60 --cpu-max-prime=20000
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 4
Initializing random number generator from current time
Prime numbers limit: 20000
Initializing worker threads...
Threads started!
CPU speed:
events per second: 2228.24
General statistics:
total time: 60.0014s
total number of events: 133705
Latency (ms):
min: 1.70
avg: 1.79
max: 26.17
95th percentile: 1.89
sum: 239920.49
Threads fairness:
events (avg/stddev): 33426.2500/388.55
execution time (avg/stddev): 59.9801/0.00Memory Performance
root@raspberrypi:/# sysbench memory run --memory-total-size=100G --memory-oper=write --memory-access-mode=rnd --time=0
WARNING: Both event and time limits are disabled, running an endless test
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Running memory speed test with the following options:
block size: 1KiB
total size: 102400MiB
operation: write
scope: global
Initializing worker threads..
Threads started!
Total operations: 104857600 (279476.74 per second)
102400.00 MiB transferred (272.93 MiB/sec)
General statistics:
total time: 375.1879s
total number of events: 104857600
Latency (ms):
min: 0.00
avg: 0.00
max: 3.84
95th percentile: 0.00
sum: 349586.81
Threads fairness:
events (avg/stddev): 104857600.0000/0.00
execution time (avg/stddev): 349.5868/0.00
Talos OS With Vanilla K8s
Idle Memory (After Bootstrapping)
root@talos-pi:/# free -m
total used free shared buff/cache available
Mem: 7831 656 5285 40 1889 7032
Swap: 0 0 0Idle CPU Usage
root@talos-pi:/# sar 30 10
Linux 5.10.29-talos (talos-pi) 04/16/21 _aarch64_ (4 CPU)
15:35:56 CPU %user %nice %system %iowait %steal %idle
15:36:26 all 6.16 0.01 3.41 0.14 0.00 90.29
15:36:56 all 6.11 0.03 3.36 0.16 0.00 90.33
15:37:26 all 6.84 0.03 3.34 0.14 0.00 89.66
15:37:56 all 5.77 0.02 3.56 0.11 0.00 90.54
15:38:26 all 6.25 0.02 3.48 0.14 0.00 90.11
15:38:56 all 6.61 0.03 3.54 0.19 0.00 89.64
15:39:26 all 6.81 0.03 3.42 0.19 0.00 89.54
15:39:56 all 6.28 0.03 3.29 0.15 0.00 90.26
15:40:26 all 6.09 0.03 3.51 0.14 0.00 90.23
15:40:56 all 6.43 0.02 3.49 0.14 0.00 89.93
Average: all 6.33 0.02 3.44 0.15 0.00 90.05CPU Performance
root@talos-pi:/# sysbench cpu run --threads=4 --time=60 --cpu-max-prime=20000
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 4
Initializing random number generator from current time
Prime numbers limit: 20000
Initializing worker threads...
Threads started!
CPU speed:
events per second: 2102.18
General statistics:
total time: 60.0015s
total number of events: 126141
Latency (ms):
min: 1.70
avg: 1.90
max: 49.46
95th percentile: 2.00
sum: 239887.52
Threads fairness:
events (avg/stddev): 31535.2500/717.61
execution time (avg/stddev): 59.9719/0.01Memory Performance
root@talos-pi:/# sysbench memory run --memory-total-size=100G --memory-oper=write --memory-access-mode=rnd --time=0
WARNING: Both event and time limits are disabled, running an endless test
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Running memory speed test with the following options:
block size: 1KiB
total size: 102400MiB
operation: write
scope: global
Initializing worker threads...
Threads started!
Total operations: 104857600 (283405.69 per second)
102400.00 MiB transferred (276.76 MiB/sec)
General statistics:
total time: 369.9862s
total number of events: 104857600
Latency (ms):
min: 0.00
avg: 0.00
max: 8.06
95th percentile: 0.00
sum: 343589.34
Threads fairness:
events (avg/stddev): 104857600.0000/0.00
execution time (avg/stddev): 343.5893/0.00What Does All That Mean?
The TL;DR is this:Ā K3s is kinder to your CPU than vanilla K8s, but its memory efficiencies are negligible given the proper OS.
Percentage-wise, hereās what I saw:
- At idle, Talos OS with vanilla K8s uses ~1.5% more memory than K3s on Raspberry Pi OS.
- At idle, K3s on Raspberry Pi OS uses 5.18% CPU while Talos OS and vanilla K8s uses 9.95%.
- During memory writes, Talos OS with vanilla K8s had ~1.4% higher throughput than K3s on Raspberry Pi OS.
- Under full CPU load, K3s on Raspberry Pi OS will process ~6% more events per second than Talos OS with vanilla K8s.
Are You A K3s Hater?
Iām not. Promise. Itās more the general discounting of Kubernetes as being totally unfit for these edgy environments that sticks in my craw. In fact, there are some things that I think K3s really nails now that Iāve got to spend some time with it.
For one, itāsĀ super easyĀ once youāve got your OS setup. Itās also very fast to bootstrap. Even though it wasnāt part of my actual data, doing a simpleĀ curlĀ command and then being able to doĀ kubectl get nodesĀ within ~30s is badass. Weāll be taking some lessons from that experience at Sidero for sure.
K3s also ships ābatteries includedā, meaning they make some hard descisions on how things are deployed and also bundle in some really nice quality of life addons. This means users donāt have to putz around with loadbalancers and ingresses if they donāt want to. As someone who tried to solve āall the thingsā for a long time with kubespray, I have respect for drawing a line and saying āhereās what comes out of the boxā.
I also feel like thereās some things about K3s that everyone who loves Talos OS would violently agree with, especially around the general minimization of a huge piece of software and the desire to provide ājust enoughā to run it.
I really think that these tests make the case that the underlying OS is pretty dang important. The data I saw shows that it is feasible to run vanilla K8s on theĀ rightĀ OS and still compete with ā and in some aspects beat ā the default experience of something like K3s. Even in the stats that look the scariest like idle CPU, itās still worth noting that only 10% of CPU power was being occupied, even if the percentage increase seems massive.
Iād say there is probably a use case for running each of these stacks on a Raspberry Pi and itās up to the reader to decide when and where. I personally like the benefits of having a declarative way to run the same OS and K8s versions on a Pi cluster as in the data center, along with the consistency and stability that enables. But does that mean I am going to demand that you use Talos OS and vanilla Kubernetes, dear reader?Ā Nope.Ā You do you! I did, however, have a very fun few days coming up with some ways to test all this out. And heck, it may even make sense for us to put K3s on āthe best damned OS for Kubernetesā at some point, who knows.
Iām also interested in any tests others may be able to come up with. Feel free to hit me up in the Talos Slack or Twitter.


