Add Your Heading Text Here

How a SaaS company saved 90% ($180,000) per year on Infrastructure Costs by adopting Kubernetes and Talos Linux

MyNewsDesk is a one stop solution for PR and communication, with content creation, analysis and insights, and media monitoring. They are a typical midsized company – 150 employees, 20 developers. They publish around 11000 stories per month, 5 million emails, 6 million page views. MyNewsDesk runs a monolithic Ruby on Rails applications, with React and graphql database. Until this year, they outsourced most of their devops support to the Heroku platform. They recently embarked on a project to switch platforms. The drivers for the change were GDPR compliance;  a feeling they may have outgrown Heroku as MyNewsDesk scaled (having run into some security and development issues); and of course they were looking to see if there was a way to reduce costs. So what were the alternatives? They first investigated startups that delivered EU based cloud platforms with GDPR comliance, but these startups were new, and so lacked a track record, and consequently there was no trust for a mission critical application. The team had zero experience in Kubernetes, but had the idea that maybe Kubernetes was viable for medium sized companies, not just enterprises. After a bit more research, they summarized their feelings about Kubernetes:
Pros:
  • Kubernetes is trending to be the de facto deployment platform.
  • Open source operators are available for managing HA databases.
  • Stronger end game – can customize more for their needs.
Cons:
  • The team had zero experience with Kubernetes
  • Steep learning curve, and Kubernetes has a scary reputation
  • Would have to build an in house platform
However, after getting Kubernetes to work for a hobby application on Nooks at home, they explored viability with a simple proof of concept, replicating the MyNewsDesk site on Kubernetes. The team elected to use Hetzner for bare metal hosting, as they seemed to provide the highest performance per $. Talos Linux emerged as the preferred operating system for Kubernetes, for a variety of reasons:
  • The team was impressed by how Talos Linux was conceived and its seamless integration with Kubernetes.
  • Talos provided a declarative, patchable, and stageable configuration management experience, allowing the team to focus on the Kubernetes part. “We were looking for something to get out of our way.”
  • Talos Linux, being a minimal, Kubernetes specific Operating System, made updates simple.  “With Ubuntu, you ssh in, you have 150 packages to update, you don’t know what they are, and what will happen when you upgrade, etc. Talos made the updates simple. No package updates to worry about.”
  • The team also found that there was an exceptional community Slack. The community helped the team figure out how configuration management could work to make it declarative, patchable and stageable
  • Zero headache Kubernetes upgrades were also fantastic.
The cons of Talos Linux was that it’s another learning curve: having no shell felt so strange after being used to normal Linux, so it took some getting used to. It’s also early days for Talos: it is not yet that widely adopted, but as kind of an early adopter of Rails, this is something that is exciting. The team got Kubernetes running on Hetzner bare metal nodes, with Talos Linux deploying Kubernetes. They struggled a bit with ArgoCD, due to their lack of experience, and had to evaluate 8 different persistent storage solutions, test three, and settled on one. But, after a few weeks, they got the MyNewsDesk application running on the proof-of-concept environment. For the proof-of-concept, they could compare performance between the two environments, each with the same database and application. On Heroku, they were running 8 x performance L Dynes, and  Heroku Postgres premium 5. On Hetzner, they ran 2 x AX101 servers for web and 2 for databases, with 128GB of RAM. They used Open EBS Dynamic LocalPV provisioner – the only use case for persistent data they had is databases – so they can use DB level replication, and simplify the storage requirements. They tested latency with a single connection, using one of their busy endpoints: Average Latency went from 205 on Heroku to 134ms on the Talos Kubernetes running on Hetzner. More significantly, the latency of 99% of requests went from 655 to 226ms. Throughput went from 163 requests/sec on Heroku, with 99% being less than 1.6 seconds, to bare metal providing 261 req/sec, with 99% less than 421 ms. They also tested sustained load, by replaying seven days of production requests overnight. They achieved 460 requests/second on the Talos deployment, which is 10 x peak production. Costs for the proof-of-concept setups:
  • Heroku: $7650 per month
  • Hetzner: $520 per month
This clearly showed that running their application on Kubernetes on bare metal with Talos Linux was viable! So, they ordered AMD servers, and migrated to bare metal.  To give an idea of scale,  the staging cluster has seven of the AX 41 servers and the production cluster is running three of the AX 41’s for the control plane, and then we have five for workers and five for databases. Looking at the developer experience of actually using this platform, the goal was to make it at least as good as Heroku. The principles guiding the design of the developer experience was to hide Kubernetes from the developers. This doesn’t mean that they don’t have access if they want it, but we didn’t want any developer to have to learn about Kubernetes and YAML and all the complexities. We developed our own command line tool to wrap Kubernetes, that interfaces with GitHub. We ended up delivering a substantially better developer experience – just one example, deploys went from 5 minutes on Heroku to 10 seconds. How did the live deployment go?

Comparing the prior week on Heroku to the first week on Hetzner/Talos, we can see that average latency reduced by 33 milliseconds: about a 29% reduction – very nice to see that after all this work, it just worked! Not only were there no problems, there was significant cost reduction: we used to pay Heroku about $200,000 per year. The new Talos Linux based Kubernetes stack is about $20,000 per year. So it’s an order of magnitude savings. Cheaper, faster and better!

When we talk to people about the Kubernetes stack we built on Talos Linux, a common response is that people believe that managing Kubernetes yourself is a nightmare. Given our experience of building our stack during that past 6 months, Kubernetes management has felt pretty much like a non-issue. The only major learning we had to go through was understanding etcd defragmentation since we had some situations where we generated extreme bloat in the revision history.

People think doing things like upgrading Kubernetes is super hard. Upgrading Kubernetes using talosctl upgrade-k8s has been a fantastic user experience with zero downtime. So I’m not able to relate to these concerns. We don’t know whether it is a misconception that Kubernetes is hard to self-manage, or whether it is just significantly easier when running Talos Linux: we don’t really have any experiences with other OS’es running Kubernetes, so we can’t compare.

For more details information, watch the talk at TalosCon 2023, or explore the Reclaim the Stack website – the team at MyNewsDesk was kind enough to open source all their  stack!

Hobby

For home labbers
$ 10 Monthly for 10 nodes
  • Includes 10 nodes in base price
  • Limited to 10 nodes, 1 user
  • Community Support

Startup

Build right
$ 250 Monthly for 10 nodes
  • Includes 10 nodes in base price
  • Additional nodes priced per node, per month
  • Scales to unlimited Clusters,
    Nodes and Users
  • Community Support

Business

Expert support
$ 600 Monthly for 10 nodes
  • Volume pricing
  • Scales to unlimited Clusters,
    Nodes and Users
  • Talos Linux, Omni and Kubernetes support from our experts
  • Business hours support with SLAs
  • Unlimited users with RBAC and SAML

Enterprise

Enterprise Ready
$ 1000 Monthly for 10 nodes
  • Business plan features, plus...
  • Volume pricing
  • 24 x 7 x 365 Support
  • Fully Managed Option
  • Can Self Host
  • Supports Air-Gapped
  • Private Slack Channel
On Prem
available

Edge

Manage scale
$ Call Starting at 100 nodes
  • Pricing designed for edge scale
  • 24 x 7 x 365 Support with SLAs
  • Only outgoing HTTPS required
  • Secure node enrollment flows
  • Reliable device management
  • Can Self Host On Prem
  • Private Slack Channel
On Prem
available