Bare-metal Kubernetes for your Homelab

Bare-metal Kubernetes for your Homelab — You are here.
Installing Fedora CoreOS — Learn how to install Fedora CoreOS, a minimal, container-optimized Linux distribution, as the base operating system for your Kubernetes cluster with Butane and Ignition.
Deploying bare-metal Kubernetes with Kubespray — How to use Kubespray, an automation tool based on Ansible, to deploy Kubernetes on your bare-metal setup, providing detailed steps for configuring and launching a multi-node cluster.
Using Flux for GitOps — How to integrate Flux, a GitOps tool, into your Kubernetes cluster, enabling automated deployment from Git repositories and streamlining your CI/CD processes.
Persistent storage with Rook-Ceph — We need to be able to save data in our cluster. We use Rook-Ceph, the cloud native to use the Ceph distributed filesystem to make storage available in our cluster.
Observability — Learn how to monitor and log your cluster's performance and health for better insights and troubleshooting.
1. Metrics: Prometheus and Grafana — Using the prometheus-stack, we set up observability for the cluster, by scraping metrics and displaying them in Grafana Dashboards.
2. Logging: Centralized Logging with ElasticSearch — The other part of observability, centralized logging. We'll discover how to set-up FluentBit to scrape logs from all parts of your cluster and use ElasticSearch and Kibana to analyze them.
Cert-Manager for automatic TLS certificates — We use Cert-Manager to automatically create and rotate TLS certificates in our cluster that we acquire using ACME and Let's Encrypt.
Ingress — Guide to configuring ingress for your cluster, using various tools to manage traffic routing and external access.
1. Ingress with Traefik — Learn how to deploy Traefik as an ingress controller for dynamic routing and load balancing across your services.
2. Ingress with cloudflared (Cloudflare Tunnels) — Discover how to use Cloudflared/ Cloudflare Tunnels as a secure ingress solution that integrates easily with Cloudflare’s edge network for added protection and performance.
3. Ingress with TailScale — Finally, we can also integrate our Kubernetes cluster into our tailscale network.
Cloud-Native Postgres — This project allows you to manage PostgreSQL Databases in a cloud-native way.
Service Meshes — Explore the benefits of service meshes for secure and efficient communication between your services.
1. Service Meshes: linkerd — An introduction to Linkerd, a lightweight and secure service mesh for your Kubernetes cluster.
2. Service Meshes: Istio — A detailed guide to Istio, a robust service mesh offering advanced traffic management and security features.
Security — Here, we discuss how to harden the cluster.
1. Automatic K8s cluster scanning with Trivy — Trivy can scan clusters for vulnerabilities and misconfigurations.
2. Use Renovate Bot for GitOps with Flux — We can use renovate bot to automatically update our Flux Deployments.

In this article series we’re going to set up a bare-metal Kubernetes cluster for use in a homelab. We’ll try to be as close as possible to production-readiness (but, since it’s a homelab, there’s going to be plenty of jank).

Introduction

I’ve attempted to write this article series many times. It all started in 2019 when I bought three Dell R710 servers. Looking back, purchasing used servers that had been end-of-life for several years was not the best decision. In fact, on Christmas Eve, 2021, one of these servers suddenly caught fire. While preparing this article, I even found a draft from December 31st, 2023, where I had tried setting something up with Raspberry Pis. No matter what I tried, though, I wasn’t satisfied with my hardware setup until recently. So, in this article, we’re going to dive into the hardware choices for a homelab setup. But before that, let’s take a step back and discuss what Kubernetes is, why it’s so powerful, and why it might be overkill for your homelab.

This article is aimed at people who find joy in tinkering with servers. Setting up Kubernetes in a homelab is not as simple as plugging a NAS into the wall. It requires careful planning and setup. Think of it like buying a car to work on—this is exactly that kind of project.

So, what exactly is Kubernetes?

Containers have become the go-to method for deploying workloads, both on-premises and in the cloud¹. However, when your application consists of many services, managing containers becomes a headache. To address this, container orchestration tools have emerged. Let’s ignore Cloud Solutions (like AWS’s ECS) for now, and focus on the on-prem world. I expect most of the readers of this blog are (at least vaguely) familiar with Docker Compose.

Docker Compose is popular among home users because it provides a straightforward way to “spin up some containers” and have them interact with each other. It works by defining a docker-compose file, which specifies the containers, their networking, and storage configurations.

services:
  broker:
    image: docker.io/library/redis:7
    restart: unless-stopped
    volumes:
      - redisdata:/data

  db:
    image: docker.io/library/postgres:16
    restart: unless-stopped
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: paperless

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - db
      - broker
      - gotenberg
      - tika
    ports:
      - "8000:8000"
    volumes:
      - data:/usr/src/paperless/data
      - media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    env_file: docker-compose.env
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBHOST: db
      PAPERLESS_TIKA_ENABLED: 1
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
      PAPERLESS_TIKA_ENDPOINT: http://tika:9998

  gotenberg:
    image: docker.io/gotenberg/gotenberg:8.7
    restart: unless-stopped
    command:
      - "gotenberg"
      - "--chromium-disable-javascript=true"
      - "--chromium-allow-list=file:///tmp/.*"

  tika:
    image: docker.io/apache/tika:latest
    restart: unless-stopped

volumes:
  data:
  media:
  pgdata:
  redisdata:

This is an example of a docker-compose file for Paperless-NGX, a document management software. Here, you can see five containers defined: Paperless, Postgres, Redis, Tika, and Gotenberg. Some containers have volumes (storage), while others receive their configuration via environment variables. Some of the containers als expose ports to the world. These files can easily be deployed on a server using the Docker CLI and work as expected. However, there are still some aspects that are missing or could use improvements, especially if you don’t have one but many different such deployments:

Can I deploy this to multiple servers?
- What happens if my container becomes unhealthy? How would I know?
- What happens if one of my servers goes down?
- Can I control which servers my workloads land on (for example: I want my Mediaserver and AI workloads to be on a server with a GPU)
How can I scale my containers? Can I scale services (such as Redis and Postgres) independently?
How do I know if my containers are doing well? Can I get warnings?
How are the secrets stored?
Can I do versioning?

So what is Kubernetes? Kubernetes is a container orchestrator, that manages container deployments across a range of underlying nodes. It provides a unified API to manage applications, scale workloads, ensure reliability, and automate deployment processes. With Kubernetes, you can abstract away the complexity of managing individual containers and instead focus on defining the desired state of your applications, while Kubernetes takes care of achieving and maintaining that state.

Hence, Kubernetes (or K8s for short) addresses the challenges hinted at earlier either directly (having a solution for it) or through its emergent properties (because the way the system is set-up such properties follow automatically). Instead of treating containers as isolated entities, Kubernetes manages them as part of a larger, distributed system. Here’s how it helps solve the above issues:

Multi-server deployments: Kubernetes is designed to span across many nodes (servers), allowing you to deploy containers on multiple machines seamlessly. It abstracts away the complexity of managing these servers, allowing Kubernetes to handle the distribution of workloads across them.
Health checks and monitoring: Kubernetes includes built-in health checks (liveness and readiness probes) for containers. If a container becomes unhealthy, Kubernetes can automatically restart it, ensuring that your services stay up and running. Additionally, Kubernetes offers integration with tools like Prometheus for monitoring, so you can easily track the health and performance of your containers.
Fault tolerance and high availability: Kubernetes ensures that your workloads are resilient to server failures. If one of your nodes goes down, Kubernetes will automatically reschedule your containers to healthy nodes, minimizing downtime.
Workload placement: Kubernetes provides features like node affinity and taints/tolerations, which allow you to specify where certain workloads should run. For example, you can make sure that GPU-heavy AI workloads only run on nodes with GPUs, ensuring optimal resource usage.
Scaling: Kubernetes makes it easy to scale your containers up or down, either manually or automatically. You can scale individual services, like Redis or Postgres, independently based on demand, without needing to manage separate Docker Compose files or configurations.
Performance monitoring and alerts: With Kubernetes, you can integrate monitoring tools like Prometheus and Grafana to track container metrics. You can also set up alerts based on predefined thresholds, so you’ll be notified if something goes wrong.
Secret management: Kubernetes has built-in mechanisms to store and manage secrets securely, such as passwords, API keys, and certificates, using Kubernetes Secrets or integrations with external secret management systems like HashiCorp Vault.
Versioning and rollbacks: Kubernetes enables version control through its deployment objects. You can deploy new versions of your containers and easily roll back to a previous version if something goes wrong, ensuring that your environment remains stable.

Picking the hardware

So let’s say I’ve convinced you and you’re ready to get started on the journey²: How do you find the right hardware?

As a homelab enthusiast, you realistically can pick from three options:

Proper servers: If you have the budget, picking up a couple of Dell, SuperMicro, or other brand servers from eBay can be a solid choice. Keep in mind, however, that these machines are designed for business use, meaning they tend to be quite loud and power-hungry. On the plus side, you’ll get all the advantages of traditional servers, such as remote management, powerful CPUs, and ample memory.

If you’re interested in going that route, check out /r/homelab and /r/homelabsales. I personally really like the Dell Servers (in fact, I would probably buy R730s right now), but I’ve heard others also speak highly of SuperMicro servers.

Raspberry Pis: Raspberry Pis are small, power-efficient ARM-based computers that are commonly used to form clusters. While I find them less ideal for this particular project—especially due to my previous struggles with ARM containers—they have a massive and active modding community. This makes them a great option for those looking to explore a range of custom setups.

Project TinyMiniMicro: One of the most interesting solutions I’ve come across is ServerTheHome’s Project TinyMiniMicro. This project focuses on small, off-the-shelf mini PCs that can be bought for a fraction of the price of traditional servers. These tiny PCs may not have the most powerful CPUs, but they shine in terms of RAM capacity, which was crucial for my needs, as my workloads are more memory-bound than CPU-bound.

If you don’t buy a server, don’t be surprised if you don’t get a server. While mini PCs are great for what they do, they’re not as versatile as real servers. For example, you can’t “just add” storage as you wish, can’t arbitrarily add extension cards (such as GPUs or NICs) and you can’t infinitely add memory.

So how do you choose?

Try to think about the requirements for your project. Here’s some questions to get you started, but this list is by no means exhaustive:

How much storage do you expect to need? Are HDDs sufficient or do you need SSDs? How can you connect them to your cluster?
What are the requirements of your workloads? Do they need a high-performance CPU or just some CPU and a lot of RAM?
Do you expect a lot of network traffic? Do you want to segment the traffic (for example, have the traffic for balancing your storage cluster on another NIC)?
Do you need GPUs in your cluster?
How much power are you willing to consume, and what are the limitations of your home electrical setup³?
How much scalability do you need? Will you expand the cluster over time, and can the hardware handle it?
What is your budget for initial setup versus long-term maintenance and upgrades?
Do you require specific hardware features, like ECC memory or IPMI for remote management?
Are there physical space constraints where the cluster will be deployed?
What is your plan for networking? Do you need enterprise-grade switches, VLAN support, or more NICs?
How will you handle firmware and BIOS updates across your nodes?
Do you need to integrate with other services, such as an existing home automation system, media server or your ISP router?
Are you prepared for potential hardware failures, and do you have spare parts or a repair plan?
Will you host sensitive data, and do you need hardware with features like a TPM for encryption or hardware security modules (HSM)?
How much noise do you tolerate?

Answering theses questions helped me land on going the TinyMiniMicro router and I bought three Lenovo ThinkCentre M720q with 32 GB RAM each. I added one SATA 128 GB boot SSD and a 1 TiB NVMe SSD for storage in the cluster. This choice is solid and budget friendly, but I’m making some trade offs: For example, I’m limited to only one one NIC, and can’t easily add new storage. With some quick math that lands me at 2.8 TiB usable storage, which can’t handle my media ripping/Jellyfin pipeline. So in the future, — when I’m migrating my current prod services — I’m going to need to add more storage to the cluster (somehow). The SSDs are also not enterprise grade, so I expect them to decay faster.

This concludes the article on picking your hardware! Think about your deployment, calculate your budget, get spouse approval, recalculate your budget and finally order yourself some hardware! In the next article, we’re going to look into installing Fedora CoreOS on our cluster.

Yes, I’m aware of Lambda. Fight me. :D ↩
If yes: How? I’m not a strong writer. ↩
Residential power delivery apparently doesn’t like six servers on one socket. Don’t ask me how I know. ↩