In this article, we’re going to use Kubespray to deploy Kubernetes on our bare-metal CoreOS cluster. We’ll discuss the configuration options. We’ll also cover CRI (container runtime interface) as well as CNI (container networking interface).
First things first: What is Ansible?
One recurring theme in this series is going to be automation. Processes requiring manual intervention are
prone to errors—think of the last time you accidentally burned your toast or scrambled your eggs too long.
Imagine you’re managing ten web servers hosting various web applications under subdomains of example.com
.
Each server requires a certificate that needs to be updated and deployed every three months before expiration.
While the task is straightforward (copy the new certificate and restart the web server daemon), mistakes can
easily creep in. Did you copy the correct certificate? Did you actually restart the service? Which server did
you just finish updating? Add to that the challenge of continuity: if your experienced IT admin leaves, are
the replacements prepared to handle this consistently and correctly?
Enter Ansible, a powerful tool for automation. With Ansible, you describe the desired state of your systems (e.g., “file: present”, “service: started”) declaratively, and it handles applying that configuration (typically over SSH). This means you can rebuild or reconfigure servers in minutes with minimal effort.
For this article, we don’t need to get into the weeds of Ansible, but it does help to know some important terms:
- Inventory: A collection of servers managed by Ansible.
- Roles: Define specific configurations or tasks for servers.
- Playbooks: YAML files that outline desired states and actions.
For example, you might have twenty servers. Fifteen of them are webservers
and seven of them a postgres cluster
(with two servers assuming dual roles). In this scenario, you might define the following roles with
their corresponding playbooks:
-
Webserver Role: Configure and maintain servers hosting web applications.
- Deploy updated SSL/TLS certificates.
- Install and configure the web server software (e.g., Nginx, Apache).
- Ensure the webserver service is running and properly configured.
- Set up monitoring and logging for HTTP traffic.
-
Database Role: Manage the servers in the PostgreSQL cluster.
- Set up primary and replica configurations for high availability.
- Back up the database to a secure storage location.
- Apply performance tuning configurations or updates.
- Monitor replication health and repair issues proactively.
-
Monitoring Role: Deploys monitoring agents, such as for Prometheus.
Kubespray is a collection of Ansible Playbooks to deploy Kubernetes
Our goal of this blog series is to have a running Kubernetes cluster. There are a couple of ways to do it, but the way I always found nice and easiest is using Kubespray.
Kubespray is a collection of Ansible playbooks to deploy a production-ready cluster. So with that out of the way, let’s get started!
Create a virtual environment
Ansible is written in Python, and because Python’s dependency management is … great, the first thing we need to do is create a virtual environment and install Ansible as well as some other dependencies:
$ git clone git@github.com:kubernetes-sigs/kubespray.git
$ cd kubespray
$ python3 -m venv .venv
$ source .venv/bin/activate # Different if on powershell
$ pip install -U -r requirements.txt
Create an Inventory
The next step is to create an inventory for your nodes. Friendly reminder: At this point your cluster should be running CoreOS and be reachable over the network from the machine you’re working on. Kubespray comes with an example inventory that we can adapt to our needs. To copy it, run:
$ cp -rfp inventory/sample inventory/your-cluster
We end up with a folder structure like this (shortened for the sake of brevity):
inventory/your-cluster
├── group_vars
│ ├── all
│ │ ├── all.yml
│ │ ├── containerd.yml
│ │ ├── coreos.yml
│ │ └── etcd.yml
│ ├── etcd.yml
│ └── k8s_cluster
│ ├── addons.yml
│ ├── k8s-cluster.yml
│ ├── k8s-net-calico.yml
│ ├── k8s-net-kube-router.yml
│ └── k8s-net-macvlan.yml
└── inventory.ini
The inventory.ini
is the inventory of our cluster. We’ll edit it in a bit. Let’s take the opportunity to
read through the documentation on the folder
structure:
The inventory is composed of 3 groups:
- kube_node: list of kubernetes nodes where the pods will run.
- kube_control_plane: list of servers where kubernetes control plane components (apiserver, scheduler, controller) will run.
- etcd: list of servers to compose the etcd server. You should have at least 3 servers for failover purpose.
So what are the individual components? Kubernetes is split into a control plane and nodes. The control plane manages the cluster’s overall state, schedules workloads, and handles communications within the cluster. It consists of components like:
- API Server: Acts as the gateway for administrative commands and the cluster’s state.
- Scheduler: Assigns workloads (Pods) to nodes based on resource availability and other constraints.
- Controller Manager: Ensures that the cluster’s state matches the desired state, managing tasks like replication and node health.
- etcd: A distributed key-value store that serves as Kubernetes’ backing store for all cluster data.
The nodes are the worker machines where your application workloads (Pods) run. Each node includes:
- kubelet: Ensures that containers are running in Pods as expected.
- kube-proxy: Manages networking for Pods, including service discovery and load balancing.
- Container Runtime: Runs the actual containers (e.g., containerd, CRI-O, or Docker).
Basically, Kubespray allows us to choose which of our servers will assume which roles. It’s possible to run servers that are part of the control plane and act as worker nodes and run etcd (in fact, that’s what we’re deploying today), but you can separate the roles as well.
So let’s create our inventory:
[all]
node1 ansible_host=10.100.0.1 etcd_member_name=k8s-01 ansible_user=ansible
node2 ansible_host=10.100.0.2 etcd_member_name=k8s-02 ansible_user=ansible
node3 ansible_host=10.100.0.3 etcd_member_name=k8s-03 ansible_user=ansible
[kube_control_plane]
node1
node2
node3
[etcd]
node1
node2
node3
[kube_node]
node1
node2
node3
[calico_rr]
[k8s_cluster:children]
kube_control_plane
kube_node
calico_rr
First, we define the nodes that we want ansible to manage: We give them a name, set their IP (in my case, all
of the nodes live in the 10.100.0.0/24
CIDR range). Since these three nodes are going to run etcd, we set
their etcd_member_name
s (if I had one more node not running etcd, this could just be set to an empty
string). Finally ansible_user
is the user that Ansible will use to SSH into the machine.
ssh THIS-USER@your-machine
.Next, we split the hosts into groups: The kube_control_plane
will host the control plane, the etcd
group
will house etcd and the kube_node
group will be available as worker nodes. There’s one last special group,
calico_rr
(calico route reflectors). Unless you have special networking
requirements, leave it empty.
Configuring Kubespray variables
Next, it’s time to configure some variables to make our cluster behave in the way we want. For that, Kubespray
comes with the group_vars
directory.
Settings for all nodes: all.yml
The all/all.yml
contains settings that are important for the entire process. Check out the file yourself (in
particular if you’re running a proxy, or are using RHEL). One thing you definitely should set is the upstream
DNS servers:
upstream_dns_servers:
- 8.8.8.8
- 8.8.4.4
The Google DNS servers are sane defaults, you can also use Cloudflare’s 1.1.1.1
or Quad9 if you feel more
comfortable with that, it really doesn’t matter.
Container runtimes (CRI)
Container runtimes are the underlying systems responsible for running containers on Kubernetes nodes.
Historically, Docker was the dominant runtime, but Kubernetes introduced the Container Runtime Interface
(CRI) to standardize how runtimes integrate with
Kubernetes. This led to a shift toward CRI-compliant runtimes, and today, the three main options are
containerd
, CRI-O
, and Docker (via the cri-dockerd CRI plugin).
-
containerd: Originally developed as part of Docker,
containerd
has become a standalone, lightweight, and efficient runtime. It is the default runtime for many Kubernetes distributions due to its close integration with Kubernetes and low resource overhead. -
CRI-O: Built specifically for Kubernetes,
CRI-O
is a lightweight runtime focused on simplicity and security. It supports Open Container Initiative (OCI) standards and is often used in environments prioritizing Kubernetes-native solutions, like OpenShift. -
Docker: While Docker brought containerization to the mainstream, its complexity and non-CRI compliance led to Kubernetes deprecating direct Docker support in version 1.20. Users who prefer Docker now rely on the cri-dockerd plugin or migrate to CRI-compliant alternatives.
To choose the container runtime, edit the k8s-cluster.yml
:
## Container runtime
## docker for docker, crio for cri-o and containerd for containerd.
## Default: containerd
container_manager: containerd
You can the configure the container runtime in the associated YAML file, for example containerd.yml
. This
can be useful in the case you have a private registry or if you want to change the log format. For my cluster,
I left the default values.
CoreOS Settings: coreos.yml
You can use the all/coreos.yml
to turn off auto upgrades. I’d advise against it.
etcd settings: etcd.yml
You can also change the etcd deployment type. If you don’t know what you’re doing, don’t change it.
Container Networking Interface (CNI)
The Container Networking Interface (CNI) is a standardized framework for configuring networking in containerized environments. Kubernetes relies on CNI plugins to provide connectivity between Pods and manage networking policies. The choice of CNI plugin can impact performance, scalability, and available features.
Common CNI Plugins
-
Calico: A versatile CNI plugin that supports a wide range of features, including network policy enforcement, IP address management, and optional support for BGP (Border Gateway Protocol). Calico can operate in pure Layer 3 mode, making it highly efficient for large-scale deployments.
-
Flannel: A simple, easy-to-configure plugin that provides basic networking through overlays like VXLAN. It lacks advanced features like network policy support but is lightweight and ideal for small to medium-sized clusters.
-
Weave Net: Known for its simplicity and ease of setup, Weave Net offers automatic IP allocation, encrypted connections, and limited network policy support. It’s a good choice for developers or smaller clusters.Be aware that Weaveworks apparently is bancrupt and there’s no support whatsoever for their CNI plugin. -
Cilium: A modern CNI plugin focused on security and observability. It uses eBPF (extended Berkeley Packet Filter) to enable high-performance networking, advanced policy management, and detailed telemetry.
Kubespray supports those and then some. Unless you have specific requirements, Calico is a good choice, and is what we’ll be using in this project.
This thus leaves us with the last file we should (or at least could) edit, k8s-cluster.yml
:
# Choose network plugin (cilium, calico, kube-ovn, weave or flannel. Use cni for generic cni plugin)
# Can also be set to 'cloud', which lets the cloud provider setup appropriate routing
kube_network_plugin: calico
While we’re at it, we can also configure the cluster-internal CIDRs, if needed:
# Kubernetes internal network for services, unused block of space.
kube_service_addresses: 10.233.0.0/18
# internal network. When used, it will assign IP
# addresses from this range to individual pods.
# This network must be unused in your network infrastructure!
kube_pods_subnet: 10.233.64.0/18
Dual-stack operations
You can choose to enable IPv6 for the cluster and optionally run everything in dual-stack mode. This is out-of-scope for this blog series.Addons
Optionally, Kubespray can install many addons into the cluster. You can finetune this in the addons.yml
.
While this is a perfectly fine way to handle things, personally, I like to do this using Flux (see the next
article for details). That’s why I have everything disabled in the file.
Deploying the cluster
Once we have configured everything to our liking, we can now deploy the cluster to our nodes by running:
$ ansible-playbook -i inventory/your-cluster/inventory.ini cluster.yml \
-b -v --private-key=~/.ssh/id_rsa
Lean back. Get a coffee. Walk your cute labradoodle. This is going to take a while.
Getting the admin.conf
for the cluster
Once everything is done, we’re almost ready to connect to our cluster! The last thing we need is to somehow
get the admin credentials. The easiest way is to ssh
to one of your nodes and copy the admin.config
from
/etc/kubernetes/
:
foo@node-1:$ sudo cp /etc/kubernetes/admin.conf ~
foo@node-1:$ sudo chown foo:foo admin.conf
user@workstation:$ scp foo@node-1:/home/foo/admin.conf ~/.kube/config
Edit the config, such that the server
points to one of your nodes:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: "" #<CENSORED>
server: https://10.100.0.1:6443 # <------- CHANGE THIS
name: cluster.local
contexts:
# ...
Finally, we can test the connection to the cluster by running:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane 1d v1.30.4
node2 Ready control-plane 1d v1.30.4
node3 Ready control-plane 1d v1.30.4
Closing remarks
Ooof, that was a long way to have a cluster up and running! Congratulations! There’s a couple of considerations to have at this stage:
-
Encryption at rest: Kubernetes knows the concept of secrets, data that should be securely stored (for example, API tokens or TLS certificates). Kubernetes can be configured to not store those in plaintext, but encrypt them at rest. Kubespray by default uses the
secretbox
algorithm.kube_encrypt_secret_data: true
-
Cluster hardening: You can follow the Kubespray documentation on hardening your cluster.
-
High Availability: Right now, in your
admin.conf
you hardcoded the IP or the hostname of one node. While the control plane technically is highly available, when that node goes down, you don’t have access without editing the config. You can check out the docs on HA to change that. -
Authentication and Authorization: Right now, there’s one user that can sign into the cluster, which is the super mega admin. Consider using other authentication methods, such as OpenID Connect.
-
Removing nodes: You might want to remove nodes (or replace them). To add new nodes you can simple re-run
cluster.yml
. To remove them, use theremove-node.yml
playbook.
In the next article in the series, we’ll set up Flux to have a working GitOps pipeline!