Deploy a free Kubernetes cluster in the cloud

» Post Updates

2022-08-20: The k3os project seems to be dying and currently I would not recommend it for new cluster installations. Personally, I’ve switched to Talos.
2021-09-17: Added a new section about securing web applications with TLS.
2021-07-08: Oracle now offers 4 Ampere (ARM) instances with cumulated memory of 24GB. So, multi-main clusters are now possible and resources are less of a problem.

» Introduction

For installing a completely free Kubernetes cluster, you can use the free instances of Oracle Cloud. The free tier is of course very low-end but it’s enough for some testing and playing around without always running something like minikube, kind or k3d on your local machine.

My first attempt was trying to install Ubuntu with MicroK8S, but this was already too much for the free instances with 1GB of RAM. The same was true when I tried to deploy k3s on top of the Ubuntu installation.

Then I found k3os, which even offers a Takeover Installation, which means you can install the whole k3os based on another Linux installation (which will be overwritten in the process). As my Ubuntu instance was still idling around, I gave it a try. It took me a bit of time, especially getting the kubectl connection from my local machine to the k3os instance running (without certificate issue), but now it seems to work. Well, it’s just a single node cluster and you shouldn’t use it for anything production like. The memory consumption is already at about 65%, but nevertheless, it’s very nice to have this for some testing.

» Takeover Installation

I have not tried installing k3os “from scratch”, instead I used the takeover installation on an existing Ubuntu instance. First, download the installation script.

1
2
wget https://raw.githubusercontent.com/rancher/k3os/master/install.sh
chmod +x install.sh

Store the k3os config from below on the host you are overwriting and adjust the IP for the --node-external-ip argument, as well as the ssh key(s).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
hostname: k3os
k3os:
  dns_nameservers:
  - 8.8.8.8
  - 1.1.1.1
  k3s_args:
  - server
  - --cluster-init       # <-- When running a multi main cluster (only add this to the first node of the cluster!)
  - --node-ip=10.0.0.236 # <-- Private network IP of this machine
  modules:
  - kvm
  - nvme
  ntp_servers:
  - 0.us.pool.ntp.org
  - 1.us.pool.ntp.org
  sysctls:
    kernel.kptr_restrict: "1"
    kernel.printk: 4 4 1 7
  token: TOKEN_VALUE
ssh_authorized_keys:
- github:sj14   # <-- Adjust with your github user or insert your public key like below
# - ssh-rsa AAAAB3NzaC1yc2EAAAADAQAB...

Before running the installation script, modify the parameter for the config path and the url to the iso (check the releases) in the command below. I would further suggest modifying the command to use the latest k3os release version.

1
sudo ./install.sh --takeover --config /path/to/config.yaml --no-format /dev/sda1 <URL TO ISO>

Reboot the instance, say goodby to Ubuntu and hello to k3os.

» Local kubeconfig

To connect from your local machine to the cluster with kubectl, you need the kube config from your new cluster.

1
scp rancher@<PUBLIC IP OR HOSTNAME>:/etc/rancher/k3s/k3s.yaml ~/.kube/config

Adjust the server IP (server: https://127.0.0.1:6443) of your local config to match the public IP address or hostname of the k3os installation. Afterwards, you can verify the connection with:

1
2
3
$ kubectl get nodes
NAME          STATUS   ROLES    AGE   VERSION
k3os-master   Ready    master   16m   v1.18.9+k3s1

» Oracle Firewall Rules

When using the free oracle instances as I do, you need to add security rules for allowing the kubectl connection.

1
2
3
4
5
6
7
DIRECTION: Ingress
SOURCE TYPE: CIDR
SOURCE CIDR: 0.0.0.0/0
IP PROTOCOL: TCP
SOURCE PORT RANGE: (empty)
DESTINATION PORT RANGE: 6443
DESCRIPTION: kubectl

You need to add similar rules when you want to expose other services.

» Adjust the config

When you want to change the cluster configuration after the installation, you can edit /var/lib/rancher/k3os/config.yaml. In my case, this file was empty after the takeover installation, so you can run sudo k3os config --dump to show the current configuration and copy it to the given path.

1
sudo sh -c 'k3os config --dump > /var/lib/rancher/k3os/config.yaml'

When you are happy with the modified configuration, just run:

1
sudo k3os config --install

And if you want to see the logs:

1
tail -f /var/log/k3s-service.log

» Persistent Data Storage

» Local Path Provider

UPDATE: This is not necessary for me anymore with newer k3os versions but I will keep this section as reference.

Some of our pods might need to persist the data over a restart. The top-notch solution would be using a distributed storage platform such as Ceph. But here we do things on a smaller scale and go with a local-path solution. This means we are using the disk storage from the k8s node. This has the drawback, that a pod gets an affination to the node it was originally created on and you can’t easily move the workload to another node. As always, there is no silver bullet and the more extensive solutions would most probably not fit our use-case as you would need at least 3 nodes that could share the data with a distributed storage system.

First, I tried the provisioner from Rancher, as it’s also mentioned in the k3os project documentation. Unfortunately, it had issues with starting again after a node restart. My research showed me OpenEBS as an alternative. There is even a lightweight solution of it:

1
2
kubectl apply -f https://openebs.github.io/charts/openebs-operator-lite.yaml
kubectl apply -f https://openebs.github.io/charts/openebs-lite-sc.yaml

For deployments you might need to set the storageClass or storageClassName to openebs-hostpath. This was a bit nicer with the Rancher Local-Path Provisioner, as there was no change required. Probably, it can also somehow be configured like this but I don’t know how to do this yet.

» Longhorn (distributed storage)

For replicating volumes on the nodes and making it more fault tolerant, you can install longhorn as distributed storage.

1
2
3
4
5
6
7
helm repo add longhorn https://charts.longhorn.io
helm repo update
kubectl create namespace longhorn-system
helm install longhorn longhorn/longhorn --namespace longhorn-system
# remove local-path as default provisioner:
kubectl patch storageclass local-path -p \
  '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

» Adding more Nodes

Now we have our first node installed. But all the fun with Kubernetes comes when you add more nodes and get a cluster. The setup is almost identical to our first node, except we want to add a worker node this time. This means it has more resources for your workload but when the master node fails, which is the “head” of the cluster, you won’t be able to control the worker nodes anymore. Again, this is a shortcoming of our toy project which will now have 2 nodes and solving this would require at least 3 master nodes alone (update: now possible with the free Ampere instances).

Run the steps from the previous takeover installation, but this time with slightly different config. To connect this new worker node with the master node, we need the secret token from the master machine. Login to your master node with ssh and run:

1
sudo cat /var/lib/rancher/k3s/server/node-token

Paste the output as token into your new config for the worker node. Furthermore, adjust the IP addresses for --node-external-ip, which is the public IP of your new node and the server_url which should be the internal network IP of this machine. Thus, both nodes should be in the same data center (or at your home in the same network). Don’t forget to adjust the firewalls to allow connection between your nodes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
hostname: k3os-worker
k3os:
  dns_nameservers:
    - 8.8.8.8
    - 1.1.1.1
  k3s_args:
    - server # when joining as a main node
  # - agent  # when joining as a worker node
    - --node-ip=10.0.0.49 # <-- Private network IP of this machine
  modules:
    - kvm
    - nvme
  ntp_servers:
    - 0.us.pool.ntp.org
    - 1.us.pool.ntp.org
  server_url: https://<IP/HOSTNAME OF THE MAIN NODE>:6443 # <- Add the "internal" IP of the main node
  sysctls:
    kernel.kptr_restrict: "1"
    kernel.printk: 4 4 1 7
  token: XXX::server:XXX # <- Add token from main node
ssh_authorized_keys:
  - github:sj14   # <-- adjust with your github user or insert your public key like below
  # - ssh-rsa AAAAB3NzaC1yc2EAAAADAQAB...

Run the install script and reboot the machine as described in the installation of the first node. After some time, your new node should have joined the cluster.

1
2
3
4
$ kubectl get nodes
NAME          STATUS   ROLES    AGE   VERSION
k3os-main     Ready    master   23h   v1.18.9+k3s1
k3os-worker   Ready    <none>   21h   v1.18.9+k3s1

I had an issue with the traffic being blocked on the worker node (when it is a “pure” worker node, joining as agent) but executing this workaround on the worker node did the trick:

1
iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT

TODO: persist the iptables change

When there are any issues and the new node does not show up, check the logs on both machines:

1
tail -f /var/log/k3s-service.log

» Deploying a web application with TLS

It took me some time to get Grafana correctly running with TLS certificates from Let’s Encrypt inside the cluster. Those sections will describe what is needed to achieve this.

» Grafana

I used the Helm chart for installing Grafana:

1
2
3
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm upgrade -i grafana grafana/grafana

This will install a Grafana instance in the default namespace. Of course you can adjust this and all the other settings, e.g. the persistent storage, but let’s not focus on those details here.

» DNS Changes

In the next section describing cert-manager, we will receive the certificates. To make this work with subdomains, you have to add an A Record with a wildcard (*) Subname to the DNS settings of your domain:

Type	Subname	Content (IP Address)
A	*	same as for the root record

The wording might be different in your DNS dashboard.

» Cert-Manager

Next in the line is cert-manager which will help us getting the Let’s Encrypt certificates:

1
2
3
4
5
6
7
8
9
helm repo add jetstack https://charts.jetstack.io
helm repo update

helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.5.3 \
  --set installCRDs=true

Create a new file certs.yaml with the following content but replace the marked parts:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
  namespace: default # replace
spec:
  acme:
    email: [email protected] # replace
    privateKeySecretRef:
      name: prod-issuer-account-key
    server: https://acme-v02.api.letsencrypt.org/directory
    solvers:
      - http01:
          ingress:
            class: traefik
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: example.com  # replace (doesn't need to match the domain)
  namespace: default # replace
spec:
  secretName: example.com-tls # replace
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  commonName: Example Organization # replace
  dnsNames:
    - example.com         # replace
    - grafana.example.com # replace

Mainly, you have to replace example.com with your domain. It is important, that the namespace is the same as for your application. As Grafana is also running in the default namespace, it is fine here.

As we use a HTTP Solver, we are not able to get a wildcard certificate for all subdomains, instead we have to explicitly specify every subdomain. With a DNS Solver it is possible to get a wildcard certficiate for all subdomains. Please check other documentations when you want to do this.

Apply the config with:

1
kubectl apply -f certs.yaml

It should only take a short amount of time to receive the certificates from Let’s Encrypt. You can check the status of the process with several commands:

1
2
3
kubectl get certs
NAME          READY   SECRET            AGE
example.com   True    example.com-tls   20h

1
2
3
kubectl get orders
NAME                           STATE   AGE
example.com-4sckv-4151083919   valid   20h

1
2
3
kubectl get certificaterequests
NAME                APPROVED   DENIED   READY   ISSUER             REQUESTOR                                         AGE
example.com-4sckv   True                True    letsencrypt-prod   system:serviceaccount:cert-manager:cert-manager   20h

» Ingress

With the ingress, we configure the final part and the door to the web. Create a new ingress.yaml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress
  namespace: default               # replace
  annotations:
    ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
    - hosts:
        - grafana.example.com      # replace
      secretName: example.com-tls  # replace
  rules:
    - host: grafana.example.com    # replace
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: grafana
                port:
                  number: 80

Again, adjust the file to match your domain and settings, then apply it:

1
kubectl apply -f ingress.yaml

While I tested all this, Traefik was not always fully reloading the config and I had to restart the pod before everything worked. This can be done with the following command (adjust the suffix of the pod name to match yours):

1
kubectl delete pod -n kube-system traefik-xyz-xyz

Now, you should be able to reach your Grafana instance on grafana.example.com with a valid TLS certificate.