Cloud Foundry on Rancher (RKE): Where to Begin

April 6, 2020

Great question.

The following is a cloud agnostic guide to installing a 3-node RKE cluster, installing the Rancher UI, and using them to run KubeCF on top for a quick, cheap development Cloud Foundry environment. Depending on the IaaS you are deploying on top of you may need to modify some of the configurations where applicable – i.e. cloud_provider. Examples of these modifications for vSphere are included.

Machine Preparation

The first step in creating our 3-node RKE cluster is prepping the machines themselves. These machines can be bare-metal, on-prem virtual, or cloud instances, it doesn’t really matter as long as they are capable of running a distribution of linux with a supporting container runtime (i.e. Docker). For the sake of this blog, we will be creating 3 Ubuntu virtual machines on vSphere each with 2 CPU, 4GB RAM, and 100GB Disk.

Once you have the VMs up and running with Ubuntu Server installed, it’s time to install Docker and the Rancher toolset.

Docker Installation

The following commands can be run to add the relevant apt repo. GPG keys and Docker are installed by:

$ sudo apt update## Install GPG
$ sudo apt install apt-transport-https ca-certificates curl software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
## Add docker repo and install
$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
$ sudo apt update && sudo apt install docker-ce

Presuming all went smoothly, you should be able to check the status and see that Docker is now running:

$ sudo service docker status
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2020-02-13 11:14:33 EST; 1 months 16 days ago
     Docs: https://docs.docker.com
 Main PID: 1166 (dockerd)
    Tasks: 42
   Memory: 315.6M
      CPU: 4d 8h 32min 36.342s
   CGroup: /system.slice/docker.service
           └─1166 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

SSH Setup

To create and orchestrate the cluster, RKE uses SSH for access to each of the machines. In this case, we are going to create a new ssh key with ssh-keygen and add it to all of the machines with ssh-copy-id. For ease of deployment, avoid adding a passphrase.

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/rke/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/rke/.ssh/id_rsa.
Your public key has been saved in /home/rke/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:2qtjgSJg4kj/VCT2x9lbLytYhFLJTHbz4bX8bVVIy1A rke@rancher
The key's randomart image is:
+---[RSA 2048]----+
|        +o.o.+Eo.|
|     o ..=. +o=.o|
|    . + o +  ooo.|
|oo     + = o .  +|
|B .  .. S . o . +|
|o......o   o . o |
| . .o ... o   o  |
|     .o  o . .   |
|     ..o.   .    |
+----[SHA256]-----+

The following can then be performed for each of the new machines. The command will copy the ssh keys you generated to the other 2 nodes.

$ ssh-copy-id -i ~/.ssh/id_rsa.pub rke@<ip-addr>

Rancher & K8s Tools Installation

Now that we have Docker installed and ready and ssh configured, we need to install the tools used to create and manage the cluster. For this, all we need are rke, helm, and kubectl.

Both rke, helm, and kubectl need to be downloaded, made executable, and added to a place in your PATH:

## Install rke cli
$ wget https://github.com/rancher/rke/releases/download/v1.0.6/rke_linux-amd64
$ chmod +x rke_linux-amd64
$ sudo mv rke_linux-amd64 /usr/local/bin/rke
## Install helm cli
$ wget https://get.helm.sh/helm-v3.1.2-linux-amd64.tar.gz
$ tar -xvf helm-v3.1.2-linux-amd64.tar.gz linux-amd64/helm --strip 1
$ sudo mv helm /usr/local/bin/helm
## Install kubectl cli
$ wget https://storage.googleapis.com/kubernetes-release/release/v1.18.0/bin/linux/amd64/kubectl
$ chmod +x kubectl
$ sudo mv kubectl /usr/local/bin/kubectl

A quick note here about versions:

The installation commands and instructions in this guide use Helm v3. If you reuse a jumpbox/laptop with older versions of Helm the instructions here will not work and error out in awkward ways. Verify your versions of Helm, rke and kubectl.

Creating the RKE Cluster

At this point, we are ready to configure and provision the new K8s cluster. While there are a lot of potential options to fiddle with, rke will walk you through them and set up sane defaults to get going quickly. For our use case, we will be enabling all three roles (Control Plane, Worker, etcd) on each of our nodes.

The rke config command will start a wizard bringing you through a series of questions with the goal of generating a cluster.yml file. If you answer one of the questions you’ll be able to manually edit the cluster.yml file before deploying the cluster. An example of the wizard is below:

$ rke config --name cluster.yml
[+] Cluster Level SSH Private Key Path [~/.ssh/id_rsa]:
[+] Number of Hosts [1]: 3
[+] SSH Address of host (1) [none]: 10.128.54.1
[+] SSH Port of host (1) [22]:
[+] SSH Private Key Path of host (10.128.54.1) [none]:
[-] You have entered empty SSH key path, trying fetch from SSH key parameter
[+] SSH Private Key of host (10.128.54.1) [none]:
[-] You have entered empty SSH key, defaulting to cluster level SSH key: ~/.ssh/id_rsa
[+] SSH User of host (10.128.54.1) [ubuntu]: rke
[+] Is host (10.128.54.1) a Control Plane host (y/n)? [y]: y
[+] Is host (10.128.54.1) a Worker host (y/n)? [n]: y
[+] Is host (10.128.54.1) an etcd host (y/n)? [n]: y
[+] Override Hostname of host (10.128.54.1) [none]: rke1
[+] Internal IP of host (10.128.54.1) [none]:
[+] Docker socket path on host (10.128.54.1) [/var/run/docker.sock]:
[+] SSH Address of host (2) [none]:
...
[+] Network Plugin Type (flannel, calico, weave, canal) [canal]:
[+] Authentication Strategy [x509]:
[+] Authorization Mode (rbac, none) [rbac]:
[+] Kubernetes Docker image [rancher/hyperkube:v1.17.2-rancher1]:
[+] Cluster domain [cluster.local]:
[+] Service Cluster IP Range [10.43.0.0/16]:
[+] Enable PodSecurityPolicy [n]:
[+] Cluster Network CIDR [10.42.0.0/16]:
[+] Cluster DNS Service IP [10.43.0.10]:
[+] Add addon manifest URLs or YAML files [no]:

In running the interactive command above and answering the questions regarding our machines, network config, and K8s options rke has generated a lengthy cluster.yml file that is the main source of truth for the deployment.

You may want to consider modifying the cluster.yml file to add in cloud_provider options based on your underlying IaaS or change any answers you gave in the previous step before deployment. An example cloud_provider config is shown below for vSphere – we will be going into a deeper dive in another post regarding the vSphere cloud provider specifically if you run into issues or have questions regarding that. For other IaaSs – please refer to the Rancher documentation here.

cloud_provider:
  name: vsphere
  vsphereCloudProvider:
    global:
      insecure-flag: false
    virtual_center:
      vsphere.lab.example.com:
        user: "vsphere-user"
        password: "vsphere-password"
        port: 443
        datacenters: /Lab-Datacenter
    workspace:
      server: vsphere.lab.example.com
      folder: /Lab-Datacenter/vm/k8s-demo-lab/vms
      default-datastore: /Lab-Datacenter/datastore/Datastore-1
      datacenter: /Lab-Datacenter

In addition to adding the cloud_provider section above for your specific IaaS, you also should add the below section under the services key so that it looks like the following. This allows the cluster to sign certificate requests which is required by the KubeCF deployment for our dev environment.

  services:
    kube-controller:
      extra_args:
        cluster-signing-cert-file: /etc/kubernetes/ssl/kube-ca.pem
        cluster-signing-key-file: /etc/kubernetes/ssl/kube-ca-key.pem

Now that we have our cluster.yml prepared and ready to deploy, it can be rolled out using rke up:

$ rke up --config cluster.yml
INFO[0000] Running RKE version: v1.0.4
INFO[0000] Initiating Kubernetes cluster
INFO[0000] [certificates] Generating admin certificates and kubeconfig
INFO[0000] Successfully Deployed state file at [./cluster.rkestate]
INFO[0000] Building Kubernetes cluster
INFO[0000] [dialer] Setup tunnel for host [10.128.54.0]
INFO[0002] [network] No hosts added existing cluster, skipping port check
INFO[0002] [certificates] Deploying kubernetes certificates to Cluster nodes
INFO[0002] Checking if container [cert-deployer] is running on host [10.128.54.0], try #1
INFO[0003] Image [rancher/rke-tools:v0.1.52] exists on host [10.128.54.0]
INFO[0010] Starting container [cert-deployer] on host [10.128.54.0], try #1
INFO[0025] Checking if container [cert-deployer] is running on host [10.128.54.0], try #1
INFO[0031] Checking if container [cert-deployer] is running on host [10.128.54.0], try #1
INFO[0031] Removing container [cert-deployer] on host [10.128.54.0], try #1
INFO[0031] [reconcile] Rebuilding and updating local kube config
INFO[0031] Successfully Deployed local admin kubeconfig at [./kube_config_cluster.yml]
INFO[0031] [reconcile] host [10.128.54.0] is active master on the cluster
INFO[0031] [certificates] Successfully deployed kubernetes certificates to Cluster nodes
INFO[0031] [reconcile] Reconciling cluster state
INFO[0031] [reconcile] Check etcd hosts to be deleted
INFO[0031] [reconcile] Check etcd hosts to be added
INFO[0031] [reconcile] Rebuilding and updating local kube config
INFO[0031] Successfully Deployed local admin kubeconfig at [./kube_config_cluster.yml]
INFO[0031] [reconcile] host [10.128.54.0] is active master on the cluster
INFO[0031] [reconcile] Reconciled cluster state successfully
INFO[0031] Pre-pulling kubernetes images
...
INFO[0038] [addons] Saving ConfigMap for addon rke-ingress-controller to Kubernetes
INFO[0038] [addons] Successfully saved ConfigMap for addon rke-ingress-controller to Kubernetes
INFO[0038] [addons] Executing deploy job rke-ingress-controller
INFO[0038] [ingress] ingress controller nginx deployed successfully
INFO[0038] [addons] Setting up user addons
INFO[0038] [addons] no user addons defined
INFO[0038] Finished building Kubernetes cluster successfully

At this point we should have a cluster up and running and a few new files will have been generated – cluster.rkestate and kube_config_cluster.yml. In order to perform future updates against the cluster you need to preserve the cluster.rkestate file otherwise rke wont be able to properly interact with the cluster.

We can run some basic commands to ensure that the new cluster is up and running and then move on to installing the Rancher UI:

$ export KUBECONFIG=$(pwd)/kube_config_cluster.yml
$ kubectl get nodes
NAME      STATUS   ROLES                      AGE   VERSION
rancher   Ready    controlplane,etcd,worker   21d   v1.17.2
$ kubectl get po -n kube-system
NAME                                      READY   STATUS      RESTARTS   AGE
canal-p7jgr                               2/2     Running     2          21d
coredns-7c5566588d-hrhtr                  1/1     Running     1          21d
coredns-autoscaler-65bfc8d47d-fz285       1/1     Running     1          21d
metrics-server-6b55c64f86-mq99l           1/1     Running     1          21d
rke-coredns-addon-deploy-job-7vgcd        0/1     Completed   0          21d
rke-ingress-controller-deploy-job-97tln   0/1     Completed   0          21d
rke-metrics-addon-deploy-job-lk4qk        0/1     Completed   0          21d
rke-network-plugin-deploy-job-vlhvq       0/1     Completed   0          21d

Assuming everything looks similar you should be ready to proceed.

Installing the Rancher UI

One prerequisite to installing Rancher UI is cert-manager presuming you are not using are not bringing your own certs or using letsencrypt. Thankfully, the installation process is just one command:

$ kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.14.1/cert-manager.yaml

And to check that it is working, make sure all the pods come up ok:

$ kubectl get pods --namespace cert-manager
NAME                                      READY   STATUS    RESTARTS   AGE
cert-manager-64b6c865d9-kss6c             1/1     Running   0          21d
cert-manager-cainjector-bfcf448b8-q98q6   1/1     Running   0          21d
cert-manager-webhook-7f5bf9cbdf-d66k8     1/1     Running   0          21d

Now we can install Rancher via helm:

$ helm install rancher rancher-latest/rancher \
  --namespace cattle-system \
  --set hostname=rancher.lab.example.com

And wait for the deployment to roll out:

$ kubectl -n cattle-system rollout status deploy/rancher
Waiting for deployment "rancher" rollout to finish: 0 of 3 updated replicas are available...
deployment "rancher" successfully rolled out

Presuming all went according to plan (and you configured DNS accordingly to point at your nodes) – the Rancher UI should now be available at the domain you configured.

After setting up the admin account, you should be able to sign in and view your new cluster:

Installing KubeCF

KubeCF is currently deployed in two parts – the cf-operator and the kubecf deployment which leverages the cf-operator to turn a traditional manifest into K8s native spec.

Another quick note about versions:

The versions of the cf-operator and KubeCF are very important. As of this writing there is not a matrix of operator compatability but the examples provided below have been tested to work. The release notes for KubeCF reference the version of the cf-operator that particular version was tested with. For example, the release for KubeCF we are deploying can be found here has cf-operator v3.3.0 as listed under dependencies.

So let’s start with deploying cf-operator via helm:

$ kubectl create namespace cf-operator
$ helm install cf-operator \
  --namespace cf-operator \
  --set "global.operator.watchNamespace=kubecf" \
  https://s3.amazonaws.com/cf-operators/release/helm-charts/cf-operator-3.3.0%2B0.gf32b521e.tgz

After deploying, there should be two pods created in the cf-operator namespace. We should check to make sure they are both up and ready (STATUS=Running) before deploying KubeCF:

$ kubectl -n cf-operator get pods
NAME                                      READY   STATUS    RESTARTS   AGE
cf-operator-69848766f6-lw82r              1/1     Running   0          29s
cf-operator-quarks-job-5bb6fc7bd6-qlg8l   1/1     Running   0          29s

Now it’s time to deploy KubeCF, for this environment we are going to deploy with the defaults with the exception of using Eirini for application workloads. For more information regarding the different deployment options and features of KubeCF, check out our previous blog here.

$ helm install kubecf \
  --namespace kubecf \
  --set system_domain=system.kubecf.example.com \
  --set features.eirini.enabled=true \
  https://github.com/cloudfoundry-incubator/kubecf/releases/download/v1.0.1/kubecf-v1.0.1.tgz

After running the helm deploy, it’ll take a few minutes to start spinning CF pods in the kubecf namespace. We can then watch the pods come up and wait for them all to have a ready status – on average it should take between 20-45 minutes depending on the options you selected and the specs of cluster you are deploying to. You may see some of the pods failing and restarting a few times as the cluster comes up as they are waiting for different dependencies to be available.

$ watch kubectl get po -n kubecf
NAME                               READY   STATUS      RESTARTS   AGE
ig-kubecf-51d0cf09745042ad-l7xnb   0/20    Init:4/37   0          3m11s
kubecf-database-0                  2/2     Running     0          3m24s

While this is deploying, check out the IP address associated with the kubecf-router-public loadbalancer and add a wildcard DNS record for the system_domain you specified above as well as any additional application domains:

$ kubectl get svc -n kubecf | grep -i load
kubecf-cc-uploader           ClusterIP      10.43.196.54    <none>          9090/TCP,9091/TCP
kubecf-router-public         LoadBalancer   10.43.212.247   10.128.54.241   80:32019/TCP,443:32255/TCP
kubecf-ssh-proxy-public      LoadBalancer   10.43.174.207   10.128.54.240   2222:31768/TCP
kubecf-tcp-router-public     LoadBalancer   10.43.167.176   10.128.54.242   80:30897/TCP,20000:30896/TCP,20001

The final deployed state should look like the following:

$ kubectl get po -n kubecf
NAME                               READY   STATUS    RESTARTS   AGE
kubecf-adapter-0                   4/4     Running   0          24m
kubecf-api-0                       15/15   Running   1          24m
kubecf-bits-0                      6/6     Running   0          23m
kubecf-bosh-dns-59cd464989-bh2dp   1/1     Running   0          24m
kubecf-bosh-dns-59cd464989-mgw7z   1/1     Running   0          24m
kubecf-cc-worker-0                 4/4     Running   0          23m
kubecf-credhub-0                   5/6     Running   0          24m
kubecf-database-0                  2/2     Running   0          36m
kubecf-diego-api-0                 6/6     Running   2          24m
kubecf-doppler-0                   9/9     Running   0          24m
kubecf-eirini-0                    9/9     Running   0          23m
kubecf-log-api-0                   7/7     Running   0          23m
kubecf-nats-0                      4/4     Running   0          24m
kubecf-router-0                    5/5     Running   0          23m
kubecf-routing-api-0               4/4     Running   0          23m
kubecf-scheduler-0                 8/8     Running   0          23m
kubecf-singleton-blobstore-0       6/6     Running   0          24m
kubecf-tcp-router-0                5/5     Running   0          24m
kubecf-uaa-0                       7/7     Running   6          24m

Logging Into CF

Assuming you have the CF CLI already installed, (see this if not), you can target and authenticate to the Cloud Foundry deployment as seen below, remembering to update the system_domain to the one you deployed with:

$ cf api --skip-ssl-validation "https://api.<system_domain>"
$ admin_pass=$(kubectl get secret \
        --namespace kubecf kubecf.var-cf-admin-password \
        -o jsonpath='{.data.password}' \
        | base64 --decode)
$ cf auth admin "${admin_pass}"

Pushing a Test Application

Now that our new foundation is up and running, it’s time to test it by adding a space and pushing an application. Let’s start by creating the system space within the system org.

$ cf target -o system
$ cf create-space system
$ cf target -s system

Now the app we will be deploying is called cf-env, it is a simple application used for debugging/testing. It displays its running Environment and HTTP Request Headers.

To deploy it, clone the repo and push it to the new foundation

$ git clone git@github.com:cloudfoundry-community/cf-env.git
$ cd cf-env
$ cf push -n test

The first deployment usually takes a couple minutes to stage and start running, but after the app comes up you should be able to visit http://test.<system_domain> and see output similar to the following.

Running Smoke Tests

KubeCF uses cf-deployment under the hood as the blueprint for deploying Cloud Foundry. Inside of cf-deployment you can run “smoke-tests” to run non-destructive validation that your Cloud Foundry deployment is in a happy state.

To run the smoke-tests at any time run a simple kubectl patch command to invoke the smoke tests:

$ kubectl patch qjob kubecf-smoke-tests --namespace kubecf --type merge --patch '{ "spec": { "trigger": { "strategy": "now" } } }'

In v4 of the cf-operator, replace kubecf-smoke-tests with smoke-tests.

This will create a new job and pod each prefixed with kubecf-smoke-tests-*.

There are a few containers which will spin up in the pod, if you tail the logs on the smoke-tests-smoke-tests container you will see the logs:

$ k logs kubecf-smoke-tests-4078f266ae3dff68-rdhz4 -c smoke-tests-smoke-tests -n kubecf -f
Running smoke tests...
Running binaries smoke/isolation_segments/isolation_segments.test
smoke/logging/logging.test
smoke/runtime/runtime.test
[1585940920] CF-Isolation-Segment-Smoke-Tests - 4 specs - 4 nodes SSSS SUCCESS! 29.974196268s
[1585940920] CF-Logging-Smoke-Tests - 2 specs - 4 nodes S• SUCCESS! 1m56.090729823s
[1585940920] CF-Runtime-Smoke-Tests - 2 specs - 4 nodes S• SUCCESS! 2m37.907767486s
Ginkgo ran 3 suites in 5m4.100902481s
Test Suite Passed

Adding KubeCF Namespaces to a Rancher Project

Now that the foundation is happily running, it’s time to add it to a Rancher project for ease of visibility and management. Rancher projects allow you to group a collection of namespaces together within the Rancher UI and also allows for setting of quotas and sharing of secrets across all the underlying namespaces.

From the cluster dashboard, click on Projects/Namespaces.

As you can see from the following image, the three KubeCF namespaces ( kubecf, kubecf-eirini, and cf-operator ) all do not currently belong to a project. Let’s fix that, starting by selecting Add Project.

For this deployment we are just going to fill in a name, leave all of the other options as default, and click Create.

Then from the Projects/Namespaces screen, we are going to select the three KubeCF namespaces and then click Move.

Select the new project you just created and confirm by selecting Move. At this point, the namespaces are added to your new project and their resources can now be easily accessed from the UI.

At this point, your new foundation on top of RKE is ready to roll.

Written by:
Ashley Gerwitz

Marketing Manager at Stark & Wayne and Qarik