Konstantin Kiess, Author at Stark & Wayne

Quake Speedrun Recap Level 2: Argo Applications and Workflows

Konstantin Kiess — Tue, 11 Feb 2020 15:35:00 +0000

Now that we have everything deployed, let's explore and understand Argo and what it can do for you.

Understanding Applications:

An Argo Application is a Kubernetes Custom Resource Definition. It is used by Argo to provide a wrapper around a few other tools it is able to use. We'll start with the common settings for all Application Types (Helm, Kustomize, JSonnet/kSonnet, Kube-Specs).

apiVersion: argoproj.io/v1alpha1kind: Application
metadata:
  name: quake-external-dns
  # You'll usually want to add your resources to the argocd namespace.
  namespace: quake-system
  # Add a this finalizer ONLY if you want these to cascade delete.
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  syncPolicy:
    automated:
      # Specifies if resources should be pruned during auto-syncing ( false by default ).
      prune: true
      # Specifies if partial app sync should be executed when resources are changed only in target Kubernetes cluster and no git change detected ( false by default ).
      selfHeal: true
  # Ignore differences at the specified json pointers
  ignoreDifferences:
  - group: apps
    kind: Deployment
    jsonPointers:
    - /spec/replicas
  # DEPLOY ON SELF
  destination:
    server: https://kubernetes.default.svc
    namespace: quake-system
  # The project the application belongs to.
  project: default

The Metadata field contains the actual Application Name and the Namespace where the Application CRD Resource (not what it deploys) lives. The reason is that ArgoCD can deploy Applications on remote clusters as well via the spec.destination, just continue reading :).

It also contains a finalizer which tells Kubernetes Garbage Collection to clean up the Resources that got deployed by the Argo Application Controller via the App-CRD.
The bulk of settings, as with all Kubernetes Resources, is within the "spec" field. The SyncPolicy allows you to control how the App-CRD will handle deploying it's resources.

To illustrate, I deactivated autosync and updated the "dex.name" field via the Parameters Tab in AppDetails ("dex-server" -> "dex-server-argo") value for ArgoCDs Dex:

The Application is now marked "OutOfSync" and the UI shows us exactly which resources are affected. If we click on the "Sync"-Button the UI will open the Sync Sidebar to let us choose exactly what should be done. This is very useful if you're still finding the right Config or want to know where your changes would propagate.

ArgoCD will also tell you whether the change can be done "in-place" or whether the resource needs to be "pruned" (deleted) first.

Sometimes Kubernetes Resources will get extra fields set after you run "kubectl apply." This could potentially confuse ArgoCD and put you in a sync loop. Prime examples would be active "Mutating Webhooks" within your Kube-Cluster or an Autoscaling Solution. To avoid this, ArgoCD lets you specify the "ignoreDifferences" block. Our example contains a jsonPointer to "spec.replicas" to avoid resetting the amount of replicas in case they were changed manually or via some other mechanism within Kubernetes.

Some distributed deployments need to be done in sequence (if they cannot deal with eventual consistency). ArgoCD has you covered. If your stack/app requires a particular sequence to get up and running, use Waves. By annotating your resources, you tell ArgoCD how it should do things. A sync is done in three phases, each phase has five waves.

When Argo CD starts a sync, it orders the resources in the following precedence:
The phase
The wave they are in (lower values first)
By kind (e.g. namespaces first)
By name

Next we see the mentioned destination field. You can register Remote Clusters (meaning not the one where ArgoCD itself is running) with ArgoCD. You can do so by providing a labeled Secret into the ArgoCD Namespace (quake-system in our example).

As indicated by the "project" field, ArgoCD does allow for RBAC and Isolation. Projects also let you define "Sync-Windows" to limit when execution can happen. Maybe you're a night-owl and want your syncs to happen at night, or you do not trust unattended auto-updates. In any case, I don't judge and neither does Argo.

There is much more to explore, but only so many things I can fit in a reasonably long blog-post. ArgoCD Docs provide separate sections for different perspectives: Argo-Ops, Argo-User.

Understanding Workflows:

Workflows is what powers the CI Part in Argo. ArgoCD handles continuous deployments, and workflows complements it. It comes with its own UI, your can access it by port-forwarding to the service

"kubectl -n quake-system port-forward svc/quake-argo-ui 8080:80"

So, what can we do with Workflows? Well, pretty much everything you want. Workflows are somewhat like Init-Containers: they start, they do something, they die. But unlike Init-Containers, they are not scoped to run when one specific Pod starts. They also have functionality to "pass" along build artifacts or parameters, they can run in sequence, parallel, or branch out and do both.

So maybe you want to actually build your Software and package it in a Docker-Image for your Deployment just in time before deploying? Have some tests you want to run against a system you just deployed? Alert on success or failure of something? Create a backup? Initialize your Deployment with Data?

Let's try adding some config to ArgoCD via a Workflow and learn some more about it along the way. Currently, our Application.yml's are stored locally in:

"${REPO_ROOT}/state/argo/${QUAKE_CLUSTER_NAME}.${QUAKE_TLD}"

That is a bit unfortunate as we are the only ones able to access them. But they contain secrets, so just putting them in a publicly accessible Git-Repo is a bad idea. Create a private Repo and push the contents of the above directory, here is a screenshot of mine:

Contents of App-Repo

Our QUAKE terraform already prepared SSH Deploy keys for us:

cd ${REPO_ROOT}/terraform/
terraform output | grep GIT
QUAKE_CLUSTER_GITOPS_SSH_KEY = -----BEGIN RSA PRIVATE KEY-----
QUAKE_CLUSTER_GITOPS_SSH_PUB = ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC1vMubU/mZTpNI2BYbC+jG6I1eLerwtPSIZ00E0KokzfLOOjqmxqVwg2qVFhRQ4beAj4Mpg1/F7FO4rOZs0weStWt0xxHPqN81MiPKF0CZZYWG3lnLOsw+ivfJ45wrZutVCE71bVfonqrITKVYY6S2y7K5ic8JIOFMc1JGLweiKPoEfHH74VoG3x9ffIo+CXr06wZTzWePU39PdRzfi42xXyw9e3A2L7bQ9/2VpFylkUvNbiSxAKfU+RiBtZZsBhG/aV5a1GtTo2wnaYfZ3ty/GEwitR9IpfwsUNr1l/2aaRHaCVqACoXGThhhtwPlBL3Rnvl9Ivf1vOIhM6r1r7+l

We need to add the Public-Key as a Deploy-Key to our Git-Repo:

Once your Repo is configured we can use:

quake -w -u "git@github.com:username/reponame.git"

To create the workflow.yml, the quake-CLI will let you know where it was saved. Now, just apply the workload:

kubectl -n quake-system apply -f ${REPO_ROOT}/workflow/my-argo-workflow.yml

You can now see two things happen. First the Argo Workflow UI should look like this:

And second, once the workflow executed, you will have registered a private Git-Repo with ArgoCD:

Workflow Spec:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: config-argo-workflow
  generateName: config-argo-steps-
spec:
  serviceAccountName: argo
  entrypoint: configure
  templates:
  - name: configure
    # Instead of just running a container
    # This template has a sequence of steps
    steps:
    # BLOCK A RUN IN PARALLEL
    - - name: write-secret           # double dash => run after previous step
        template: write-file
        arguments:
          parameters:
          - name: type
            value: secret
          - name: value
            value: |
              apiVersion: v1
              kind: Secret
              metadata:
                name: my-gitops-secret
                namespace: quake-system
              type: Opaque
              data:
                sshPrivateKey: SED_PLACE_HOLDER_PRIVATE_KEY_BASE64
      - name: write-patch           # single dash => run in parallel with previous step
        template: write-file
        arguments:
          parameters:
          - name: type
            value: patch
          - name: value
            value: |
              data:
                repositories: |
                  - url: SED_PLACE_HOLDER_REPO_GIT_URL
                    name: argo-apps
                    type: git
                    sshPrivateKeySecret:
                      name: my-gitops-secret
                      key: sshPrivateKey
    # BLOCK A RUN IN PARALLEL END
    # BLOCK B RUN IN PARALLEL
    - - name: create-secret
        template: kubectl-commands
        arguments:
          parameters:
          - name: action
            value: apply
          artifacts:
          - name: my-files
            from: "{{steps.write-secret.outputs.artifacts.my-file-output}}"
      - name: patch-argocd-cm
        template: kubectl-commands
        arguments:
          parameters:
          - name: action
            value: patch
          artifacts:
          - name: my-files
            from: "{{steps.write-patch.outputs.artifacts.my-file-output}}"
    # BLOCK B RUN IN PARALLEL END
    # BLOCK C TEMPLATES FOR ABOVE STEPS
  - name: kubectl-commands
    inputs:
      parameters:
      - name: action
      artifacts:
      - name: my-files
        path: /input.yml
    container:
      image: bitnami/kubectl:latest
      command: [bash, -c]
      args:
      - |
          if [[ "{{inputs.parameters.action}}" == "apply" ]]; then
            kubectl -n quake-system apply -f /input.yml
          elif [[ "{{inputs.parameters.action}}" == "patch" ]]; then
            kubectl -n quake-system patch cm/argocd-cm --patch "$(cat input.yml)"
          fi
  - name: write-file
    inputs:
      parameters:
      - name: value
      - name: type
    outputs:
      artifacts:
      - name: my-file-output
        path: /tmp/output.yml
    container:
      image: alpine:latest
      command: [sh, -c]
      args:
      - |
          if [ "{{inputs.parameters.type}}" == "secret" ]; then
            echo "creating secret"
          elif  [ "{{inputs.parameters.type}}" == "patch" ]; then
            echo "creating patch"
          fi
          echo "{{inputs.parameters.value}}" | tee /tmp/output.yml
    # BLOCK C TEMPLATES FOR ABOVE STEPS END

The above workflow is a very basic example and should give you a good idea about their structure. Workflows can do more than that though. You can implement conditionals, loops, and even recursion.

But, why have we done this? Yes, we could've just configured our Repo and created the Secret via the Helm-Chart of ArgoCD. But that wouldn't give you an example of a real life workflow, would it?

Having ArgoCD hooked up to a private repository gives you a few advantages. You can now create an App of Apps and keep Application.yml's safely and versioned in a Git Repo. Instead of running commands, we just commit and push to our Repo and ArgoCD picks up the changes for us and we can easily roll bock to a previous state by either changing the commit reference in the App of Apps or reverting the commit in Git. Try re-configuring your ArgoCD App for ArgoCD :).

Usually, there are a few ways to achieve a result: Your Platform, your choice. If you're interested in learning about Events and Rollouts, the second Recap for Argo is soon to be released.

Next week we will continue with deploying KubeCF on top of our current Platform to learn about what Cloud Foundry can do for you.

The post Quake Speedrun Recap Level 2: Argo Applications and Workflows appeared first on Stark & Wayne.

Quake Speedrun Level 2: Argo

Konstantin Kiess — Tue, 11 Feb 2020 15:30:00 +0000

Welcome to part two of our QUAKE-Speedrun. This time we will deploy Argo and use it to automate deployments of our additional components on our Kubernetes Cluster to start building out our Platform.

First, let's take a look at the Argo Project Modules to understand what their Job is.

Argo Project:

ArgoCD

“Declarative Continuous Delivery for Kubernetes”

ArgoCD deploys Helm/Kube-Specs/KSonnet/Kustomize units via Argos CustomResourceDefinition Application. It also supports pulling your Applications YAMLs from a Git Repo, comes with an amazing UI capable of dependency lookup (e.g. Deployment > ReplicaSet/StatefulSet/… > Pods), Log & Kubernetes Event streaming included.

Argo-Rollouts

“Argo Rollouts introduces a new custom resource called a Rollout to provide additional deployment strategies such as Blue Green and Canary to Kubernetes. The Rollout custom resource provides feature parity with the deployment resource with additional deployment strategies. Check out the Deployment Concepts for more information on the various deployment strategies.”

Let’s just say we want those things. Who doesn’t like to be in control? Jokes aside, depending on your Workloads, native Blue-Green and Canary are really useful for no downtime deployments.

Argo-Workflows

“Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes. Argo Workflows is implemented as a Kubernetes CRD (Custom Resource Definition).”

If you're familiar with Concourse, think of Jobs/Tasks. In Jenkins terms, it's Stages/Steps. Input/Output artifacts included. Essentially, it’s a way of defining a chain of containers which are run in sequence and are able to pass on/transfer build-artifacts.

Argo-Events

“Argo Events is an event-based dependency manager for Kubernetes which helps you define multiple dependencies from a variety of event sources like webhook, s3, schedules, streams etc. and trigger Kubernetes objects after successful event dependencies resolution.”

Let's deploy Argo

Now that we know what it is supposed to do, let's get it working.

quake --loadout

kubectl create namespace quake-system &> /dev/nulllocal INDEX=0
while true; do
  REPO_URL=$(yq r ${REPO_ROOT}/helm-templates/helm-sources.yml "QUAKE_HELM_SOURCES.${INDEX}.url")
  REPO_NAME=$(yq r ${REPO_ROOT}/helm-templates/helm-sources.yml "QUAKE_HELM_SOURCES.${INDEX}.name")
  if [[ "${REPO_NAME}" == "null" ]]; then
    break
  fi
  helm repo add "${REPO_NAME}" "${REPO_URL}"
  ((INDEX+=1))
done
helm repo update
local INDEX=0
while true; do
  local INSTALL_NAME=$(yq r ${REPO_ROOT}/helm-templates/helm-sources.yml "QUAKE_HELM_INSTALL_CHARTS.${INDEX}.name")
  local CHART_VERSION=$(yq r ${REPO_ROOT}/helm-templates/helm-sources.yml "QUAKE_HELM_INSTALL_CHARTS.${INDEX}.version")
  if [[ "${INSTALL_NAME}" == "null" ]]; then
    break
  fi
  local CHART_NAME=$(echo ${INSTALL_NAME} | sed 's#.*/##')
  local HELM_STATE=${REPO_ROOT}/state/helm/${QUAKE_CLUSTER_NAME}.${QUAKE_TLD}
  mkdir -p ${HELM_STATE} &> /dev/null
  yq r \
    <( kops toolbox template \
       --template helm-templates/helm-values-template.yml \
       --values ${REPO_ROOT}/state/kops/vars-${QUAKE_CLUSTER_NAME}.${QUAKE_TLD}.yml \
     ) \
    ${CHART} ${HELM_STATE}/${CHART}-values.yml
  helm upgrade --wait --install "quake-${CHART_NAME}" "${INSTALL_NAME}" \
    --values ${HELM_STATE}/${CHART}-values.yml \
    --version ${CHART_VERSION} \
    --namespace quake-system
  ((INDEX+=1))
done

First, we parse our helm-sources.yml that contains the info about the Charts, their desired Version, and the respective Helm-Repo where they are hosted.
Then, we interpolate the helm-values-template with the data that
Terraform outputted and create a values.yml containing our merged config items for each Chart we're going to deploy.

Finally, we use `helm upgrade --wait --install` as an idempotent
way of deploying/upgrading our chart with the our fresh out the templating values.

You can now inspect your quake-system namespace by running:

kubectl get all -n quake-system

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/quake-argo-cd-argocd-application-controller ClusterIP 100.68.155.246  8082/TCP 24h
service/quake-argo-cd-argocd-dex-server ClusterIP 100.65.68.36  5556/TCP,5557/TCP 24h
service/quake-argo-cd-argocd-redis ClusterIP 100.69.164.236  6379/TCP 24h
service/quake-argo-cd-argocd-repo-server ClusterIP 100.68.32.95  8081/TCP 24h
service/quake-argo-cd-argocd-server LoadBalancer 100.68.217.5 ...elb.amazonaws.com 80:32606/TCP,443:30637/TCP 24h
service/quake-argo-ui ClusterIP 100.66.143.38  80/TCP 24hservice/quake-external-dns ClusterIP 100.67.57.61  7979/TCP 24h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/quake-argo-cd-argocd-application-controller   1/1 1 1 24h
deployment.apps/quake-argo-cd-argocd-dex-server               1/1 1 1 24h
deployment.apps/quake-argo-cd-argocd-redis                    1/1 1 1 24h
deployment.apps/quake-argo-cd-argocd-repo-server              1/1 1 1 24h
deployment.apps/quake-argo-cd-argocd-server                   1/1 1 1 24h
deployment.apps/quake-argo-events-gateway-controller          1/1 1 1 24h
deployment.apps/quake-argo-events-sensor-controller           1/1 1 1 24h
deployment.apps/quake-argo-ui                                 1/1 1 1 24h
deployment.apps/quake-argo-workflow-controller                1/1 1 1 24h
deployment.apps/quake-aws-alb-ingress-controller              1/1 1 1 24h
deployment.apps/quake-external-dns                            1/1 1 1 24h
NAME DESIRED   CURRENT   READY   AGE
replicaset.apps/quake-argo-cd-argocd-application-controller-6985bf866d 0 0 0 24h
replicaset.apps/quake-argo-cd-argocd-application-controller-85bfd47868 1 1 1 23h
replicaset.apps/quake-argo-cd-argocd-dex-server-6d77949f5d 1 1 1 23h
replicaset.apps/quake-argo-cd-argocd-dex-server-6f969b7bb 0 0 0 24h
replicaset.apps/quake-argo-cd-argocd-redis-5548685fbb 1 1 1 23h
replicaset.apps/quake-argo-cd-argocd-redis-db7bdbf86 0 0 0 24h
replicaset.apps/quake-argo-cd-argocd-repo-server-67bbfb4f5f 1 1 1 23h
replicaset.apps/quake-argo-cd-argocd-repo-server-bb4b8bf79 0 0 0 24h
replicaset.apps/quake-argo-cd-argocd-server-55c6798674 0 0 0 24h
replicaset.apps/quake-argo-cd-argocd-server-576d6ff898 1 1 1 23h
replicaset.apps/quake-argo-events-gateway-controller-65846fc954 0 0 0 24h
replicaset.apps/quake-argo-events-gateway-controller-776fff4c48 1 1 1 23h
replicaset.apps/quake-argo-events-sensor-controller-7954475947 0 0 0 24h
replicaset.apps/quake-argo-events-sensor-controller-84879ffb4f 1 1 1 23h
replicaset.apps/quake-argo-ui-6798554bd 1 1 1 24h
replicaset.apps/quake-argo-workflow-controller-6bdbb96f86 1 1 1 24h
replicaset.apps/quake-aws-alb-ingress-controller-6768b6c6bd 0 0 0 24h
replicaset.apps/quake-aws-alb-ingress-controller-78474fbdfd 1 1 1 23h
replicaset.apps/quake-external-dns-7c8cc75876 0 0 0 24h
replicaset.apps/quake-external-dns-85cbdcf957 1 1 1 23h

Now that we have Argo deployed, let's register our newly deployed components as Argo Applications by running:

quake --template

local CHARTINDEX=0
while CHART=$(yq r ${REPO_ROOT}/helm-templates/helm-sources.yml QUAKE_HELM_INSTALL_CHARTS.$CHARTINDEX.name); do
  if [[ "${CHART}" == "null" ]]; then
    break
  fi
  CHART_NAME=$(echo "${CHART}" | sed 's#./##')
  CHART_REPO_NAME=$(echo "${CHART}" | sed 's#/.##')
  REPO_INDEX=0
  until [[ "$(yq r ${REPO_ROOT}/helm-templates/helm-sources.yml QUAKE_HELM_SOURCES.${REPO_INDEX}.name)" == "${CHART_REPO_NAME}" ]]; do
    ((REPO_INDEX+=1))
  done
  mkdir -p ${REPO_ROOT}/state/argo/${QUAKE_CLUSTER_NAME}.${QUAKE_TLD}
  kops toolbox template \
    --fail-on-missing=false \
    --template ${REPO_ROOT}/argo-apps/helm-app-template.yml \
    --set APP_NAME="quake-${CHART_NAME}" \
    --set APP_NAMESPACE="quake-system" \
    --set APP_HELM_REPO_URL="$( yq r ${REPO_ROOT}/helm-templates/helm-sources.yml QUAKE_HELM_SOURCES.${REPO_INDEX}.url)" \
    --set APP_HELM_CHART_VERSION="$( yq r ${REPO_ROOT}/helm-templates/helm-sources.yml QUAKE_HELM_INSTALL_CHARTS.${CHARTINDEX}.version )" \
    --set APP_HELM_CHART_NAME="${CHART_NAME}" \
    --set APP_VALUES_INLINE_YAML="$( cat ${REPO_ROOT}/state/helm/${QUAKE_CLUSTER_NAME}.${QUAKE_TLD}/${CHART_NAME}-values.yml)"
  > ${REPO_ROOT}/state/argo/${QUAKE_CLUSTER_NAME}.${QUAKE_TLD}/${CHART_NAME}-application.yml
  kubectl apply -f ${REPO_ROOT}/state/argo/${QUAKE_CLUSTER_NAME}.${QUAKE_TLD}/${CHART_NAME}-application.yml -n quake-system
  ((CHARTINDEX+=1))
done

This will create the manifests and register the helm charts we just deployed with Argo, thus making it aware about the deployed Kubernetes Resources it is now tasked with managing.

You can find your Application manifests under state/argo/${QUAKE_CLUSTER_NAME}.${QUAKE_TLD}

Finally, the script uses kubectl apply, because Argo Applications are a Custom Resource Definition and thus can be manipulated via kubectl.

It may take a while for the Argo LoadBalancer DNS to propagate, but you should usually be able to access your Argo Instance @ argo.${QUAKE_CLUSTER_NAME}.${QUAKE_TLD} after around 15mins. If not specified, the default admin password for Argo is the ${Argo_Server_Server_Pod} name. But we actually did generate and specify a password.

cd ${REPO_ROOT}/terraform
terraform output QUAKE_ARGO_PASSWORD

To finish this post off, let us login and look at what we deployed.

Application Overview

Application View

Application Details

Pod Details

As you can see, Argo doubles as a quite resourceful Kubernetes UI. You can access most of the Information required in day to day work via its features. You can test/change values for your Helm Deployments (and other Apps/Resources Argo watches) directly from the UI. Watch your deployments propagate, stream your pod logs or check events on your resources.

If you want to dig deeper into Argo on your own you can check out their App Examples, you have everything in place to play around.

That's it for today.

If you're interested in learning more, check the first Recap Post for more info about ArgoCD and some Rollout Experiments. An additional Recap post for Events and Rollouts is currently in the writing and will soon be release, so make sure to check our Blogs regularly, you'll find many interesting adventures in the cloud space.

As always, thank your for reading and let me know your thoughts in the comments.

If you should find an issue, please create one on GitHub.

The post Quake Speedrun Level 2: Argo appeared first on Stark & Wayne.

Quake Speedrun: Recap Level 1 Terraform and Kops:

Konstantin Kiess — Tue, 04 Feb 2020 15:35:00 +0000

Last time, we successfully finished deploying the first part of our platform. We have bootstrapped an AWS Env via Terraform and deployed a Kubernetes Cluster via Kops. ‌‌‌‌Now that we have the infrastructure in place, let’s go through what we did.

After we’ve provided our credentials for AWS and installed the CLI’s, the command we ran was:

quake --bootstrap

Let’s look at the script to understand what exactly it does.

pushd ${REPO_ROOT}/terraform  terraform init --backend-config="path=${TF_STATE_PATH}" &> /dev/null
  terraform apply -auto-approve
  yq r \
    <( terraform output -json | \
      jq 'to_entries |map ( .value=.value.value ) | from_entries'
    )
  \
  > ${REPO_ROOT}/state/kops/vars-${QUAKE_CLUSTER_NAME}.${QUAKE_TLD}.yml
  for VAR in $(env|grep QUAKE); do
    echo ${VAR} | sed 's#QUAKE_(.*\)=#QUAKE_\1: #' \
    >> ${REPO_ROOT}/state/kops/vars-${QUAKE_CLUSTER_NAME}.${QUAKE_TLD}.yml
  done
popd

The bootstrap is pretty straight forward. After switching the work-directory (pushd/popd) to the Terraform folder, the script initializes the TF project. By default, Terraform will use the "local backend." This means that we're keeping the Terraform-State locally and only our machine is aware of it. If you are working in a team and want to avoid concurrent deployments, you can utilize one of the available "Remote Backends" and "State Locking" mechanisms. If you want more information about available backends, check out their docs to learn about your options. ‌‌Next, we apply our Terraform config by running terraform apply -auto-approve. This will create our TF defined resources in our AWS account.

Terraform organizes Resources hierarchically into Providers. Providers are Terraform Plugins that provide specific functionality (remote API Calls, local binary calls, script execution, etc) in Hashicorp-Config-Language Markup. There are a plethora of available Terraform Providers that implement diverse resource CRUD (Create Read Update Delete) operations for most of your infrastructure/config requirements.

Once Terraform created our resources, we need to be aware about their IDs/Data. Some of our stack down the road will reference/use these resources, e.g. ExternalDNS requires knowledge about the Route53 Zone. Terraform kindly lets us output such values, in our case using the -json flag to change the output format. We're utilizing JQ (a JSON processor) and YQ (a YAML processor inspired by JQ) to reformat the output for our needs. The output JSON data structure initially is a JSON Object containing a Map of JSON Objects with nested keys:

{
  "QUAKE_CLUSTER_CERT_ARN": {
      "sensitive": false,
      "type": "string",
      "value": "arn:....-af3f-831143c6d1c0"
    },
  "QUAKE_CLUSTER_SSH_PUB": {
    "sensitive": false,
    "type": "string",
    "value": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDL/fnGyc5RFfgnjuzTMvxBM0eyhG+KdK6MhBDKdYWGfmW0e+qHcR9kbvlOx+oFc22b40m9tCU8AJeO248cpcwQacjwyXgnUXJgnpu+DoRf8d5znUSVW6cSm1fhTLXFPHnD9xb6cAw2 4oxP571KoygH9X7/YXj24c3TmhKBz2u7SvWFkyeYGD8FZmCGKblGbQhriqf/Sn09TSQEAtCrK6hLAJfkzdYwEQsUHD3vWGYY2bCIuRNDXUpXR6U036Mx+WHw/+ZPU59J6BY712Wi1ZBSCz2kwhW730w2Qnj+JyPn7tkF86rMblIOf6zF/ra/Jgv3/Bh+yun4ga6YXN1Opu5L\n"
  },
  ...
}

Since we do not need the extra info fields "type" and "sensitive," let's remove them. We're going to map the contents of the field "value" to the parent key, this will drop the not required values and simplify the structure. The output first gets piped into JQ resulting in:

{
  "QUAKE_CLUSTER_CERT_ARN": "arn:....-af3f-831143c6d1c0",
  "QUAKE_CLUSTER_GITOPS_SSH_PUB": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC1vMubU/mZTpNI2BYbC+jG6I1eLerwtPSIZ00E0KokzfLOOjqmxqVwg2qVFhRQ4beAj4Mpg1/F7FO4rOZs0weStWt0xxHPqN81MiPKF0CZZYWG3lnLOsw +ivfJ45wrZutVCE71bVfonqrITKVYY6S2y7K5ic8JIOFMc1JGLweiKPoEfHH74VoG3x9ffIo+CXr06wZTzWePU39PdRzfi42xXyw9e3A2L7bQ9/2VpFylkUvNbiSxAKfU+RiBtZZsBhG/aV5a1GtTo2wnaYfZ3ty/GEwitR9IpfwsUNr1l/2aaRHaCVqACoXGThhhtwPlBL3Rnvl9Ivf1vOIhM6r1r7+l\n",
  ...
}

Then we use a File Descriptor to read the contents into YQ. This is just a convenient way to reformat the JSON into YAML:

QUAKE_CLUSTER_CERT_ARN: "arn:....-af3f-831143c6d1c0"
QUAKE_CLUSTER_GITOPS_SSH_PUB: |
  ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC1vMubU/mZTpNI2BYbC+jG6I1eLerwtPSIZ00E0KokzfLOOjqmxqVwg2qVFhRQ4beAj4Mpg1/F7FO4rOZs0weStWt0xxHPqN81MiPKF0CZZYWG3lnLOsw+ivfJ45wrZutVCE71bVfonqrITKVYY6S2y7K5ic8JIOFMc1JGLweiKPoEfHH74VoG3x9ffIo+CXr06wZTzWePU39PdRzfi42xXyw9e3A2L7bQ9/2VpFylkUvNbiSxAKfU+RiBtZZsBhG/aV5a1GtTo2wnaYfZ3y/GEwitR9IpfwsUNr1l/2aaRHaCVqACoXGThhhtwPlBL3Rnvl9Ivf1vOIhM6r1r7+l

Finally, the output gets written into the state directory.‌‌‌‌The vars file is then used to provide a KOPS Cluster Template with our values by our next command:

quake --deploy

kops toolbox template \
  --template ${REPO_ROOT}/kops-templates/cluster_tpl.yml \
  --values ${REPO_ROOT}/kops-templates/cluster_defaults.yml \
  --values ${REPO_ROOT}/state/kops/vars-${QUAKE_CLUSTER_NAME}.${QUAKE_TLD}.yml \
> ${REPO_ROOT}/state/kops/full-manifest-${QUAKE_CLUSTER_NAME}.${QUAKE_TLD}.yml
kops replace -f ${REPO_ROOT}/state/kops/full-manifest-${QUAKE_CLUSTER_NAME}.${QUAKE_TLD}.yml --force
kops update cluster ${QUAKE_CLUSTER_NAME}.${QUAKE_TLD} --yes

As you can see, we only need three commands to get a Kubernetes cluster up and running. We simply interpolate our cluster template with the defaults and our generated config file using the KOPS template subcommand.‌‌ Kops actually does not create VMs directly, but it creates AutoscalingGroups and LaunchTemplates for our instances. These are then picked up by AWS to create the actual EC2 instances. ‌‌‌‌‌‌

While not in use within this project to execute the actual deployment, you might be interested to know that KOPS does not require to manage the AWS resources "directly." You can outsource the creation of AWS Objects to Terraform. If you're already used to and well experienced with Terraform, this might simplify your operations. Try creating the files with:

quake -d -o tf

But back to the KOPS CLI flow. To be able to use the same sequence of commands to create and update the cluster config idempotently, we're utilizing a combination of KOPS replace & KOPS update. Since we used an existing template (cluster/instancegroups), we did not have to run any of the "kops create cluster/instancegroup" commands.

Templates are useful once you want to stage your environment, if you want to quickly add or remove functionality/config to a cluster without relying on manually editing clusters config files.

While the above script will deploy a cluster, to manipulate an existing cluster we need to understand a bit about technical handling of updates for ASGs within AWS. Changing or Updating a ASG resource, might not automatically redeploy the underlying instances. Essentially, AWS would only rollout new instances if the accompanying LaunchTemplate changes.

Most of the time, our LaunchTemplates won't change, e.g. if we are trying to use a newer BaseImage for our Clusters.‌‌ If you want to trigger the deployment process after you updated your cluster config, you will need to use "kops rolling-update cluster." ‌‌‌‌The rolling-update subcommand is somewhat "blocking." While KOPS will stream info about the current deployment process, it "only" pulls info from Kube/AWS status of the requested work.

One of the biggest issues I found with this is, that it can quickly eat your AWS API Request Quota if you're deploying multiple environments into one account. KOPS will validate draining the current/old node, as well as do validation for the new replacement once it comes up. You can switch to "asynchronous" behaviour by adding the "--cloudonly" flag, or just skip validation to speed up the process.
Unfortunately, in some situations, this ends up being quite confusing as some upgrade cases would require more in-depth control than KOPS "shelling" out to upstream functionality offers. Luckily, the KOPS community provides extensive docs with workarounds and mitigations on such cases.

One of the resources our TF Config created was the KOPS State Store S3 Bucket. KOPS utilizes this Bucket to cache the "live" Cluster Manifests, Kubernetes ETCD State Backups, Certificates, Cluster Secrets etc. Simply put: "It contains everything KOPS needs to operate the clusters defined in that bucket." Additionally, this bucket will contain your regular ETCD-Backups, so treat it with the respect it deserves.
While the Docs say otherwise, currently there is no other backing mechanism than buckets. Thus, this bucket should never be public and also should be encrypted as it contains sensitive data.‌‌‌‌

KOPS deploys the Kubernetes Control Plane (ETCD, Kube-API & the (Calico-)CNI-Pods) on the master nodes. These Pods are running as privileged containers.
For ETCD it utilizes ETCD-MANAGER to deploy the two required ETCD-Clusters (etcd-main, etcd-events) for your Kubernetes. The state for ETCD is persisted on EC2 Volumes that get mounted via master nodes and made available to the respective Pods. Furthermore etcd-manager will create regular backups of your ETCD State into the $KOPS_STATE_STORE Bucket. This is quite useful as this already covers most of Kubernetes' Backup needs from a Control Plane perspective.

Combine this with a proper deletion policy (meaning volumes should not be instantly deleted, but rather marked for deletion) on your Persistent Volume Claims for your StatefulSets and you should be able to recover even your workloads from most failures.

Another interesting, though somewhat feature-lacking, feature of KOPS are Addons. While easy to apply and some integrated out of the box, the choice of available addons is quite small and unfortunately KOPS uses the channels tool to deploy these.

This happens in the cloud-init lifecycle phase of the deployed EC2 machine and before the Kube-API is available. Thus, you cannot stream logs from inside those containers via KUBECTL and you will need to SSH onto the master nodes if something isn't going as it's supposed to.

Thus, debugging broken/misbehaving addons can be quite a pain depending on the Topology of your deployment.

Especially, if the addon in question can lead to cascading failures down the road (I'm looking at you kube-metrics). While I initially used some of the addons for convenience, I eventually moved to HELM/ARGO to manage additional platform components. Upstream Helm Charts in combination with ArgoCD provide more control, more features, more components, and a more widely used toolset. ‌‌‌‌

Last notes on KOPS:

I've hit several Issues relating to tags with KOPS on AWS. The first outage was caused by the ETCD instances not being able to build/join a cluster on create/update of an environment. While our investigation into the matter surfaced an issue with our environment-deletion/cleanup process, the information is good to have none the while.

The root cause for out deployment issues wa
s that the EC2 volumes previously attached to the master nodes were not properly removed/cleaned. This caused multiple volumes with the same tags being discovered and mounted by KOPS. Unfortunately, this also behaved somewhat flaky, as depending on which volume got discovered & mounted first, the deployment would either work or fail. Since this happens in the cloud-init stage, there were no warnings available in the KOPS CLI output / our logs.

The second issue appeared when we updated environments from 1.14 to 1.15. The master nodes took an unreasonable amount of time to come up and most times would require manual restarting of the kubelet service on the master nodes. While doing the research on this, I've found info from other users that looked similar:

We have seen intermittent issues (less than 1%) with the canal CNI not loading (empty /etc/cni/net.d) on startup but have never gotten to a root cause/fix beyond "kick it until it starts".

The CNI config (created by an init container of Calico) was nowhere to be found and we were not able to find the reason for a few days.

We started going through old issues on Github that looked similar. That research uncovered a workaround posted in an issue that had the same symptoms as our deployments (turns out that Spot-Instances may get their tags propagated somewhat later than normal EC2-Instances). While we did not use Spot-Instances, it perfectly described our experience at the time.

Thus, I tried to run the mentioned workaround and noticed that we're getting 401's on requests trying to list the tags. Doing further research we found KOPS' default policies only contain "Autoscaling:DescribeTags" permissions but not "EC2:DescribeTags." After we added the EC2 policy to our IAM-Roles, everything started to deploy faster and more reliable. Workaround not necessary. This is something I'm still planning to follow up as the KOPS code at least in some places suggests to use EC2 Describe Tags.

To sum it up: I hope my failures will prevent yours :)

Thanks for reading and stay tuned for the next part of the QUAKE Series about deploying and using Argo to implement a continuous delivery and a few extra features for our platform.

The post Quake Speedrun: Recap Level 1 Terraform and Kops: appeared first on Stark & Wayne.

Quake Speedrun Level 1: Kops

Konstantin Kiess — Tue, 04 Feb 2020 15:30:00 +0000

Introdution:

A follow along tutorial to build your own Platform. This was inspired by the gaming speedrun community.

A speedrun is a play-through, or a recording thereof, of a whole video game or a selected part of it (such as a single level), performed with the intention of completing it as fast as possible. While all speedruns aim for quick completion, some speedruns are characterized by additional goals or limitations that players subject themselves to...

So our speedrun will deploy a Platform based on Kubernetes with a CI/CD Solution using these major Components:

[ "Argo", "Kops", "Cloud Foundry": { "Quarks" , "Eirini" } ]

To avoid confusion, this is Part/Level I. After getting an Introduction into the utilized Stack, this first part will let you deploy Terraform based Infrastructure and a Kops based Kubernetes Cluster. Watch out for the continuation of this series to learn about Argo in Part II & III, and lastly Cloud Foundry, Quarks & Eirini in Part IV.

Going out to customers in the wild, you learn that: just "Kubernetes" is not what people want. What people want is to be running on a Kubernetes that is well integrated and provides extra functionality. Being able to create LoadBalancers, DNS, Data Service Instances is the first half of the story, the second half of it is automation and management. Functionality and Automation are integral parts of a well designed developer experience.

So, in reality we are dealing with more than just Kubernetes. The result of these requirements is what is usually called a Platform. This series will take you through all steps of building and automating a platform. It will also give you an introduction into some widely used tools and approaches to automate their usage, as well as how to tie all of it together.

Every Level Post is accompanied by an "In Depth Recap" which will elaborate on what we did. Networks, Firewalls & Certs on a particular IaaS can be created via native tools (UI, CLIs, Markup like CloudFormation/Openstack Heat) or third party abstractions (e.g. Terraform).

Following this series we will use AWS and Terraform to bootstrap our base Infrastructure and Cluster. Continuing into the series we automate the deployment and operations of our Platform Components via Argo and Helm. Lastly, we will deploy CloudFoundry on top of Kubernetes via Quarks & Eirini to investigate the additional abstractions/benefits provided by CloudFoundry compared to pure Kubernetes.

As the scripts are still undergoing change, I recommend pulling often :).

System Requirements:

We assume git and direnv are present on your system, because why would you not have that already. To start our journey, we need to be aware about a few projects and what they will be used for.

Quake-installer:

Your little helper. It contains Scripts, a Terraform Project, Kops Manifest Templates, Helm Configs for require Platform Components. Quake comes pre-wired and with a little bit of GitOps-Attitude.

Kops

“kops helps you create, destroy, upgrade and maintain production-grade, highly available, Kubernetes clusters from the command line. AWS (Amazon Web Services) is currently officially supported, with GCE and OpenStack in beta support, and VMware vSphere in alpha, and other platforms planned.”

Kops is little helper around creating VM based (as opposed to managed) Kubernetes Clusters. It’s able to orchestrate Rolling-Updates, it contains config options for various Add-ons (CNIs, External-DNS, … ) and a templating engine. It can also output the config files to create TF based clusters if you wish.

AWS Account Access:

You will also need access to an AWS Account with appropriate Roles/Permissions to create Route53, EC2, AutoScaling, S3, IAM, and VPC Resources. For the sake of the tutorial, it's probably easiest to run this with admin access to an AWS Account. Diving into tying down AWS Policies could use its own blog post.

Installation:

We're going to utilize the quake-installer Repo to bootstrap our AWS Account. This will take care of most things like installing CLIs, running Terraform etc. You can use it as a toolbox / building base for your own Platform and example library for automation.

It will create the required Terraform resources for KOPS to take over. Once that is done, we will create our Cluster Manifest for Kops, and finally deploy the Cluster. All generated manifests/yamls/state files will be available in the state folder of your cloned repo. This way you can experiment with the base tools that we use outside of the tutorial.

Deploy a base Environment:

You're about to edit two config files and run three commands. We will bring up a Kubernetes Cluster with a 3 Master HA and autoscalable Worker ( 1 Node on start ) Setup. Additionally, Calico is installed as a CNI. Your API will be exposed behind a LoadBalancer with proper TLS Certs. You'll finish with a working, albeit empty Kubernetes Cluster as your base for the upcoming Posts.

git clone https://github.com/nouseforaname/quake-installercd quake-installer
#edit .awsrc and provide your CREDS/REGIOn
#edit configs/settings.rc and provide your QUAKE_TLD and CLUSTER_NAME
#run
direnv allow

QUAKE_TLD should be a domain your AWS Account will have access to. This is mostly related to DNS Setup and Certificate Validation. I used 'kkiess.starkandwayne.com' and then configured our CloudFlare DNS to the hosted zone that will get created later on (You do not need to take care of this yet)
QUAKE_CLUSTER_NAME is an arbitrarily chosen HOSTNAME Valid String. I called mine "cf-kube."

CLI Installation:

The next step is to install all required CLIs/Binaries.

If you do not have trust issues, you can run/use the quake installer to simplify downloading required binaries and to avoid version issues.

It will run the subscript cli-install. It's as easy as running:

quake --install

You'll need to have these CLIs:
[ "Kops", "YQ", "JQ", "Terraform", "Git", "ArgoCD", "Argo", "Helm" ]

Base Infrastructure:

Now we need to deploy the TF Stack. It runs the subscript bootstrap

quake --bootstrap

It will also output a config file under that contains the relevant Terraform outputs in YAML format:

ls state/kops/vars-..yml

This is where I unfortunately cannot guide you. You will need to Delegate the DNS for your Chosen QUAKE_TLD to the AWS Hosted Zone that just got created. You can find the AWS Docs for this here. Start from "Updating Your DNS Service with Name Server Records for the Subdomain," the rest was just done by Quake.

Cluster Templating & Deployment:

With our fresh TF Resources on AWS we're ready to deploy our actual Kubernetes Cluster. The deploy script includes interpolating the base cluster teamplate with the defaults and our vars.yml.

installers/quake --deploy

With that, we should have everything in place for the next blog post, which will be released in a few days. You can already start using your cluster by running:

#Double Check that your $KOPS_STATE_STORE variable was set properly by the Scripts
echo $KOPS_STATE_STORE
#it should output s3://-state
#if it is not set, running "direnv allow" again or setting it manually should fix it
kops get clusters
#this will output the deployed cluster, you can copy the name from there or run
kops export kubecfg $QUAKE_CLUSTER_NAME.$QUAKE_TLD

Your kubectl should now be setup to access the cluster but most probably it's still booting. Run the below command and wait until all nodes are shown and marked as "Ready."

watch kubectl get nodes

And that's it for today. Stay tuned for the next Post about deploying Argo on our Cluster.

You can continue reading about KOPS here or come back next week to continue with the next part where you'll deploy Argo.

Thanks for reading and let me know your thoughts in the comments.
If you should find an issue, please create one on GitHub.

The post Quake Speedrun Level 1: Kops appeared first on Stark & Wayne.

Container to Container Networking for Cloud Foundry and Kubernetes

Konstantin Kiess — Fri, 08 Feb 2019 13:51:25 +0000

Part Two: Approaching the problem without rewriting existing code

Welcome back to the second part of SILK for CFCR. If you just stumbled upon this, it might be a good idea to check out the first blog-post explaining the project.
For you TL;DR; kind of people:
Here is the ci, here are the wrappers, and finally here is the customized cfcr-release.

As with every new codebase you touch: In the beginning, there was confusion. We quickly realized that we're not just going to change a few lines to make it work. Furthermore, we're going to have to touch code in several places and projects to achieve our goal.

Baby steps...

A few things were clear and a good place to start:

we need a working Cloud Foundry
a working CFCR.

Once again we combined Bosh-BBL and BUCC to quickly deploy our infrastructure and base environments. We created two pipelines: CF and CFCR and let Concourse take care of the rest.

Hacking it into obedience, it's just a CNI they said...

This is where it gets complicated and where we hit the first bumps in the road. We knew we're working on exchanging a CNI named Flannel in our CFCR-deployment with a CNI named SILK that is used on DIEGO-Cells. To be honest, that was about all we knew...

To get an idea, we started searching for keywords: network/cni/flannel in the releases and deployment manifests we already used: cf-deployment; cf-networking-release; docker-release; kubo-release. While narrowing our scope, this was still plenty.

As a first milestone, we had to find how and where Flannel is utilized within CFCR and how we configure a CNI in Kubernetes and Docker. We focused our research in two directions: What would Flannel do and any mention of a CNI?

What would Flannel do...

We looked at the Flannel specific code in the CFCR-Repo and the Flannel-Docs. By approaching the problem with a solution, we could easily identify the key points we had to figure out and it gave us a roadmap to follow.

First, we found that Flannel writes a config file within its jobs-ctl. This config is part of the CNI Specification. It is written into a Kubernetes default path for CNI configs, thus our Vanilla-CFCR picked it up without any extra config. A CNI-Config contains the field 'type' and the value of that field maps to a (CNI) binary name. This way Kubernetes (and every other CNI Consumer) know what to call. Next, we found that the BOSH Docker-Release has Flannel specific settings which we deactivated. This basic info on how Flannel hooks into Kubernetes/Docker was enough to start digging into how SILK hooks into DIEGO/Garden.

So far for our initial research.

A CNI for Kubernetes...

Our next goal was to be able to reconfigure CFCR with a different CNI config. We updated our kubo-release to include the required (CNI-related) config options that we found in the Kubernetes documentation. To be able to configure SILK we needed to add parsing for a few Kubelet flags:

cni-bin-dir, cni-conf-dir (to be able tJust give me an IP Address already ¶
At this point the basics were in place and we concentrated on "just getting an IP" for our Pods. We kept it simple and just reused the SILK CNI-Config we found on the DIEGO-Cells and pointed Kubelet to our SILK binaries and config. To no ones surprise, it broke. But it left fresh interfaces and according network-namespaces on our Kube-Wo provide a different CNI and its config)
proxy-mode and hairpin-mode (to be able to adjust kubelet and kube-proxy config for SILK).

Initially, we thought that we would also require a custom Docker release. Eventually, we realized that changing a few networking related settings in the Docker Release was all that it took:

Deactivate Flannel property
Deactivate Docker Iptables and IP-Masquerade

We started creating our local/custom releases which we referenced in the CI:

Kubo (CI)
SILK-Patches (CI)

To use SILK on our CFCR Cluster, it required a few more jobs and their packages from cloudfoundry/silk-release:

We did not care much for a working config in the beginning, we had no idea how it could look anyway. At this step, automation was added to be able to quickly cleanup and repave a playground once we tried a change too many.

Just give me an IP Address already

At this point the basics were in place and we concentrated on "just getting an IP" for our Pods. We kept it simple and just reused the SILK CNI-Config we found on the DIEGO-Cells and pointed Kubelet to our SILK binaries and config. To no ones surprise, it broke. But it left fresh interfaces and according network-namespaces on our Kube-Worker. This outcome was a lot more than we expected.

Specifically, it broke because Kubelet had trouble parsing out the IP from the created interfaces. A good thing about stuff that breaks, is that it rarely does so silently. We looked at our logs and searched for the relevant code that could produce these. We found that Kubelet looks up the IP-Address of a Pod's interface by shelling out to the "ip" and "nsenter" binaries. We compared the interface within a runC/SILK vs. a Docker/Flannel Container and found that the main difference is the scope. SILK creates an interface with scope "link" while the interface created by Flannel has scope "global".

We created a shell-wrapper for nsenter on our Kube-Worker and analyzed how kubelet was trying to call nsenter/ip commands and what parameters were provided. This way we could iterate fast without redeploying or compiling. Further, we manually reproduced the output of these calls against a Flannel interface (to learn about expected output). With a bit more understanding of what is supposed to happen, we created this wrapper for "nsenter":

We included the wrapper into our release and the CI redeployed. After a few minutes of waiting, we started seeing the first Pods come up with IPs from SILKs CIDR. From a functional point of view, we had created interfaces on the Kube-Worker but we were far from implementing the Kubernetes or Docker networking model.

Thank you for reading and stay tuned for the upcoming post:

Part one: Building bridges in the cloud
Part Two: Approaching the problem without rewriting existing code
Part three: There is no better way to learn than trial and error

The post Container to Container Networking for Cloud Foundry and Kubernetes appeared first on Stark & Wayne.

Container to Container Networking for Cloud Foundry and Kubernetes

Konstantin Kiess — Fri, 25 Jan 2019 19:39:09 +0000

Part One: Building bridges in the cloud

Image Source: https://www.pexels.com/photo/bridge-clouds-nature-outdoors-584315

It started at the CF Summit Basel Hackathon in 2018 and the result is above screencast.

What did you just look at? Let's start with explaining what stack we're using and what is happening before explaining in depth what we see in the Terminals.

The setup:

The Cloud Foundry side:

We deployed an OSS CF via CF-Deployment
We pushed an example app (cf-env).

The Kubernetes side:

We deployed a CFCR based Kubernetes Cluster.
We build wrappers around the SILK-binaries and customized our CFCR-Deployment; it's elaborated in future blog posts that are left out for now.
We deployed Kibosh into the CFCR Cluster.

What we realized: We have an easy and well working solution to create services out of Helm-Charts ( Thank you, Kibosh), an easy solution to deploy Kubernetes ( Thank you, Kubo/CFCR deployment repo), and Cloud Foundry (Thank you too, cf-deployment repo). We did not see an easy solution to consume these services from Cloud Foundry, why is that?

Vanilla (CFCR) Kubernetes networking works perfectly fine as long as you do not leave your cluster. If your requests are generated from within the cluster, you can rely on Kube-Proxy and the ServiceIP Range to access your Pods. Those ServiceIPs do not map to Network-Interfaces on the Workers, but are "GhostIPs" that get destination NAT to one of the corresponding Pod-IPs behind the Service. Since this relies on iptables/ipvs/userspace on the Kubernetes-Workers, you cannot access those Ghost-IPs from outside (e.g. an app running on a Diego-Cell).

Once you need to access your Pods from outside the cluster, you will have to choose between: a) NodePort Service, b) LoadBalancer Service, and c) Ingress Controller.
Assume one of these scenarios: you created a service instance and now need to expose the endpoints to the consumer (Layer 4 connection from CF to Kubernetes), you deployed a HTTP Microservice to Kubernetes that needs to be made available for a Cloud Foundry app (Layer 7), or in general, you want bidirectional Layer 4/7 traffic between Cloud Foundry and Kubernetes workloads for performance (least amount of network hops) or security reasons (Mutual TLS).

If you are relying on NodePorts, the biggest issue is how to deal with potentially changing IPs on your Kube-Workers. This includes scaling up your Workers and thus creating new available endpoints, scaling down your workers and removing in-use endpoints, or recreating your workers and changing in-use endpoints. While this could be solved by a Layer 4 load balancer in front of the Workers, but that just makes the problem someone's responsibility and introduces dependencies to the load balancer system and troubles with Network Access Control.

Lastly, there is the fact that you cannot grow your environment "indefinitely" as the total amount of Nodeports in one Kubernetes Cluster (not just on one Worker) is limited. If you are relying on a Service of Type LoadBalancer, you need an IaaS layer that supports this. Examples for an IaaS without support for LoadBalancer Service include vSphere without NSX-T, OpenStack without LBaaS Engine, and a Kubernetes Cluster deployed on Bare Metal. Additionally, this approach will consume resources for every created service, load balancers are usually not free and they will add additional hops to every request. If you are relying on Ingress Controllers for external exposure, depending on your Controller and its configuration, you might find the same issues as in (a) and/or (b) combined.

What to consider:

Looking at the architecture of Cloud Foundry and Kubernetes container-hosts, we realized that Garden/RunC and Kubernetes/Docker respectively rely on a CNI for container networking. By default, Cloud Foundry uses Silk and Kubernetes/CFCR uses Flannel. Silk and Flannel work similarly from a network architecture perspective.

Silk and Flannel:

Rely on creating a layer 3 Bridge on each host.
Assign a smaller (e.g. /24) subnet out of the whole (e.g. /16) container overlay range to a host.
Let hosts deal with IP Address Management.
Rely on the hosts for routing by creating NAT-rules for incoming/outgoing requests.

From this point on, we could have gone two ways: make Flannel work on Diego-Cells and lose Cloud Foundry C2C-Policies (Network Access Control)
or
make Silk work on Kubernetes and get API driven policies for free
(Read up on the basket of nice that is Silk).
We found that Silk has a few more integration points to Cloud Foundry (mainly the Policy API) than Flannel has to Kubernetes, thus we decided to find out what it would take to make Silk play nice with Kube-Workers.

This will be our target setup:

For the love of Cthulhu, why?:

Having the same network available from either type of container would help dramatically with spanning workloads across Kubernetes and Cloud Foundry, vice versa, in a dynamic and responsive way. Additionally, our containers are blissfully ignorant to the fact that they're hosted on different underlying systems which live on potentially different VM-networks.

We do not rely on deploying additional components (LoadBalancers/Ingress Controller).
We do not rely on containers having knowledge about their host network (IP:Port combination of an app instance on a Diego-Cell or a Pod on a Kube-Worker).
We are not unnecessarily exposing our apps/pods via their hosts, the Containers are only accessible via the SILK Bridge and thus routing is only available to their hosts.
We can use CF native C2C Service-Discovery to expose Pod/App IPs via hostnames as opposed to making an apps responsibility to do service discovery (e.g. via adding a Eureka to our deployments) or forcing the app to have access to Cloud Foundry/Kubernetes APIs to do look ups.
We do not introduce additional APIs for Cloud Foundry/Kubernetes to deal with as we're completely reusing existing Cloud Foundry functionality.

Finally, let's watch the screencast again to understand what is happening:

On the right, you see three panels. From top to bottom:

R-Top:
This panel runs `watch "kubectl get namespaces | grep -v Terminating"`. Our service Instance is created via Kibosh. Kibosh will deploy every Instance into its own Namespace. Once we trigger `cf create-service ...`, we see a new namespace pop up.

R-Middle:
This panel runs `watch "kubectl get pods -o wide --all-namespace | grep kibosh"`. Shortly after the Namespace got created, Kibosh started to deploy the Pods into that namespace. We added the '-o wide' to be able to see the created Pods IP Address.

R-Bottom:
This panel runs `dmesg -w -T | grep DENY`, this shows the CF Container to Container Policy system get applied to our Kubernetes Pods; SILKs Access Control is still working.

On the left side you see the commands we run. Let's go through it chronologically:

L-Top:
We start by running `cf create-service ...` to let the marketplace talk to Kibosh to create our Service Instance and wait for the creation to finish.
Once the creation finished we're using `cf bind-service` to tell Cloud Foundry to inject the binding/credentials into our App on the next restage.

L-Bottom:
We run `cf restage` on our App so the Container gets recreated and injected with the Binding.

L-Top:
We run `cf ssh ...` to SSH into one of our App Instances and look at the VCAP_SERVICES ENV Var to find one of the Ports exposed by our Service Instance.
We try to run a curl on an HTTP Endpoint the Service provides and see that the logs in R-Bottom show that the connection gets denied by CF-Policies.

L-Bottom:
We run enable_access.sh and disable_access.sh, these are just wrappers to run `cf curl` against the External Policy API. As you can see in the output of the script, it creates a Policy that allows incoming traffic on our Pod (or rather its parent entity) from our CF-App (or rather the set of containers considered our App). After running enable_access.sh, we see that the curl is now succeeding. One of our later blog posts will elaborate on how we plan to apply the CF Policy system to Pods.

We repeat the process of enabling access to the kube-dns Pod for our example app. And finally we do a DNS-Lookup for google.de via the Kube-DNS Pod to finish our screencast.

Thank you for reading and keep tuned to find out what we did to make it work in our upcoming blog post:

Part One: Building bridges in the cloud
Part two: Approaching the problem without rewriting existing code
Part three: There is no better way to learn than trial and error

The post Container to Container Networking for Cloud Foundry and Kubernetes appeared first on Stark & Wayne.