Photo by Modestas Urbonas on Unsplash
So, you just spun up a brand new K8s cluster on vSphere, OpenStack, Bare Metal, or a Raspberry Pi cluster and started running your first workloads. Everything is going great so far. That’s awesome! But then, you try to deploy a helm chart for that fancy new app you’ve been wanting to test out. The the pods come up ok but you can’t seem to access it from the link the chart spit out. Yikes!
So you think to yourself, “What the heck? I can get to it fine from inside the cluster and there aren’t any issues with the pods or in the logs…”
You wear your debugging glasses and take a look at the service responsible for handling the traffic to the pods. Awful! That service is just stuck pending
waiting for an External-IP.
kubectl get svcNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
test LoadBalancer 10.43.241.255 <pending> 80:31168/TCP 3s
Describing the service doesn’t offer much help either. It doesn’t show anything out of the ordinary other than the fact that there are no events associated with it:
kubectl describe svc test
Name: test
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"test","namespace":"default"},"spec":{"ports":[{"name":"http","por...
Selector: app=nginx
Type: LoadBalancer
IP: 10.43.107.74
Port: http 80/TCP
TargetPort: 80/TCP
NodePort: http 32325/TCP
Endpoints: 10.42.0.7:80
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
So, what’s going on here? What’s wrong with my service?
Well, the issue here is that our cluster is running on-prem and doesn’t have a proper cloud provider configured with a LoadBalancer support like you would have with AWS, GCP, Azure, etc. Therefore, when you create a service with type LoadBalancer
it will sit in the pending state until the cluster is configured to provide those capabilities.
So, what are our options here? With vSphere you have the NSX Container Plug-in, with OpenStack you have the Neutron LBaaS plugin, for Bare Metal and Pi clusters…well, you’re kind of on your own. So, why not just use the NCP or Neutron plugins? The answer usually comes down to complexity and cost. Spinning up a full SDN is quite a bit of added overhead to manage, however, if you are already running NSX or Neutron they may be worth considering first.
So, what option do we have that will satisfy the needs of all of our environments? This is where MetalLB comes in. MetalLB is a simple solution for K8s network load balancing using standard routing protocols aimed to “Just Work.”
Setting up MetalLB
A basic deployment of MetalLB requires the following prerequisite components to function properly:
- A Kubernetes cluster (v1.13.0+) that does not already have network load-balancing
- IPv4 address ranges for MetalLB to provision service instances
- A CNI that is compatible with MetalLB – most are supported, but see here for caveats.
IPVS Configuration Requirements
If you happen to be running kube-proxy in ipvs
mode you’ll have to make a quick change to allow ARP. If you are using the default iptables
mode then you can safely skip this step.
$ kubectl edit configmap -n kube-system kube-proxy
Set the following keys un the kube-proxy configmap:
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
strictARP: true
Deploying MetalLB Resources
The following commands will create a namespace metallb-system
, the speaker
daemonset, controller
deployment, and service accounts:
$ kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.9.3/manifests/namespace.yaml
$ kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.9.3/manifests/metallb.yaml
The first time you deploy MetalLB you’ll also have to generate the secretkey for speaker communication however for upgrades this can remain unchanged:
$ kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
At this point, it should have spun up a controller pod and one speaker pod per node that should all get to the running state:
$ k get po -n metallb-system
NAME READY STATUS RESTARTS AGE
controller-5c9894b5cd-8mww9 1/1 Running 0 4d1h
speaker-4bjhf 1/1 Running 0 4d1h
speaker-jnfpk 1/1 Running 0 4d1h
speaker-sgwht 1/1 Running 0 4d1h
Though the components are all up and running now, they won’t actually do anything until you provide them with configuration. MetalLB can be configured to run in either Layer2 or BGP modes. For this deployment we will be using Layer2, however, more information and BGP configuration examples can be found here.
Layer2 Configuration
For Layer2 configuration all you need is a set of IPv4 addresses allocated for MetalLB to hand out for requesting services. This configuration is set in the configmap as shown below which after being written to a file can be applied via kubectl apply -f config.yml
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: default
protocol: layer2
addresses:
- 10.128.54.230-10.128.54.239
After applying the configmap, our changes should take effect and our service should now have an External-IP ready for use! Note that the External IP 10.128.54.230 was pulled from the IP range in addresses array of the ConfigMap.
$ k get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 5d1h
test LoadBalancer 10.43.107.74 10.128.54.230 80:32325/TCP 22h
A few Caveats and Limitations
Layer 2 mode has two primary limitations you should know about that they call out as part of the documentation:
- Potentially slow failover
- Single node bottlenecking
In Layer2 mode, a single leader-elected node receives all traffic for a service IP. This means that your service’s ingress bandwidth is limited to the bandwidth of a single node. This is a fundamental limitation of using ARP and NDP to steer traffic.
Also, due to the leader lease time of 10 seconds, it can take time for failover to occur and properly direct traffic in the event of a failure.
For many situations this is not a dealbreaker, however, for more critical systems you may want to leverage the BGP mode. For more information regarding the tradeoffs, see the BGP Limitations in the documentation.