Life was great until we needed to upgrade!
We had our websites and services working just fine with our certificates being upgraded automatically, and then we were forced to upgrade to a later Kubernetes release. Kubernetes 1.22 removed a few features but it should tell us if anything went wrong in the upgrade, right?
We started getting emails about certificates expiring. Well, that was odd, so we ran the cert-manager plugin to renew the certificates. Of course, to get that to work, we needed to upgrade the cert-manager service, which required us to also change our deployment manifest file to match the updated specification for the cert-manager. Isn’t Kubernetes just grand!
We ran the cert-manager renew command, and everything looked like it should work.
We were wrong and now what can we do?
Our expiry date came, and we started getting the following error:
Ok, so the certificate rotation did not work. Now what?
Is this really the right solution?
One of my colleagues said, oh, certificate rotation does not work anymore. So, do the following steps:
- Extract the private and public certificate parts
- Decode the cerificates because it is stored in base64
- Generate the new certificates
- Encode the certificates back into base64
- Push the encoded certificates parts back to the cert-manager
This solution bothered me to no end. What is the issue? If the cert-manager cannot do its job, then why even have it? Searching the web suggested using another DNS provider. Well, we like Cloudflare because it supports DNS proxy. What is Cloudfare’s definition of DNS proxy? Their DNS proxy broadcasts a fake/proxy IP and then the proxy IP redirects the actual/real host IP. This allows Cloudfare’s proxy host to deal with denial of service attacks instead of your website. Your actual IP host cannot be discovered directly since the DNS service only knows about the proxy IP address. Well, why does this matter?
We looked at the cert-manager log and discovered log messages like the one below:
E0913 20:53:36.897867 1 sync.go:185] cert-manager/controller/challenges “msg”=”propagation check failed” “error”=”wrong status code ‘526’, expected ‘200’” “dnsName”=”DNS-name-changed” “resource_kind”=”Challenge” “resource_name”=”verse-tls-gg4k8-1453563326-1173022923” “resource_namespace”=”namespace-name-changed” “resource_version”=”v1” “type”=”HTTP-01”
Kubectl log for cert-manager
That status code is the same status code we get back from the Cloudflare proxy service. Ah, the cert-manager is trying to renew the certificate using the public internet which is proxied through Cloudflare.
526 Invalid SSL Certificate
https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
Cloudflare could not validate the SSL certificate on the origin web server. Also used by Cloud Foundry’s gorouter.
Cloudflare won’t proxy new traffic from a server with an invalid certificate. We also redirect HTTP to HTTPS so it must be served by SSL. So, without looking deeper into the problem, what can we do? It turns out you can turn off DNS Proxy through the Cloudflare interface.
If you turn off the proxy status marked in the image below, the renewing certificates will work.
Another solution, but at least we are using the cert-manager
The new steps we followed were:
- Login to Cloudflare
- Find your DNS entry and disable DNS proxy
- Run the kubectl cert-manager renew command
- Wait the renewal to complete
- Reenable DNS proxy
- Logout from Cloudflare
Our next step is to do some research and see if we can actually make changes to the certificate renewal process so this manual process can be avoided in the future.