Industrial Farm Equipment Manufacturer (Manufacturer) had three separate Cloud Foundry environments up and running; two production environments, both with assigned data centers that had been running for a year, and a third development environment that was not yet linked to a data center.
The production environments were critically important to the day-to-day operation of the business, as they hosted dealership-related services.
With the loss of a key employee and the institutional knowledge that went with him, the remaining team members at Manufacturer weren't actively monitoring certificates or the respective re-issuance dates. Due to the problems during this transition of responsibility, some key re-issuance dates had already passed, leading to the expiration of BOSH-related certificates for the development environment, thus making it inoperable.
Soon after the development environment went down, the BOSH-related certificates for the first production environment also expired, leading to another outage. Fortunately, Manufacturer's high-availability design forced all traffic to go through the second production environment, enabling Manufacturer's operations to continue running smoothly.
These outages acted as a clear warning to the Manufacturer team, alerting them to the problem of expiring certificates along with the potentially disastrous consequences of having a second production become inoperable. Because of the sequence of certificate expiration's, it also gave the team a three-day warning period before the BOSH certificate for the second production would expire.
The clock was ticking and, with the possibility of having day-to-day operations grinding to a halt, the team at Manufacturer called on Stark & Wayne for assistance. That's when Stark & Wayne's experience with Cloud Foundry and our collective approach enabled us to find a solution.
Stark & Wayne's engineers quickly gained access to BOSH and identified each certificate that had expired. With reissuing the certificates ruled out as a viable option, the experts at Stark & Wayne were undeterred and devised another plan for what to do.
Relying on the experience, the Stark & Wayne team decided to extract the certificates from the existing deployments and create renewal requests.
By taking this approach, Stark & Wayne was able to extend the expiration dates on the existing certificates, leveraging BOSH deployment paradigms to update the certificates for each VM through purpose-built software using Manufacturer's last standing production environment.
This secondary approach, unfamiliar to most people with little Cloud Foundry experience, helped Manufacturer avoid what could have been a catastrophic period of downtime, allowing Stark & Wayne to keep the one production environment and restore the two inoperable environments.
During this engagement, Stark & Wayne leveraged their Collective approach, which advised the onsite team to use the Stark & Wayne Safe certificate management tool for the job. This choice tool, based on all of Stark & Wayne's considerable experience, was an important factor in having the onsite team manage the process more effectively, allowing them to complete the job of renewing certificates a full day ahead of the projects three-day deadline.
To help prevent future outages, Stark & Wayne also taught Manufacturer's operation team how to use Doomsday, a Stark & Wayne-developed tool used to alert personnel automatically when certificates are nearing their expiration date. This reduces the team's reliance on tracking certificate expiration manually.
Stark & Wayne supplied best practices architecture recommendations to avoid BOSH certificate problems going forward.