Recently, one of our clients had to migrate their CF from one vSphere cluster to another. Here is the story: the client bought some more modern UCS chassis they would like to add to the existing cluster. Enhanced vMotion Compatibility (EVC) must be enabled to support mixed processors in the same cluster. You can’t enable EVC if there is a single VM is on it. In order to enable EVC feature, they need to migrate the whole CF to a new cluster. Since it is a heavily used CF environment, they would like zero or minimum downtime for the migration.
The solution should have been simple since vSphere has the vMotion feature. The steps we came up with are: disable bosh resurrection, create a new cluster in the same vCenter, vMotion the CF VMs to the new cluster, enable EVC in the old cluster, vMotion the CF back to the old cluster, enable Bosh resurrection. Zero downtime and no user should have noticed that we migrated CF between two clusters.
Small challenges happened when we found out that we can’t vMotion between the two clusters when VMs are running due to the CPU compatibility issues between the two clusters. (More details see vMotion between vSphere Clusters). This means we have to power off CF VMs before we can do vMotion, which means possible downtime for the platform.
Given the situation, we came up with the following working solution:
1) Turn off BOSH resurrection, otherwise BOSH will try to self-recover/recreate
your VMs that are down when you try to migrate.
2) Run bosh stop
on a subgroup of the VMs so there were still same type VMs running to keep the platform working. bosh stop
without --hard
flag by default will stop VM while keeping the persistent disk.
3) Power off those VMs to do vMotion to the new cluster we created beforehand.
4) After vMotion, bring the VMs in the new cluster up.
5) Repeat the above process until you migrate all the VMs over to the new cluster
6) Delete or rename the old cluster.
7) Rename the new cluster to the old cluster’s name.
8) Turn the BOSH resurrection back on.