You know when Spock does that thing on Star Trek and he just grabs the guy by the neck in the right place and blamo, he’s out cold? It’s called the Vulcan Nerve Pinch. The right move at the right time is all it takes to go from chaos to calm.
Well we want to present two "Killer Moves" that can help restore sanity to your bosh-lite
environment.
Let’s say you’ve previously Deployed Cloud Foundry Locally with bosh-lite with Mac-OSX (Late 2015), following our blog. What if you shutdown your machine? Well it appears that all the Warden instance are foobar…
Killer Moves
Here’s what we’ll be talking about:
./bin/add-routes
bosh cloudcheck
Vagrant Up?
The very first check is just to make sure the vagrant machine is running. Get it to a running state first, then begin the next steps.
Enable Network
Once you change directory to where you’re running bosh-lite
run the ./bin/add-route
script.
Why do we need to run ./bin/add-route
? To bridge the networks together between the virtualbox machine that is the gateway at 192.168.50.4
to the Warden instances running on the 10.244.0.0/16
range.
For example on Mac OS X the route
reports the following information after running the script:
$ route -n get 192.168.50.4
route to: 192.168.50.4
destination: 192.168.50.4
interface: vboxnet1
flags: <UP,HOST,DONE,LLINFO,WASCLONED,IFSCOPE,IFREF>
recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire
0 0 0 0 0 0 1500 -438
$ route -n get 10.244.0.0/16
route to: 10.244.0.0
destination: 10.244.0.0
mask: 255.255.0.0
gateway: 192.168.50.4
interface: vboxnet1
flags: <UP,GATEWAY,DONE,STATIC,PRCLONING>
recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire
0 0 0 0 0 0 1500 0
Also, the routes do not persist between reboots, so it’s good to remember that if you’re having problems running bosh ssh
to your local bosh-lite
instances, re-run the ./bin/add-route
script.
This is crucial for the bosh cloudcheck
step to work.
bosh cloudcheck
Gather Information
You might have gathered information first by running bosh vms
and seen output reports the status of your Cloud Foundry deployment:
$ bosh vms
[Acting as user 'admin' on 'Bosh Lite Director'
Deployment 'cf-warden'
Director task 242
Task 242 done
+---------------------------------------------------------------------------+---------+-----+-----------+--------------+
| VM | State | AZ | VM Type | IPs |
+---------------------------------------------------------------------------+---------+-----+-----------+--------------+
| api_z1/0 (f9b7bb05-7f9b-4758-9775-9d9c817f09b2) | unresponsive agent | n/a | large_z1 | 10.244.0.138 |
| blobstore_z1/0 (c772d8be-52d2-4e9e-b4f6-4e25b4dd84d6) | unresponsive agent | n/a | medium_z1 | 10.244.0.130 |
| consul_z1/0 (62bfb613-0ef1-42ad-8ffb-96d718633484) | unresponsive agent | n/a | small_z1 | 10.244.0.54 |
| doppler_z1/0 (8891552e-57b4-4466-8708-1dcb979d5c3b) | unresponsive agent | n/a | medium_z1 | 10.244.0.146 |
| etcd_z1/0 (ec4520d6-12c8-4b9f-90a1-b434c0271e22) | unresponsive agent | n/a | medium_z1 | 10.244.0.42 |
| ha_proxy_z1/0 (5bfd75ea-fc4b-4be1-a5cd-e9a89bb6fa10) | unresponsive agent | n/a | router_z1 | 10.244.0.34 |
| hm9000_z1/0 (e117d6a2-802f-40f8-b945-a28b0d069faa) | unresponsive agent | n/a | medium_z1 | 10.244.0.142 |
| loggregator_trafficcontroller_z1/0 (13aebd87-fac2-489c-8b6e-68e1145fa0e9) | unresponsive agent | n/a | small_z1 | 10.244.0.150 |
| nats_z1/0 (c6f680b5-f51c-448c-b62a-d997f274c62b) | unresponsive agent | n/a | medium_z1 | 10.244.0.6 |
| postgres_z1/0 (5a157c5a-3a06-46e6-9350-07703dd2f68f) | unresponsive agent | n/a | medium_z1 | 10.244.0.30 |
| router_z1/0 (456ee602-d347-4cea-9964-8376859084fb) | unresponsive agent | n/a | router_z1 | 10.244.0.22 |
| runner_z1/0 (6cbed5a2-88b7-4e85-9d22-9c4de4c64e24) | unresponsive agent | n/a | runner_z1 | 10.244.0.26 |
| uaa_z1/0 (614eefa9-dc34-4521-a5e6-d07ee9b26f9f) | unresponsive agent | n/a | medium_z1 | 10.244.0.134 |
+---------------------------------------------------------------------------+---------+-----+-----------+--------------+
VMs total: 13
Restore Jobs
If the state of any of the Warden instances are other than "running", BOSH has a command called cloudcheck
with an alias cck
that will scan your deploy, all jobs that are not running and prompt for action.
bosh cck
Check Yourself
Each job that is part of the deploy will prompt you, asking what action to take. We recommend restoring the instance.
Once you’ve selected what to do about each job it confirms the actions and gets to work.
Coffee Break
Depending on the number of jobs it needs to restore, the bosh cloudcheck
may take a while. When I’ve restored an entire 13 instance Cloud Foundry it’s taken at least thirty minutes.
Most of the output as well will post errors as each of the CF services are restored. This is usually because of dependencies between each of the jobs that will be fixed once all of the instances are running again.
Confirm Integration
Once all the instances are back up and running, it’s usually good to do a test deploy to confirm that all the instances are able to connect to each other, have the correct IP addresses for each other, etc.
bosh deploy -n
Share Your Tips
Do you have any other tips that work for you to keep your development environment running smooth? Feel free to share them in the comments below.