The BOSH v263 release added an exciting new feature – the ability to run one-off tasks (called errands) inside existing instances.
A BOSH deployment is the top-level first-class citizen of running things with BOSH. Typically a BOSH deployment will be one or more long-running instances on your target cloud infrastructure. For example, you could use BOSH to deploy a 5-node cluster of Zookeeper to Amazon AWS or much much larger systems such as Cloud Foundry or Kubernetes.
A few years ago BOSH added the ability to run one-off tasks, called errands. These errands are run inside brand new, temporary instances. The results – stdout
/stderr
– are returned from the errand instance, and the instance is destroyed.
What you could not do with errands was run one-off tasks on the same instances as your running job templates/processes. You couldn’t perform local cleanups or health checking or other introspection tasks.
Fortunately this feature is now open to you once you upgrade to BOSH v263+ and Linux 3445+ stemcells!
Dmitriy’s https://github.com/cppforlife/zookeeper-release has been upgraded to add a status
collocated errand. Deploying and testing this errand is very simple (see BOSH 2/AWS/Zookeeper blog post for lengthy getting started tutorial for BOSH).
git clone https://github.com/cppforlife/zookeeper-release
cd zookeeper-release
bosh -d zookeeper deploy manifests/zookeeper.yml
Once the 5-node cluster is running, you can run the status
errand:
bosh run-errand status
The output will initially show the errand running on each instance, and then it will aggregate the output from each instance:
Task 2200 | 02:15:23 | Preparing deployment: Preparing deployment (00:00:00)
Task 2200 | 02:15:23 | Running errand: zookeeper/5d746534-e40e-445d-9e7d-7b2c4608e322 (1)
Task 2200 | 02:15:23 | Running errand: zookeeper/6e8b7728-88fd-4da1-8191-66546a80a8f6 (4)
...
Task 2200 | 02:15:24 | Fetching logs for zookeeper/6e8b7728-88fd-4da1-8191-66546a80a8f6 (4): Finding and packing log files
Task 2200 | 02:15:24 | Fetching logs for zookeeper/e06dcbb1-5fcd-4bf7-874b-0bdc07c31620 (0): Finding and packing log files
...
Instance zookeeper/5d746534-e40e-445d-9e7d-7b2c4608e322
Exit Code 0
Stdout Mode: follower
Stderr ZooKeeper JMX enabled by default
Using config: /var/vcap/jobs/zookeeper/config/zoo.cfg
Instance zookeeper/b5923bcf-3106-4b7f-8eb4-212b5877ef81
Exit Code 0
Stdout Mode: leader
Stderr ZooKeeper JMX enabled by default
Using config: /var/vcap/jobs/zookeeper/config/zoo.cfg
...
Implementation
Collocated errands are implemented exactly the same as traditional errands within your BOSH release. That is, the only requirement is that a job template has a bin/run
script that exits 0
if successful.
To add a collocated errand into your deployment manifest, place it within the instance group like any other errand. For the zookepeer.yml
example the status
job template is added:
instance_groups:
- name: zookeeper
instances: 5
jobs:
- name: zookeeper
release: zookeeper
properties: {}
- name: status
release: zookeeper
properties: {}
...
Debugging errands
When standalone errands fail it is difficult to debug them as the temporary instance was deleted before you learn that the errand has failed. You would need to re-run the errand with the --keep-alive
flag so that the instance is not deleted, thus allowing you to SSH in and isolate the issue.
bosh run-errand sanity-test --keep-alive
bosh ssh sanity-test
With collocated errands, debugging them has never been easier. The errand script is now a short-lived process on a long-running instance, so you can SSH into the instance before/during/after the errand is run, observe it, watch the logs, etc.
bosh ssh zookeeper/0
bosh run-errand status
Errors you might see
For your benefit, I hit some bumps and thought I’d document them here so you don’t have to hit the same bumps.
Errand doesn’t exist
If you get this error, then you have not yet upgraded your BOSH to v263+. You might feel silly initially, and then realise that I didn’t just guess what this error looks like. I too forgot to check that my BOSH was upgraded first.
Task 313 | 00:39:33 | Preparing deployment: Preparing deployment (00:00:01)
L Error: Errand 'status' doesn't exist
Task 313 | 00:39:34 | Error: Errand 'status' doesn't exist
Older stemcell
Your deployment will need to be running on Linux 3445+ stemcells for collocated errands to work. If you have not yet upgraded your stemcells, you might see this error:
Task 2182 | 01:41:29 | Preparing deployment: Preparing deployment (00:00:00)
Task 2182 | 01:41:30 | Error: Multiple jobs are configured on an older stemcell, and "status" is not the first job