After three years of using BOSH I’m still surprised that bosh cancel task
is relatively ineffective. It doesn’t immediately cancel the task; rather it registers the request, and patiently waits for the current task to get into a state where you could safely cancel it. Which is almost never what you want. You know something is wrong and you want to cancel the task NOW.
More importantly perhaps: you want to unlock the current deployment.
BOSH has a lock over each deployment – you can only perform one deployment per deployment name at a time. So if you cannot cancel a task that has locked a deployment, then you cannot perform a subsequent task.
The rest of this article documents how to find and delete a lock.
Where is a BOSH lock?
It depends on how old your BOSH Director is. If you are using a version of BOSH made since 2016 the locks are stored in a Postgres database on the director. Older versions store this information in Redis and directions for clearing locks are further down.
Newer BOSH Directors
First start by stopping all the monit jobs on the director and then just restarting Postgres:
/:~$ sudo -i
[sudo] password for vcap: (hint: defined in your manifest, credhub or `c1oudc0w`)
/:~# monit stop all
/:~# monit start postgres
The locks are stored in a Postgres database on the BOSH director. To connect to the database from the director run:
/:~# /var/vcap/packages/postgres-9.4/bin/psql -U vcap bosh
To find the lock, query the locks
table by running: SELECT * FROM locks;
id | expired_at | name | uid
-----+----------------------------+-------------------------------------+-------------------------------------
1722 | 2017-07-13 17:55:14.075042 | lock:deployment:us-east-1-pr-shield | f3218bf0-ec1e-40f0-b843-4ba9b8779954
You can delete the offending row by DELETE FROM locks WHERE id=1722
, substitute your own value for the id. The table is normally empty when there are no tasks running.
Now restart the monit jobs and the lock should be clear
monit start all
Older BOSH Directors
BOSH a few years ago used Redis for locks. To find the location and password for Redis, look in the Director’s configuration:
$ cat /var/vcap/jobs/director/config/director.yml | grep redis -A3
redis:
host: 127.0.0.1
port: 25255
password: redis
logging:
level: info
For a single VM BOSH, Redis will be running on the same host on port 25255
.
To connected to Redis, add the redis-cli
to the $PATH:
export PATH=$PATH:/var/vcap/packages/redis/bin
redis-cli -p 25255 -a redis
To find the lock, look up all current locks:
> keys "lock:*"
1) "lock:deployment:my-locked-deployment"
There it is – the “deployment” lock for deployment “my-locked-deployment”.
Delete it with Redis command del
:
> del lock:deployment:my-locked-deployment
(integer) 1