When you are running out of Cloud Foundry resources (which results in apps not being able to stage/start) you may want to list the available memory/disk on your DEA/runner vms. In this blog post we will be using the same tools as in my previous blog post, but instead of using the Cloud Controller API as our source of truth, we will be using the NATS message bus.
Lets start by installing some dependecies:
sudo gem install nats --pre && sudo apt-get install socat
We will also be using jq which was covered in more depth here.
To connect to NATS we will need connection details. Which can be extracted from your Cloud Foundry BOSH deployment manifest using the following snippet (snippet uses yaml2json which was introduced here).
MBUS=`cat __path_to_deployment_manifest__ | yaml2json | jq -r '.properties.nats | "nats://\(.user):\(.password)@\(.address):\(.port)"'`
The message we are interested in is dea.advertise
, so lets subscribe to this message using nats-sub
:
> nats-sub -s $MBUS dea.advertise
Listening on [dea.advertise]
[#1] Received on [dea.advertise] : '{"id":"1-bb1caa3c08324dbebb24552b76980a1f","stacks":["lucid64","cflinuxfs2"],"available_memory":47488,"available_disk":296748,"app_id_to_count":{"844caa81-a453-40c0-a511-827329f2c9b4":1,"8c398c22-c25b-403a-b67a-123b344ddff9":1,"6e45aa45-e414-4cf6-80b3-0e89b82bcb31":1},"placement_properties":{"zone":"z1"}}'
[#2] Received on [dea.advertise] : '{"id":"3-4aa6d126ddd44d3d96aba32e5be3085c","stacks":["lucid64","cflinuxfs2"],"available_memory":45056,"available_disk":294340,"app_id_to_count":{"5d8addfa-86ab-4ed2-b3c9-c793a2a0b408":1,"af51c7fa-62dc-46b1-b78b-f89ef30c3191":1,"8cdd2bd2-79f4-4060-ac61-596c8ea794b5":1,"dd8689d3-71fc-4eee-85f7-79769867d387":1,"5fac5461-cb20-465d-b572-b28111092509":1},"placement_properties":{"zone":"z1"}}'
We can use the -r
(raw) flag to easily extract the relevant information with jq, in addition to the extra flag we will also wrap nats-sub in socat to overcome stdout buffering issues:
> socat EXEC:"nats-sub -r -s '$MBUS' 'dea.advertise'",pty,ctty STDIO | jq -r '"\(.id) disk:\(.available_disk) memory:\(.available_memory)"'
1-bb1caa3c08324dbebb24552b76980a1f disk:296748 memory:47488
0-a4da6e64cadf49c6ba21780676a2201a disk:293136 memory:44032
2-e9feaabb2d584b029f36a1cd2dfc6cfb disk:297592 memory:47104
While we are at it lets also add the number of apps per DEA by using jq reduce:
> socat EXEC:"nats-sub -r -s '$MBUS' 'dea.advertise'",pty,ctty STDIO | jq -r '"\(.id) disk:\(.available_disk) memory:\(.available_memory) apps:\(reduce .app_id_to_count[] as $i (0; . + $i))"'
1-bb1caa3c08324dbebb24552b76980a1f disk:296748 memory:47488 apps:3
0-a4da6e64cadf49c6ba21780676a2201a disk:293136 memory:44032 apps:6
2-e9feaabb2d584b029f36a1cd2dfc6cfb disk:297592 memory:47104 apps:2
3-4aa6d126ddd44d3d96aba32e5be3085c disk:294340 memory:45056 apps:5
1-bb1caa3c08324dbebb24552b76980a1f disk:296748 memory:47488 apps:3
We are not interested in a continues stream of messages instead we want a point in time snapshot of the current state of the world. So lets use timeout
(to kill nats-sub) in combination with sort
, uniq
and column
to make a nice list:
> socat EXEC:"timeout 6 nats-sub -r -s '$MBUS' 'dea.advertise'",pty,ctty STDIO | jq -r '"\(.id) disk:\(.available_disk) memory:\(.available_memory) apps:\(reduce .app_id_to_count[] as $i (0; . + $i))"' | sort -n | uniq | column -s" " -t
0-a4da6e64cadf49c6ba21780676a2201a disk:293136 memory:44032 apps:6
1-bb1caa3c08324dbebb24552b76980a1f disk:296748 memory:47488 apps:3
2-e9feaabb2d584b029f36a1cd2dfc6cfb disk:297592 memory:47104 apps:2
3-4aa6d126ddd44d3d96aba32e5be3085c disk:294340 memory:45056 apps:5
The timeout should be greater then dea_next.advertise_interval_in_seconds
which defaults to 5 seconds.
The final watch
optimzed snippet looks like this:
watch "socat EXEC:'timeout 6 nats-sub -r -s \"$MBUS\" \"dea.advertise\"',pty,ctty STDIO | jq -r '\"\(.id) disk:\(.available_disk) memory:\(.available_memory) apps:\(reduce .app_id_to_count[] as \$i (0; . + \$i))\"' | sort -n | uniq | column -s' ' -t"