Earning Your Trust
Cloud Foundry and BOSH make an excellent platform and set of tools and they continue to get better. But all software has shortcomings. Keeping an eye on present and future gaps in the platform is on the mind of most of our clients. This is the first in the series of posts about how you can help Cloud Foundry earn your trust.
Who will watch the watcher?
While Cloud Foundry keeps and eye on your applications with tools like loggregator, if you want to keep an eye on Cloud Foundry itself you should probably start with BOSH. Of course you can start with BOSH command line tools to get quite a bit of information. In a pinch you can even parse out the responses from bosh status but that isn’t likely to be a stable long term solution.
The key here is to understand how BOSH communicates about the status of of individual deployed components and find a way to hook in to it.
NATS your average message bus
Fortunately BOSH communicates via NATS and provides a mechanism for notifying external services about alerts and health checks. BOSH agents on deployed job vms communicate with the BOSH director via this bus. There are a couple types of messages that we might care about. We will focus on alert and heartbeat events.
A Heartbeat event looks something like this
{"job":"etcd_z1",
 "index":0,
 "job_state":"running",
 "vitals":{
  "cpu: {"sys":"0.0","user":"0.0","wait":"0.6"},
  "disk":{"ephemeral":{"inode_percent":"0","percent":"0"},
  "persistent":{"inode_percent":"0","percent":"0"},
  "system":{"inode_percent":"36","percent":"46"}},
  "load":["0.00","0.03","0.05"],
  "mem":{"kb":"123444","percent":"3"},
  "swap":{"kb":"0","percent":"0"}
  }
 }
BOSH job agents put heartbeat notifications on the bus at 60 second intervals. The heartbeat package contains the status of the job (running/failed etc…) as well as a bunch of environment details (cpu/mem etc..). An alert job is one of several predefined warnings some of which Dr. Nic has recently outlined here with an example about monitoring persistent disk.
Get to the hook already!
So how do we get us some of that juicy job detail? Fortunately there is a built in Health Monitor and we can hook in to it. Enter BOSH Monitor. BOSH Monitor uses Event Machine to run plugins that fire whenever a NATS message is read from the bus. While you can subscribe to BOSH’s NATS server directly the supported method is to use one of the provided plugins or to contirbute your own to the project.
If you’d like to get a look at the NATS messages on your bosh deployment you can use something like the ruby script below.
#!/usr/bin/env ruby
require "nats/client"
require "json"
require "net/http"
["TERM", "INT"].each { |sig| trap(sig){ NATS.stop }}
BOSH_IP   = <IP OF BOSH OR YOUR LOCAL NATS>
NATS_USER = 'nats'
NATS_PASS = 'nats'
NATS_PORT = 4222
def nats_connection
  "nats://#{NATS_USER}:#{NATS_PASS}@#{BOSH_IP}:#{NATS_PORT}"
end
NATS.start(:uri => nats_connection) do
  NATS.subscribe('>') do  |msg, reply, sub|
     <DO SOMETHING WITH THE msg>
  end
end
In the long term, if you decide you still want to read directly from nats, the supported method is to use the BOSH Monitor NATS events plugin. This plugin forwards events from the BOSH NATS to a NATS server of your own. You can get details on installing your own nats server at nats.io
The configuration options for NATS Events plugins are highlighted below.
properties:
  hm:
    event_nats_enabled: true
properties:
  # Send events via NATS message bus
  event_nats_enabled:
    description: Enable event NATS message bus plugin
    default: false
  event_nats.address:
    description: Address of the event NATS message bus to connect to
  event_nats.port:
    description: Port of the event NATS message bus port to connect to
  event_nats.user:
    description: User for the event NATS message bus connection
  event_nats.password:
    description: Password for event NATS message bus connection
This should get you started. Next time we’ll take you through building a BOSH monitor plugin.
You can move on to creating a plugin in the second post in this series. Keeping an eye on Cloud Foundry Pt. 2