When I joined the team at Stark & Wayne, one of my first tasks was to create a BOSH release for MariaDB. I found the Cloud Foundry documentation is excellent for getting the big picture but I still managed to hit a few sticking points along the way. Fortunately, Stark & Wayne has deployed many BOSH releases and we thought, as it was one of my first, it would be a good excuse to make a beginner friendly overview of the process that might augment the existing documentation.
So how do I develop a BOSH release without a server handy? That is where BOSH-lite comes in. BOSH-lite is a local development environment for BOSH. BOSH-lite will create a local BOSH director that will of course be bound by the resources available to your local development machine.
You can learn how to install BOSH-lite
Once I had BOSH-lite installed I found that there can be a lot of set up and tear down. You’ll want a good rebuild script to stand up your BOSH-lite environment. Here is the script I started with and it is a minimal set of commands to get the job done.
#tear down/build up from Vagranfile in bosh-lite repository
vagrant destroy -f
vagrant up
#target the BOSH cli at the new VM we've created. The ip is the default specified in the BOSH-lite Vagrantfile
bosh target 192.168.50.4
bosh login
#Add the stemcell you would like to target you can view avaialble stem-cells with `bosh public stemcells`
bosh download public stemcell bosh-stemcell-60-warden-boshlite-ubuntu-lucid-go_agent.tgz
bosh upload stemcell bosh-stemcell-60-warden-boshlite-ubuntu-lucid-go_agent.tgz
#make sure routing rules are set up, otherwise you won't be able to access the VMs beyond the bosh director
./scripts/add-route
Now that I was ready to get started, I needed to answer a few questions. Fortunately the Stark & Wayne team created a list of questions you should answer before you begin a BOSH release. If you can’t answer these questions, you can’t finish the release.
What packages are required?
- Is there a usable binary available?
- Where is the source code located?
- What are the compilation requirements on target platform?
What will your Jobs look like?
- How to run process? eg. start, stop, restart.Control Script? Helpful wrapper script? ({name}_ctl)
- How to daemonize?
- How to configure pidfile? (/var/vcap/sys/run/{name})
- How to set logs dir? (/var/vcap/sys/log/{name})
- How to storage (db) dir? (/var/vcap/store/{name})
- What do config files look like?
- What is parameterizable? (jobs/{name}/spec-> properties: …)
- How to cluster? Leader (Master) vs Followers (Slaves)
Once I answered these questions I was ready to go. With some research and a new found expertise on the inner workings of MariaDB I jumped right in by manually creating files based on examples I had found online. You might guess how well that worked out.
After struggling with missing files and entries in those files a valuable lesson was learned. Actually, Dr. Nic had to tell me three times before I listened. So do him a favor and listen the first time. Use bosh-gen and ./templates/make_manifest. You should not create manifest or package/job skeletons by hand. You will forget stuff. If the generator differs from an example you’ve found online and you have the latest version of the bosh-gen gem the generator is probably a newer example and you probably want to follow it.
#Assuming you already have ruby installed, install the bosh-gen gem
gem install bosh-gen
#Create your new release; choose s3 to store assets/blobs
bosh-gen new <release-name>-boshrelease
Now that I had the power to generate packages, jobs and manifests with ease, things went much more smoothly.
#create your package in packages/<package-name>
bosh-gen package <package-name>
My first impression was that the packaging aspect is not unlike creating a package for apt, yum, or homebrew. When troubleshooting my release it helped quite a bit to pore over configuration options looking for anything that specified directory paths, making sure I provided them with updated /var/vcap.. entries. Often these paths were set to defaults and did not show up in an out-of-the-box config file. If you are having problems with permissions on files or processes accessing files or directories that do not exist look for compile, runtime, or configuration options that specify directories for your service/process.
Another issue I ran into, when it comes to setting up paths was that even though I was setting the paths appropriately in my config template I was still have problems. I realized that MariaDB was not using the config file I supplied. The nature of a BOSH release means that you are going to have lots of things in paths that your service might not expect. It took a few tries to get things squared away here. Check which of the commands available to your service need to have the config file path specified. Im my case I was able to provide a compile option to an alternate config file path.
I had some trouble getting my head around where the action was happening until I realized that the packaging scripts are going to run on a compile node and you won’t have access to persistent data there. For example, the initialization scripts for mysql,mariadb, or PostgreSQL cannot be run at compile time. Be aware which of your scripts are going to run where, packaging will happen on compile nodes and jobs will run on the BOSH vm.
One of the biggest mistakes I made early in the process was a habit from using interpreted programming languages. Making small changes and then running my code constantly. Remember that making changes to a packaging script will force a recompile even if the change is trivial. For packages with a long compile time you want to avoid making lots of trivial changes or formatting adjustments while you are in development. You’ll spend a lot of time waiting for compilation to finish. On the other hand if you focus on getting your packaging scripts right first your compiled package will be cached and further bosh deployments will be quick.
I also lost some time trying to figure out why some of my packages did not seem to make the trip from the compile node to my service VM. If you need a package to be available in /var/vcap/packages, ensure you have added it as a job dependency in jobs/spec(packages:). It will not get copied over otherwise.
#create your job in jobs/<job-name>
bosh-gen job <job-name>
Bosh-gen will create a monit file, a spec file and a suite of templates for you. I initially made the mistake of editing the monit file directly. Fortunately this time Dr. Nic only had to tell me once that it was a bad idea. You should probably leave the monit file alone and focus on the ctl script in jobs/
Once the necessary files and configuration were in place it was time to try a deployment. In the process I found myself typing these commands a lot.
bosh create release --force && bosh -n upload release && bosh -n deploy
It was helpful to alias them to something short and memorable with a separate alias for bosh -n deploy as you will often run the deploy alone.
One of the first hurdles I had with the release is that there isn’t yet a built in way to handle ‘run once’ requirements. For example a database initialization script that needs populate some tables in in the data file on your persistence store. If you need to keep track of run once requirements remember that your service VMs are mostly transient and any persistent information will need to be in /var/vcap/store or in another persistent service. Fortunately the BOSH team is working on a new feature called errands that will accomplish run once tasks.
Now that I had my release together the fun began. The first error I encountered was job_name/0 is not runnning. To me this implied that everything was getting deployed correctly but that the jobs were not starting. It turns out that this was not the case.
After trying to ssh in to the BOSH VMs, thinking my BOSH director was misbehaving and manually reviewing the ./scripts/add-routes script to make sure it was doing what I thought it was doing, I realized that BOSH VMs had not actually been created and the deployment failed at a higher level than the point where it tries to start the service. So it turns out that this error does not necessarily mean that the job is deployed but not starting. It is also possible that the BOSH virtual machines are not even available.
Once I got through the compile and packaging process the jobs were not starting and I needed to watch the logs. I figured out I could tail the logs on a running BOSH vm.
 #connect to the appropriate vm
 bosh ssh
 #tail all the logs
 tail -f /var/vcap/sys/logs/**/*
Initially not all the logs were being created in that location and I needed to add some entries to the config file to make sure the logs are in the right place.
Once I got that sorted I found some good information on finding and debugging logs here
Of the larger lessons learned along the way I think one of the most important was "If in doubt, start over". While I was poking around on the running BOSH VMs I would change things for testing or investigative purposes. If you’ve been poking around and things aren’t working, remember that the BOSH service VMs are supposed to be transient. Tearing them down and starting over is a great test of the clean bootstrap behavior of your release and it may just get rid of an odd issue or two that you’ve inadvertently caused.
 bosh delete deployment <deployment-name> --force
After a few odd problems and a clean bootstrap of my release I was finally getting somewhere. The release would deploy fine, monit would stop and start the processes correctly and all seemed to be well. Now it was time to do some testing and it turned out that replication wasn’t working correctly. When I proudly showed my “error free” release to the team they congratulated me on getting the easy part done and wished me luck with replication problems. The hardest learned lesson was that getting a release that runs without failing is just the beginning.