Stark & Wayne

Demystifying Cloud Foundry's Diego

This post serves a specific purpose -- to take Cloud Foundry's new runtime environment, break it down into parts that hopefully make it a bit easier to understand. Diego was, for me, something that seemed a bit obscure until I dug into its vitals. I found the documentation gave me something of an idea of the runtime environment, but I wanted to see if I could write it in a plain, easy-to-understand manner. I'm hoping this post finds other people like me, largely beginners to Cloud Foundry, and makes it a bit quicker to hit the ground running with what is a pretty cool way to host Cloud Foundry applications.

First off

Diego's meant to be a replacement for Cloud Foundry's old runtime environment, DEA (Droplet Execution Agent) -- and in fact, the original purpose was simply to update DEA, which is written mostly in Ruby, to a more modern environment written in Go.

DEA-GO, Coral

DEA-Go, Carl. DEA-go.

Containers

Diego is, first and foremost, an environment for containers. That said, it doesn't use Docker. Instead it uses garden, which follows the Open Container Initiative guidelines for hosting containers. Garden's documentation mostly involves the Go documentation which helps a lot if you know the format of godocs. For the most part, you won't really ever touch garden, it'll be abstracted through Diego's other components. Since garden follows the open specification, you should be able to import Docker images just as easily as if it was in the native Docker environment.

If you're used to DEA, garden gives you a few advantages out of the box. First off, garden supports more operating systems than DEA's warden. Diego and garden also makes it a lot easier to ssh into the containers involved in your application.

Management of Containers

From the standpoint of your application, here's what you need to know: In Diego, you now have the choice to push a one-use function (a Task) or a more traditional application that stays resident (a Long-Running Process, or LRP) -- a good example of an LRP might be a web server that you need always listening for traffic, while a Task may be something like a database migration as part of a release, or a task that examines recent data for something specific. Before, in DEA, you really only pushed processes that were expected to stay resident. Diego's brain and health monitor makes sure these tasks are balanced as well as possible - spreading out CPU-intensive tasks across virtual machines, or balancing memory, et cetera. While before some of this was done as part of the cloud controller, now the Diego environment handles it itself.

Getting a bit further into the trees, pushing an application to Cloud Foundry using Diego would:

So why all the complexity?

A mature deployment in a virtualized environment has several factors that keep it stable and productive - one, the service is highly available, meaning if one piece of it dies it is able to recover effectively. Two, there is monitoring in place to ensure that if one piece of it does it will recover or be replaced quickly, often without human involvement. Three, capacity issues (CPU, disk space, memory, or another factor) are handled in a proactive manner when at all possible.

The Diego environment, therefore, is set up to have a state it considers good and to have every piece of software that supports it work towards the good state. Each way, whether by health checking, balancing apps so capacity is spread out across the entire available environment, and deploying new applications to ensure its stability, all keep Diego self-sustaining. The environment seems complex, but each part is a service with a specific goal towards the greater goal.