Case Study: A-Telco

Building 12 Platforms in a Hurry

A-Telco — A multi-billion dollar private telecommunication services company

Over the past few years, A-Telco has undergone a series of acquisitions that required several reorganizations, some downsizing because of duplicative skill sets, and consolidation of application development and operations teams, respectively.

 

Challenge

This engagement required that 12 new platforms be created within a six week window.

  1. After realizing the benefits of their six Pivotal Cloud Foundry (PCF) platforms already in operation, A-Telco wanted to increase their investment in PCF as a way to standardize and automate their operational processes as much as possible, all in support of application development and deployment.
  2. A-Telco customers were already committed to leasing space on the additional platforms six to eight weeks from the beginning of the engagement. There was no time to waste.
  3. Because of the aggressive schedule, it was decided that Stark & Wayne would stand up the platforms and then have A-Telco personnel be responsible for maintaining the new platforms once they were operational.

At the project’s start, A-Telco pulled individuals from a variety of existing operations teams, creating new PCF-centric teams that were charged with maintaining all of the PCF platforms. The formation of the new teams presented unique challenges that ranged from the subtlety of building camaraderie among colleagues new to one another to more critical aspects of operations, such as establishing standard approaches to platform maintenance and developing workable communications protocols teamwide.

A lack of staff skilled in managing PCF added complexity and difficulty to achieving the goal. The shortage of PCF expertise among team members could lead to a host of problems in terms of operator oversight of the platform(s), following proper procedures, making the right fixes, or even communicating amongst the team. In addition, most of the newly formed PCF-centric team were primarily experienced in the waterfall approach and did not have experience with Agile development methods.

Lastly, A-Telco was holding to an aggressive timeline for standing up what amounts to 12 additional new platforms (6 PCF and 6 Pivotal Container Service (PKS)) within six weeks. This ambitious goal was necessary because A-Telco had already promised customers access to the new platform(s) by the stated due date, which made completing the project on time critical in terms of meeting their own customer commitments.

Solution

Two members of the Stark & Wayne team arrived onsite to facilitate standing up the 12 platforms (6 PCF and 6 PKS) within the required six week time frame. If a more generous timeline was available, Stark & Wayne engineers would normally focus on knowledge transfer to ensure that A-Telco operators were well trained in the operation of the new PCF platforms. Unfortunately, because of the aggressive timeline, Stark & Wayne engineers decided up front to focus on writing scripts and setting up pipelines in an effort to highly automate as many processes as possible. This approach, while somewhat unorthodox, worked to ensure that all 12 platforms would be up and running on time.

Once the vast majority of processes were automated, ensuring that the platforms would be up and running and deadlines would be met, Stark & Wayne engineers shifted their focus to that of educating and training A-Telco operators on how to reliably manage the PCF and PKS platforms properly. Since the heavy lifting of automating PCF and PKS was complete, this would allow A-Telco operators to take a hands-on approach and assume control of the platform(s) for the remainder of the engagement.

By taking a “learning by doing” stance, problems can and will occur, especially when operators aren’t yet fully trained on proper procedures while also being hesitant to reach out to unfamiliar colleagues on their team. One example is listed below:

A problem occurred after an operator made what seemed to be a normal change to Terraform without telling the rest of the team. While the change worked fine for the first operator, his deviation from proper procedure could potentially have caused more serious problems down the road.

When Terraform was used a second time by a different operator, he noticed that the program was performing more work than his initial request required. At that point, the second operator stopped Terraform and reached out to the entire team, asking about any recent changes that were made over the past week(s).

The second operator’s research eventually uncovered the procedural errors made by the first operator during the last Terraform upgrade, which explained why the program was acting oddly and what might happen if left unchecked.

While nothing permanently damaging occured, this experience helped drive home the point that clear and open communication between operators is necessary for managing these platforms effectively. Subsequently, as A-Telco operators became more skilled in maintaining the PCF and PKS platforms, the likelihood of these types of problems cropping up steadily decreased over time.

Other cultural changes that were brought about by Stark & Wayne engineers included:

  1. Emphasizing the importance of transparent, Agile-based communication principles throughout all teams when maintaining PCF and PKS platforms.
  2. Facilitating the collaboration of cross-functional work to be objective-oriented instead of task-oriented.
  3. Educating and training A-Telco engineers about best practices and working to eliminate any bad practices being used by individuals or teams.
  4. Helping to develop team dynamics through work and social activities, all of which facilitated better communication between groups.

At the end of the A-Telco engagement, the various teams were communicating better than ever before, the company had 6 new PCF and 6 new PKS up and running, and they were able to honor their lease agreements with existing clients on those new platforms.