Efficient Resource Use at Lower Costs
This is how we supported Red Hat in identifying how best to utilize their resources and optimize costs associated with Amazon Web Services, based on the metrics we received from monitoring and health check tools.
About the customer
Red Hat is a leading provider of open source software products to enterprises, most notably Linux, cloud, container, and Kubernetes technologies. Its broad portfolio includes hybrid cloud infrastructure, middleware, agile integration, cloud-native application development, and management and automation solutions.
years in operation
One of Red Hat’s internal teams has a number of scheduled automation jobs that are used to perform different day-to-day tasks. After analyzing the metrics and data received from different tools regarding their infrastructure and resource usage, the Red Hat team realized that they could effectively cut some costs through optimization. They brought in Axelerant to help them achieve this goal.
The customer had all these jobs running on Jenkins, and they were mainly using Ansible to perform the required actions—but the hosts on which these jobs ran were configured using Puppet. They were interested in migrating the entire configuration management code from Puppet to Ansible. They also wanted to use GitLab CI jobs instead of Jenkins jobs.
Their objectives were to:
- Only use Ansible for any kind of orchestration, provisioning or configuration management tasks
- Migrate all jobs from Jenkins to GitLab CI
There were a few other considerations our team had to keep in mind:
01. Maintain Coding Standards While Migrating Puppet Code
02. Allow Multiple Instances of the Same Job to Run Simultaneously
|03. Clean up the Entire Stack With Each Run Without Leaving Debris|
They had to manage job schedules and re-run jobs in case of failures, so it was necessary to have idempotent code so that jobs could be run repeatedly within the same sessions.
To allow for this, our team had to define the naming convention for jobs so that even if the same job ran multiple times, each job could have its own unique infrastructure and resources.
The customer also wanted to ensure that every run cleaned up the entire stack without leaving any debris, which included AWS instances, ELBs, S3 buckets, Route53 records, etc.
Axelerant’s skills and experience with Amazon Web Services (AWS), Ansible and GitLab CI technologies made us the right choice to support Red Hat through this implementation.
Following the agile methodology, Axelerant’s DevOps team discussed and planned the migration with the customer. After a few discussions, we defined some aspects, like what type of infrastructure needed to be used for these jobs, what kind of AWS instances were needed, what schedules needed to be used to run these jobs, etc.
AWS Spot Instances for Dynamic Environments
After analyzing all the data, we realized that these jobs only utilized their respective environments during specific periods of time (when they are scheduled to run). All resources remain idle for the rest of the time. However, the customer was paying for these resources on a 24/7 basis, which created a lot of wasted resources.
Based on the requirements, we decided to use AWS spot instances for dynamic environments that are used by these automation jobs. We wanted to have robust Ansible playbooks that could be used to orchestrate, provision and destroy the environments, based on the job’s requirements. Along with that, another set of Ansible playbooks was needed to perform the task done by the jobs.
As Ansible was Red Hat’s in-house tool, they had a preference for using Ansible when it came to any kind of orchestration, provisioning and configuration management.
Since cost reduction was one of the main requirements, and the environments were needed only while they were being used by a job, AWS spot instances were the best fit.
Since the Red Hat’s team was managing all its code repositories on GitLab, so we recommended they migrate all jobs from Jenkins to GitLabCI to better utilize GitLab’s features.
Since there were multiple jobs and each job had its own infrastructure requirements, configuration requirements, execution steps, etc, we started with one job at a time.
Axelerant’s DevOps team was familiar with most of the technologies involved, like Ansible, AWS and GitLab CI. They began the AWS infrastructure orchestration using Ansible playbooks.
They had to ensure code reusability so that the same code could be used for multiple jobs—e.g. creating an Ansible role for AWS instances provisioning, which can be used for other jobs as well. The code also needed to be idempotent so that it wouldn’t fail the execution during repeated executions.
Our team helped Red Hat reduce costs by using dynamic environments and AWS spot instances, along with other technologies.
Each job uses its own dynamic environment, so that multiple executions of a single job can run in parallel.
Everything on Ansible
The customer wanted to move orchestration and provisioning to Ansible and eliminate Puppet; our team helped achieve this.
Optimizing the jobs and the infrastructure helped reduce the total execution time required for each job.
- Migrating Puppet Code to Ansible
- Reducing Job Execution Time
- Ensuring Thorough Cleanup Jobs
- Code Quality and Best Practices
Migrating Puppet Code to Ansible
One of the biggest challenges involved in this engagement was migrating the existing Puppet code to Ansible. Red Hat’s team had been using Puppet for a long time, and it was their core confirmation management tool. Their complex infrastructure and custom confirmations made it a challenge to understand and migrate the existing Puppet code to Ansible.
Ansible was the right candidate because of the way the infrastructure at Red Hat needs to be reprovisioned often. There was also a sequence of tasks that needed to be executed, which prompted our team to implement Ansible.
Reducing Job Execution Time
The client also wanted to reduce the execution time as much as possible, while optimizing resource utilization.
Our team addressed this by designing and configuring infrastructure and Ansible playbooks in such a way as to obtain the best performance and efficiency.
Ensuring Thorough Cleanup Jobs
The customer wanted to ensure that each and every trace of a particular job run, e.g. instances, S3 buckets, ELBs, etc, was cleaned up. There were multiple reasons for this, like Ansible module failures, AWS API issues, etc, which made it difficult to always rely on a single run of cleanup jobs.
Our team had to address any failures and re-run cleanup jobs while ensuring that any particular job didn’t impact any other jobs or infrastructure.
Code Quality and Best Practices
Maintaining code quality and following best practices were always a priority for the customer as well for our team. Using Ansible best practices, we prepared a code base that was idempotent and robust, and we were able to use most of the code across different jobs without having to rewrite it.
Our team also invested their effort in identifying the best infrastructure and configuration settings. They tested, monitored and gathered information about the best infrastructure and configuration settings for these jobs to best utilize the resources and optimize the performance.