April 22, 2013 Leave a comment
Adap.tv is an emerging leader in the video advertising sector. It is critical for us to stay on top of this fast moving industry. We needed a deployment methodology that allowed us quickly develop and release changes without disrupting our clients.
Two years ago we decided to switch to continuous deployment (CD). Today it’s hard to imagine going back. CD allows us to move fast, saves time on integration, and helps to find and fix bugs. As with anything else, the hardest part is to start. In this blog we would like to share our experiences of switching to and using the CD approach.
Probably, the most important thing is not to over-analyze or over-optimize. Don’t try to solve problems you are not facing. Implement simple solutions for problems you encounter as needed. It may sound like a cliché but many fall into that trap.
Here is a list of what you need when shifting to CD:
- Automatic validation/Immune system
- Code reviews
- Automatic deployment mechanism
You may already have all or some of this. There are different methodologies that can be applied to each of these systems. In any case, keep it as simple as possible.
- Work in small incremental changes and deploy each change as soon as possible. Last complete work day we had 28 production deployments in our system.
- It is very important to keep each change small. The change does not have to implement the entire feature. Each submitted change usually represents a small part of the functionality in development.
- Commit the code into a separate branch of source control and have one or two other engineers review the code. The exact process depends on the size of the team and how many engineers have approval rights. See code review section below.
- The Immune system runs automatically and validates the changes.
- If the immune system finds bugs or the code review was not satisfactory, fix issues and re-submit the code.
- Once a change passes the code review and immune system, merge the change into the main branch. It’s important to note that merging code into the main branch is only possible after the change was approved and verified. See code review section below.
- CD scripts monitor the main branch and automatically pick up latest changes and deploy approved and verified code to production. As the code is deployed ‘self tests’ run on each machine to make sure that the services started correctly.
- Monitor key indicators for each service following the deployment to make sure that the change doesn’t cause unexpected problems.
- Minimize Risk: There are sensitive time periods when CD will not deploy changes to production. This is done to prevent potential crises when engineers are out of the office on weekends or holidays.
Automatic validation/Immune system
In our case, the Immune system verifies each of the services via its API. A typical test involves pre-defined test-case data and definitions of expected responses. When we first started using CD, our immune system only checked the basics. It is much more developed today.
Different software components have separate immune systems. It’s important to note that some changes require validation by multiple immune systems. An example of such a component is the database. Deploying changes to the database schema mandates executing immune systems for each software component that depends on that database.
Code reviews are a crucial part of CD. Reviews not only find bugs but also help train the engineering team. Depending on the engineering culture, a code review system based on administrative action might be ineffective. Automation has to be put in place. We are using combination of git and gerrit to achieve that.
Gerrit defines three roles: review, approve and verify. The number of people who ‘Approve’ should start out limited and grow as more people gain expertise.
We use BASH shell scripts to execute the CD mechanism. The deployment script runs the immune system and distributes binaries across the cluster. On each machine we have scripts to start and stop applications once new code is deployed.
There are multiple types of monitoring:
- deployment process monitoring – sends an e-mail about the deployment process. It is a responsibility of the engineer who submitted the change to make sure that the deployment succeeds.
- business health monitoring – monitors critical business parameters. We use cacti graphs to visualize this data.
- system performance monitoring – monitors the health of the system. We use ganglia graphs to visualize this data.
Pitfalls of CD
Although our development team is happy with CD, the flip side is customer experiencing frequent changes. This is especially relevant to the GUI.
To keep continuous GUI changes from surprising our customers, we use product flags to hold back certain deployed features until our clients are ready. That lets us use CD and avoid customer confusion with frequent GUI changes. Once a feature is fully released, the product flag is retired and the code is simplified.
At the start of the process many of us had doubts that CD was practically possible. The most frequent concern was reliability. A fear that CD will cause the entire production environment to meltdown and disrupt our business. But it never happened. In fact, our rate of serious production problems decreased and the speed of business feature development and deployment increased. Another plus is that it boosted sense of system ownership and job satisfaction in the engineering team.