According to DevOps guru Gary Gruver, large, tightly coupled architectures are the biggest cause of software delivery inefficiencies in the enterprise. In this article, Gary and Plutora discuss processes and metrics for increasing quality of complex delivery pipelines.
The tightly coupled architectures of enterprise applications add significant complexity to the release landscape when the business demands both increased velocity and improved quality. Major releases require extensive coordination and communication to link customer facing systems of engagement, back-end business critical apps on legacy systems of record, and everything in between. In contrast, loosely coupled architectures are composed of well-encapsulated components, reducing the risk that changes in one component will require changes in any other component.
In his book, Applying agile and DevOps at Scale, Gary Gruver talks about the differences between the application deployment pipeline of a loosely coupled architecture vs a tightly coupled architecture. In loosely coupled architectures, agile release trains achieve quality validation independently. He uses multiple parallel train tracks to illustrate this delivery independence. And parallel lines never cross, right? Nice and clean.
The reality for the enterprise though, is a large, tightly coupled architecture with multiple interdependent pipelines. While these pipelines may run in parallel initially, eventually they converge as the test environments become increasingly complex to reflect the final production state. If one pipeline gets stalled, the flow of the entire release is jeopardised. Not so nice, not so clean.
In another book, Starting and Scaling DevOps in the Enterprise, Gary Gruver writes:
“The biggest inefficiencies in most large organisations exists in the large, complex, tightly coupled systems that require coordination across large numbers of people from the business all the way out to Operations.”
We sat down with Gary recently to discuss the challenges of application delivery in the enterprise. Since we were talking with a DevOps guru, it’s no surprise the main topic was pipeline inefficiencies. We specifically focused on the dev to test handoff and test environments for our interview; topics near and dear to our hearts as our customers tell us these are massive but ironically often unrecognized bottlenecks. The World Quality Report highlights the breadth of this challenge:
- 42% identified “test data and environment availability” as the main challenge in achieving desired level of test automation.
- 46% of enterprise customers cited “lack of appropriate test environment and data” as key challenges when applying testing to agile development.
Test automation may be adopted, agile or scaled agile methodologies honed, but if test teams can’t get access to a properly configured test environment, the delivery flow stops. And waits. This brings to mind one of the core principles of scaled agile (SAFe). Systems thinking highlights the need to understand and optimise the full value stream, and one can find many soundbites to emphasise this point:
- “Any improvement before or after the bottleneck is a waste of time”
- “Optimising a component doesn’t optimise the system.”
- “A system can’t evolve faster than its slowest integration point.”
Here’s part of our discussion with Gary Gruver:
Plutora: In your book starting and Scaling DevOps you state:
“There is nothing more demoralising for these small, fast moving teams than having to wait to get an environment to test a new feature or to wait for an environment in production where they can deploy the code.”
What are some of the key challenges in getting a test environment up and running?
Gruver: “Test teams face a lot of constraints with regards to test environments…How do I make sure everything is configured correctly, deployed correctly, ready to test, and the database is synced…This can be very challenging. They end up running tests that aren’t passing that have nothing to do with the code, but instead are a result of a test environment issue.
You want to find defects as close to the developer as possible to get them fast feedback. However, it’s backwards for a lot of orgs that do a majority of their testing in complex test environments. Once you get code into a large enterprise test environment the ramification of failure is far more profound. A defect can impact a majority of your delivery pipeline…yet not at all be associated with code.”
Plutora: Tell us a little about the process of improving efficiencies when testing in a complex enterprise environment.
Gruver: “First thing we look at is cycle time…the time it takes from business idea to working code. When you force an organisation to build and test on a more frequent basis they find issues that have been around for years. I ask them to map out deployment pipeline — dev, unit testing, regression testing, UAT, performance testing, etc. Then we try to map out how long it takes for code to flow from dev to the environment to production. Lots of orgs spend more time and effort getting the big, complex enterprise environments up and ready for testing than they do actually writing the code.
To really develop a solid, streamlined testing process though, the metrics that I try to capture at a customer are typically hard for them to get. As opposed to measuring productivity, I’m more in the mode of trying to identify and highlight waste to take it out of the system.”
From Starting and Scaling DevOps in the Enterprise:
“Developers waste time triaging defects that were thought to be code but ended up being issues with data, environments, and deployments…The QA org wastes time and energy testing on a build that has the wrong version of code, is in an environment that is not configured correctly, or is unstable for other reasons…The release team wastes time trying to find all the right versions of the code and the proper definition of the environments, and getting all the right data in place to support testing…”
Plutora: In your book you talk about the importance of establishing quality gates to keep defects out of larger, more complex test environments. Can you provide some additional insight on gating code?
Gruver: “For organisations with tightly coupled architectures, it’s important to build up stable enterprise systems using a well-structured deployment pipeline with good quality gates. Large test environments are incredibly expensive and hard to manage. As a result, they are not a very efficient approach for finding defects. You have to establish quality gates, where release teams are not allowed to move code further along the pipeline until they pass specific tests.”
Plutora: What’s the process for introducing gated code into a delivery pipeline?
Gruver: “Find out what are the sources of the apps or subsystems that are breaking most frequently. You’ll want to start by gating those apps to find and fix defects before moving code onto the next, more complex test environment.
Create a Pareto chart as to why tests are failing. When you have those metrics for each environment, use different tags for the various defects. Then you can see all the waste that can potentially come out of the system. This sounds easy, but it’s so rare that people step back and look at how their organisation works in order to put metrics on it. There’s never time to step back and look at everything.”
Plutora: How do you identify areas for improvement in the dev to test handoff?
Gruver: “First, I tell teams to answer this question: What percentage of defects are you finding in each of your environments? For example, if 90% of defects occur in the initial test environment, the system is good, meaning the dev team is receiving quality feedback. If you are finding a majority of your issues on the right side of the pipeline, closer to production, then feedback to dev is not that valuable as triage and root cause is much more complex. By the time you are in a complex environment, you should only be testing the interfaces.
The next question to ask is: What are the different types of issues found in the different test environments? Are you finding environment issues, deployment issues, problems with automated tests, the code? We’ll look at test results from two perspectives to understand the best approach to triage. First, we look at whether the same test has passed from build to build, then we’ll look at how well the same build is performing as the test environment complexity increases from one phase of the release to the next. To do this, each stage of the deployment pipeline needs a stable test environment for gating code.”
Plutora: Besides incorrect configurations, what other problems have you run into with test environments?
Gruver: “I work with many organisations that are on a journey to test automation, meaning they still have manual testing left in the system. It takes work to make sure tests are can be triaged, are maintainable, and can adapt as the app is changing. One organisation I’m working with now runs 3000 automated tests on a weekly basis. Recently, a release train was held up not because there was a problem with the tests, but it turned out that a test environment didn’t have enough memory and CPU power to run the automated test.”
Plutora: In your book, you state:
“For large tightly coupled systems, developers often don’t understand the complexities of the production environments. Additionally, the people that understand the production environments don’t understand well the impact of the changes that developers are making. There are also frequently different end points in different test environments at each stage of the deployment pipeline. No one person understands what needs to happen all the way down the deployment pipeline. Therefore, managing environments for complex systems requires close collaboration from every group between dev and ops.”
What is your advice for managing test environments in tightly coupled systems?
Gruver: “The DevOps journey towards efficiency must involve getting test environment configurations under version control, that way everyone can see exactly who changed what and when. Then there’s no need to hold big meetings like a scrum of scrums because everyone can see code progress down the pipeline to understand status. Also, developers need faster access to early stage test environments, so they can validate their changes and catch their own defects. Success really requires being able to provide environments with cloud-like efficiencies on demand.”
The Goal, by Eliyahu Goldratt.
About the author, Lenore Adam
Lenore has over 15 years’ of experience in product management and marketing for enterprise hardware and software product lines at HP, Cisco, and Micro Focus. She is a Sr. Product Marketing Manager for Plutora in the San Francisco Bay area.
About Plutora Environments
Plutora Environments helps delivery teams schedule, manage, and maintain test environments. Plutora has helped customers increase test environment productivity up to 140%, and better predict and remediate environment contentions. Watch our webinar with Gary Gruver: Continuous delivery at scale: metrics, myths, and mcloud computingilestones.