Moviegoers thrill to watch Tom Cruise play Ethan Hunt as he ingeniously and quickly infiltrates the most secure systems or facilities. He adroitly deceives biometric scanners and leverages unwitting accomplices. But move away from fictional Mission Impossible scenes, and every CIO will cringe at the thought of anyone so easily bypassing the tollgates they hope are protecting their real-world business systems.
The tollgates of which we speak are those control points designed to prevent bad actors (in our case, bad code) from making it any further down the path into production. The theory is that the more of these tollgates or controls we have in place, the less likely something nefarious or sloppy will make its way to production and negatively impact the systems upon which modern businesses depend.
The challenges of implementing tollgates today
In the world of software development, tollgates can be as simple as an IT leader holding a development manager accountable for code quality or as onerous as the heavy-handed Change and Release Management processes famous for long review board meetings and detailed checklists. In a world before CI/CD pipelines and automated code validation tools, these processes were manual, unpopular, and only as effective as the integrity and diligence of the players involved.
Over the past several years, IT delivery has moved beyond the on-premise (and largely manual) integration and delivery approaches, leaving behind manual tollgate process disciplines and governance in favor of Agile teams working automated release pipelines.
In parallel, application teams who have adopted DevOps often produce “full lifecycle” engineers skilled in software development but lacking the deep operations chops to understand the risks regularly delivered into production environments.
From an oversight perspective, IT management often trusts the Change process because “specialist” DevOps engineers are executing them and because they are flowing over sanctioned CI/CD pipelines.
The bottom line is that in many IT shops, modern SDLC processes and tools have exponentially increased velocity and flexibility but have also increased the risk of performance and stability incidents. This dilemma can be attributed, at least in part, to a decrease in attention to some critical non-functional requirement disciplines that were formerly the natural byproduct of separation of Dev and Ops.
The tollgate maturity roadmap: A better way to implement mature tollgate practices
Tollgates are processes or procedures put in place to examine pre-production code and the code development process to ensure both comply with basic standards to keep a system resilient, reliable, and performant. For application development, these tollgates can result in alerting or flagging for follow-up, risk scoring, and even go/no-go decision points that can “stop the train”. Tollgates can be manual or automated and almost always work best when including both.
For example, a tollgate might be a standard change management process that confirms a change record has management approval before moving forward. Of course, the rigor behind that change management process will vary based on the person, policy, risk appetite, and time allotted. Other examples (naming only a few) include architectural reviews, limitations on the number of development pattern options enabled on a CI/CD pipeline, automated code checks for compliance with standards, performance testing results, and manual release management reviews that identify change collisions.
Maturing tollgate adoption
Most organizations operate tollgates within their change management process, but the challenge is maturing tollgate adoption across your IT organization. How do you avoid the pitfalls of over-governance that will incentivize creative workarounds and only result in the need for further and more invasive governance? How do you also avoid the other pitfall of assuming your newly minted CI/CD pipeline will be enough to keep your production environment safe?
Below is a tollgate maturity roadmap broken into three phases, each with pros and cons. The image below uses color-coding to depict the types of tollgates that comprise each phase.
This phase assumes 100% manual tollgates. These controls could include written policies, management edicts, formal CAB processes, compliance mandates, checklists, manual impact analyses, test case reviews, architectural reviews, coding-standard reviews, third-party reviews. While the list of possible control points is extensive, the key is that they are monitored and enforced in a manual, human-dependent way.
Pros: By nature, manual controls can be creative, customized, and exhaustive in their scope. They have the potential of nearly guaranteeing compliance, depending on the expertise, tenacity, and time afforded the evaluators.
Cons: Manual controls can drive developers crazy and invite check-the-box, minimum compliance efforts because these tollgates fail to first win the hearts of developers as to the necessity and value of the rigorous process. Manual controls foster bad behaviors that add risks to the quality of the actual work and the culture/morale of the team. Second, manual tollgates are expensive, unwieldy, and do not facilitate velocity. Third, they are unlikely to prove consistent (due to the human factor) and can end up feeding accusations of bias.
This phase moves the majority of tollgates to automated solutions. We doubt 100% automation will ever be fully effective, but it is possible to get close with limited exceptions. Automated tollgates may include RPA or policy-rules-based scripts as part of a CICD pipeline. These tollgates can add scores to change records at various steps along the path based on things like architectural standards compliance or inclusion of non-functional requirements. These tollgates can also set flags, kick changes back to developers for remediation, move changes to architecture for review, engage performance testing for missed test cases, or route to change management agents for manual review. These automated scripts can even kick changes out of a release altogether. In post-production environments, automated tollgates can report on post-change performance, highlight change impacts to production, and even automatically roll changes back if they fail to meet a pre-set threshold.
Pros: Speed and consistency are the two primary advantages. When designed around the right non-functional disciplines and leveraging operations experiences, these can be coupled with appropriate manual reviews and make a strong production “shield.” Automated tollgates can force discipline (validation against CMDB to confirm impact radius, CMDB updates with new microservice connections, quick identification of test vs. prod configuration differences) that are otherwise difficult to enforce.
Cons: The tollgates designed by DevOps engineers may not always consider all non-functional needs impacting production stability. Consequently, in many places, tollgates tend to focus on coding standards and miss aspects like capacity, cross-application risks, and CMDB updates. A second negative consideration is the need to build a manual process in parallel. Not only will kick-outs require human intervention, but some applications may end up forced down a mostly manual path because they don’t meet critical criteria, are highly complex, or have had too many issues in the past.
This phase blends appropriate manual and automated tollgates while introducing gamification into the equation. Imagine an index that scores each application or team based on factors like history of change reliability, history of compliance with architectural standards, completion of key pieces of training, evidence of self-review, the existence of capacity planning disciplines, or proof of review with capacity managers on asset utilization. The teams that score above a certain threshold will be allowed to take the most automated path possible from design to deployment. Those who score lower would not only see their names low on a published list but will also find themselves forced to follow a more manual and stringent approval process, thus costing them and their business sponsor’s precious time. It will help if you design this scoring so that applications/teams are evaluated by as objective a means as possible.
Pros: This approach incentivizes teams to follow best practices and policies and drives quality outcomes. This approach can have a positive impact on behavior and culture.
Cons: The design of the factors to be scored, the means for gathering the data, and effective scoring can be complex. Management must select the right few metrics that add the most value and not over-complicate or add weight to an already compressed and scrutinized process. This approach presupposes the existence of both automated and manual tollgates because the primary incentive is not the absence of tollgates but the selection of the more automated path. While the long-term pros are compelling, IT management must consider the up-front workload for design and organizational change management.
Wherever your organization may be on the tollgate maturity roadmap above, we recommend focusing forward efforts with the intent of reaching Phase 3. Even if you are squarely in Phase 1, you can still develop a roadmap towards Phase 3 that identifies the factors you want to include in the gamified index and then start automating those one at a time. In fact, you don’t have to automate anything to take advantage of the concepts in phase 3. For example, you could still build an index that scores teams/applications based on factors like compliance, training, maturity, and historical code quality. Their rewards could include having their name at the top of a published list and a “bypass card” that sends them to the front of certain queues or allows them to skip selected reviews altogether.
Regardless of your current maturity, we strongly urge you to take the time and do the work on the front end to identify the RIGHT things to roll into the index scoring formula. Start small with manual tollgates. Add more and make the formula more sophisticated over time.
As you mature along this path, you will likely find that tollgates (especially from a gamification perspective) can find application beyond Release Management and perhaps play a part in governance across a broader set of the 17 ITIL Service Management Practices. We’ve already noted Capacity and Performance Management and see the potential for use with Availability Management, Asset Management, Monitoring and Event Management, and Service Validation and Testing.
We hope this article has whetted your appetite for considering a different approach to a process that has tended to be overly rigid. Governance, controls, and tollgates don’t have to be burdensome. When conceived well and executed with discipline and transparency, this kind of governance can ensure resiliency and reliability and play a crucial part in fostering a healthy IT culture that engages the heart of IT engineers in the “game” of quality control.
Looking to gain additional insight into implementing mature tollgate practices or automation solutions?Speak with a Wavestone expert today
Have a Question? Just Ask
Whether you're looking for practical advice or just plain curious, our experienced principals are here to help. Check back weekly as we publish the most interesting questions and answers right here.