
Imagine losing $54 million in an afternoon.
That happened to Southwest Airlines in July 2016, when a faulty network router triggered a system-wide outage that included its website and reservations system. The backup systems that were made for a situation like this faltered too, turning what could have been a technical hiccup into a week-long disaster. In the end, more than 2,300 flights were forced to be canceled or delayed. This has happened to Southwest before. A year earlier, another outage, caused by a software glitch, resulted in 800 canceled flights.
Other airlines, too, are no stranger to system outages like this. Delta, Jetblue, and United have seen their fair share of IT failures, often with losses in the hundreds of millions. What do these airlines have in common? Technical debt. Lots of it.
Of course, every organization has technical debt. As the pace of technology accelerates, so does the rate of obsolescence. Question is: how much debt are you carrying and what does it cost to service that debt?
Some organizations don’t recognize the debt they’re carrying because the costs are insidious and multifaceted. They can show up in the form of frequent service outages, which results in lost sales and poor customer service. They show up as a lack of agility, as seen when the business waits months for what seem to be simple application development enhancements that ultimately don’t work as expected. These kinds of symptoms are caused by overcomplexity, driven by poor architectural practices. They’re implemented over years of system add-ons, followed by more add-ons to the add-ons. Instead of simplifying, the IT department continues to invest in more tools and people. No one person can explain how the system works end-to-end or what is the root cause of major problems. The ultimate technical debt payment is finding out the hard way that the disaster recovery plan doesn’t work. Oops!
The interest that technical debt incurs shows up in the form of complexity. The more complex your systems, the higher your interest rate. And if the interest rate gets high enough the debt becomes unserviceable—that’s when the CIO loses his or her job. But soon, a new person comes in and exposes the technical debt problems, everyone feigns surprise, and a large sum of capital is approved to retire the debt, along with a catchy project name. Therein lies the problem with technical debt: it is accrued based on seemingly sound management decisions made over a long period of time.
It happened when the decision was made not to convert that system after an acquisition because it was going to cost too much, so you ran two systems instead of one. Add that to the decision to stretch out the life of the operating systems that are now deemed “unsupported” by the vendor, which also means unpatchable to remediate security flaws. It happened when the decision was made to defer systems maintenance in favor of delivering new business requirements. These decisions looked good at the time, but they each contributed to the increased complexity of the system—and the interest rate that you pay on the debt.
So how do you avoid getting in over your head in debt? The answer is simple. Keep your systems as simple as possible and keep complexity at bay.
How do you do that? Start by recognizing and tracking the amount of complexity in your environment. All new projects will have an impact on the overall complexity. Adding on to a system with more functionality or size typically adds complexity. Replacing architecturally non-conforming systems with ones that conform reduce complexity. You can measure this by including a Complexity Factor (CF) in your business case analysis as a kind of variable interest rate for your debt. Some projects will make your interest rate go up; others can make it go down. Recognize what impact a project is going have on the overall environment. No one project is going to have a huge impact on your CF. The point is, if you only approve initiatives that make the CF increase without approving ones to make it decrease then you’ll eventually get so far in debt that you can’t service the interest.
Work with the business to agree on a methodology that calculates CF and consider including that percentage as an added cost to project financing, which then goes into a fund to finance negative CF projects. Even if you don’t use the CF to collect funds to finance remediation projects, use the CF to communicate to the business and leadership that your environment has too much complexity and is heading into a danger zone that will eventually have consequences. Build awareness that will create a new level of scrutiny on all projects being approved.
What if your technical debt is already at a critical level? Conduct a Technical Debt Assessment (TDA). A TDA is best done by a skilled external organization who has no perceived biases and can look at things objectively. It’s necessary to drag all the skeletons out of the closet and assess them for risk and cost of remediation. Only then can you develop a prioritized plan that will get your organization back to a healthy debt position. Does your organization need one? Ask:
1. Are we experiencing a high number of outages?
2. Does it take too long to deliver new business functionality?
3. Do the estimates for new projects shock us?
If you answered yes to any of the questions above, an assessment may be due.
For Southwest, things may be looking up. Since the outage, the airline has invested heavily in a technological overhaul that includes a new reservations system. One that the airline started rolling out in May 2017 and will continue to transition to in the next three years. Hopefully, they’ll keep track of their CF and justify the necessary remediation investments so that they can avoid the next big outage.
What is technical debt costing you?
Take our five-minute Complexity Factor assessment to find out your CF score and remedial next steps.
4 Strategic Mistakes to Avoid When Defining Service Level Management Processes
Jun 01, 2023
Strategic errors made when defining service levels can have a detrimental, cascading effect on service level operational performance - leading to additional costs and service delays. Here are 4 strategic errors to avoid when defining service levels and instituting the SLM processes to govern them.
Optimizing the 3 Stages of Your Cloud Software Development Lifecycle
May 25, 2023
Your Cloud Optimization Strategy requires seamless coordination between optimization levers throughout the SDLC to produce and maintain effective cloud solutions. Discover best practices and improvement opportunities for each lever, where they fit in the SDLC, and how to synergize them effectively.
Have a Question? Just Ask
Whether you're looking for practical advice or just plain curious, our experienced principals are here to help. Check back weekly as we publish the most interesting questions and answers right here.