Making a Business Case for Paying Down Tech Debt

Laura Tacho
7 min readSep 20, 2021

--

One skill I wish I’d learned earlier in my career as a software engineer is how to talk about the outcomes of engineering projects with folks adjacent to or outside of my engineering team. Later as an engineering leader, I’ve seen engineers propose great projects but then struggle to articulate why we should do it, and more specifically, why we should do it instead of something else. Getting better at talking about the business value of your projects makes it much easier for people to say “yes” and for them to see (and measure) the impact of your work.

This is especially true when pitching tech debt paydown projects.

Tech debt isn’t “code that the engineering team isn’t proud of.” Just like financial debt, tech debt has a measurable negative impact on your team’s ability to reach other goals. In other words, it has to cause pain on a regular basis. Getting buy-in from other teams you work closely with gets easier if you can articulate the benefits of paying off tech debt in terms of cash or other concrete business metrics.

Finding your business case

It’s easy for others outside engineering to balk against a tech debt project when the benefits aren’t articulated clearly, and the whole project just seems like the engineering team scratching an itch without considering impact on users or revenue. Why do engineers get a do-over when other teams aren’t granted the same?

An engineering team is accountable for business results. Period. That means framing tech debt repayment in terms of business results.

Photo by Lagos Techie on Unsplash

There’s no magic formula to making a business case for these projects, but most of them fall into similar categories. Keep reading for a bit of a cheat sheet to help you feel more prepared when advocating a tech debt project.

Increasing the quality of automated testing

Cheat sheet: productivity, outage risk, strain on support team

This is all about productivity and risk, both of which can be translated into cash.

The risk side should be clear from your contractual SLAs or terms of service. You might even have an outage you can point to that was the result of missing tests. The bottom line here is that your customers pay you for a service that is available. If you fail to meet that, it could mean you’re on the hook to repay a percentage — sometimes up to 100% — of a monthly subscription or user license fee. Even worse, those customers might churn completely. How can your project de-risk that, or recover revenue?

On the productivity side, how much time is wasted when engineers have to try to figure out if the test is just flaky, or if there’s a real problem? Do your teammates have to babysit deploys because no one trusts that the tests will actually fail if something’s broken?

Improving your CI/CD pipeline, or adding monitoring instrumentation

Cheat sheet: productivity, efficiency, outage risk, outage recovery time

Building software without fit-for-purpose CI/CD is like building a boat on land. How’s it going to get where it needs to go?

CI/CD improvement projects make a big difference in developer productivity, which just needs a bit of math to figure out the financial cost. I often talk about these projects in terms of the productivity tax we have to pay on every future project if we don’t do this. Let’s say we’re proposing project that will take half a sprint and will cut our build times by 25%. Doing this project sooner rather than later means that all future projects can be done slighly faster because of better tooling. There will be a breakeven point where the project easily pays for itself in time saved.

CI/CD improvements are also critical if you’re operating a SaaS and a have an on-call rotation. For some companies, recovering from an outage 5 minutes faster can mean some serious cash. And what about being able to revert a bad deploy 5 minutes faster.

In terms of alerting and monitoring, how much money is lost for an outage of 30 seconds vs. 3 minutes or 30 minutes? Engineering should always be the first to know when something is broken, not your support teams pinging engineers on Slack.

If you’re not yet tracking your Mean Time to Repair/Recovery (MTTR), that’s a good place to start and helpful for debt repayments framed this way. It’s hard to say no to a project that will halve the amount of time it takes to recover from an outage.

Refactoring crusty old code

Cheat sheet: outage risk, outage recovery time, efficiency

If you’re operating a SaaS and have an on-call rotation, every minute of an outage or period of degraded performance can cost you real cash. Code that doesn’t adhere to current style guides or is just generally spaghetti code and really hard to debug is a risk. Aside from emergencies, if that code is likely to change often, or if it’s a critical part of the application that frequently needs debugging, articulate the amount of time wasted by not having this refactored. If it takes an engineer an extra 10 or 30 minutes to figure out what’s going on, what does that mean for your SLAs during an outage? What about routine development during a non-emergency time?

Another business case for this type of refactoring is if you’re about to onboard new people to the team. Getting up to speed on a codebase it hard enough in the best of circumstances, but adding in inconsistent coding styles makes it a big headache and wastes a lot of time. What else could those new hires be doing instead of trying to figure out some crusty code from a few years ago?

If you’re not using a linter in your PR process, it’s worthwhile to add one sooner than later. It saves reviewer time by automating feedback, and is a decent way to keep styles consistent, depending on which type of linter you use.

One other thing to consider is how bug and outages impact other teams, specifically your support team. If you have a business-critical use case that’s not well-tested, you’re leaving yourself open to the risk of high-impact bug in production, or at worst, an outage. This puts strain on your customer support teams, and also makes it more difficult to predict engineering team velocity because the amount of unplanned work is unpredictable. All of that costs money. How much time did your support team spend responding to customer inquiries about something related to your debt repayment project? How much cash is that equivalent to?

Refactoring to removing duplicated code

Cheat sheet: outage risk, outage recovery time, efficiency

Developers are some of the highest paid folks in the company. Tech debt slows teams down. Articulate the benefit in terms of time spent. Let’s imagine one of your internal APIs has some duplicated endpoints. If this is in a part of the product that changes often, every time you update anything that hits those APIs, your team is doing double the work. Those hours quickly add up, especially if it’s a part of your application that changes frequently. A project to fix that quickly pays for itself.

Quality of life improvements

Cheat sheet: satisfaction and happiness, team morale

There are some cases where developer satisfaction is the business case. No one wants to live in a house with broken windows, even if it never rains. Some tech debt is just plain annoying and can really have an impact on motivation on your team. Retention is a useful way to get buy in for these projects which may not fit into a clear revenue-based business metric. Offboarding, onboarding, recruiting and hiring are all expensive activities. A project is worth doing it if it keeps your team’s morale in a healthy state. Using a tool like Peakon, Officevibe, or 15Five can give you some quantifiable metrics to move when advocating for a project like this.

Coming to the conversation with clear and measurable outcomes helps the rest of the team view the project as something that brings value to the business. You need to show not just why the project is worth doing, but why it’s worth doing ahead of other projects that might be more tightly coupled with revenue.

The cases against paying off tech debt

I’d be remiss for not mentioning the circumstances where there really is no great business case to pay off debt: “good” debt. You want to take about debt if it lets you get to market faster or make investments in things that would otherwise be out of reach. If those debts are hiding in places in your code that are very unlikely to change and not causing ongoing pain, it’s best to focus your repayment efforts elsewhere — just like you might only make a minimum payment on a loan with a .5% interest rate, because investing money will give you a 3% return.

As a senior+ engineer or an engineering leader, it’s your responsibility to distinguish good debt and bad debt, and be judicious about which tech debt projects are up for consideration. This isn’t an excuse to be sloppy or skip over testing and alerting — those are real problems that pose real risk — but some spaghetti code in a feature set that’s not changing frequently is better off on the backburner.

🎉 My course on developer productivity metrics will help you find the right metrics framework for your org — and help you figure out what to do with the metrics.

--

--

Laura Tacho
Laura Tacho

Written by Laura Tacho

VP of Engineering turned engineering leadership coach. I moved off of Medium to lauratacho.com

Responses (1)