Playing the triangle: breaking down technology risk
What’s more risky? Building a hotel on the side of a volcano, or trying to deliver a software project?
Years ago, while working for a bank, I heard a talk from a colleague in the Structured Finance team: this team created complicated lending structures for projects that carried high degrees of risk. He told a story about a loan for a hospitality business that was building a new resort on the side of a volcano. Unsurprisingly, that project required some complicated risk models.
I put up my hand and asked, ‘How do you model risk for IT projects?’
The banker smiled and shook his head.
‘We don’t,’ he said. ‘Far too risky. They fail all the time and we don’t know why.’
At the time, I was surprised. I didn’t think that the work that me and my team did could possibly be as risky as trying to do construction on the side of a volcano. I imagined cranes swinging heavy loads past wind-swept palm trees and an ominous smoking crater, and compared it to a group of people sitting in offices, typing on keyboards and sipping coffee.
But I was thinking about the problem incorrectly: the banker’s models did not attempt to reflect differences in the physical danger of the work: they abstracted all of that away, and tried to discern the probability of project success. And they found that it was possible to make judgements about volcano-based construction, but not possible to make judgements about software development - at least not sufficiently to price a loan.
Today, I have worked on enough software projects to know that the banker was right. Furthermore, I think he did not go far enough, and made a mistake common to many people considering IT projects: he only considered one dimension of risk.
We face many types of risk when we build and run software, but I believe that we can group them into three categories: risks to the success of the project; risk to the operation of the service in production; and risks to the performance of the organisation. We can neatly represent these categories in a triangle:
The Technology Risk Triangle
This representation works because, in traditional technology projects, these risks pull in different directions: acting to address one usually makes the others worse. This tension is exacerbated when they are owned by different people: when the sponsor cares most about the performance of the organisation; when the project manager cares most about the smooth operation and completion of the project; and the operations team cares most about performance in production.
When this tension manifests, we see the typical pitfalls of IT software delivery: the pitfalls that made my banker colleague so reluctant to make judgements about the likelihood of success. During the initial development project, risks to project delivery dominate, even to the point of irrationality. The project becomes an end in its own right, and short term goals such as a particular release, launch of a specific feature, or even just getting through a project board meeting take priority - even when achieving these short term goals means compromising production stability or jettisoning requirements essential to organisational performance.
Servicing project risks is not the only source of irrationality, though. In order to protect themselves - and production services - against compromise, operational teams often introduce elaborate and complex approval processes which try to ensure that only good software gets released. Except that these processes reduce velocity, make releases infrequent and fragile, and encourage project teams to find creative ways to evade them.
And the sponsor is left wondering why it seems so hard for the project to get anything into production, why production services seem to break all the time, and why the business performance outcomes they were promised remain elusive. No wonder my banker colleague chose volcanos over software.
Of course, there is an answer, and it is an answer we have known for some time. Whether we call them DevOps teams, Agile teams, squads, pods or some other term, we know that multi-functional teams, containing people who care (and are rewarded) about all dimensions of risk and success, can achieve speed, reliability and outcomes - and that these can be mutually reinforcing rather than in tension. The problem is that, despite the proven value of this approach, we still rarely achieve it, and easily revert to classical project structures - with all the tensions of the risk triangle.
As technologists, we must continue not only to make the case for the value of technology - but for the value of doing it well - and that means organising ourselves in ways that work, even when they contradict established structures.