Learn to fail fast? Technologists fail all the time

15 Jan

Photo credit: Kind and Curious via Unsplash

From time to time, organisations attempt to learn new ways of working. They attempt to become digital or agile or data-driven or innovative. These attempts come with some familiar ideas: that we should execute through cross-functional teams who are empowered to experiment. One of these ideas is that we should not be scared of failure, and that we should learn to fail fast.

These attempts sometimes elicit eye rolls from the technology teams, especially the idea that we should embrace failure. This is not because these ideas are invalid: in fact, they are welcome to technology teams, and reflect their preferred ways of working. However, technologists have a different relationship with failure than non-technologists.

For non-technologists, being willing to fail fast typically means that they have the appetite to conduct some experiments and try some things out, with the knowledge that not all of them will work. They accept that some of them will fail, select the ones that they are confident in, and get on with implementing and scaling them. Failure is a fleeting visitor at the beginning of the lifecycle: it is welcomed, accepted, and shown the door.

For technologists, failure is a constant companion, from the first day of a new project to the day when the system is decommissioned. If they roll their eyes at the suggestion that they should learn to embrace failure, it is because they live with failure all the time. Projects go wrong. User needs are poorly expressed and poorly understood. Partners let each other down. Writing code is a continuous, humbling demonstration of human fallibility. Hardware fails and software fails. When it doesn’t fail of its own accord, bad actors try to make it fail for us. Users subvert interfaces, and developers subvert data structures and APIs. More than half the work of designing and building any system is anticipating what can go wrong and setting up measures to deal with it - and then something goes wrong anyway.

Failure is so deeply embedded in the practice of building and running software, that, for technologists, the notion of being willing to embrace failure becomes meaningless. What would be more meaningful would be a willingness to embrace unpredictability. Experimentation in the early stages of a project is useful because, without conducting experiments, we cannot predict which approach will be most successful. But this unpredictability does not go away when experimentation is over: it continues through all stages of the lifecyle. We do not know how long software will take to build until we try to build it. We do not know how many bugs it has until we run it. We do not know exactly which bit of hardware will fail until it fails, and we do not know how we will be attacked until the attack takes place.

Well designed systems and processes tolerate failure and unpredictability. Organising work in a backlog, and releasing on a regular basis allows us to make progress, even though we don’t know exactly what will be in every release. Spreading workloads across hardware and across regions allows us to survive crashes and disasters. And making our post-mortems blameless enables us to learn from what went wrong without losing the faith and dedication of our teams.

For technologists: when our non-technical colleagues declare a new found appetite for failure, we owe them a more useful response than simply rolling our eyes. We should take the opportunity to share the realities of building and running systems, and the ways that we deal with failure all the time. We should take in good faith the willingness to consider new ways of working - and explain just what that means.

For non-technologists: when you embrace failure at the experimentation stage of the lifecycle, you should regard it as a step on a journey. By recognising the need for experimentation, you are acknowledging one part of the inherent unpredictability of building and running systems. If you listen to your technical colleagues, they can show you how this unpredictabilty runs through all their work - and what it means to live with it. It may change how you react when things go wrong later in the project.

We can make better systems, better teams and have a better time at work if we understand failure and unpredictability - and take the trouble to understand each other.

technology advocacycorporate liferisk

David Knott

Learn to fail fast? Technologists fail all the time

It’s more complicated on the inside than it is on the outside

Are LLMs the air fryers of AI?