Keeping the lights on

Photo Credit: Luis Tosta via Unsplash

Do you have a line in your IT budget which says something like ‘keep the lights on’ or ‘keep the show on the road’, or ‘maintenance’ or ‘support’ - or even just ‘run?.

Over the years, I have put together IT budgets with at least one of these lines in. They’re a convenient way of signalling to finance people, hovering over their printouts and spreadsheets, ready to wield the red pen or press the delete key, that they’d better not strike out this item. They don’t have to understand technology to understand that, without this money, something will break. They’ll probably ask us to swallow inflation, or to trim around the edges, but they’re unlikely to cancel it altogether.

While this arrangement can be convenient (the IT department gets its money; the finance department does not have to understand technology), I do not believe that it is healthy or helpful. I believe that we would do a better job if we explained exactly what we spend this money on, why it matters, and why it gets more complex and more difficult every year.

In the physical world, keeping the lights on is understood as a serious endeavour. Proverbially, we take the plug in the wall for granted, but we are also aware of how quickly things would get unpleasant if that plug suddenly ceased to work. Furthermore, we have a mental image, even if only a vague one, of what it takes to keep that plug working. We imagine giant towers belching steam, wind turbines spinning, and cables stretching across the country.

By contrast, in the digital world, the phrase keeping the lights on is often dismissive. It’s shorthand for a bunch of stuff that happens behind the scenes that is apparently important because the IT people tell us it is. But most people have no clear mental image of what this stuff could possibly be. Perhaps lights blinking in the darkness? Some fibre optic cables? Somebody at a keyboard?

It’s worth reflecting on what it takes to keep the lights on for digital services, and why it has become harder. Near the beginning of this century I was responsible for the infrastructure operations of a mid-size global organisation. We had a single data centre which contained a mainframe, a set of Unix servers, some networks and storage arrays. Outside the data centre we had PCs and laptops, distributed servers running collaboration tools and file sharing, network contracts and a VPN setup. We had a few services connected to the Internet, but not many. We made changes to our in-house software about once a week, with the usual change boards and controls. Compared to today, things were simple.

And yet it was hard to keep things running even for this architecture. We had the usual 3 am callouts, failures in global networks, machines drifting out of support, and emergency patches which we had to push to the world. We had to figure out how to recover our single site if it suffered a physical disaster. Keeping the lights on felt like a treadmill.

It might seem to someone outside the world of technology that it must have become easier to keep the lights on in the intervening decades. After all, physical infrastructure is more advanced, global networks are stronger, service providers are more mature, practices have evolved - and we have the cloud.

However, we also have a much richer and more complex technology architecture than we had before. In the early 2000s, we had physical machines and VMs: now we have VMs, containers and serverless functions - and sometimes even have use cases for dedicated machines. We have abstracted our physical infrastructure away to the cloud, but we now need to learn the concepts and configuration of multiple clouds. We have more choices about how to host and manage our data, but those choices can be bewildering. We have automated and instrumented our change pipeline, but now need to support a stream of constant change. We can connect our services to a global public network, but also expose them to attacks from all over the world. We have access to new AI services, but need to make choices down to the chipset level.

I should be clear: all of this richness and complexity is good. It is part of our continued exploration of the potential of computing - an exploration that is still only at its very beginning. But we should not forget that, every time we add some new capability, we also need to figure out how to keep it running. We make the job of keeping the lights on more interesting - but also more challenging and more complicated.

In the end, I think that there is no harm in retaining the phrase keeping the lights on, but, rather than using it as something to hide behind, or as shorthand used to dismiss the work of dedicated people, we should use it recognise the work of digital and data professionals as a difficult, complex, civilisation-sustaining effort, worthy of respect and understanding.

Previous
Previous

Laziness and improvisation: two programming superpowers

Next
Next

Three reasons to learn to code