- agents
- AI
- ambiguity
- architecture
- augmented reality
- books
- bureaucracy
- career
- change
- Christmas
- cloud
- collaboration
- communication
- compliance
- corporate life
- data
- decisions
- delivery
- devops
- disagreement
- end user tools
- ethics
- failure
- fear
- fundamentals
- government
- halloween
- history
- humans
- hype
- identity
- inclusion
- infrastructure
- innovation
- language
- leadership
- learning
- legacy
- management
- measurement
- mental health
- money
- networking
- New Year
- operations
- philosophy
- physics
- platforms
- prediction
- privacy
- process
- procurement
- products
- programming
- quantum
- reliability
- resilience
- risk
- science fiction
- security
- shadow IT
- space
- strategy
- talent
- teaching
- teams
- technical debt
- technology advocacy
- testing
- thinking
- transformation
- TV
- virtues
- vision
- writing
Is your technology solution a well behaved house guest?
‘How many devices do you have in your home connected to the Internet? One? Three? Five?’
It was 2010. I was attending an internal conference within the technology department of a large bank. The leader of the digital team was illustrating the rapid expansion of the Internet, and the importance of digital customer experience.
Most people put their hands up to show that they owned a few connected devices. Most put their hands down by the time the count rose to five.
I considered my answer to the question. I had a work laptop in my bag. I had an iPhone and a work Blackberry. I had two PCs at home with a broadband connection. My wife had an iPhone too. We had six connected devices in our house (sometimes). I was at the top end of the spectrum; perhaps I even counted as an early adopter!
Do you need an umbrella or a lifeboat?
How do you prepare for things that just keep on going wrong? And how do you prepare for the day when everything goes wrong?
Last week I attempted to distinguish between reliability and resilience, claiming that reliability is the ability to keep services running despite routine failures, while resilience is the ability to restore essential services despite unexpected catastrophes. But that basic definition is not quite enough to disentangle these related but distinct topics. In this article, I’ll explore three more important differences between reliability and resilience.
Reliability protects normality; resilience strives for survival
Do you know the difference between reliability and resilience?
If you want to know the difference between reliability and resilience, look to the Moon. Specifically, look at the two best known Moon missions, Apollo 11 and Apollo 13.
Although Apollo 11 was famously successful, this was almost not the case. In the final minutes of the descent, the guidance computer crashed repeatedly, throwing error after error at the two astronauts, who waited tensely on instructions from Mission Control, and wondered whether to abort the mission.
Later, it was found that the computer was receiving unexpectedly large amounts of data from one of the instruments, overloading its memory and processing capacity. Yet, despite the nerve wracking series of errors, the computer behaved exactly as designed. When overwhelmed, it displayed an error message and restarted itself, giving priority to the most important programmes.
The human path to legacy modernisation
‘We just need to change the funding model.’
I heard that phrase many times when leading architecture teams, and it always made my heart sink.
It was normally said with good intent by architects who were working on a problem which required long term, multi-year investment to build shared assets. Unfortunately, resources (money, people, leadership attention) were allocated to meet the focused needs of business divisions, rather than to create enterprise capabilities. Hence the lament that, if only we could change the funding model, we could give the enterprise the architecture that it really needed.
Old is not bad. Are you modernising legacy systems for the right reasons?
The Voyager 1 spacecraft, launched in 1977, has been flying through space for most of my life, and for longer than many people today have been alive. It was alarming therefore, to read reports that it was being shut down - and reassuring to find that those reports were somewhat exaggerated. Voyager is powered by the radioactive decay of plutonium, and it is starting to run out: power output is 40% less than at launch, so NASA is shutting down some systems to keep others operating into the 2030s.
I was also intrigued to learn that Voyager 1 is currently on its ‘extended mission’. Its original mission, to gather data on the outer planets and moons, was completed in 1980: the extended missions is effectively just . . . keep on going. And Voyager has kept on going for another 42 years and counting.
Is that AI doing what you think it’s doing?
I can remember my first experience using image search on my photo library, a long time ago. I noticed that a new tool had appeared in the UI, prompting me to search for things that might appear in my photos. It seemed to work like magic: when I searched for castles, it found castles (I have visited a lot of castles). When I searched for seascapes it found seascapes (I live near the sea).
Years later, image search doesn’t seem like magic any more: it’s just another everyday miracle that has become mundane. Because it has become mundane, it is easy to fall into the trap of believing that we understand how it works: that it understands and interprets the world in the same way that we do.
Should you try to take your cloud with you?
In their 1973 research paper Availability: a heuristic for judging frequency and probability, Daniel Kahneman and Amos Tversky introduced the availability heuristic to the world. Roughly stated, the availability heuristic says that, when making decisions or judging probabilities, we give disproportionate weight to information that comes readily to mind. For example, if lots of people who live down my street drive a particular make of car, I may judge that that make of car is very popular, when it may just be that I live in a street with people of a certain wealth, age, social class and so on. Or maybe I live next to the owner’s club.
I believe that the availability heuristic may be responsible for some of the ways we think and talk about one of the biggest shifts under way in enterprise technology today: the shift from on-premise infrastructure to public cloud platforms. I think this because much of our talk about public cloud is dominated by concepts of movement: we talk about portability, about vendor lock-in, and about various types of exit strategies, to a much greater degree than we do for our on-premise infrastructure. And this is because, right now, the story of public cloud is a story of migration. The adoption of public cloud is still in its early stages across most industries, and much of the work that is being done is to move workloads and data from one platform to another. The idea is prominent in our minds, and leads us to dwell on the thought: ‘If I have to put this much effort into getting in, then what will I need to do to get out?’
Another technology architecture trap: mistaking the means for the end
I once worked as part of a technology architecture team where we really cared about reusability. We believed that projects would go faster, architecture would be simpler and services more stable, if only we could design and build more reusable components. We also believed that our architectural sprawl would be tamed, or at least contained, if we could get people to stop building the same things over and over again.
The problem is that, while reuse (or just plain use) can be relatively easy to measure (you can check how often a library is imported or an API called), measuring reusability is much more difficult. How can you tell whether a product or service is likely to be used before it is released? It might seem obvious that the true answer is that you can’t, but nevertheless, we attempted to invent all sorts of metrics and attributes which would signal reusability, and then we tried to test against them as part of our architectural governance process (I do not claim that every architectural governance process I have been a part of has been a good use of time).
Do you know whether you are building a platform or a product?
Many years ago, I was working as a solution architect for a large organisation, designing a system for scanning and processing inbound physical mail (I said that it was a long time ago). One of the things I had to figure out was how to host the system. The default approach at that time would have been to size, spec and procure some physical servers, but the company I was working for was just building out its first shared virtual machine infrastructure (once again, this was a long time ago), so the project team and I decided that that was what we should use. After all, why go through the delay and difficulty of getting physical equipment bought and commissioned, when we could consume pre-provisioned capacity from a shared facility?
Of course, things didn’t quite work out that way. The VM farm wasn’t quite as ready as I thought it was. And there was some enabling infrastructure that needed to be put in place. And the first adopter was expected to pay some of these costs. And the scanning and processing software wasn’t quite as ready to run on VMs as the provider thought it was. What was supposed to be a project about handling information, figuring out customer needs and routing work turned out to be a project about getting infrastructure in place and arguing about funding.
Taming dragons with humility and curiosity
I grew up believing that medieval cartographers marked their maps with the legend, ‘Here be dragons’ when they didn’t know what else to put. It was supposedly a warning to travelers that they were entering unknown territory – as well as a neat and entertaining way to fill in the blanks. Like many great stories, it’s not entirely true: apparently there are only two historical globes with ‘Here be dragons’ written on them: the usual practice was to simply draw a picture of a dragon or another mythical beast. Where they did use language, the more common phrase was apparently ‘HIC SVNT LEONES’: here are lions. Nevertheless, ‘Here be dragons’ has become shorthand for any space, real or conceptual, which contains hazards that we should be wary of.
My own maps of the world are dotted with signs marked ‘Here be dragons’. They mark mental spaces rather than physical spaces, and they indicate areas I know little about, where I only have a sketchy grasp of the landscape. These signs appear in some areas of art, literature, science and culture, where I am aware of the depths of my own ignorance, and the expertise which other people have attained. These are deep waters where I don’t know how to swim.
Embrace the glorious art of architectural procrastination
According to legend, the night before the opera Don Giovanni was due to open, it was still not finished. But Mozart was not locked in his room, desperately seeking inspiration: he was out with his friends. When they finally convinced him to go home and finish his work, he completed the opera in a few hours, so late that the orchestra was handed still-wet copies of the music as the curtain went up, and had no time for rehearsal. ‘Some notes fell under the stands,’ said Mozart, ‘but it went well.’
I don’t know how much of this story is true. There is documentary evidence: Mozart recorded the completion of the opera in his own catalogue of works the day before the first performance. But the story has the air of embellishment: we would love to believe the tale of the mercurial genius dashing off a masterwork with only moments to spare.
The secret to great technology architecture is . . . timing
It’s been fascinating and nerve wracking to watch the journey, unfolding and testing of the James Webb Space Telescope since its launch at the end of 2021. Fascinating because, to a non-expert like me, every time I read about it, the mission seems more complicated than I thought it was. Nerve wracking because the telescope is a very, very long way away (at the Lagrange point L2, about one and a half million kilometres from Earth). This means that, if anything goes wrong, there’s no realistic prospect of sending someone (or something) to fix it (unlike Hubble, which has had five shuttle missions for servicing, upgrades and repairs). Anybody who has been involved in any sort of launch, even of software that has never left a machine, let alone left the Earth, knows that ‘just one last thing’ feeling before the point of no return is crossed.
I find the JWST interesting because it is an extreme example of knowing when to make choices about architecture and design. I must admit that I don’t know the full production lifecycle of the JWST, the manufacturing lead times, or the critical path. But I do know that, once the Ariane 5 rocket was lit, there was no going back.