- AI
- ambiguity
- APIs
- architecture
- augmented reality
- books
- bureaucracy
- career
- change
- Christmas
- cloud
- collaboration
- communication
- complexity
- computer history
- corporate life
- data
- decisions
- delivery
- devops
- end user tools
- ethics
- failure
- fear
- fundamentals
- gaming
- government
- halloween
- history
- humans
- hype
- identity
- infrastructure
- innovation
- language
- leadership
- learning
- legacy
- management
- measurement
- mental health
- money
- networking
- New Year
- operations
- philosophy
- physics
- platforms
- prediction
- process
- procurement
- programming
- quantum
- reliability
- resilience
- risk
- robotics
- science
- science fiction
- security
- shadow IT
- space
- standards
- strategy
- talent
- teams
- technical debt
- technology advocacy
- testing
- thinking
- transformation
- TV
- virtues
- vision
- writing
Are you overfitting your delivery decisions - and to the wrong dataset?
Overfitting is a term from the world of machine learning. It describes the problem that arises when you train a model on a dataset, and use that model to make accurate predictions - as long as they are limited to that dataset. Unfortunately, the model starts to go haywire when you apply it to the wider world.
Imagine a fictional example: you wish to use data about students to predict their academic performance. You train your model on a class where there are several students called ‘Geoff’, and they all happen to be doing very well in their studies - there’s no reason for this: it’s pure coincidence. But the data doesn’t know that. When you make predictions about this class only, your model seems to work: it finds the high performing Geoffs. When you use it to make wider predictions, however, in a world where there are rather fewer Geoffs, and where being a Geoff is not a reliable indicator of academic performance, the model stops working.
How do we get what we want from AI systems - and human systems?
I am sure that you have heard of the paperclip problem. Just in case you haven’t, it is the idea that, if you ask an AI system to make paperclips, then it may go on making paperclips, until the whole world is nothing but paperclips. There’s even a fun game based on this concept.
The paperclip problem illustrates the problem of setting goals for AI systems which represent what we truly want. Unlike us, AI systems do not come ready equipped with goals and desires: we have to provide them, in the form of what is often known as a reward function.
And crafting this function can be more difficult than it first appears. When I wrote some recent articles on generative AI, it was suggested that I read the book Human Compatible by Stuart Russell. It’s a great book, and triggered lots of other reading: it took me down a rabbit hole of articles and papers about optimisation, particularly a phenomenon known as specification gaming.
Architecture year one: measuring success
I have a flippant answer when asked how to measure the success of an architecture team: just check how often your phone is ringing. (I realise that this dates my advice - please substitute ringing phones for the messaging app of your choice.) If you and your team are in demand, if you are the first call whenever your company faces a big challenge or opportunity, then you are doing something right. If your phone (or messaging app) is silent, if you have to push to insert yourself into the room with tough problems, then you have more work to do.
However, recently I was talking to a friend and colleague who has just taken on a Chief Architect role in a new company, and who is attempting to design measures of success for their team. In that conversation, I realised that we needed to come up with something a bit more sophisticated than the number of messages on your phone.
What are you optimising for?
Recently, one of the teams I am lucky to work with showed me a tool they had built to help plan and manage Cloud adoption. It captured system data, project data, dependencies between applications and platform capabilities, and the roadmap for launch and enablement of those capabilities. Such a tool is helpful for any Cloud adoption programme, but what really stuck with me was the goal that the team had set for themselves.
The team wanted to answer two questions. First, what were they optimising for? And, second, what would they do differently depending on the answer to the first question? If they were optimising for cost, then one set of dependencies mattered more than another set. If they were optimising for agility, then they would give one set of tasks more priority than another set. If they were optimising for risk, then they would build their plan this way rather than that way.