Generative AI and the duty to understand

Image credit: Shane Rounce via Unsplash

Generative AI worries me.

This is an unusual position for someone like me, who has worked and played with technology for most of their life. It’s impossible not to be intrigued and excited by the sudden appearance of software which is capable of generating convincing text, or creating images far better than I could ever hope to produce..

But, for the last few weeks, as I have been attempting to learn in public about generative AI, I have grown increasingly concerned, not just about its ethical implications, but about the ability of the companies that will put generative AI to work to grasp and respond to those ethical implications.

In those weeks, I have learnt enough to have a rough mental model of how generative AI, particularly Large Language Models (LLMs) work: put simply, they are large scale statistical models, trained on huge quantities of data, implemented through multi layered neural network architectures, which predict an acceptable visual or linguistic response to a text input (a prompt).

For me, characterising generative AI in these terms allows me to see some of the obvious ethical questions which others have already written about. For example, a model trained on huge quantities of data inherits the biases and weaknesses of that data. The total set of text on the Internet, for example, is inherently skewed towards those countries, languages and communities with a high level of Internet usage - before we even consider what some of that text might be saying. Some researchers have attempted to curate datasets to reduce bias, but the curation of datasets is itself an ethical choice.

This characterisation of generative AI also helps me to remember what is not happening. The model does not understand the language it is processing, it does not directly grasp meaning, and there is no true creation occurring. (Or, if there is any creation, it is in the mind of the person creating the prompts.)

I won’t attempt to explore all the risks associated with the use of generative AI, especially as this fantastic paper from Google DeepMind does a much better job.

However, I will consider a risk which is not directly considered in this paper, and attempt to address an audience outside the AI researchers to whom this paper is primarily addressed: the technologists and business leaders who make choices about the use of technology within their companies.

I hope that nobody will object if I point out that enterprise technology is already beset by obscurity, bluff and misunderstanding. If they are honest, most technologists working in large enterprises will admit that they are running software which they don’t fully understand, and which does not perform as expected, but which has become integral to their operations. They will also admit that their decision making processes are not always objective or rational, but are also based on personal preferences, relationships and the respective sales ability of different vendors.

This is not great: it leads to the architectural mish-mash which most companies run. But, for most of the era of traditional software, it has been possible for people who care enough systems to get to bottom of how they work: not everyone can code, and even fewer people can code well, but I would argue that the basic logical constructs that make up traditional coding languages can be grasped by most people. We can figure things out.

It is much harder, however, to figure out how AI models work, and particularly hard to figure out how enormous models such as LLMs work. As I wrote last week, there are two layers of obscurity. First, the models themselves are difficult to understand, Second, the skills required to understand how the models are built and operated are concentrated in a small number of people. There is something democratic about traditional code: the nested and deepening specialisms of AI are much less accessible and approachable.

Yet, despite this difficulty in understanding, generative AI has the appearance of usefulness, just as it has the appearance of truth. It is difficult to interact with ChatGPT without thinking about how it could be used in customer support or operations. It is hard to create an image with DALL-E without thinking about how it would look on a T-shirt or a poster. It is inconceivable that these solutions will not find their way into products, and that those products will not find their way into the enterprise. It is already happening.

I do not think that it is possible or desirable to deny ourselves the benefits of these technologies. We should be excited and intrigued by them, and we should put them to work to improve our lives. That is, after all, the point of technology. But I think that, as we put these technologies to work, we must do so responsibly, and that it is urgent for us to figure out what this means. I have often claimed that professional technologists have a duty to explain. This exercise in learning in public has convinced me that we also have a duty to understand: a duty to get beyond hype, excitement and opportunity, to get to grips with the basics of new technologies, and work out their implications for our companies and all the people they touch. This is especially true when those basics are hard and difficult to understand, and is a particularly important job for my favourite group of professional technologists: technology architects. If we don’t take the trouble to understand , who will?

Previous
Previous

In the wonderland of new technology, let’s be curiouser and curiouser

Next
Next

Generative AI: time to learn a whole new vocabulary