Reflections on one year at Google: platforms, products and practices
I celebrated my first Googleversary this week! It’s hard to believe that I have been here a year. I don’t know whether it feels longer or shorter: for me, as for many people, time has passed strangely over the last year.
I’ve learnt many things so far at Google - about culture, about collaboration and about customers. Unsurprisingly, I have also learnt about technology - in particular, to think more deeply about how Google uses technology, and what Cloud means for customers.
A lot has been written about Cloud (including quite a few words by me), but I’ve had a feeling for a while that much of this writing (including my own) does not really capture why Cloud matters so much. Considerations of Cloud often focus on core technology benefits (reduce cost, reduce risk, increase agility) or jump straight to ambitious business goals (innovate, disrupt, create new models), but I think that they often miss a big and important architectural chunk in the middle.
After a year at Google, I can offer a slightly different view. This view is informed by my curiosity about how Google answers one of the most interesting architectural questions: how does an enterprise balance autonomy and uniformity while realising the benefits of both? Most large enterprises struggle to answer this question, and I can confess to managing architectures which seemed to get both parts of the equation wrong, combining stifling procedural control with unmanageable heterogeneity and complexity. (Naturally, part of my job was to change both of these things - that was hard.)
From inside, it is clear that Google is a platform and a product company. Products such as YouTube, Search and GMail are developed by their own teams who enjoy a high level of autonomy, coupled with a high level of accountability. But these products have to achieve reliability at scale: Google runs products with a billion+ users, and those users rely on those products to be available at all times - they are helping them answer questions, communicate with family and friends, and navigate their path in the physical world. This scale and reliability can only be achieved by a global scale platform - the platform which underpins Google, and which is now available to customers as Google Cloud Platform (GCP).
We might think that traditional enterprises have a similar architecture, with separation between application and infrastructure layers. In real life, though, we know that the infrastructure layers of most enterprises are fragmented and various, and that it is rare for application teams to achieve full autonomy and accountability. Cloud platforms achieve these architectural goals to such a degree as to be different in kind as well as different in scale.
But Google’s architecture does not just depend on platforms and products: despite high levels of autonomy, it is not the case that every team operates differently from every other team. As well as sharing a distinct culture, Google also operates a set of distinct practices: user experience, software engineering and so on, united in formal structures such as career ladders, but also in informal communities, collaboration and ways of working. Perhaps the most well known practice in recent years is that of Site Reliability Engineering (SRE) which has been described in some great books. These should be on your reading list if you haven’t read them - and you should make sure that you read them all the way through (to avoid Chapter One syndrome - the subject of a future blog). The publication of the SRE books demonstrates the importance of practitioners shaping their own practice: being opinionated and vocal about the work that they do - and putting their voice to work to educate others.
We can put all this together in this very simple picture:
This may seem like one of those classic simplistic architecture pictures which describes a future which is impossible to realise: we can’t get there from here. But I think that we have to remember that, at root, all platforms such as GCP are doing is taking capabilities which have enabled companies such as Google be successful and making them available to everybody. One thing I have learnt in my first year at Google is that it is possible to get there from here.