Introduction

There are many different ways of structuring modern software delivery organisations. What pattern your particular organisation uses is probably influenced by whatever stage of life it is in.

The following groups are gross simplifications and are not intended to be exhaustive. They're here to add some flavour.

Organisation life cycles

Grossly simplified.

Startup

You have a small engineering group who do everything. They are not only coding the software that you want to deliver, but also looking after infrastructure, probably as code because it Just Makes Sense, continuous integration pipelines, continuous deployment pipelines. production service health, and everything else.

Scale up

You now have several groups of engineers. They may still be looking after all of the things mentioned for Startups, but focused on their own unique part(s) of the codebase.

You might start to see some challenges appearing at this point.

You will probably see some divergence in how the different groups are delivering things. People that move from one group to another need to learn whatever the local variant is in their new group. It's not the biggest burden, but it adds friction.
Infrastructure-as-code is one of the things that has probably diverged. If you are thinking ahead then you have set up versioned libraries of infrastructure components that everybody contributes to and shares.
Ensuring that everybody is on the most current version of $Whatever in order to avoid compatibility and security issues is something that is starting to become important.

This is the stage where you might start to think about having teams that specialise in certain areas that can take on the burden of maintaining those things for everybody, and to reduce the cognitive load for developers.

Warning

Do not fall into the trap of building a team that will have the role that is some variant of "looking after things in Production". Service Reliability Engineering might work for you at some point, but it probably is not now. And it may never. In my experience the very best people, and the ones who care the most, about how software runs in Production are the people that make it.

Enterprise

Congratulations! You made it through the growing pains, have found many happy customers, and have organised subject matter experts into teams who are there to help other teams.

Whatever stage you are at, this page is all about the group of specialists often called the Platform Engineering Team. Or Cloud Engineering. Or something of that nature.

What is Platform Engineering?

I am going to use this definition from Wikipedia as I think it is a good summary.

Cite

Platform engineering is a software engineering discipline that focuses on building toolchains and self-service workflows for the use of developers. Platform engineering is about creating a shared platform for software engineers using computer code

Platform engineering uses multiple components to try to be reliable and scalable. These components can include configuration management, infrastructure orchestration, and role-based access control, with deployment management specifically for continuous delivery or continuous deployment.

The discipline has been associated with DevOps and platform as a servicepractices.

My Platform Engineering Principles

Four things to be aware of here.

This list will evolve over time.
It will never be complete.
You will probably disagree with some of these. Differences like whatever they are make the world go round, and keep us learning.
This list is too long. Principles need to be easy to remember, and I need to whittle this down at some point to some core principles.

Your cloud provider already provides various flavours of platform-as-a-service. For infrastructure platforms, start with one of them. In other words, you probably don't need Kubernetes and the complexity that it brings. Not yet.
"Platform" is more than infrastructure. It also includes tools, APIs, guard rails, and more.
Do the hard work to make it easy. Absorb as much complexity as you can and make complex things as simple to use as you possibly can.
No surprises, ever.
Document everything.
Be kind. You are the expert in platform delivery. The people using your platform, tools, and APIs are not.
Do things that solve real problems.
Don't make problems by doing things that nobody needs.
Do everything you can to make enable your architecture decisions to go through evolutions, not revolutions.
Architecture changes are the hardest and most complex things that you can do. Avoid them whenever possible. When you have to do them, do it before it becomes critical.
The "bleeding edge" is called that for a reason. If you have to be on it, bake in capacity to deal with the inevitable problems.
You cannot do everything for everybody.
Do not gold plate things. Do the work to cover the 80%. It is probably not worth the effort to cover the much rarer 20%.
Ship frequently and keep iterating.
The work does not stop when you ship something. Now you have to run and maintain it, and keep people informed about what you have shipped and help them to learn how to use it.
Never, ever, roll your own security.
Seriously. Don't do it.
Build security into your solution from the start.
Have a single Identity Provider that is not provided by your cloud provider. (Moving cloud providers isn't easy, or something that you will do frequently, but when you have to, moving your cloud provider-hosted IDP adds a lot of difficulty and complexity.)
Be generous with metrics and reporting them.
Be frugal with alerts that will wake somebody up at 3am.
Look for off-the-shelf solutions that are already available before building your own.
Stay Aware. Adapt. Change. Kent Beck said this about XP. It applies to Platform Engineering too.
Beck also said "First make the change easy, then make the easy change." Do that, even when making the change easy is difficult.