AI Through a Platform Engineering Lens

March 30, 2026

#ai #platform #kubernetes #cloud-native

I attended my first KubeCon recently, and one thing became obvious very quickly: AI was everywhere, but most of the interesting conversations were not really about demos. They were about platforms.

KubeCon Europe 2026 visual

KubeCon was the starting point for this reflection: less hype than I expected, more platform questions than I expected.

The same questions kept coming back:

How do we expose internal systems safely?
Where should control live once agents can act instead of only suggest?
How do we avoid creating a new wave of shadow IT with a better interface?

That clicked for me because this is not just a conference topic. It is also something we are starting to run into more and more at PayFit.

For me, the most useful way to think about AI in this space is through a platform lens.

The platform is no longer only a product for developers. It becomes a product for both developers and agents.

The question is not whether AI changes platform engineering. It already does. The more interesting question is what platform engineering becomes when the platform is used not only by humans, but also by agents.

What KubeCon Made Obvious

For years, platform teams have focused on reducing cognitive load for developers. They built paved roads, self-service workflows, reusable abstractions, and safer defaults. AI does not make any of that less important. If anything, it makes it more important.

Once agents can act instead of only suggest, the platform is no longer just a layer that helps developers move faster. It starts becoming the runtime, the policy surface, and the trust boundary for automated action.

That is the shift I keep coming back to. AI does not reduce the importance of platform engineering. It widens its scope.

Two Very Different AI Platform Problems

At PayFit, I find it useful to split the topic into two very different subjects.

flowchart LR
    Dev["Developer with assistant"] --> ReadOnly["Read-oriented access"]
    Runtime["Runtime workload"] --> ManagedAI["Managed AI entry point"]
    ReadOnly --> Platform["Platform trust layer"]
    ManagedAI --> Platform
    Platform --> Gateway["Gateway and policy boundary"]
    Platform --> Workflows["Operational workflows"]
    Gateway --> Resources["Internal resources and tools"]
    Workflows --> Resources

1. Personal Developer Usage

This is the case where engineers use assistants to investigate systems, read documentation, inspect configuration, query internal knowledge, or just find their way around operational context.

In this model, the first need is usually controlled read access to resources. MCP can help here, but existing CLIs and APIs already give us a lot to build on when RBAC is already defined and understood.

That matters because part of the challenge is not inventing a completely new access model. It is making existing access patterns consumable by AI in a way that is structured, observable, and safe.

2. Runtime Usage

This is a very different problem. Here the question is not how an individual developer interacts with an assistant, but how deployed workloads use AI capabilities as part of their normal behavior.

In our case, this currently relies mostly on Bedrock. It gives us a managed entry point, but right now it also feels limited in terms of integrations and operational flexibility.

That distinction matters because “AI on the platform” is not one single topic. The constraints are different. The operational expectations are different. The control model is different.

Still, in both cases, platform engineering becomes more central, not less.

Where Platform Value Grows

The value is not just in faster suggestions. It is in making AI-driven action reliable enough to actually be useful.

The areas that become more valuable are the ones that turn automation into something observable and governable:

Investigation: agents can gather context across logs, metrics, traces, tickets, and documentation much faster than a human can.
Debugging: the platform can expose safe diagnostic workflows instead of relying on tribal knowledge and shell access.
Self-healing: remediation can be encoded as approved operational paths instead of improvised during an incident.
Agentic platform capabilities: the platform itself becomes a place where agent workflows, tools, and access boundaries are exposed and reused.

The Bottleneck Moves Upward

AI does not only increase speed. It also moves the bottleneck.

In a more traditional model, the friction often sits in execution. Someone has to find the right system, get the right access, gather the right signals, and apply the right change. With AI, part of that execution chain gets faster.

The bottleneck moves upward, toward:

trust
approval
control
adaptation

To me, that looks much more like a platform problem than an AI problem.

Why This Starts Looking Like a Gateway Problem

If every tool becomes reachable through MCP-like interfaces, then access control cannot live only inside the agent. It also has to live at the platform and gateway layers.

The agent should not be the final policy decision point. It should operate inside a narrower, governed environment defined by the platform.

flowchart TD
    Agent["Agent or assistant"] --> Gateway["Gateway / control plane"]
    Gateway --> Auth["Identity and authorization"]
    Gateway --> Audit["Audit and observability"]
    Gateway --> Toolset["Approved tool surface"]
    Toolset --> Systems["Infra, APIs, docs, SaaS"]

The important question is no longer just, “what tools exist?” It becomes:

What is the approved interface for action?
Who can use it?
Under which policy?
With which audit trail?

That is why projects like kagent and kgateway are interesting to me.

kagent helps on the agentic platform side by giving more structure to how agents and tools are exposed in Kubernetes. kgateway and agentgateway help at the control boundary by making MCP connectivity, authorization, and traffic governance more explicit.

That split feels important to me. One side helps run agents. The other helps constrain and observe what they can actually reach.

The Risk Is Not Only Security. It Is Bypass.

None of this means every platform team suddenly becomes a security team. But it does mean the platform becomes the place where safe operational access gets designed.

And that is where another risk shows up: if platform teams move too slowly, teams will route around them.

I do not think shadow IT disappears in the age of AI. I think it just gets faster. If the official path is too rigid, too slow, or too limited, teams will connect agents directly to SaaS tools, scripts, APIs, or undocumented workflows.

In that world, platform teams cannot rely only on governance arguments. They need to provide alternatives that are fast, usable, and trusted.

That may be one of the harder platform lessons in the AI era:

Control without usability loses adoption.
Standardization without iteration creates avoidance.
Guardrails that are too slow become obstacles instead of enablers.

What I Believe Happens Next

The challenge is not simply to add AI to the platform.

The challenge is to evolve the platform into something that supports both human and agent consumers. That means:

exposing better primitives
structuring operational workflows
defining clearer capability boundaries
reducing the distance between policy and execution

AI does not reduce the need for platform engineering. It raises the bar.

Platform engineering is moving from an abstraction layer toward an operational trust layer. The organizations that do this well will not just attach agents to existing systems. They will redesign their platforms so agents can operate through intentional interfaces instead of accidental access.

RBAC, security, and authorization deserve their own article. But even before going deep on those controls, one thing already feels clear to me: AI is pushing platform engineering into a more central role, not a smaller one.