AI Agent Can Now Manage Kubernetes GPU Resources Directly
Source: mesutoezdil.substack.com
GitHub Repo: kagentWithHami
Chinese translation by Jimmy Song, originally published on WeChat
Before We Start
This is not a documentation summary.
Every command you see below was executed by me personally on a Nebius VM. Every output is from that machine.
When things failed, I debugged them. When things worked, I explain why they worked. The errors in this article are real errors; the fixes are fixes I verified myself.
If you run these commands in the same environment, you will get the same results.
Complete repository (all manifests and setup script):
https://github.com/mesutoezdil/kagentWithHami
Scope note: this article covers the core parts. The full installation flow, all manifests, complete troubleshooting guide, and setup script are in the GitHub repository. If you want to reproduce this, start there.
If you haven't worked with HAMi before:
https://medium.com/@mesutoezdil/hami-in-a-real-kubernetes-environment-e8eaa872f388
If you want to see GPU observability tooling tests:
https://mesutoezdil.substack.com/p/i-tested-every-feature-of-ingero
What This Article Is Actually About
kagent turns AI Agents into Kubernetes resources.
Your system prompt, tools, and model config all exist as CRDs.
You can:
- Version-control them with Git
- Deploy them with Helm
- Inspect them with kubectl
HAMi implements GPU virtualization at the Kubernetes scheduler layer.
One physical NVIDIA L40S becomes 10 virtual GPUs in Kubernetes, with strict VRAM limits enforced at the CUDA Driver level.
Nebius Token Factory is an OpenAI-compatible inference service.
All tests in this article use Llama 3.3 70B.
The question I wanted to answer:
"Can an AI Agent, running inside a Kubernetes cluster, using only open-source models, manage GPU-virtualized workloads?"
The answer is yes.