Skip to main content

One post tagged with "AI Agent"

View All Tags

AI Agent Can Now Manage Kubernetes GPU Resources Directly

· 5 min read

Source: mesutoezdil.substack.com
GitHub Repo: kagentWithHami
Chinese translation by Jimmy Song, originally published on WeChat


Before We Start

This is not a documentation summary.

Every command you see below was executed by me personally on a Nebius VM. Every output is from that machine.

When things failed, I debugged them. When things worked, I explain why they worked. The errors in this article are real errors; the fixes are fixes I verified myself.

If you run these commands in the same environment, you will get the same results.

Complete repository (all manifests and setup script):

https://github.com/mesutoezdil/kagentWithHami

Scope note: this article covers the core parts. The full installation flow, all manifests, complete troubleshooting guide, and setup script are in the GitHub repository. If you want to reproduce this, start there.

If you haven't worked with HAMi before:

https://medium.com/@mesutoezdil/hami-in-a-real-kubernetes-environment-e8eaa872f388

If you want to see GPU observability tooling tests:

https://mesutoezdil.substack.com/p/i-tested-every-feature-of-ingero

What This Article Is Actually About

kagent turns AI Agents into Kubernetes resources.

Your system prompt, tools, and model config all exist as CRDs.

You can:

  • Version-control them with Git
  • Deploy them with Helm
  • Inspect them with kubectl

HAMi implements GPU virtualization at the Kubernetes scheduler layer.

One physical NVIDIA L40S becomes 10 virtual GPUs in Kubernetes, with strict VRAM limits enforced at the CUDA Driver level.

Nebius Token Factory is an OpenAI-compatible inference service.

All tests in this article use Llama 3.3 70B.

The question I wanted to answer:

"Can an AI Agent, running inside a Kubernetes cluster, using only open-source models, manage GPU-virtualized workloads?"

The answer is yes.

CNCFHAMi is a CNCF Sandbox project