Skip to main content

AI Agent Can Now Manage Kubernetes GPU Resources Directly

· 5 min read

Source: mesutoezdil.substack.com
GitHub Repo: kagentWithHami
Chinese translation by Jimmy Song, originally published on WeChat


Before We Start

This is not a documentation summary.

Every command you see below was executed by me personally on a Nebius VM. Every output is from that machine.

When things failed, I debugged them. When things worked, I explain why they worked. The errors in this article are real errors; the fixes are fixes I verified myself.

If you run these commands in the same environment, you will get the same results.

Complete repository (all manifests and setup script):

https://github.com/mesutoezdil/kagentWithHami

Scope note: this article covers the core parts. The full installation flow, all manifests, complete troubleshooting guide, and setup script are in the GitHub repository. If you want to reproduce this, start there.

If you haven't worked with HAMi before:

https://medium.com/@mesutoezdil/hami-in-a-real-kubernetes-environment-e8eaa872f388

If you want to see GPU observability tooling tests:

https://mesutoezdil.substack.com/p/i-tested-every-feature-of-ingero

What This Article Is Actually About

kagent turns AI Agents into Kubernetes resources.

Your system prompt, tools, and model config all exist as CRDs.

You can:

  • Version-control them with Git
  • Deploy them with Helm
  • Inspect them with kubectl

HAMi implements GPU virtualization at the Kubernetes scheduler layer.

One physical NVIDIA L40S becomes 10 virtual GPUs in Kubernetes, with strict VRAM limits enforced at the CUDA Driver level.

Nebius Token Factory is an OpenAI-compatible inference service.

All tests in this article use Llama 3.3 70B.

The question I wanted to answer:

"Can an AI Agent, running inside a Kubernetes cluster, using only open-source models, manage GPU-virtualized workloads?"

The answer is yes.

HAMi v2.9.0 Release: Ascend User-Space Partitioning, DRA Generally Available, and Scheduler Ecosystem Expansion

· 12 min read
HAMi Community

The HAMi community is proud to announce the official release of HAMi v2.9.0. This represents a milestone version in terms of heterogeneous device virtualization depth, scheduler ecosystem expansion, and Kubernetes native standards alignment.

v2.9.0 introduces the Ascend 910C HAMi-core mode, HAMi-DRA general availability, and Volcano vGPU upgrade to v0.19, along with systematic enhancements in observability, security, and stability. This release also welcomes 19 new contributors for the first time.

This article provides a detailed overview of the major updates in v2.9.0.

Introducing HAMi WebUI: GPU Monitoring Dashboard for Kubernetes

· 6 min read
HAMi Community

Managing GPU resources in Kubernetes has long been a "blind spot" for operators. You know GPUs are being used, but answering questions like "which node has idle capacity?", "is this workload actually utilizing its allocated GPU?", or "what is the overall cluster utilization trend?" often requires piecing together kubectl get, Prometheus PromQL, and log output.

Today, the HAMi community is introducing HAMi WebUI - an open-source GPU monitoring dashboard that puts your entire GPU cluster into a single, visual interface.

HAMi WebUI v1.1.0 is now available as the first official major release.

Together with the core HAMi scheduler, WebUI completes the full loop: from GPU scheduling to visual observability.

From Device Plugin to DRA: GPU Scheduling Paradigm Upgrade and HAMi-DRA Practice Review

· 5 min read
HAMi Community

KCD Beijing 2026 was one of the largest Kubernetes community events in recent years.

Over 1,000 people registered, setting a new record for KCD Beijing.

The HAMi community not only gave a technical talk but also set up a booth, engaging deeply with developers and enterprise users from the cloud-native and AI infrastructure fields.

The topic of this talk was:

From Device Plugin to DRA: GPU Scheduling Paradigm Upgrade and HAMi-DRA Practice

This article combines the on-site presentation and slides for a more complete technical review. Slides download: GitHub - HAMi-DRA KCD Beijing 2026.

HAMi at KubeCon Europe 2026: Building the GPU Resource Layer in Kubernetes

· 6 min read
HAMi Community

Next week, HAMi will be featured in multiple activities at KubeCon + CloudNativeCon Europe 2026, including Project Pavilion booth, technical sessions, main stage demo, and post-conference AI-related events.

As a CNCF Sandbox project, HAMi focuses on GPU virtualization, sharing, and scheduling, which is increasingly intersecting with AI infrastructure topics in the Kubernetes ecosystem. KubeCon + CloudNativeCon Europe 2026 will be held in Amsterdam from March 23-26, with March 23 as pre-event programming and March 24-26 as the main conference.

HAMi v2.8.0 Release: Full DRA Support and High Availability Scheduling - Towards Standardized GPU Resource Management

· 5 min read
HAMi Community

The HAMi community is proud to announce the official release of HAMi v2.8.0. This represents a milestone version in terms of architectural completeness, scheduling reliability, and ecosystem alignment.

v2.8.0 not only introduces multiple key feature updates but also delivers systematic enhancements in Kubernetes native standard alignment, heterogeneous device support, production readiness, and observability, making HAMi more suitable for AI production clusters that require long-term operation with high stability and clear evolution paths.

This article provides a detailed overview of the major updates in v2.8.0.

Source Code Walkthrough of the GPU Pod Scheduling Process in HAMi

· 34 min read
Maintainer

During the use of HAMi, it is common for Pods to be created and remain in a Pending state, particularly due to the following two issues:

  • Pod UnexpectedAdmissionError
  • Pod Pending

This section provides a rough walkthrough of the related code to explain the interactions between components during scheduling and how resources are calculated. Other details may be omitted.

Introducing HAMi

· 2 min read
HAMi Community

What is HAMi?

HAMi (Heterogeneous AI Computing Virtualization Middleware), formerly known as k8s-vGPU-scheduler, manages heterogeneous AI computing devices within Kubernetes clusters. It enables sharing of various AI devices while enforcing resource isolation between tasks, and provides a unified interface for different device types.

CNCFHAMi is a CNCF Sandbox project