Skip to main content

2 posts tagged with "GPU Virtualization"

View All Tags

AI Agent Can Now Manage Kubernetes GPU Resources Directly

· 5 min read

Source: mesutoezdil.substack.com
GitHub Repo: kagentWithHami
Chinese translation by Jimmy Song, originally published on WeChat


Before We Start

This is not a documentation summary.

Every command you see below was executed by me personally on a Nebius VM. Every output is from that machine.

When things failed, I debugged them. When things worked, I explain why they worked. The errors in this article are real errors; the fixes are fixes I verified myself.

If you run these commands in the same environment, you will get the same results.

Complete repository (all manifests and setup script):

https://github.com/mesutoezdil/kagentWithHami

Scope note: this article covers the core parts. The full installation flow, all manifests, complete troubleshooting guide, and setup script are in the GitHub repository. If you want to reproduce this, start there.

If you haven't worked with HAMi before:

https://medium.com/@mesutoezdil/hami-in-a-real-kubernetes-environment-e8eaa872f388

If you want to see GPU observability tooling tests:

https://mesutoezdil.substack.com/p/i-tested-every-feature-of-ingero

What This Article Is Actually About

kagent turns AI Agents into Kubernetes resources.

Your system prompt, tools, and model config all exist as CRDs.

You can:

  • Version-control them with Git
  • Deploy them with Helm
  • Inspect them with kubectl

HAMi implements GPU virtualization at the Kubernetes scheduler layer.

One physical NVIDIA L40S becomes 10 virtual GPUs in Kubernetes, with strict VRAM limits enforced at the CUDA Driver level.

Nebius Token Factory is an OpenAI-compatible inference service.

All tests in this article use Llama 3.3 70B.

The question I wanted to answer:

"Can an AI Agent, running inside a Kubernetes cluster, using only open-source models, manage GPU-virtualized workloads?"

The answer is yes.

Introducing HAMi WebUI: GPU Monitoring Dashboard for Kubernetes

· 6 min read
HAMi Community

Managing GPU resources in Kubernetes has long been a "blind spot" for operators. You know GPUs are being used, but answering questions like "which node has idle capacity?", "is this workload actually utilizing its allocated GPU?", or "what is the overall cluster utilization trend?" often requires piecing together kubectl get, Prometheus PromQL, and log output.

Today, the HAMi community is introducing HAMi WebUI - an open-source GPU monitoring dashboard that puts your entire GPU cluster into a single, visual interface.

HAMi WebUI v1.1.0 is now available as the first official major release.

Together with the core HAMi scheduler, WebUI completes the full loop: from GPU scheduling to visual observability.

CNCFHAMi is a CNCF Sandbox project