KubeCon NA 2025: Scaling GenAI Inference with Cloud-Native Platform Tools

By Srini Penchikala


Published on November 17, 2025| Vol. 1, Issue No. 1

Summary

At KubeCon + CloudNativeCon North America 2025, Erica Hughberg from Tetrate and Alexa Griffith from Bloomberg presented on the essential new tools required to build robust Generative AI platforms. Their discussion centered on addressing the unique demands of GenAI workloads, traffic patterns, and infrastructure, specifically focusing on achieving scalable model inference within cloud-native environments.

Why It Matters

This briefing highlights a critical juncture for AI professionals: the transition of Generative AI from experimental stages to demanding, large-scale production environments. The emphasis on "new tools" for "model inference at scale" at a cloud-native conference like KubeCon is a clear indication that while foundational cloud-native principles are essential, specialized solutions are emerging to meet GenAI's unique challenges. For MLOps engineers, platform architects, and AI developers, this means recognizing that traditional microservices tooling may not adequately handle the bursty, latency-sensitive, and resource-intensive nature of AI inference. The discussion points to a future where service mesh adaptations, intelligent resource scheduling, optimized data planes, and advanced observability tailored for AI workloads will be paramount. Understanding these evolving infrastructure demands is crucial for building resilient, cost-effective, and performant GenAI applications, ensuring that the promise of AI can be delivered reliably in the real world.

Advertisement