Épisodes

  • #050 - Data Protection and Kubernetes Resilience with Michael Cade & Julia Furst Morgado (Veeam)
    Oct 8 2025

    In this episode Itiel hosts Veeam experts Julia and Michael, to share their distinct paths into cloud-native technology.

    Julia discusses her transition from a background in law and marketing to becoming a CNCF ambassador and AWS container hero. Michael, a veteran who has been with Veeam for over 10 years, details his traditional CIS admin background (virtualization, storage) and the evolution of this role into platform engineering.

    We explore the critical need for backup and data resilience, which is often the last thing companies consider in cloud-native application deployment. The discussion covers why backup serves as the essential insurance policy against modern threats like ransomware, emphasizing the importance of application consistency and data portability across multiple platforms, including Kubernetes, SaaS, and public cloud services.

    Afficher plus Afficher moins
    31 min
  • #049 - The AI Translator: Using LLMs & MCP for K8s Operations & Self-Healing Infra with Alexei Ledenev (doit)
    Sep 24 2025

    In this episode, Itiel Shwartz kicks off a series on MLOps, LLM, and GenAI in Kubernetes.

    Starting with Alexei Ledenev, who has over two decades in software development and deep experience in cloud architecture and distributed systems. He shares his journey from CoreOS Fleet to his current role on the Platform Team at Doit.

    The conversation focuses on tackling the complexity of Kubernetes, which Alexei notes can be overwhelming even for experienced DevOps engineers. He discusses how he developed the idea to leverage AI assistants and the Model Context Protocol (MCP) to access and execute tools like kubectl. This concept creates a "translator between AI and the Kubernetes environment", allowing users to troubleshoot complex cluster issues or quickly create ad hoc testing environments using natural language.

    They also explore the challenges of implementation, such as hallucination, and how providing context helps the AI self-correct. Looking ahead, Alexei predicts that infrastructure is moving towards self-aware and self-healing platforms that integrate AI deeply.

    Afficher plus Afficher moins
    25 min
  • #048 - Shaping the Future of Software Development with Idan Gazit (GitHub Next)
    Aug 27 2025

    Meet Idan Gazit from GitHub Next, a team responsible for projects like GitHub Copilot. Gazit, despite jokingly claiming to be "the least knowledgeable about Kubernetes," shares his diverse career journey, spanning from early web development with Perl and Django to his time at Heroku and eventually GitHub. He discusses his team's role in prototyping future software development solutions, emphasizing the importance of identifying and nurturing risky, impactful ideas for developers, even if it means "killing projects" that don't gain traction. Gazit also provides insights into GitHub's evolving focus on AI to enhance developer experiences and envisions a future where software creation becomes more accessible to a broader audience, with professional developers empowered by advanced tools, fostering a more integrated approach to the entire software development lifecycle.

    Connect with Idan on social: https://gazit.me/

    Afficher plus Afficher moins
    38 min
  • #047 - Securing the Software Supply Chain and Kubernetes with Dustin Kirkland (Chainguard)
    Aug 5 2025

    Meet Dustin Kirkland, VP of Engineering at Chainguard. Dustin shares his fascinating 26-year journey in the tech industry, from IBM and two stints at Canonical to roles at Google (working on GKE), Apex, and Goldman Sachs, eventually leading him back to engineering at Chainguard.

    Discover how Chainguard is helping secure the software supply chain, focusing on building secure containers primarily for Kubernetes. Learn about the critical problem of software vulnerabilities (CVEs) and how Chainguard's products, including their hardened container images with a zero CVE goal and accelerated patching SLAs, address this challenge.

    Dustin also introduces their newer initiatives, like Chainguard Libraries, which secures open source dependencies and builds hardened virtual machines (kernels) for Kubernetes worker nodes. He explains Chainguard's automation-driven approach, which allows them to monitor upstream fixes and rapidly rebuild and retest images and libraries.

    Afficher plus Afficher moins
    33 min
  • #046 - Simulating, Scheduling, and Saving: Optimizing Kubernetes with David Morrison (Applied Research)
    Jun 24 2025

    In this episode, Itiel has an insightful conversation with Dr. David Morrison, a research scientist and founder specializing in Kubernetes scheduling and autoscaling. David shares his journey from operations research to leading distributed systems efforts at tech giants like Yelp and Airbnb. Learn about the transition from Apache Mesos to Kubernetes at Yelp, including the role of their open-source API layer, Pasta.

    Discover why David started his own venture, Applied Computing Research Labs, to help companies tackle the challenges of Kubernetes cost optimization and reliability. Get an inside look at SimCube, his fascinating project that simulates production Kubernetes clusters for debugging, troubleshooting, and capacity planning.

    David also offers insights into the common pitfalls of low cluster utilization, the complexities of saving costs beyond low-hanging fruit, and the potential future of distributed systems.

    Afficher plus Afficher moins
    30 min
  • #045 - Beyond Cluster Creation: Mastering Multi-Cluster Kubernetes with Gianluca Mardente (Cisco)
    Jun 10 2025

    Join Itiel as he chats with Gianluca Mardente, a Principal Engineer at Cisco Systems. Gianluca shares his path to tech and Kubernetes, including his work history and the inspiration behind his open-source project, Sveltos. They dive into the significant challenges of managing a large fleet of Kubernetes clusters – ensuring consistency, handling upgrades, and coordinating resources across different clusters.

    Learn how Gianluca's project tackles these 'Day-2 Operations', offering solutions that go beyond simple cluster creation and enable seamless integration of various open-source tools like Crossplane. Check out Sveltos: https://projectsveltos.github.io/sveltos/

    Afficher plus Afficher moins
    23 min
  • #044 - Scaling Platforms and Pioneering AI Agents with Hasith Kalpage (Outshift by Cisco)
    May 27 2025

    Join us as Hasith Kalpag, Head of Platform Engineering at Outshift by Cisco, shares his fascinating journey. Hear about his experience leading the massive WebEx transformation to cloud-native using Kubernetes, including the intense push during the COVID-19 response, where they went from zero to over 50 production clusters in just three months.

    Learn about the challenges and lessons learned from migrating a large enterprise product, like navigating aggressive timelines and overcoming unexpected issues. Hasith also discusses his current role at Outshift, Cisco's incubation engine focused on innovating new ventures, and how they are building a common foundation for diverse teams.

    Discover his vision for the future of platform engineering, highlighting the critical role of AI in managing complexity and enabling self-service, including Outshift's internal "Jarvis" project focused on agentic platform engineering. Learn more about the AGNTCY initiative: https://agntcy.org/

    Afficher plus Afficher moins
    38 min
  • #043 - Gaming on K8s: Stateful Servers, Low Latency, and an Incredible Infra Journey with Siddharth Dhulipalla (Hathora.dev)
    May 13 2025

    In this episode, Sid, CEO of Hathora, discusses building game infrastructure, specifically for hosting dedicated servers. He shares how Hathora tackles the challenges of running stateful, low-latency, high-throughput workloads that reconcile player actions up to 60 times per second. Sid explains their approach using Kubernetes to manage compute across bare metal and cloud VMs, leveraging technologies like Talos and Civo's Omni. He also details the technical hurdles they face with Kubernetes networking (like hostport issues), achieving sub-3-second pod startup times, and optimizing container image pulls, offering insights into the unique demands of the gaming industry and Hathora's growth. Sid is the CEO and co-founder of Hathora, the premier tool for studios to globally scale dedicated servers for their multiplayer video games. Sid comes from a deeply technical background, having led infra teams responsible for $100m+ annual cloud budgets. His team at Hathora is working with some of the most anticipated games of '25 and beyond, including Frost Giant’s Stormgate, Omeda’s Predecessor, and 1047 Games’ Splitgate 2. Follow Sid on LinkedIn: https://www.linkedin.com/in/dsiddharth/

    Learn more about Hathora at hathora.dev

    Afficher plus Afficher moins
    29 min