Platform Engineering Playbook Podcast

Épisodes

The AI-Cloud Native Symbiosis - How Intelligent Infrastructure is Transforming Platform Engineering

Jan 14 2026

By 2025, 90% of new enterprise applications will be AI-powered and cloud-native. This episode explores the symbiotic relationship between AI and Kubernetes - where AI isn't just another workload, but is fundamentally transforming how we build and operate cloud native platforms. We cover real-world examples like Netflix's predictive scaling achieving 92% accuracy, the emergence of AI-driven observability platforms, and why platform engineers need to evolve from infrastructure operators to AI-infrastructure orchestrators.

In this episode: - AI transforming the Kubernetes control plane with predictive scheduling - Netflix's AI-driven traffic management: 92% prediction accuracy, 35% resource reduction - AI-native observability: anomaly detection on metric relationships, not just metrics - GPU orchestration: NVIDIA GPU Operator achieving 80%+ utilization vs 30-40% baseline - Edge AI patterns: federated learning, model distillation, intermittent connectivity - Skills evolution: Understanding AI workload characteristics without becoming ML experts - News: Red Hat connects AI to Istio via Kiali MCP Server, AWS CloudWatch adds Apache Iceberg support

Perfect for senior platform engineers, SREs, DevOps engineers looking to understand the convergence of AI and cloud native technologies.

New episodes every week. Subscribe wherever you listen to stay current on platform engineering.

Episode URL: https://platformengineeringplaybook.com/podcasts/00090-ai-cloud-native-symbiosis

Duration: 15 minutes

Host: Alex and Jordan

Category: Technology Subcategory: Software How-To

Keywords: AI, cloud native, Kubernetes, symbiosis, intelligent infrastructure, platform engineering, GPU orchestration, predictive scaling, observability, machine learning, Netflix, edge AI, federated learning

Afficher plus Afficher moins

15 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
MIT 10 Breakthrough Technologies 2026 - The Platform Engineering Perspective

Jan 13 2026

MIT just released their 10 Breakthrough Technologies for 2026 - and three of them are infrastructure problems that platform engineers are solving right now. This episode explores hyperscale AI data centers consuming 96 GW globally by 2026, vibe coding with 41% of code now AI-generated, and LLM interpretability research from Anthropic. We break down how platform engineers enable these breakthroughs through power-aware scheduling, AI coding guardrails, and new observability patterns for ML systems.

In this episode: - Hyperscale AI data centers: 96 GW capacity, $600B capex, 100+ kW per rack - Vibe coding: 92% developer AI adoption, GitHub Copilot at 20M users - LLM interpretability: Anthropic's sparse autoencoders for debugging AI - Platform skills needed: power management, GPU orchestration, ML observability - News: Cloudflare IaC security, AWS CloudWatch Iceberg, SSL certificate dangers

Perfect for senior platform engineers, SREs, DevOps engineers looking to understand the infrastructure behind 2026's biggest tech breakthroughs.

New episodes every week. Subscribe wherever you listen to stay current on platform engineering.

Episode URL: https://platformengineeringplaybook.com/podcasts/00089-mit-10-breakthrough-technologies-2026

Duration: 21 minutes

Host: Alex and Jordan

Category: Technology Subcategory: Software How-To

Keywords: MIT, breakthrough technologies, 2026, AI, hyperscale, data centers, vibe coding, LLM, interpretability, platform engineering, infrastructure, GPU, Copilot, Cursor

Afficher plus Afficher moins

21 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
AWS Route 53 Global Resolver - Enterprise DNS Security at the Edge

Jan 12 2026

Every DNS query your hybrid environment makes could be exposing sensitive data. AWS Route 53 Global Resolver, announced at re:Invent 2025, combines anycast routing, encrypted DNS protocols (DoH/DoT), and managed threat filtering in a single service.

In this episode, we cover: - Anycast DNS architecture routing to nearest of 11 AWS regions - DoH and DoT encrypted DNS protocol support - AWS RAM authorization for multi-account private hosted zones - DNS filtering with managed threat lists - Implementation patterns for hybrid environments and remote workforces - Query logging for security visibility and threat hunting

Plus news on Claude Code creator workflows, UK encryption backdoors, K8s EU hosting costs, PostgreSQL replacing Redis, and Rust ecosystem security.

Links: - Episode page: https://playbook.platformengineering.org/podcasts/00088-aws-route-53-global-resolver - AWS Route 53 Global Resolver docs: https://docs.aws.amazon.com/route53/latest/userguide/resolver-global-resolver.html

#AWS #Route53 #DNS #DoH #DoT #HybridCloud #Security #PlatformEngineering #DevOps

Afficher plus Afficher moins

20 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Kubernetes Upcoming Features Deep Dive - Extended Toleration Operators and Mutable PV Node Affinity

Jan 11 2026

There's a Kubernetes cluster out there right now burning ten thousand dollars a month on GPU nodes that sit idle sixty percent of the time. Why? Because the scheduler can't say "only schedule pods on nodes with MORE than four GPUs." It's 2026, and our scheduler still can't count. But that's about to change.

In this episode, we dive deep into two alpha features in Kubernetes 1.35 that represent a fundamental shift in how Kubernetes handles scheduling and storage:

**Extended Toleration Operators (KEP-5471)** - Finally, numeric threshold-based scheduling with taints. New Gt (greater than) and Lt (less than) operators let you express "I can tolerate risk up to 5%" or "schedule me on nodes with at least 4 GPUs."

**Mutable PersistentVolume Node Affinity (KEP-5381)** - Storage topology that adapts to reality. When you migrate volumes between availability zones, you no longer need to recreate pods and PVs - just update the nodeAffinity.

Plus platform engineering news: - OpenEverest: Percona's database platform goes open governance - GKE Agent Sandbox: Kernel-level isolation for AI agent code execution - MongoBleed (CVE-2025-14847): Critical vulnerability with 87,000 exposed servers - Predictive capacity planning and the shift from reactive to proactive infrastructure

This is Kubernetes evolving from reactive feedback loops to truly predictive infrastructure.

Listen on the web: https://platformengineering.org/podcasts/00087-kubernetes-upcoming-features-deep-dive

Afficher plus Afficher moins

41 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Why Is a 2016 AWS Instance Still the Best Value? (Cloudspecs Research)

Jan 10 2026

New research from TUM reveals uncomfortable truths about cloud hardware stagnation. The paper "Cloudspecs: Cloud Hardware Evolution Through the Looking Glass" shows that the best-performing AWS instance for NVMe I/O per dollar was released in 2016 - and nothing since has come close.

In this episode: • CIDR 2026 research from Technical University of Munich • AWS i3 instances from 2016 still beat all newer options for storage price-performance • CPU gains: 10x cores, but only 2-3x cost-adjusted improvement • Memory crisis: DRAM capacity per dollar has "effectively flatlined" • Network is the only bright spot: 10x improvement per dollar • Interactive tool at cloudspecs.fyi using DuckDB-WASM

News segment covers AI coding tool challenges, Kubernetes updates (Dashboard archived, CoreDNS 1.14), Windows Secure Boot certificate expiration, AWS Lambda .NET 10, Amazon MQ mTLS, MCP criticism, and NVIDIA Rubin announcement.

Episode page: https://platformengineering.org/podcasts/00086-cloudspecs-cloud-hardware-evolution

#PlatformEngineering #CloudComputing #AWS #FinOps #CostOptimization #DevOps

Afficher plus Afficher moins

21 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Iran IPv6 Blackout - When Governments Weaponize Protocol Transitions

Jan 9 2026

The same IPv6 transition your infrastructure team has been procrastinating on is now being weaponized by governments. On January 8, 2026, Iran's IPv6 address space dropped 98.5% while IPv4 remained intact—a surgical strike against mobile users.

In this episode, we break down: - Why blocking IPv6 specifically targets mobile users (hint: carrier NAT exhaustion) - The BGP mechanics of protocol-specific blocking - "Engineered degradation" vs total blackout—the new censorship playbook - How Starlink terminals are changing the calculus for authoritarian internet control - What platform engineers need to know: protocol-specific monitoring, Happy Eyeballs testing, dual-stack resilience

Plus news: Kubernetes 1.35 CSI SA tokens, HashiCorp non-human identity, CoreDNS 1.14.0, OpenTelemetry Slack analysis, AWS Route 53 Global Resolver, and kernel bug hide times.

Links: - Episode page: https://platformengineering.org/podcasts/00085-iran-ipv6-blackout - Cloudflare Radar Iran: https://radar.cloudflare.com/ir - RFC 8305 Happy Eyeballs: https://datatracker.ietf.org/doc/html/rfc8305

Afficher plus Afficher moins

24 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Venezuela BGP Anomaly - Deep Technical Analysis

Jan 8 2026

A deep technical dive into the January 2026 Venezuela BGP route leak incident. Was it a cyberattack? The technical evidence says no - and that's actually more concerning.

In this special deep-dive episode (no news segment), Jordan and Alex break down:

- What actually happened on January 2, 2026 with AS8048 (CANTV, Venezuela's state ISP) - Why 10x AS-path prepending proves this was misconfiguration, not a man-in-the-middle attack - How BGP valley-free routing works and why Type 1 Hairpin leaks happen - The pattern of 11 similar leaks from CANTV since December 2025 - Why your multi-region deployment doesn't protect you from BGP anomalies - RPKI, RFC 9234 OTC, and ASPA - the defenses that exist and why adoption is slow - Practical steps: Check your providers at isbgpsafeyet.com, deploy ROAs, add BGP monitoring

The internet's most critical routing protocol was designed in 1989 when ~160 networks trusted each other. Now 75,000+ autonomous systems operate on that same trust model. Understanding BGP isn't just for network engineers anymore - it's essential context for anyone building on the internet.

Full episode page with transcript and sources: https://platformengineeringplaybook.com/podcasts/00084-venezuela-bgp-anomaly-technical-analysis

#BGP #NetworkSecurity #PlatformEngineering #InternetRouting #RPKI #Kubernetes #DevOps #SRE

Afficher plus Afficher moins

28 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
HolmesGPT: AI Root Cause Analysis for Kubernetes

Jan 8 2026
Deep dive into HolmesGPT, the CNCF Sandbox AI agent that revolutionizes cloud-native troubleshooting. This episode covers what it is, its 40+ integrations, the project roadmap, and how to set it up today.

News Segment:

AirFrance-KLM's secure automation platform with Terraform, Vault, and Ansible
AWS ECS tmpfs mounts on Fargate for secure secrets handling
Qwen 30B running on Raspberry Pi - democratizing edge AI
AWS European Sovereign Cloud with independent EU governance

Main Topic - HolmesGPT:

CNCF Sandbox project (accepted October 2025) with 1,600+ GitHub stars
Agentic architecture: creates investigation task lists, queries systems, synthesizes findings
40+ built-in toolsets: Prometheus, Grafana Loki/Tempo, Kubernetes, ArgoCD, DataDog, and more
Privacy-first: bring your own LLM keys, read-only access, respects RBAC
End-to-end automation with AlertManager, PagerDuty, OpsGenie integration
Installation options: pip, Homebrew, Helm, Web UI, K9s plugin

Resources:

HolmesGPT GitHub
HolmesGPT Documentation
Full Transcript

Episode Type: full Episode Number: 83 Season: 1 Tags: HolmesGPT, CNCF, Kubernetes, root cause analysis, AI ops, troubleshooting, observability, SRE, platform engineering, Robusta, agentic AI
Afficher plus Afficher moins
25 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement

Épisodes

The AI-Cloud Native Symbiosis - How Intelligent Infrastructure is Transforming Platform Engineering

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

MIT 10 Breakthrough Technologies 2026 - The Platform Engineering Perspective

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

AWS Route 53 Global Resolver - Enterprise DNS Security at the Edge

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Kubernetes Upcoming Features Deep Dive - Extended Toleration Operators and Mutable PV Node Affinity

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Why Is a 2016 AWS Instance Still the Best Value? (Cloudspecs Research)

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Iran IPv6 Blackout - When Governments Weaponize Protocol Transitions

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Venezuela BGP Anomaly - Deep Technical Analysis

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

HolmesGPT: AI Root Cause Analysis for Kubernetes

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast