Couverture de Alexa's Input (AI)

Alexa's Input (AI)

Alexa's Input (AI)

De : Alexa Griffith
Écouter gratuitement

Alexa’s Input is a podcast about how technology actually moves forward. Hosted by Alexa Griffith, it features conversations with engineers, founders, CEOs, and leaders shaping today’s tech landscape. Each episode digs into the decisions behind the systems — what’s being built, what’s being questioned, and why it matters now. Opinions are my own Linktree: https://linktr.ee/alexagriffith Website: https://alexagriffith.com/ LinkedIn: https://www.linkedin.com/in/alexa-griffith/ X: @lexal0uAlexa Griffith
Épisodes
  • David Aronchick on Distributed Data Orchestration with Expanso
    Jun 15 2026
    In this episode of Alexa's Input (AI), I sit down with David Aronchick, co-founder and CEO of Expanso and former product lead for Kubernetes at Google.Data is growing everywhere outside your data center. Solar panels in remote across a country. Security cameras at retail stores. IoT sensors across factory floors. And moving that data to the cloud for processing? It's expensive, slow, and often restricted by compliance.David is an expert when it comes to solving distribution problems. He led Kubernetes product at Google, co-founded Kubeflow to bring ML to production, and now he's building Expanso to tackle a difficult constraint: when your data can't move, how do you process it where it lives?We discuss:- The need for distributed data orchestration-Upstream data control: filtering and transforming at the source- Three forces making edge computing inevitable (physics, regulations, economics)- How to build successful open source infrastructure projects- Customer discovery and finding real pain points- His transition from Protocol Labs to founding Expanso- ETL pipelines: moving the first four steps closer to the data- Context loss and lineage in distributed systems- Processing 400,000 signals per second with 150MB agents- AI observability: attaching source metadata to training data- Running ML pipelines at the edge- Real-world deployment challenges (bandwidth, regulations, cost)Expanso is rethinking how we process data in an AI-native world—moving compute to data instead of data to compute. If you want to understand where distributed systems and edge computing are heading, this is a deep dive into the infrastructure layer beneath modern AI applications.General Podcast LinksWatch: https://www.youtube.com/@alexa_griffith Read: https://alexasinput.substack.com/ Listen: https://creators.spotify.com/pod/profile/alexagriffith/ More: https://linktr.ee/alexagriffithLearn more about the host atWebsite: https://alexagriffith.com/ LinkedIn: https://www.linkedin.com/in/alexa-griffith/Find out more about the guest atLinkedIn: https://www.linkedin.com/in/aronchick/ Twitter/X: https://x.com/aronchick GitHub: https://github.com/aronchick Expanso Website: https://expanso.io/ResourcesExpanso Website: https://expanso.io/ Kubernetes: https://kubernetes.io/ Kubeflow: https://www.kubeflow.org/ CNCF (Cloud Native Computing Foundation): https://www.cncf.io/ Protocol Labs: https://protocol.ai/KeywordsDavid Aronchick, Expanso, Kubernetes, Kubeflow, distributed systems, edge computing, data pipelines, ETL, upstream data control, Google Kubernetes Engine, open source, CNCF, observability, log processing, data lineage, provenance, schema enforcement, IoT, edge AI, distributed data, machine learning infrastructure, Protocol Labs, IPFS, Filecoin, data governance, compliance, GDPR, bandwidth optimization, data aggregation, AI infrastructure, multi-cloud, hybrid cloud, real-time processing
    Afficher plus Afficher moins
    1 h et 18 min
  • How vLLM and llm-d Changed AI Inference with Rob Shaw
    Jun 3 2026
    In this episode of Alexa’s Input (AI), I sat down with Rob Shaw from Red Hat to talk about how AI inference evolved from a simple model serving problem into a large-scale distributed systems problem.We explored the infrastructure shifts behind modern LLM serving, including how vLLM and PagedAttention changed the economics and efficiency of inference, why KV cache management became one of the most important bottlenecks in production AI systems, and how orchestration layers like llm-d are emerging to coordinate distributed inference.We also discuss:how LLM inference differs from traditional model serving runtimesKV cache, prefix caching, and cache-aware routingwhy throughput and latency became major infrastructure challengeslong-context agents and repeated inference callsdistributed inference on Kubernetesintelligent routing, flow control, and load balancingprefill/decode disaggregationenterprise AI deployment realitiesvLLM has become one of the most important open-source projects in AI infrastructure, and llm-d represents a newer shift toward treating inference as a coordinated distributed system rather than just a single runtime problem.If you want to better understand the systems layer beneath modern AI applications, this episode is a deep dive into where inference infrastructure is heading next.General Podcast LinksWatch: ⁠⁠⁠⁠⁠⁠https://www.youtube.com/@alexa_griffith⁠⁠⁠⁠⁠⁠Read: ⁠⁠⁠⁠⁠⁠⁠⁠https://alexasinput.substack.com/⁠⁠⁠⁠⁠⁠⁠⁠Listen:⁠⁠ ⁠⁠https://creators.spotify.com/pod/profile/alexagriffith/⁠⁠⁠⁠More: ⁠⁠⁠⁠⁠⁠https://linktr.ee/alexagriffith⁠⁠⁠⁠⁠⁠Learn more about the host atWebsite: ⁠⁠⁠⁠⁠⁠https://alexagriffith.com/⁠⁠⁠⁠⁠⁠LinkedIn: ⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/alexa-griffith/⁠⁠⁠⁠⁠⁠Find out more about the guest at:LinkedIn: https://www.linkedin.com/in/robert-shaw-1a01399a/ Red Hat Articles: https://developers.redhat.com/author/robert-shawGithub: https://github.com/robertgshaw2-redhat ResourcesvLLM Website: https://vllm.ai/vLLM GitHub Repository: https://github.com/vllm-project/vllmllm-d Website: https://llm-d.ai/llm-d GitHub Repository - https://github.com/llm-d/llm-d KeywordsAI inference, VLLM, LMD, distributed inference, GPU optimization, open source AI, Kubernetes, multi-cluster deployment, AI infrastructure, enterprise AI AI infrastructure, Kubernetes, model optimization, speculative decoding, mixture of experts, AI deployment, performance tuning, AI systems, neural network scaling Key TopicsEvolution of vLLM and llm-dDistributed inference and routingGPU utilization and performance optimizationOpen source AI infrastructureEnterprise deployment challenges and solutions Standardization in Kubernetes for NIC exposurePerformance optimizations: quantization and speculative decodingMixture of experts architecture and parallelism strategiesFlow control and request scheduling in AI systemsEmerging hardware for AI inference, Cerebras processorReinforcement learning and AI system supportModular architecture of vLLM and ecosystem projects
    Afficher plus Afficher moins
    1 h et 43 min
  • Intelligence Per Watt with Emilio Andere
    May 24 2026

    On this episode of Alexa’s Input (AI), I sit down with Emilio Andere, co-founder and CEO of Wafer, to talk about the future of AI infrastructure, inference optimization, and the economics driving the AI compute race.

    We discuss:

    • why “intelligence per watt” may become one of the defining metrics of the AI era
    • the current GPU and accelerator landscape across NVIDIA, AMD, TPUs, and emerging hardware startups
    • why software optimization is becoming just as important as hardware itself
    • inference optimization strategies
    • why AI infrastructure companies are racing up the stack
    • what it’s actually like building an AI infrastructure startup today

    and more!

    Emilio also shares lessons from founding Wafer, thoughts on the future of open-source AI infrastructure, and why he believes optimizing intelligence itself could become one of the most important engineering problems.


    General Podcast Links

    Watch: ⁠⁠⁠⁠⁠⁠https://www.youtube.com/@alexa_griffith⁠⁠⁠⁠⁠⁠

    Read: ⁠⁠⁠⁠⁠⁠⁠⁠https://alexasinput.substack.com/⁠⁠⁠⁠⁠⁠⁠⁠

    Listen:⁠⁠ ⁠⁠https://creators.spotify.com/pod/profile/alexagriffith/⁠⁠⁠⁠

    More: ⁠⁠⁠⁠⁠⁠https://linktr.ee/alexagriffith⁠⁠⁠⁠⁠⁠


    Learn more about the host at

    Website: ⁠⁠⁠⁠⁠⁠https://alexagriffith.com/⁠⁠⁠⁠⁠⁠

    LinkedIn: ⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/alexa-griffith/⁠⁠⁠⁠⁠⁠


    Find out more about the guest at:

    LinkedIn: https://www.linkedin.com/in/emi-andere/

    Wafer Website: https://www.wafer.ai/

    Wafer AI / Y Combinator Article: https://www.ycombinator.com/companies/wafer


    Chapters

    00:00 Exploring AI Conversations and Recent Podcasts

    02:14 Intelligence per Watt: A New Metric for AI

    07:35 The Manifesto: Efficiency in Civilization

    12:40 Founding Wafer: The Journey Begins

    18:08 The GPU Hardware Landscape and Market Dynamics

    23:07 AMD's Growing Presence in the GPU Market

    24:07 Emerging Competitors in the AI Hardware Space

    26:04 Comparing TPUs and GPUs

    27:21 Acquisition and Availability of TPUs

    28:33 Navigating the GPU Marketplace

    30:05 Understanding Neo Cloud Economics

    33:30 The AI Bubble Debate

    36:25 Optimizing AI Models for Performance

    44:46 Bottlenecks in AI Model Performance

    48:08 Future Directions in AI Hardware Optimization

    54:39 Balancing Speed and Cost in AI Performance

    56:54 Kernel Arena: Benchmarking AI Performance

    01:03:45 Lessons from Founding: Sales and Emotional Resilience

    01:07:38 The Future of AI: Trends and Predictions

    01:13:03 Outro


    Keywords

    AI hardware, inference optimization, intelligence per watt, GPU market, AI infrastructure, Wafer, AI bubble, TPU, GPU bottleneck, AI efficiency AI optimization, large language models, AI hardware, quantization, speculative decoding, benchmarking, AI infrastructure, model training, AI startups





    Afficher plus Afficher moins
    1 h et 14 min
adbl_web_anon_alc_button_suppression_t1
Aucun commentaire pour le moment