On this episode of Alexa’s Input (AI), I sit down with Emilio Andere, co-founder and CEO of Wafer, to talk about the future of AI infrastructure, inference optimization, and the economics driving the AI compute race.
We discuss:
- why “intelligence per watt” may become one of the defining metrics of the AI era
- the current GPU and accelerator landscape across NVIDIA, AMD, TPUs, and emerging hardware startups
- why software optimization is becoming just as important as hardware itself
- inference optimization strategies
- why AI infrastructure companies are racing up the stack
- what it’s actually like building an AI infrastructure startup today
and more!
Emilio also shares lessons from founding Wafer, thoughts on the future of open-source AI infrastructure, and why he believes optimizing intelligence itself could become one of the most important engineering problems.
General Podcast Links
Watch: https://www.youtube.com/@alexa_griffith
Read: https://alexasinput.substack.com/
Listen: https://creators.spotify.com/pod/profile/alexagriffith/
More: https://linktr.ee/alexagriffith
Learn more about the host at
Website: https://alexagriffith.com/
LinkedIn: https://www.linkedin.com/in/alexa-griffith/
Find out more about the guest at:
LinkedIn: https://www.linkedin.com/in/emi-andere/
Wafer Website: https://www.wafer.ai/
Wafer AI / Y Combinator Article: https://www.ycombinator.com/companies/wafer
Chapters
00:00 Exploring AI Conversations and Recent Podcasts
02:14 Intelligence per Watt: A New Metric for AI
07:35 The Manifesto: Efficiency in Civilization
12:40 Founding Wafer: The Journey Begins
18:08 The GPU Hardware Landscape and Market Dynamics
23:07 AMD's Growing Presence in the GPU Market
24:07 Emerging Competitors in the AI Hardware Space
26:04 Comparing TPUs and GPUs
27:21 Acquisition and Availability of TPUs
28:33 Navigating the GPU Marketplace
30:05 Understanding Neo Cloud Economics
33:30 The AI Bubble Debate
36:25 Optimizing AI Models for Performance
44:46 Bottlenecks in AI Model Performance
48:08 Future Directions in AI Hardware Optimization
54:39 Balancing Speed and Cost in AI Performance
56:54 Kernel Arena: Benchmarking AI Performance
01:03:45 Lessons from Founding: Sales and Emotional Resilience
01:07:38 The Future of AI: Trends and Predictions
01:13:03 Outro
Keywords
AI hardware, inference optimization, intelligence per watt, GPU market, AI infrastructure, Wafer, AI bubble, TPU, GPU bottleneck, AI efficiency AI optimization, large language models, AI hardware, quantization, speculative decoding, benchmarking, AI infrastructure, model training, AI startups