Couverture de Dwarkesh Podcast

Dwarkesh Podcast

Dwarkesh Podcast

De : Dwarkesh Patel
Écouter gratuitement

À propos de ce contenu audio

Deeply researched interviews

www.dwarkesh.comDwarkesh Patel
Science
Épisodes
  • Reiner Pope – The math behind how LLMs are trained and served
    Apr 29 2026

    Did a very different format with Reiner Pope - a blackboard lecture where he walks through how frontier LLMs are trained and served.

    It’s shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk.

    It’s a bit technical, but I encourage you to hang in there – it’s really worth it.

    There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him.

    Recommend watching this one on YouTube so you can see the chalkboard.

    Reiner is CEO of MatX, a new chip startup (full disclosure - I’m an angel investor). He was previously at Google, where he worked on software efficiency, compilers, and TPU architecture.

    Download markdown of transcript here to chat with an LLM.

    Wrote up some flashcards and practice problems to help myself retain what Reiner taught. Hope it's helpful to you too!

    Sponsors

    * Jane Street needs constant access to incredibly low-latency compute. I recently asked one of their engineers, Clark, to talk me through how they meet these demands. Our conversation—which touched on everything from FPGAs to liquid cooling—was extremely helpful as I prepped to interview Reiner. You can watch the full discussion and explore Jane Street’s open roles at janestreet.com/dwarkesh

    * Google’s Gemma 4 is the first open model that’s let me shut off the internet and create a fully disconnected “focus machine”. This is because Gemma is small enough to run on my laptop, but powerful enough to actually be useful. So, to prep for this interview, I downloaded Reiner’s scaling book, disconnected from wifi, and used Gemma to help me break down the material. Check it out at goo.gle/Gemma4

    * Cursor helped me turn some notes I took on how gradients flow during large-scale pretraining into a great animation. At first, I wasn’t sure the best way to visualize the concept, but Cursor’s Composer 2 Fast model let me iterate on different ideas almost instantaneously. You can check out the animation in my recent blog post. And if you have something to visualize yourself, go to cursor.com/dwarkesh

    Timestamps

    (00:00:00) – How batch size affects token cost and speed

    (00:32:09) – How MoE models are laid out across GPU racks

    (00:47:12) – How pipeline parallelism spreads model layers across racks

    (01:03:37) – Why Ilya said, “As we now know, pipelining is not wise.”

    (01:18:59) – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal

    (01:33:02) – Deducing long context memory costs from API pricing

    (02:04:02) – Convergent evolution between neural nets and cryptography



    Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
    Afficher plus Afficher moins
    2 h et 14 min
  • Jensen Huang – TPU competition, why we should sell chips to China, & Nvidia’s supply chain moat
    Apr 15 2026

    I asked Jensen about TPU competition, Nvidia’s lock on the ever more bottlenecked supply chain needed to make advanced chips, whether we should be selling AI chips to China, why Nvidia doesn’t just become a hyperscaler, how it makes its investments, and much more. Enjoy!

    Watch on YouTube; read the transcript.

    Sponsors

    * Crusoe’s cloud runs on state-of-the-art Blackwell GPUs, with Vera Rubin deployment scheduled for later this year. But hardware is only part of the story—for inference, Crusoe’s MemoryAlloy tech implements a cluster-wide KV cache, delivering up to 10x faster TTFT and 5x better throughput than vLLM. Learn more at crusoe.ai/dwarkesh

    * Cursor helped me build an AI co-researcher over the course of a weekend. Now I have an AI agent that I can collaborate with in Google Docs via inline comment threads! And while other agentic coding tools feel like a total black-box, Cursor let me stay on top of the full implementation. You can try my co-researcher out at github.com/dwarkeshsp/ai_coworker, or get started on your own Cursor project today at cursor.com/dwarkesh

    * Jane Street spent ~20,000 GPU hours training backdoors into 3 different language models, then challenged my audience to find the triggers. They received some clever solutions—like comparing the base and fine-tuned versions and extrapolating any differences to reveal the hidden backdoor—but no one was able to solve all 3. So if open problems like this excite you, Jane Street is hiring. Learn more at janestreet.com/dwarkesh

    Timestamps

    (00:00:00) – Is Nvidia’s biggest moat its grip on scarce supply chains?

    (00:16:25) – Will TPUs break Nvidia’s hold on AI compute?

    (00:41:06) – Why doesn’t Nvidia become a hyperscaler?

    (00:57:36) – Should we be selling AI chips to China?

    (01:35:06) – Why doesn’t Nvidia make multiple different chip architectures?



    Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
    Afficher plus Afficher moins
    1 h et 43 min
  • Michael Nielsen – How science actually progresses
    Apr 7 2026
    Really enjoyed chatting with Michael Nielsen about how we recognize scientific progress.It's especially relevant for closing the RL verification loop for scientific discovery.But it's also a surprisingly mysterious and elusive question when you look at the history of human science.We approach this question stories like Einstein (who claimed that he hadn't even heard of the famous Michelson-Morley experiment, which is supposed to have motivated special relativity, until after he had come up with the theory), Darwin (why did it take till 1859 to lay out an idea whose essence every farmer since antiquity must have observed?), Prout (how do you recognize that isotopes exist if you cannot chemically separate them?), and many others.The verification loop on scientific ideas is often extremely long and weirdly hostile. Ancient Athenians dismissed Aristarchus's heliocentrism in the 3rd century BC because it would imply that the stars should shift in the sky as the Earth orbits the sun. The first successful measurement of stellar parallax was in 1838. That's a 2,000-year verification loop.But clearly human science is able to make progress faster than raw experimental falsification/verification would imply, and in cases where experiments are very ambiguous. How?Michael has some very deep and provocative hypotheses about the nature of progress. One I found especially thought-provoking is that aliens will likely have a VERY different science + tech stack than us. Which contradicts the common sense picture of a linear tech tree that I was assuming. And has some interesting implications about how future civilizations might trade and cooperate with each other.Watch on Youtube; read the transcript.Sponsors* Labelbox researchers built a new safety benchmark. Why? Well, current safety benchmarks claim that attacks on top models are successful only a few percent of the time, but the prompts in those benchmarks don’t reflect how real bad actors actually write. You can read Labelbox’s research here. If this could be useful for your work, reach out at labelbox.com/dwarkesh* Mercury has an MCP that lets you give an LLM access to your full transaction history, including things like attached receipts and internal notes. I just used it to categorize my 2025 transactions, and it worked shockingly well. Modern functionality like this is exactly why I use Mercury. Learn more at mercury.com* Jane Street’s ML engineers presented some of their GPU optimization workflows at GTC, showing how they use CUDA graphs, streams, and custom kernels to shave real time off their training runs. You can watch the full talk here. And they open-sourced all the relevant code here. If this kind of stuff excites you, Jane Street is hiring — learn more at janestreet.com/dwarkeshTimestamps(00:00:00) – How scientific progress outpaces its verification loops(00:17:51) – Newton was the last of the magicians(00:23:26) – Why wasn’t natural selection obvious much earlier?(00:29:52) – Could gradient descent have discovered general relativity?(00:50:54) – Why aliens will have a different tech stack than us(01:15:26) – Are there infinitely many deep scientific principles left to discover?(01:26:25) – What drew Michael to quantum computing so early?(01:35:29) – Does science need a new way to assign credit?(01:43:57) – Prolificness versus depth(01:49:17) – What it takes to actually internalize what you learn Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
    Afficher plus Afficher moins
    2 h et 3 min
Aucun commentaire pour le moment