Couverture de The Information Bottleneck

The Information Bottleneck

The Information Bottleneck

De : Ravid Shwartz-Ziv & Allen Roush
Écouter gratuitement

À propos de ce contenu audio

Two AI Researchers - Ravid Shwartz Ziv, and Allen Roush, discuss the latest trends, news, and research within Generative AI, LLMs, GPUs, and Cloud Systems.2025 Ravid Shwartz-Ziv & Allen Roush Science
Les membres Amazon Prime bénéficient automatiquement de 2 livres audio offerts chez Audible.

Vous êtes membre Amazon Prime ?

Bénéficiez automatiquement de 2 livres audio offerts.
Bonne écoute !
    Épisodes
    • EP27: Medical Foundation Models - with Tanishq Abraham (Sophont.AI)
      Mar 2 2026

      Tanishq Abraham, CEO and co-founder of Sophont.ai, joins us to talk about building foundation models specifically for medicine.

      Sophont is trying to be something like an OpenAI or Anthropic but for healthcare - training models across pathology, neuroimaging, and clinical text, to eventually fuse them into one multimodal system. The surprising part: their pathology model trained on 12,000 public slides performs on par with models trained on millions of private ones. Data quality beats data quantity.

      We talk about what actually excites Tanishq, which is not replacing doctors, but finding things doctors can't see. AI predicting gene mutations from a tissue slide, or cardiovascular risk from an eye scan.

      We also talk about the regulation and how the picture is less scary than people assume. Text-based clinical decision support can ship without FDA approval. Pharma partnerships offer near-term impact. The five-to-ten-year timeline people fear is really about drug discovery, not all of medical AI.

      Takeaways:

      • The real promise of medical AI is finding hidden signals in existing data, not just automating doctors
      • Small, curated public datasets can rival massive private ones
      • Multimodal fusion is the goal, but you need strong individual encoders first
      • AI research itself might get automated sooner than biology or chemistry
      • FDA regulation has more flexibility than most people think

      Timeline

      (00:12) Introduction and guest welcome

      (02:32) Anthropic's ad about ChatGPT ads

      (07:26) XAI merging into SpaceX

      (13:32) Vibe coding one year later

      (17:00) Claude Code and agentic workflows

      (21:52) Can AI automate AI research?

      (26:57) What is medical AI

      (31:06) Sofont as a frontier medical AI lab

      (33:52) Public vs. private data - 12K slides vs. millions

      (36:43) Domain expertise vs. scaling

      (41:54) Cancer, diabetes, and personal stakes

      (47:52) Classification vs. prediction in medicine

      (50:36) When doctors disagree

      (54:43) Quackery and AI

      (57:15) Uncertainty in medical AI

      (1:03:11) Will AI replace doctors?

      (1:07:24) Self-supervised learning on sleep data

      (1:10:10) Aligning modalities

      (1:13:17) FDA regulation

      (1:22:28) Closing

      Music:

      • "Kid Kodi" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.
      • "Palms Down" - Blue Dot Sessions - via Free Music Archive - CC BY-NC 4.0.

      Changes: trimmed

      About

      The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

      Afficher plus Afficher moins
      1 h et 26 min
    • EP26: Measuring Intelligence in the Wild - Arena and the Future of AI Evaluation
      Feb 24 2026

      Anastasios Angelopoulos, Co-Founder and CEO of Arena AI (formerly LMArena), joins us to talk about why static benchmarks are failing, how human preference data actually works under the hood, and what it takes to be the "gold standard" of AI evaluation.

      Anastasios sits at a fascinating intersection - a theoretical statistician running the platform that every major lab watches when they release a model. We talk about the messiness of AI-generated code slop (yes, he hides Claude's commits too), then dig into the statistical machinery that powers Arena's leaderboards and why getting evaluation right is harder than most people think.

      We explore why style control is both necessary and philosophically tricky, where you can regress away markdown headers and response length, but separating style from substance is a genuinely unsolved causal inference problem. We also get into why users are surprisingly good judges of model quality, how Arena serves as a pre-release testing ground for labs shipping stealth models under codenames, and whether the fragmentation of the AI market (Anthropic going enterprise, OpenAI going consumer, everyone going multimodal) is actually a feature, not a bug. Plus, we discuss the role of rigorous statistics in the age of "just run it again," why structured decoding can hurt model performance, and what Arena's 2026 roadmap looks like.

      Timeline:

      (00:12) Introduction and Anastasios's Background

      (00:55) What Arena Does and Why Static Benchmarks Aren't Enough

      (02:26) Coverage of Use Cases - Is There Enough?

      (04:22) Style Control and the Bradley-Terry Methodology

      (08:35) Can You Actually Separate Style from Substance?

      (10:24) Measuring Slop - And the Anti-Slop Paper Plug

      (11:52) Can Users Judge Factual Correctness?

      (13:31) Tool Use and Agentic Evaluation on Arena

      (14:14) Intermediate Feedback Signals Beyond Final Preference

      (15:30) Tool Calling Accuracy and Code Arena

      (17:42) AI-Generated Code Slop and Hiding Claude's Commits

      (19:49) Do We Need Separate Code Streams for Humans and LLMs?

      (20:01) RL Flywheels and Arena's Preference Data

      (21:16) Focus as a Startup - Being the Evaluation Company

      (22:16) Structured vs. Unconstrained Generation

      (25:00) The Role of Rigorous Statistics in the Age of AI

      (29:23) LLM Sampling Parameters and Evaluation Complexity

      (30:56) Model Versioning and the Frequentist Approach to Fairness

      (32:12) Quantization and Its Effects on Model Quality

      (33:10) Pre-Release Testing and Stealth Models (34:23) Transparency - What to Share with the Public vs. Labs

      (36:27) When Winning Models Don't Get Released

      (36:59) Why Users Keep Coming Back to Arena

      (38:19) Market Fragmentation and Arena's Future Value

      (39:37) Custom Evaluation Frameworks for Specific Users

      (40:03) Arena's 2026 Roadmap - Science, Methodology, and New Paradigms

      (42:15) The Economics of Free Inference

      (43:13) Hiring and Closing Thoughts

      Music:

      • "Kid Kodi" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.
      • "Palms Down" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.
      • Changes: trimmed

      About:

      The Information Bottleneck is hosted by Ravid Shwartz-Ziv and Allen Roush, featuring in-depth conversations with leading AI researchers about the ideas shaping the future of machine learning.

      Afficher plus Afficher moins
      45 min
    • EP25: Personalization, Data, and the Chaos of Fine-Tuning with Fred Sala (UW-Madison / Snorkel AI)
      Feb 17 2026

      Fred Sala, Assistant Professor at UW-Madison and Chief Scientist at Snorkel AI, joins us to talk about why personalization might be the next frontier for LLMs, why data still matters more than architecture, and how weak supervision refuses to die.

      Fred sits at a rare intersection, building the theory of data-centric AI in academia while shipping it to enterprise clients at Snorkel. We talk about the chaos of OpenClaw (the personal AI assistant that's getting people hacked the old-fashioned way, via open ports), then focus on one of the most important questions: how do you make a model truly yours?

      We dig into why prompting your preferences doesn't scale, why even LoRA might be too expensive for per-user personalization, and why activation steering methods like REFT could be the sweet spot. We also explore self-distillation for continual learning, the unsolved problem of building realistic personas for evaluation, and Fred's take on the data vs. architecture debate (spoiler: data is still undervalued). Plus, we discuss why the internet's "Ouroboros effect" might not doom pre-training as much as people fear, and what happens when models become smarter than the humans who generate their training data.

      Takeaways:

      • Personalization requires ultra-efficient methods - even one LoRA per user is probably too expensive. Activation steering is the promising middle ground.
      • The "pink elephant problem" makes prompt-based personalization fundamentally limited - telling a model what not to do often makes it do it more.
      • Self-distillation can enable on-policy continual learning without expensive RL reward functions, dramatically reducing catastrophic forgetting.
      • Data is still undervalued relative to architecture and compute, especially high-quality post-training data, which is actually improving, not getting worse.
      • Weak supervision principles are alive and well inside modern LLM data pipelines, even if people don't call it that anymore.

      Timeline:

      (00:13) Introduction and Fred's Background

      (00:39) OpenClaw — The Personal AI Assistant Taking Over Macs

      (03:43) Agent Security Risks and the Privacy Problem

      (05:13) Cloud Code, Permissions, and Living Dangerously

      (07:47) AI Social Media and Agents Talking to Each Other

      (08:56) AI Persuasion and Competitive Debate

      (09:51) Self-Distillation for Continual Learning

      (12:43) What Does Continual Learning Actually Mean?

      (14:12) Updating Weights on the Fly — A Grand Challenge

      (15:09) The Personalization Problem — Motivation and Use Cases

      (17:41) The Pink Elephant Problem with Prompt-Based Personalization

      (19:58) Taxonomy of Personalization — Preferences vs. Tone vs. Style

      (21:31) Activation Steering, REFT, and Parameter-Efficient Fine-Tuning

      (27:00) Evaluating Personalization — Benchmarks and Personas

      (31:14) Unlearning and Un-Personalization

      (31:51) Cultural Alignment as Group-Level Personalization

      (41:00) Can LLM Personas Replace Surveys and Polling?

      (44:32) Is Continued Pre-Training Still Relevant?

      (46:28) Data vs. Architecture — What Matters More?

      (52:25) Multi-Epoch Training — Is It Over?

      (54:53) What Makes Good Data? Matching Real-World Usage

      (59:23) Decomposing Uncertainty for Better Data Selection

      (1:01:52) Mapping Human Difficulty to Model Difficulty

      (1:04:49) Scaling Small Ideas — From Academic Proof to Frontier Models

      (1:12:01) What Happens When Models Surpass Human Training Data?

      (1:15:24) Closing Thoughts

      Music:

      • "Kid Kodi" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.
      • "Palms Down" — Blue Dot Sessions — via Free Music Archive — CC BY-NC 4.0.
      • Changes: trimmed
      Afficher plus Afficher moins
      1 h et 16 min
    Aucun commentaire pour le moment