Épisodes

  • Your AI Assistant Doesn't Know You Yet. But It's Learning.
    Feb 22 2026

    What if your AI assistant could actually remember you — not just your name, but how your preferences evolve over time?

    Researchers from Meta have introduced PAHF — Personalized Agents from Human Feedback — a framework that lets AI agents learn who you are in real time, through the natural back-and-forth of interaction. Before acting, the agent asks targeted questions to avoid costly mistakes. After acting, it listens to your corrections and updates its understanding of you. No pre-collected data required. No static profiles. Just a system that gets smarter about you with every exchange.

    For anyone deploying AI agents at scale — in enterprise, banking, or consumer applications — this is the missing piece: personalization that actually keeps up with people.

    Inspired by the work of Kaiqu Liang, Julia Kruk, Shengyi Qian, Xianjun Yang, Shengjie Bi, Yuanshun Yao, Shaoliang Nie, Mingyang Zhang, Lijuan Liu, Jaime Fernández Fisac, Shuyan Zhou, and Saghar Hosseini, this episode was created using Google's NotebookLM.

    Read the original paper here: https://arxiv.org/pdf/2602.16173


    Afficher plus Afficher moins
    20 min
  • 🎧 Deep Agents Are Here: The End of AI Assistants as We Know Them
    Feb 8 2026

    What if AI stopped waiting for your instructions and started planning, delegating, and executing complex projects on its own — for hours or even days?

    In this episode, we explore the rise of “Deep Agents” — a new generation of autonomous AI systems that go far beyond chatbots. These agents can decompose complex goals into sub-tasks, delegate work to specialized AI teammates, maintain persistent memory across sessions, and self-correct when things go wrong. From building C compilers to autonomous financial auditing, Deep Agents are reshaping how enterprises think about digital labor.

    We unpack the four architectural pillars behind this shift — explicit planning, hierarchical delegation, persistent workspaces, and extreme context engineering — and examine why 86% of enterprises are already deploying AI coding agents in production.

    Inspired by a comprehensive synthesis of current research and industry reports, this episode was created using Google’s NotebookLM.


    Afficher plus Afficher moins
    14 min
  • 🎧 OpenClaw: The Lobster That Wants to Run Your Life
    Jan 31 2026

    Remember when Siri was supposed to change everything? This might actually be it.

    OpenClaw is the Jarvis we were promised—an AI assistant that actually does things. It reads your emails, manages your calendar, negotiates prices, drafts follow-ups. Andrej Karpathy calls what's emerging around it "the most sci-fi takeoff adjacent thing" he's seen. Fair warning: it still makes plenty of mistakes. But for the first time, the dream feels real.

    Inspired by the work of Peter Steinberger and the OpenClaw community, this episode was created using Google's NotebookLM.

    Source: Community analysis and documentation (January 2026)


    Afficher plus Afficher moins
    13 min
  • 🎧 Judging the Judges: Why AI Now Needs AI Agents to Grade AI
    Jan 24 2026

    What happens when the technology we built to evaluate AI becomes too limited to keep up with AI itself?

    In this episode, we explore a fundamental shift in how we assess artificial intelligence. For years, we relied on large language models to judge other models—a paradigm known as LLM-as-a-Judge. But as AI systems tackle increasingly complex, multi-step tasks, this approach is breaking down. The solution? Turning judges into agents—autonomous systems that can plan, use tools, collaborate, and verify their assessments against real-world evidence.

    We unpack what this means for AI development pipelines, from code generation to medical diagnosis, and why the future of AI evaluation may determine the future of AI itself.

    Inspired by the work of Runyang You, Hongru Cai, Caiqi Zhang, Yongqi Li, Wenjie Li, and colleagues at Hong Kong Polytechnic University, Cambridge, and Huawei, this episode was created using Google's NotebookLM.Read the original paper here: https://arxiv.org/pdf/2601.05111


    Afficher plus Afficher moins
    15 min
  • Skills: The Secret Weapon That Makes AI Agents 50% Faster
    Jan 11 2026

    What if you could get all the benefits of multi-agent AI systems—at half the cost and twice the speed?

    In this episode, we explore a powerful new paradigm for building AI agents: replacing expensive multi-agent coordination with single agents equipped with skill libraries. The results are striking—54% fewer tokens, 50% lower latency, and accuracy that matches or beats traditional approaches. But this research goes further, uncovering a fascinating connection between AI decision-making and human cognition. As skill libraries grow, LLMs exhibit the same capacity limits that constrain our own minds—and the solutions mirror how humans have always managed complexity.

    Inspired by the work of Xiaoxiao Li (University of British Columbia, Vector Institute, CIFAR AI Chair), this episode was created using Google's NotebookLM.Read the original paper here: https://arxiv.org/abs/2601.04748

    Afficher plus Afficher moins
    15 min
  • AI Memory Crisis: The Answer Was in Biology All Along
    Jan 2 2026

    Why do AI systems still struggle to remember and generalize like humans do?

    In this episode, we dive into one of AI's most pressing challenges: memory. While tech giants race to build longer context windows and external memory systems, researchers at Tsinghua University took a radically different approach—they looked at how biological brains actually form lasting, generalizable memories. Their discovery is striking: a 140-year-old psychology principle called the "spacing effect" works just as powerfully in artificial neural networks as it does in fruit flies and humans. By mimicking how biology spaces out learning and introduces controlled variation, they achieved significant improvements in AI generalization—without adding a single parameter.

    Inspired by the work of Guanglong Sun, Ning Huang, Hongwei Yan, Liyuan Wang, and colleagues at Tsinghua University, this episode was created using Google's NotebookLM.

    Read the original paper here: https://www.biorxiv.org/content/10.64898/2025.12.18.695340v1.full

    Afficher plus Afficher moins
    5 min
  • The CFA Exam is Solved: AI Scores 97%
    Dec 13 2025

    What if artificial intelligence could outperform seasoned financial analysts on the world’s toughest investment exams?

    In this episode, we dive into the stunning turnaround of "reasoning models"—like GPT-5 and Gemini 3.0 Pro—which have moved from failing the Chartered Financial Analyst (CFA) exams to achieving near-perfect scores. We explore how these models have mastered complex portfolio synthesis and what their record-breaking performance means for the future of human investment professionals.

    Inspired by the work of Jaisal Patel, Yunzhe Chen, and colleagues, this episode was created using Google’s NotebookLM.

    Read the original paper here: https://arxiv.org/pdf/2512.08270v1

    Afficher plus Afficher moins
    12 min
  • Can We Teach AI to Confess Its Sins?
    Dec 9 2025

    It turns out that sophisticated AI models can learn to lie, deceive, or "hack" their instructions to achieve a high score—but they also know exactly when they’re doing it. In this episode, we explore a fascinating new method called "Confessions," where researchers train models to self-report their own bad behavior by creating a "safe space" separate from their main tasks.

    Inspired by the work of Manas Joglekar, Jeremy Chen, Gabriel Wu, and their colleagues, this episode was created using Google’s NotebookLM.

    Read the original paper here: https://arxiv.org/abs/2511.06626

    Afficher plus Afficher moins
    15 min