Épisodes

  • Deadline Day for Autonomous AI Weapons & Mass Surveillance
    Feb 27 2026

    Will Anthropic be forced to make a version of Claude for war? And does a new paper expose the risks of Claude agents, in both OpenClaw and the field of war? Plus, 5 more twists in the story of the Pentagon versus Anthropic + some AI lab employees, and a petition that could change everything, or nothing...


    Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:44 - Deadline Day + Petition
    02:42 - Twist 1: Existing Deal
    03:26 - Twist 2: Existing Policy
    04:21 - Twist 3: Twin Threats
    05:54 - Twist 4: Interesting Objections
    11:32 - Twist 5: Anthropic’s Dropped Policy


    Dario Statement: https://www.anthropic.com/news/statement-department-of-war

    Google/OpenAI Petition: https://notdivided.org/

    Axios on Amodei Rejection: https://www.axios.com/2026/02/26/anthropic-rejects-pentagon-ai-terms

    FT on US Threat: https://www.ft.com/content/11d27612-d6c5-4cf7-94dd-f65603549b7f

    Politico on Latest: https://archive.ph/20260227013117/https://www.politico.com/news/2026/02/26/incoherent-hegseths-anthropic-ultimatum-confounds-ai-policymakers-00800135

    The Verge on Current Deal: https://www.theverge.com/ai-artificial-intelligence/883456/anthropic-pentagon-department-of-defense-negotiations

    Anthropic RSP change: https://www.anthropic.com/news/responsible-scaling-policy-v3

    Time Magazine on RSP: https://time.com/7380854/exclusive-anthropic-drops-flagship-safety-pledge/

    Agent of Chaos Paper: https://x.com/NatalieShapira/status/2026062499599319526

    AI Agent Reliability Paper: https://arxiv.org/pdf/2602.16666

    My Patreon Video: https://www.patreon.com/posts/real-mystery-ai-151647211

    Patreon Documentary: https://www.patreon.com/posts/our-new-age-of-133960279



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    Afficher plus Afficher moins
    14 min
  • Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI
    Feb 20 2026

    Do we have a new best AI model, or do we have the downfall of benchmarks in general, as a way of capturing machine intelligence? Full breakdown of Gemini 3.1 Pro, guest-starring the new Sonnet 4.6, plus analysis from 7 papers/posts that will give you much needed context. Oh, and a new record on Simple Bench!

    https://epoch.ai/ai-explained-datacenters


    Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained


    Chapters:
    00:00 - Introduction
    00:30 - Post-training Dominance
    04:00 - ARC-AGI 2 Caveat
    05:54 - Simple Bench Record
    08:22 - Hallucination Caveat
    10:05 - Model Card
    11:12 - Exponential Coming
    12:20 - Amodei on Generalizing
    15:10 - One True Benchmark?
    17:02 - Other Metrics…

    Gemini 3.1 Model Card: https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Pro-Model-Card.pdf

    Release: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/

    Where are Agents deployed?: https://www.anthropic.com/research/measuring-agent-autonomy

    Newsletter Post: https://signaltonoise.beehiiv.com/p/4-ai-numbers-that-surprised-me-this-week

    Hallucination AA: https://artificialanalysis.ai/evaluations/omniscience

    Melanie Mitchell: https://x.com/MelMitchell1/status/2022738363548340526
    ARC-AGI-2: https://x.com/arcprize/status/2024522812728496470/photo/1

    Chollet on Agentic Coding and ML: https://x.com/fchollet/status/2024519439140737442

    METR Caveat: https://metr.org/notes/2026-01-22-time-horizon-limitations/

    Talaas Fast: https://chatjimmy.ai/

    Amodei Interview Continual learning: https://www.dwarkesh.com/p/dario-amodei-2?open=false#%C2%A7002942-is-continual-learning-necessary-how-will-it-be-solved

    Metaculus FutureEval: https://www.metaculus.com/futureeval/

    Next Vid to Watch: https://www.patreon.com/posts/what-you-need-to-150647292



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    Afficher plus Afficher moins
    19 min
  • The Two Best AI Models/Enemies Just Got Released Simultaneously
    Feb 6 2026

    The two models that you will hear discussed for at least the next two months - Claude Opus 4.6 and GPT 5.3 Codex - just got released within 26 mins or each other. The full breakdown of around 250 pages of reports, with just the most interest moments, from the battle of which is best, Claude personhood, the surprising misbehaviour of Opus 4.6, and much more

    https://assemblyai.com/aiexplained

    Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

    AI Insiders ($9): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:54 - Self-improvement?
    02:44 - Knowledge Work
    05:30 - Overly agentic behaviour
    09:12 - Who Shouldn’t Use Claude Opus
    11:39 - Step-change?
    15:09 - Claude’s ‘Personhood’

    Hassabis Roadmap: https://www.patreon.com/posts/hassabis-roadmap-149750869

    Release of Opus 4.6: https://www.anthropic.com/news/claude-opus-4-6
    212 Page System Card: https://www-cdn.anthropic.com/0dd865075ad3132672ee0ab40b05a53f14cf5288.pdf
    Claude Code Tip: https://x.com/bcherny/status/2019475897691124107


    GPT Codex 5.3: https://openai.com/index/introducing-gpt-5-3-codex/

    System Card: https://openai.com/index/gpt-5-3-codex-system-card/

    Browse Comp: https://arxiv.org/pdf/2504.12516v1
    Finance Agent: https://www.vals.ai/benchmarks/finance_agent
    Terminal Bench 2: https://arxiv.org/pdf/2601.11868
    Vending Bench: https://andonlabs.com/blog/opus-4-6-vending-bench

    My X post: https://x.com/AIExplainedYT/status/2016851303436095647

    Anthropic Apology: https://x.com/ch402/status/2014066134194995256/photo/1

    Altman rebuttal: https://x.com/sama/status/2019139174339928189
    https://x.com/sama/status/2019140276246442089

    4% of GitHub: https://x.com/dylan522p/status/2019490550911766763



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    Afficher plus Afficher moins
    20 min
  • Claude AI Co-founder Publishes 4 Big Claims about Near Future: Breakdown
    Jan 28 2026

    Anthropic's CEO, who has consistently predicted transformative AI will arrive before 2030, recently published a nearly 20,000-word essay outlining his vision of where AI is heading. The video gives you the highlights. The essay argues that scaling and recursion will advance AI from coding automation to full engineering automation, while warning of economic displacement within 1-2 years and China's trajectory toward AI-enabled totalitarianism. Additionally, Dario Amodei predicts that AI models will increasingly be understood as collections of distinct personas rather than monolithic systems.

    80,000 Hours: https://www.youtube.com/watch?v=B54EQiuO1UU



    Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained


    Chapters:
    00:00 - Introduction
    01:10 - Scaling to software engineers
    06:11 - Permanent Underclass
    10:18 - Totalitarian Nightmares
    16:38 - Collection of Personas

    Essay: https://www.darioamodei.com/essay/the-adolescence-of-technology

    Physics Prediction: https://www.quantamagazine.org/is-particle-physics-dead-dying-or-just-hard-20260126/

    Axios: https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic

    World GDP: https://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG?end=2024&start=1961&view=chart

    Demis Hassabis Counter: https://www.youtube.com/watch?v=q6fq4_uP7aM

    Karpathy 80%: https://x.com/karpathy/status/2015883857489522876

    Machines of Loving Grace: https://www.darioamodei.com/essay/machines-of-loving-grace

    Anthropic LessWrong: https://www.lesswrong.com/posts/5aKRshJzhojqfbRyo/unless-its-governance-changes-anthropic-is-untrustworthy#1__In_private__Dario_frequently_said_he_won_t_push_the_frontier_of_AI_capabilities__later__Anthropic_pushed_the_frontier

    Original Constitution: https://www.anthropic.com/news/claudes-constitution

    New Constitution: https://www.anthropic.com/constitution

    Kimi K2.5: https://x.com/Kimi_Moonshot/status/2016024049869324599

    Societies of Thought, Google DeepMind Paper: https://arxiv.org/pdf/2601.10825

    https://lmcouncil.ai/benchmarks

    https://www.patreon.com/posts/our-new-age-of-133960279



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    Afficher plus Afficher moins
    22 min
  • Anthropic: Our AI just created a tool that can ‘automate all white collar work’, Me:
    Jan 14 2026

    A new tool, with code written by an AI model, has gone omega-viral: Claude Cowork. But is the hype justified? What do the stats say on productivity? Where is the truth in a sea of noise? What is truth? Can we handle the truth? Where's Nemo?

    https://matsprogram.org/s26-aie


    Check out my new app! https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    01:12 - Claude Cowork
    06:48 - Productivity Speed-up + jobs
    09:33 - Comparing Models
    12:00 - Brittle AI Paper

    Cowork Intro: https://x.com/claudeai/thread/2010805682434666759

    'All of it': https://x.com/bcherny/status/2010813886052581538

    'AGI' Claims: https://x.com/deepfates/status/2004994698335879383

    Douglas Interview: https://www.youtube.com/watch?v=TOsNrV3bXtQ&t=2313s

    Job Stats: https://www.oxfordeconomics.com/wp-content/uploads/2026/01/Evidence-of-an-AI-driven-shakeup-of-job-markets-is-patchy.pdf
    Amodei Prediction: https://fortune.com/2025/05/28/anthropic-ceo-warning-ai-job-loss/

    GenAI Traffic: https://x.com/demishassabis/status/2009075877347512545

    Illusion of Insight: https://arxiv.org/pdf/2601.00514
    Entropy Exploration: https://arxiv.org/pdf/2506.14758
    ProRL: https://arxiv.org/pdf/2505.24864

    Genesis Mission: https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission/
    https://deepmind.google/blog/how-were-supporting-better-tropical-cyclone-prediction-with-ai/


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    Afficher plus Afficher moins
    18 min
  • What the Freakiness of 2025 in AI Tells Us About 2026
    Dec 23 2025

    It’s probably not possible to satisfactorily condense a 12 month’s worth of weird progress in AI, as well as predictions for the year to come, into one video. But I’m gonna try anyway because it has been a very strange time.

    http://matsprogram.org/s26-aie


    My new app! https://lmcouncil.ai


    Patreon Interview: https://www.patreon.com/posts/robot-in-your-27-146376094

    Chapters:
    00:00 - Introduction
    00:34 - Reasoning Models … and limits
    02:54 - A playable world
    03:36 - Realism
    03:50 - AI Slop gone mainstream
    05:03 - DolphinGemma
    05:39 - Public Mood
    07:34 - AI Enlisted
    08:30 - GPT-5
    11:05 - Open Weight not out
    13:00 - METR Breakout
    17:30 - VASA-1
    18:28 - Lateral Productivity
    20:15 - 1 or 1000 benchmarks needed?
    24:54 - Continual Learning + Altman on Superintelligence
    28:08 - Automated Information Discovery ft AlphaEvolve


    Hassabis on Generality: https://x.com/demishassabis/status/2003097405026193809
    https://www.youtube.com/watch?v=PqVbypvxDto

    Gemini 3: https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif
    Reasoning Trade-offs: https://arxiv.org/pdf/2504.13837

    DolphinGemma: https://blog.google/technology/ai/dolphingemma/?s=09

    Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/

    METR Time Horizon: https://arxiv.org/pdf/2503.14499
    https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
    Flaws: https://x.com/ShashwatGoel7/status/2002369517499105443
    https://shash42.substack.com/p/how-to-game-the-metr-plot
    https://x.com/METR_Evals/status/2002203627377574113

    GPT-5 - Altman phd in everything: https://edition.cnn.com/2025/08/14/business/chatgpt-rollout-problems

    https://simple-bench.com/

    AI Slop: https://www.youtube.com/watch?v=I_3vxoJDD9k
    https://www.theguardian.com/technology/2025/dec/16/boost-for-artists-in-ai-copyright-battle-as-only-3-per-cent-back-uk-active-opt-out-plan

    Survey: https://x.com/SearchlightInst/status/2001057144842387920/photo/1

    Nvidia Nemotron: https://x.com/percyliang/status/2000608134205985169

    OpenAI Compute Flywheel: https://x.com/OpenAI/status/2001363007209914399/photo/1
    Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ

    AI in Govt: https://x.com/jdcmedlock/status/1939814516503847259

    Benchmark Gaming: https://techcrunch.com/2025/04/07/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores/

    AlphaEvolve: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
    https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content=
    Continual Learning: https://abehrouz.github.io/files/NL.pdf

    Job Risk: https://archive.ph/20250708204527/https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic

    GPT4o: https://x.com/AISafetyMemes/status/1916889492172013989

    Vasa-1: https://www.microsoft.com/en-us/research/project/vasa-1/

    Three Views: https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelines
    Turing Test: https://x.com/tunguz/status/1907185471211422147

    Karpathy Year in Review: https://karpathy.bearblog.dev/year-in-review-2025/

    LLM Brainrot: https://arxiv.org/pdf/2510.13928

    Lateral Productivity: https://www.aisi.gov.uk/frontier-ai-trends-report

    Emotional Quotient: https://arxiv.org/pdf/2511.08394

    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/


    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Afficher plus Afficher moins
    33 min
  • Gemini Exponential, Demis Hassabis' ‘Proto-AGI’ coming, but …
    Dec 19 2025

    The condensed highlights of hours of AI lab leader interviews, model releases, Gemini 3 Flash insights (plus it’s hidden flaw), Hassabis’ ‘proto-AGI’ and much more…

    https://matsprogram.org/apply?utm_source=ai-explained&utm_medium=youtube&utm_campaign=s26

    Also, do check out my new app: https://lmcouncil.ai

    Chapters:
    00:00 - Introduction
    00:50 - Results
    02:44 - But… the Flaw
    04:49 - So Benchmarks are fake? No
    07:37 - Spatial Reasoning + Hassabis
    10:06 - Proto-AGI
    12:07 - Minimal AGI
    15:07 - Compute Slowdown
    17:56 - New Data Paradigm

    Gemini 3 Flash: https://deepmind.google/models/gemini/flash/

    Hassabis Interview: https://www.youtube.com/watch?v=PqVbypvxDto
    Legg Interview: https://www.youtube.com/watch?v=l3u_FAv33G0
    Pre-training Lead Interview: https://www.youtube.com/watch?v=cNGDAqFXvew
    Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ
    Brockman Video: https://x.com/OpenAI/status/2001336514786017417
    Post-Training Reveal: https://x.com/OfficialLoganK/status/2001742530472534442

    Hallucinations Paper: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
    Patreon Hallucinations Vid: https://www.patreon.com/posts/blockers-to-and-139264812
    AA-Omniscience Benchmark: https://artificialanalysis.ai/evaluations/omniscience
    https://arxiv.org/pdf/2511.13029


    lmcouncil.ai/benchmarks
    https://simple-bench.com/
    https://x.com/scaling01/status/1999620587744813205

    5.2 Codex Drop: https://cdn.openai.com/pdf/ac7c37ae-7f4c-4442-b741-2eabdeaf77e0/oai_5_2_Codex.pdf

    OpenAI Compute Trend: https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq

    Cramer Tweet/Response: https://x.com/BorisMPower/status/2001440650210976018

    OpenAI Valuation: ​​https://www.theinformation.com/articles/openai-discussed-raising-tens-billions-valuation-around-750-billion?rc=sy0ihq

    Indian Data: https://www.reuters.com/world/india/with-freebies-openai-google-vie-indian-users-training-data-2025-12-17/

    TheInformation Data: https://x.com/theinformation/status/2001421225751351778

    Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/
    Sima 2: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/
    Veo 3.1: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/

    METR: https://metr.org/blohttps://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/2025-03-19-measuring-ai-ability-to-complete-long-tasks/


    AI Insiders ($9!): https://www.patreon.com/AIExplained


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Afficher plus Afficher moins
    20 min
  • GPT 5.2: OpenAI Strikes Back
    Dec 12 2025

    Full GPT-5.2 breakdown - did OpenAI reclaim the crown? A story of tokens, time and cost, plus 9 details you wouldn’t get just from reading the headlines.

    https://www.youtube.com/@eightythousandhours



    AI Insiders ($9!): https://www.patreon.com/AIExplained
    https://lmcouncil.ai

    Chapters:
    00:00 - Introduction
    00:55 - Better than Human @ Professional Tasks?
    04:42 - Test time Compute
    07:05 - Benchmark Selection
    09:32 - Simple Results + council comparison
    13:01 - Long Context
    13:52 - Self-Improvement
    15:00 - 10 Years + New Models

    Release Page: https://openai.com/index/introducing-gpt-5-2/

    GPT 5.2 Benchmark Comparison: https://www.reddit.com/r/singularity/comments/1pka1y9/gpt52_all_20_benchmarks_rankings_and_pricing/
    https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif
    https://lmcouncil.ai/benchmarks

    Charxiv: https://charxiv.github.io/#leaderboard

    GDPval: https://arxiv.org/pdf/2510.04374
    My vid: https://www.youtube.com/watch?v=oK5LxMaROSA

    Kilpatrick: https://x.com/OfficialLoganK/status/1999270402712023158/photo/1

    Noam Brown: https://x.com/polynoamial/status/1999189845164667132

    New Model in New Year: https://www.theinformation.com/articles/openai-developing-garlic-model-counter-googles-recent-gains?rc=sy0ihq

    10 Years of OpenAI: https://openai.com/index/ten-years/

    GPQA: https://x.com/idavidrein/status/1841265634170278063

    ARC-AGI 1-2: https://arcprize.org/arc-agi/2/

    Sunday Robotics: https://x.com/tonyzzhao/status/1991204839578300813


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/


    https://lmcouncil.ai

    Afficher plus Afficher moins
    18 min