Palisade Research Podcast

Épisodes

Do AI Models Lie on Purpose? Scheming, Deception, and Alignment with Marius Hobbhahn of Apollo Research

Jan 17 2026

Marius Hobbhahn is the CEO and co-founder of Apollo Research. Through a joint research project with OpenAI, his team discovered that as models become more capable, they are developing the ability to hide their true reasoning from human oversight.
Jeffrey Ladish, Executive Director of Palisade Research, talks with Marius about this work. They discuss the difference between hallucination and deliberate deception and the urgent challenge of aligning increasingly capable AI systems.
Links:
Marius’ Twitter: https://twitter.com/mariushobbhahn
Apollo Research Twitter: https://twitter.com/apolloaievals
Apollo Research: https://www.apolloresearch.ai
Palisade Research: https://palisaderesearch.org/
Twitter/X: https://x.com/PalisadeAI
Anti-Scheming Project: https://www.antischeming.ai
Research paper “Stress Testing Deliberative Alignment for Anti-Scheming Training”: https://www.arxiv.org/pdf/2509.15541
Blog posts from OpenAI and Apollo: https://openai.com/index/detecting-and-reducing-scheming-in-ai-models/ https://www.apolloresearch.ai/research/stress-testing-deliberative-alignment-for-anti-scheming-training/

Afficher plus Afficher moins

1 h et 25 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement

Épisodes

Do AI Models Lie on Purpose? Scheming, Deception, and Alignment with Marius Hobbhahn of Apollo Research

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast