AI Research Today

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

AI Research Today

De : Aaron

Écouter gratuitement

À propos de ce contenu audio

AI Research Today unpacks the latest advancements in artificial intelligence, one paper at a time. We go beyond abstracts and headlines, walking through architectures, experiments, training details, ablations, failure modes, and the implications for future work. Each episode will choose between one and three new, impactful research papers and go through them in depth. We will discuss the papers at the level of an industry practitioner or AI researcher. If you want to understand the newest topics in AI research but don't have the time to dig through the papers yourself, this is your solution.

Science

Les membres Amazon Prime bénéficient automatiquement de 2 livres audio offerts chez Audible.

Vous êtes membre Amazon Prime ?

Bénéficiez automatiquement de 2 livres audio offerts.
Bonne écoute !

Épisodes Voir plus

Learning to Reason in 13 Parameters

Feb 16 2026
Send a text
Link to arxiv: https://arxiv.org/pdf/2602.04118
Large language models have recently shown impressive reasoning abilities, often learned through reinforcement learning and low-rank adaptation techniques like LoRA. But these approaches still assume that effective reasoning requires relatively large adaptation layers. This new paper challenges that assumption by asking a provocative question: how small can a reasoning update really be?
In this episode, we explore Learning to Reason in 13 Parameters, which introduces TinyLoRA, a method that compresses low-rank adapters down to the extreme — in some cases to just a single parameter. Instead of relying on large adaptation matrices, TinyLoRA demonstrates that reasoning behavior can be steered using ultra-minimal parameter updates, dramatically reducing the computational and memory footprint required to teach models new reasoning skills.
We break down:
Why conventional LoRA and low-rank adapters hit a floor at model dimensionality,
How TinyLoRA scales reasoning adapters down to near-zero parameter counts,
What this reveals about where reasoning ability actually lives inside neural networks,
And why tiny adaptation layers could reshape efficient fine-tuning, on-device intelligence, and rapid deployment.
The results suggest that reasoning competence may not require massive structural changes — only precisely targeted parameter nudges. This challenges assumptions about scaling, efficiency, and the true complexity of learned reasoning.
Afficher plus Afficher moins
27 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search

Jan 26 2026
Send us a text
Large Language Models often struggle with complex planning tasks that require exploration, backtracking, and self-correction. Once an LLM commits to an early mistake, its linear chain-of-thought reasoning makes recovery difficult. While search methods like Monte Carlo Tree Search (MCTS) offer a way to explore alternatives, they typically rely on sparse rewards and fail to fully exploit the semantic strengths of language models.
In this episode, we dive into SPIRAL (Symbolic LLM Planning via Grounded and Reflective Search), a new framework that fundamentally rethinks how planning and search interact in LLM-based agents. Instead of treating MCTS as a brute-force optimizer, SPIRAL embeds a cognitive architecture of three specialized LLM roles directly into the search loop:
A Planner proposes creative next actions,
A Simulator grounds those actions by predicting realistic outcomes, and
A Critic reflects on the results to provide dense, informative reward signals.
This planner–simulator–critic loop transforms search into a guided, self-correcting reasoning process, allowing agents to recover from mistakes, evaluate alternatives more effectively, and plan with far greater robustness.
Paper link: https://arxiv.org/pdf/2512.23167
Repo: https://github.com/IBM/SPIRAL
Afficher plus Afficher moins
29 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Meta-RL Induces Exploration In Language Agents

Jan 12 2026
Send us a text
Episode Paper: https://arxiv.org/pdf/2512.16848

In this episode, we dive into a cutting-edge AI research breakthrough that tackles one of the biggest challenges in training intelligent agents: how to explore effectively. Standard reinforcement learning (RL) methods help language model agents learn to interact with environments and solve multi-step tasks, but they often struggle when the tasks require active exploration—that is, learning what to try next when the best strategy isn’t obvious from past experience.
The new paper introduces LaMer, a Meta-Reinforcement Learning (Meta-RL) framework designed to give language agents the ability to learn how to explore. Unlike conventional RL agents that learn a fixed policy, LaMer’s Meta-RL approach encourages agents to flexibly adapt by learning from their own trial-and-error experiences. This means agents can better adapt to novel or more difficult environments without needing massive retraining.
We’ll explain:
Why exploration is critical for long-horizon tasks with delayed or sparse rewards.
How Meta-RL shifts the focus from fixed policies to adaptable exploration behavior.
What LaMer’s results suggest about learned exploration and generalization in AI systems.
Whether you’re into reinforcement learning, multi-agent systems, or the future of adaptive AI, this episode breaks down how Meta-RL could help agents think more like explorers—not just pattern followers.
Afficher plus Afficher moins
29 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement

Aucun commentaire pour le moment

AI Research Today

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

AI Research Today

À propos de ce contenu audio

Vous êtes membre Amazon Prime ?

Learning to Reason in 13 Parameters

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Meta-RL Induces Exploration In Language Agents

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast