Épisodes

  • Trajectory Releases a Concurrent Multi-LoRA Training Stack for Continual Learning, Reporting a 2.81× — 2026-05-31
    May 31 2026
    ## Short Segments SkillNet transforms AI agents by integrating reusable skills for search, evaluation, and task planning. Today, we're diving into how SkillNet enables AI agents to leverage a vast library of skills, enhancing their ability to tackle complex tasks efficiently. Later, we'll explore Trajectory's breakthrough in continual learning with their multi-LoRA training stack, promising a 2.81× increase in experiment throughput. SkillNet offers a practical framework for building skill-augmented AI agents. By setting up a SkillNet client, developers can discover, install, and evaluate AI skills, transforming them into a structured skill graph. This approach allows AI agents to break down complex goals into subtasks, discover relevant skills, and assemble an execution pipeline. With SkillNet, AI systems can now accumulate and reuse skills, much like humans do, enhancing their performance across various domains. This development is crucial for AI's evolution, as it addresses the challenge of skill accumulation and transfer, enabling agents to perform better in diverse environments. By integrating SkillNet, AI agents can achieve significant performance improvements, making them more adaptable and efficient in real-world applications. ## Feature Story Trajectory's multi-LoRA training stack revolutionizes continual learning with a 2.81× experiment-throughput gain. In a field where language models typically improve through discontinuous updates, Trajectory's approach offers a new paradigm. By partnering with UC Berkeley Sky Lab and Anyscale, Trajectory has developed a concurrent, multi-LoRA training platform that integrates continual learning into live systems. Traditional training methods involve a linear lifecycle, where models are trained, deployed, and then updated in large, infrequent batches. This process can lead to significant changes in model behavior, sometimes resulting in unexpected outcomes for users. Trajectory's solution aims to replace this cycle with a system that continuously learns from live feedback and production interactions. This means that AI models can now update in real-time, learning from user interactions and improving incrementally. The core of this innovation lies in the multi-LoRA training stack, which allows for concurrent training of multiple low-rank adapters (LoRAs). This setup enables models to learn from diverse data streams simultaneously, significantly increasing the throughput of experiments. By open-sourcing their training code in the NovaSky-AI/SkyRL repository, Trajectory has made this technology accessible to the broader AI community. Continual learning is particularly beneficial for applications where models need to adapt quickly to new information. For instance, a coding agent could learn new engineering patterns as developers correct its work, or a support agent could improve its problem-solving skills by handling complex tickets. This approach not only enhances the adaptability of AI systems but also reduces the time and resources required for model updates. Trajectory's multi-LoRA stack represents a significant advancement in AI training infrastructure. By enabling models to learn continuously, it addresses a major barrier in AI progress, allowing for more responsive and personalized AI systems. As AI continues to evolve, the ability to integrate continual learning into live systems will be crucial for developing more intelligent and adaptable models. With this breakthrough, Trajectory is paving the way for a new era of AI development, where models can improve in real-time, offering more reliable and efficient solutions to users.
    Afficher plus Afficher moins
    4 min
  • Hermes Agent Ships Tool Search for MCP: Anthropic Evals Show 49% to 74% Accuracy Gain on Opus 4 — 2026-05-30
    May 30 2026
    ## Short Segments Genesis AI's new platform, Genesis World 1.0, slashes robotics evaluation time from days to minutes. Today, we'll explore how this breakthrough accelerates model development, and later, we'll dive into Hermes Agent's Tool Search feature, which boosts AI accuracy by up to 74%. But first, let's look at Genesis World 1.0's impact on robotics. Genesis AI has launched Genesis World 1.0, a comprehensive simulation platform designed to revolutionize robotics model evaluation. This platform includes a physics engine, a real-time renderer called Nyx, a Python-to-GPU compiler named Quadrants, and a simulation interface. By addressing the bottleneck of slow model evaluation cycles, Genesis World 1.0 allows developers to run evaluations in under 0.5 hours, compared to the 200 hours required for real-world testing. This dramatic reduction in time is achieved without human intervention or hardware, ensuring consistent results across runs. The platform's focus on evaluation rather than training data generation helps avoid overfitting to simulator dynamics, ensuring genuine model improvements. For robotics teams, this means faster iteration and more reliable model assessments, paving the way for quicker advancements in the field. AgentTrove offers a new way to handle massive datasets of agent interactions, streaming 1.7 million traces for efficient analysis. This tutorial guides users through leveraging AgentTrove, one of the largest open-source collections of agentic interaction traces. Instead of downloading the entire dataset, users can stream data to inspect rows, normalize agent turns, and understand message structures. Utilities are provided to parse command-style outputs, render trajectories, and analyze agent-tool interactions across tasks. The workflow includes sampling traces, converting them into DataFrames, summarizing statistics, and exporting successful traces into a ShareGPT-style JSONL format for supervised fine-tuning. This approach allows developers to efficiently manage and analyze large datasets, enhancing their ability to fine-tune AI models with real-world interaction data. ## Feature Story Hermes Agent's new Tool Search feature significantly boosts AI accuracy by dynamically selecting relevant tools. Nous Research has introduced this feature to tackle the problem of MCP tools overwhelming AI context windows. In AI systems, connecting multiple MCP servers results in every tool's JSON schema being sent to the model on each turn, even if only a few tools are needed. This leads to bloated context windows, with deployments showing average prompt sizes of 45,000 tokens per turn, half of which are tool schema overhead. Anthropic's data highlights that tool definitions can consume up to 134,000 tokens, creating cost and accuracy issues. Cache-miss generations can cost up to $0.10 per turn, and decision paralysis occurs when models face hundreds of irrelevant tool options. Hermes Agent's Tool Search addresses these issues by dynamically retrieving only the necessary tools, reducing token overhead and improving decision-making accuracy. Anthropic's evaluations show a 49% to 74% accuracy gain on Opus 4 models, demonstrating the feature's effectiveness. This development allows AI systems to operate more efficiently and cost-effectively, with reduced context window sizes and improved task performance. As AI deployments grow, the ability to manage tool selection dynamically will be crucial for maintaining system efficiency and accuracy. Looking ahead, the integration of Tool Search into AI workflows could set a new standard for managing complex tool ecosystems, ensuring that AI agents remain agile and effective in diverse applications.
    Afficher plus Afficher moins
    4 min
  • Hexo Labs Open-Sources SIA: A Self-Improving Agent That Updates Both the Harness and the Model Weights — 2026-05-29
    May 29 2026
    ## Short Segments GPU communication bottlenecks are getting a major overhaul with the release of mKernel, a new library from UC Berkeley's UCCL project. This development promises to cut down on the significant overhead that GPU communication imposes on AI workloads. Coming up, we'll dive into Hexo Labs' ambitious open-source release of SIA, a self-improving AI framework that could redefine how AI agents evolve. Now, let's explore mKernel's impact. The library fuses intra-node NVLink communication, inter-node RDMA, and compute into a single kernel, addressing the inefficiencies of host-driven communication. Traditional methods rely on CPUs to manage GPU communication, which can lead to pipeline bubbles and inefficient overlap of compute and communication. mKernel's approach integrates these processes, potentially reducing execution time by up to 47% in Mixture-of-Experts models. This advancement could significantly enhance the performance of AI systems by minimizing communication delays and maximizing GPU utilization. ## Feature Story Hexo Labs has open-sourced SIA, a self-improving AI framework that updates both the harness and the model weights, marking a significant shift in AI agent development. Unlike traditional AI agents that require human intervention for improvements, SIA operates autonomously, continuously refining its performance. This open-source release under an MIT license aims to democratize AI development by allowing developers to experiment with and enhance the framework. SIA's architecture divides a task-specific agent into two components: the harness, which includes system prompts and tool-dispatch logic, and the model weights. The framework employs three LLM components to drive its self-improvement loop. A Meta-Agent constructs the initial scaffold from task specifications, while a Task-Specific Agent executes the task and logs its process. The Feedback-Agent then reviews this trajectory to determine necessary changes. The decision-making process is pivotal. After each task execution, the Feedback-Agent can either modify the scaffold while keeping the weights constant or update the weights while maintaining the scaffold. This dual-update capability is what sets SIA apart, allowing it to adapt and optimize both its structure and learning parameters. SIA utilizes the openai/gpt-oss-120b model as its base, with weight updates facilitated by LoRA, a low-rank adapter. The Meta-Agent and Feedback-Agent operate on Claude Sonnet 4.6, and training is conducted on H100 GPUs via Modal, Hexo Labs' reinforcement learning platform. The framework offers two operational modes: SIA-H, which focuses solely on harness updates, and SIA-W+H, which incorporates weight updates as well. Hexo Labs claims that SIA can accelerate the path to superintelligence by 350 times, a bold assertion that has garnered attention and skepticism. While the potential for such rapid advancement is intriguing, experts urge caution and thorough evaluation of these claims. The open-source nature of SIA allows for community-driven exploration and validation, which could either substantiate or challenge Hexo Labs' projections. This release comes at a time when major labs and startups are increasingly focusing on autonomous agent frameworks. SIA's ability to iteratively improve without human intervention positions it as a potentially transformative tool in the AI landscape. As developers and researchers begin to experiment with SIA, the framework's real-world impact will become clearer. In summary, Hexo Labs' SIA represents a significant step forward in AI agent development, offering a self-improving mechanism that could redefine how AI systems evolve. The open-source release invites a broader community to engage with and enhance the framework, potentially accelerating advancements in AI capabilities. As the AI community delves into SIA's capabilities, the framework's true potential and limitations will be revealed, shaping the future of AI development.
    Afficher plus Afficher moins
    4 min
  • Perplexity AI Open-Sources Unigram Tokenizer That Achieves 5x Lower p50 Latency Than Hugging Face — 2026-05-28
    May 28 2026
    ## Short Segments Perplexity AI's new Unigram tokenizer slashes latency by 5x, while Sakana AI's DiffusionBlocks offer a fresh take on neural network training. Later, we'll dive into how Perplexity's open-source release could reshape tokenization in AI workflows. First, let's explore Sakana AI's innovative approach to training deep networks. Sakana AI introduces DiffusionBlocks, a novel framework for training neural networks block by block. This approach significantly reduces memory requirements, addressing a major bottleneck in deep learning. Traditional end-to-end backpropagation demands storing intermediate activations across all layers, leading to high memory consumption as models deepen. DiffusionBlocks tackle this by partitioning networks into independently trainable blocks, cutting memory usage by a factor of B, where B is the number of blocks. This method maintains performance across various architectures, unlike previous techniques that often underperform. By treating the network's forward pass as a diffusion-like denoising process, DiffusionBlocks offer a promising alternative to conventional training methods. For developers, this means more efficient training of complex models without sacrificing performance, potentially accelerating AI research and deployment. Implementing a pgvector-powered vector search system in PostgreSQL is now more accessible than ever. A new coding guide demonstrates how to build a complete pgvector playground in Google Colab, showcasing PostgreSQL's capabilities as a vector database for AI applications. The tutorial covers installing PostgreSQL, compiling the pgvector extension, and integrating with Python via Psycopg. It also explores creating embeddings with SentenceTransformers, building HNSW indexes, and running various search types, including semantic and hybrid searches. This workflow highlights pgvector's support for retrieval-augmented generation, recommendation, and similarity search systems using open-source tools. For developers, this guide offers a practical path to leveraging PostgreSQL for advanced AI-driven search capabilities, enhancing the efficiency and effectiveness of AI applications. ## Feature Story Perplexity AI's open-source Unigram tokenizer promises to revolutionize tokenization efficiency in AI workflows. Rebuilt from scratch in Rust, this tokenizer achieves a 5x reduction in p50 latency compared to the Hugging Face tokenizers crate, and significantly outperforms other popular tokenizers like SentencePiece and IREE's tokenizer. By eliminating steady-state heap allocations, it reduces CPU utilization in Perplexity's inference stack by 5-6x, shaving milliseconds off reranker latency. This development addresses a critical bottleneck in AI processing, where tokenization can become a significant fraction of total request latency, especially in smaller models like rerankers and embedders. These models, often used for ranking, retrieval, and similarity tasks, require efficient tokenization to maximize performance. The Unigram tokenizer targets XLM-RoBERTa's 250K-token vocabulary, a common choice in production environments. By producing the same tokens as the reference implementation without rebuilding strings or chasing hash maps, it offers a streamlined solution for text processing. For AI developers and researchers, this open-source release provides a powerful tool to enhance the efficiency of language model inference, potentially reducing costs and improving response times in AI applications. As tokenization efficiency becomes increasingly important in AI workflows, Perplexity's contribution could set a new standard for performance and resource utilization. Looking ahead, the adoption of this tokenizer could lead to broader improvements in AI processing, particularly in applications where latency and resource constraints are critical factors. For now, developers have a new tool to optimize their AI systems, paving the way for more efficient and effective AI solutions.
    Afficher plus Afficher moins
    4 min
  • MEMO: A Modular Framework for Training a Dedicated Memory Model on New Knowledge Without Modifying LLM — 2026-05-27
    May 27 2026
    ## Short Segments Speculative decoding just got a major reliability boost with EAGLE 3.1, fixing attention drift in LLM inference. Today, we're diving into how EAGLE 3.1 enhances speculative decoding, a technique that speeds up large language model inference by using a small draft model to propose tokens, which the larger model then verifies. While previous versions struggled with attention drift, EAGLE 3.1 introduces per-layer normalization and a post-norm feedback loop to stabilize performance. This upgrade means up to twice the acceptance length and throughput, depending on hardware and prompt distribution. For developers, this means more reliable and efficient LLM deployments, maintaining compatibility with existing checkpoints. Coming up, we'll explore MEMO, a modular framework that separates memory from reasoning in LLMs, offering a new way to update knowledge without modifying model parameters. ## Feature Story Introducing MEMO: a modular framework that revolutionizes how large language models handle new knowledge without altering their core parameters. Traditionally, LLMs become static post-pretraining, unable to update as the world evolves. Retraining these models is costly, and fine-tuning risks losing previously learned information. Enter MEMO, developed by researchers from the National University of Singapore, MIT CSAIL, A*STAR, and SMART. This approach separates memory from reasoning, using a dedicated MEMORY model to internalize new knowledge while keeping the main EXECUTIVE model unchanged. MEMO addresses the limitations of existing methods like retrieval-augmented generation, which struggles with cross-document reasoning, and parametric methods that are computationally expensive and prone to catastrophic forgetting. By decoupling memory updates from the base model, MEMO offers a robust solution for continual learning without degrading existing knowledge. This separation allows for more flexible and transferable knowledge integration across different LLMs. In practical terms, MEMO enables developers to update a model's knowledge base without the need for extensive retraining, making it a cost-effective and efficient solution for keeping AI systems current. As AI continues to advance towards Artificial General Intelligence, frameworks like MEMO are crucial for overcoming the static nature of traditional LLMs, paving the way for more adaptable and intelligent systems. For AI practitioners, MEMO represents a significant step forward in managing and updating AI knowledge bases, offering a new paradigm for integrating and reasoning with new information. As we look to the future, MEMO's modular approach could become a standard in AI development, providing a scalable and efficient method for maintaining up-to-date AI systems. Stay tuned as we continue to explore the latest advancements in AI tools and technologies.
    Afficher plus Afficher moins
    3 min
  • Design a Complete Multimodal RLVR Pipeline with Open-MM-RL, Vision-Language Prompting, Reward Scoring — 2026-05-26
    May 26 2026
    ## Short Segments OmniVoice Studio offers a local, open-source alternative to ElevenLabs for voice AI tasks. Today, we'll explore how this desktop application enables voice cloning, video dubbing, and more without relying on cloud servers. And coming up, we'll dive into designing a complete multimodal reinforcement learning pipeline with Open-MM-RL. OmniVoice Studio is making waves as a local, open-source alternative to ElevenLabs. This desktop application allows users to perform voice cloning, video dubbing, real-time dictation, and more, all without sending data to external servers. Unlike ElevenLabs, which charges between $5 and $330 per month and processes audio files through cloud servers, OmniVoice Studio runs entirely on your local machine. It supports over 600 languages and uses zero-shot learning for voice cloning, meaning it can replicate a voice from just a three-second audio clip. Additionally, the application offers a dictation widget that streams transcription via WebSocket and auto-pastes results into any focused app on macOS. For those seeking privacy and cost-effectiveness in voice AI, OmniVoice Studio presents a compelling option. ## Feature Story Designing a complete multimodal reinforcement learning pipeline is now within reach with Open-MM-RL. This tutorial guides users through leveraging the TuringEnterprises/Open-MM-RL dataset for multimodal reasoning and reinforcement learning with verifiable rewards. The process begins by loading and inspecting the dataset, analyzing its schema, domains, formats, and visualizing examples from each domain. Users can build a lightweight reward function that evaluates model outputs by checking exact, numeric, fractional, LaTeX, and symbolic answers. This function provides a robust way to assess the accuracy of model predictions. Furthermore, the tutorial covers formatting prompts for vision-language models and testing them with SmolVLM on sample examples. Finally, the dataset is exported into a GRPO-style structure, setting the stage for future multimodal reinforcement learning training. The significance of this development lies in its ability to streamline the creation of multimodal RL pipelines. By providing a structured approach to dataset analysis and reward function creation, Open-MM-RL simplifies the process for researchers and developers. This is particularly relevant in the context of recent advancements in vision-language models, such as VLM-R1, which have demonstrated the potential of reinforcement learning to enhance reasoning capabilities. These models leverage rule-based reward formulations to achieve precise and stable reward computation, a concept that Open-MM-RL builds upon. For practitioners, the immediate implication is clear: Open-MM-RL offers a practical foundation for developing sophisticated multimodal RL systems. By following the tutorial, users can efficiently set up a pipeline that integrates vision-language prompting, reward scoring, and GRPO export. This not only accelerates the development process but also enhances the reliability of the resulting models. As the field of multimodal AI continues to evolve, tools like Open-MM-RL will play a crucial role in advancing research and application. Looking ahead, the focus will likely shift towards refining these pipelines and exploring new domains where multimodal RL can be applied effectively.
    Afficher plus Afficher moins
    4 min
  • WorkOS Releases auth.md: An Open Agent Registration Protocol Built on OAuth Standards — 2026-05-25
    May 25 2026
    ## Short Segments Today, we're diving into a major shift in how AI agents authenticate and operate online. WorkOS has introduced auth.md, a new open protocol designed to streamline agent registration using OAuth standards. This development could redefine how agents interact with web services, moving beyond traditional human-centric authentication methods. ## Feature Story WorkOS has unveiled auth.md, an open agent registration protocol built on OAuth standards, aiming to revolutionize how AI agents authenticate and operate on the web. Traditionally, web authentication has been designed with the assumption that a human is behind the browser, clicking buttons, filling out forms, and verifying emails. However, this model falls short when it comes to AI agents, which are increasingly performing tasks like writing code, opening pull requests, and updating records autonomously. Currently, the workaround for agent registration involves providing agents with raw API keys or session tokens. This method is fraught with issues, as these credentials are often unscoped, difficult to audit on a per-session basis, and challenging to revoke selectively. WorkOS's auth.md proposes a structured alternative to this problem. Auth.md is essentially a small Markdown file that an application publishes at a well-known location, typically a URL like "https://service.com/auth.md". This file serves as a guide for agents on how to register with the service, detailing supported flows, available scopes, and how credentials are issued, audited, and revoked. The beauty of auth.md lies in its dual functionality: it acts as documentation for human developers and as a runtime artifact that agents can read programmatically. Agents can fetch the auth.md file, read the structured sections, select the appropriate flow, and register without human intervention. This process is facilitated by a two-hop discovery mechanism. The machine-readable source of truth resides at a well-known path, which promotes the resource and points to the Authorization Server. The Authorization Server metadata includes the necessary blocks for agent registration. This development is particularly significant in the context of the growing role of AI agents in enterprise environments. As AI agents transition from single-user desktop demos to enterprise production, they face the challenge of multi-user, multi-system delegated authorization. Security architects and AI engineers are tasked with ensuring that every agent action is treated as a delegated user action, maintaining a clean audit trail and explicit consent. The introduction of auth.md aligns with ongoing efforts to extend OAuth for AI agents, as seen in recent IETF drafts. These drafts propose mechanisms for AI agents to act on behalf of users with explicit consent, addressing the current lack of clarity in audit trails when agents perform actions on behalf of users. Moreover, auth.md complements other initiatives like the System for Cross-Domain Identity Management (SCIM) for AI, which aims to standardize the provisioning and deprovisioning of AI agents across various applications. Together, these developments are laying the groundwork for a more secure and efficient ecosystem for AI agents. In practical terms, auth.md could significantly enhance the security and manageability of AI agents in enterprise settings. By providing a clear and structured method for agent registration, it reduces the risk of unauthorized access and simplifies the process of auditing and revoking credentials. This is a crucial step forward as AI agents become more integrated into critical infrastructure and workflows. Looking ahead, the adoption of auth.md and similar protocols could lead to a more standardized approach to AI agent authentication, making it easier for organizations to deploy and manage these agents at scale. As the landscape of AI continues to evolve, developments like auth.md will be key to ensuring that security and efficiency keep pace with innovation. That's all for today's episode of Impact Vector. Stay tuned for more insights into the latest AI tools and technologies. Until next time!
    Afficher plus Afficher moins
    4 min
  • Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys — 2026-05-24
    May 24 2026
    ## Short Segments NVIDIA's Gated DeltaNet-2 introduces a new linear attention layer that decouples erase and write operations, enhancing memory management in AI models. Today, we'll explore how this innovation improves performance and what it means for developers. Later, we'll dive into Microsoft's Webwright, a terminal-native web agent framework that significantly boosts task performance. But first, let's break down NVIDIA's latest release. NVIDIA AI has unveiled Gated DeltaNet-2, a linear attention layer that separates erase and write operations in the Delta Rule, addressing a key bottleneck in memory management. This model, trained on 100 billion FineWeb-Edu tokens, outperforms its predecessors like Mamba-2 and Gated DeltaNet across various benchmarks. By decoupling the active memory edit into two channel-wise gates, Gated DeltaNet-2 allows for more precise control over memory updates, enhancing both speed and efficiency. This development is particularly significant for developers working with large-scale AI models, as it offers a more efficient way to manage memory without compromising on performance. The practical consequence is a more streamlined process for handling complex data sets, making it easier to implement advanced AI solutions in real-world applications. ## Feature Story Microsoft Research's Webwright framework redefines web automation by using a terminal-native approach, significantly improving task performance. Unlike traditional web agents that operate one action at a time, Webwright allows agents to write and refine Playwright code, offering a more flexible and efficient method for web interactions. This shift from a stateful browser session to a terminal environment enables agents to launch, inspect, and discard browsers while focusing on code and logs in the local workspace. This approach mirrors how developers create Robotic Process Automation scripts, allowing for reusable and adaptable solutions. Webwright's architecture consists of three core components: a Runner, a Model Endpoint, and a terminal Environment, totaling just over a thousand lines of code. This simplicity and efficiency make it accessible for developers looking to integrate AI-driven web automation into their workflows. The framework's ability to score 60.1% on the Odysseys benchmark, a significant improvement from the base GPT-5.4's 33.5%, highlights its potential to transform how web tasks are automated. For developers, this means a more robust toolset for creating and deploying web agents, ultimately leading to faster and more reliable automation solutions. As AI continues to evolve, frameworks like Webwright will play a crucial role in bridging the gap between AI capabilities and practical applications, offering new possibilities for innovation and efficiency in web-based tasks.
    Afficher plus Afficher moins
    3 min