Couverture de Ep.10 Are benchmarks broken?

Ep.10 Are benchmarks broken?

Ep.10 Are benchmarks broken?

Écouter gratuitement

Voir les détails

3 mois pour 0,99 €/mois

Après 3 mois, 9.95 €/mois. Offre soumise à conditions.

À propos de ce contenu audio

In this episode, we’re lucky to be joined by Alexandre Sallinen and Tony O’Halloran from the Laboratory for Intelligent Global Health & Humanitarian Response Technologies to discuss how large language models are assessed, including their Massive Open Online Validation & Evaluation (MOOVE) initiative.

0:25 - Technical wrap: what are agents?

13:20 - What are benchmarks?

  • 18:20 - Automated evaluation

  • 20:10 - Benchmarks

  • 37:45 - Human feedback

  • 44:50 - LLM as judge

Read more about the projects we discuss here:

  • Meditron

  • Learn about the MOOVE or contact our team if you'd like to be involved
  • Listen to the LiGHTCAST including their recent excellent outline of the HealthBench paper

More details in the show notes on our website.

Episodes | Bluesky | info@medicalattention.ai

Les membres Amazon Prime bénéficient automatiquement de 2 livres audio offerts chez Audible.

Vous êtes membre Amazon Prime ?

Bénéficiez automatiquement de 2 livres audio offerts.
Bonne écoute !
    Aucun commentaire pour le moment