Learning Bayesian Statistics

Épisodes

#162 Bayesian Hydrology & GPU AI, with Christopher Krapu

Jul 28 2026

Support & Resources
→ Support the show on Patreon
→ Bayesian Modeling Course (first 2 lessons free)

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work
Takeaways:
Q: How does putting a Gaussian process on unknown coordinates fix noisy location data in mineral prospecting?
A: In mining and geostatistics, the classic Gaussian process model, known there as kriging, assumes you know exactly where each sample was taken. Chris’ project broke that assumption on purpose: the recorded coordinates for each core sample were only accurate to within a rough radius. By treating the true locations as latent variables and putting a Gaussian process over them jointly with the measurements, the model could still reconstruct the underlying gold-concentration field, even though the exact sampling locations were never known precisely. It's a demonstration that Gaussian processes can absorb structural uncertainty that looks, at first glance, like it should make the problem impossible.
Q: What is "Poverty Bayes," and what did it cost to train a two-million-parameter Bayesian model?
A: Poverty Bayes was Chris’ experiment in seeing how cheaply a large Bayesian model could be trained using modern cloud infrastructure. He fit a hierarchical logistic regression with close to two million parameters, using PyMC's Hamiltonian Monte Carlo on a single A100 GPU rented through Modal, a serverless platform that deploys a Python script straight to GPU hardware with almost no setup. He'd originally guessed it would cost around five dollars, the price of a Big Mac, but the real bill came in an order of magnitude lower. A model that would take a Gibbs sampler weeks to run, and that once required a research lab's dedicated GPU, now costs pocket change and a few minutes of setup.
Q: What's the current bottleneck in Bayesian-at-scale tooling?
A: Chris argues the software has largely caught up: PyMC's JAX backend and NumPyro make GPU-accelerated Bayesian modeling work out of the box for most problems. What's missing is common knowledge. Companies are clearly running large Bayesian models in production, but the results stay behind corporate firewalls. Chris’ proposal is a community benchmark effort: which frameworks handle a million-parameter Markov random field on a given GPU out of the box, since this kind of expensive, slow-running benchmark is a poor fit for standard CI pipelines but valuable for the field to know.
Chapters:
22:57 When does GPU acceleration actually pay off for a Bayesian model?
26:33 What did it cost to train a two-million-parameter model on Modal?
30:36 What happened when Chris asked 200 different LLMs to flip a coin?
34:50 Where do Bayesian ideas show up in the agentic AI systems Chris builds at Nvidia?
40:16 Are statisticians being made obsolete by large language models?
41:19 How does putting a Gaussian process on unknown coordinates fix noisy data in mineral prospecting?
58:05 What is Chris looking forward to working on next?
Thank you to my Patrons for making this episode possible!
Links from the show here

Afficher plus Afficher moins

1 h et 5 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
The Next Step Beyond LLMs: Foundation Models for Inference

Jul 22 2026

Today's clip is from episode 161, featuring Luigi Acerbi. In this conversation, Luigi explains one of the biggest engineering bottlenecks facing transformer-based probabilistic models—and how his group found a way around it.

The core challenge is that many inference models treat data as an unordered set, making them naturally permutation invariant. That's statistically elegant, but computationally painful: every time a new data point arrives, the model has to recompute attention over the entire dataset from scratch, preventing the kind of KV caching that makes modern language models so efficient.
Luigi walks through his team's solution: a hybrid architecture that keeps the original context fully set-based while introducing a causal-attention buffer for newly arriving data. The result is dramatically faster inference- up to 100× faster in some settings - opening the door to applications like reinforcement learning, active data acquisition, and, ultimately, Luigi's long-term vision of a foundation model for Bayesian inference.
Get the full discussion here
Support & Resources
→ Support the show on Patreon
→ Bayesian Modeling Course (first 2 lessons free)

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work

Afficher plus Afficher moins

6 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
#161 Amortized Inference & Neural Processes, with Luigi Acerbi

Jul 16 2026

Support & Resources
→ Support the show on Patreon
→ Bayesian Modeling Course (first 2 lessons free)

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work

Takeaways:
Q: What is Variational Bayesian Monte Carlo (VBMC) and how is it different from Bayesian optimization?
A: VBMC borrows the machinery of Bayesian optimization but aims at a different target. Bayesian optimization fits a Gaussian process surrogate to an expensive function and uses it to hunt for the optimum. VBMC instead treats the log-posterior as the function to model, evaluates it at a few carefully chosen points, and keeps the whole reconstructed shape rather than just its peak. That gives you the full posterior, not a single best-fit value. Where MCMC might need tens of thousands to millions of evaluations, VBMC often reconstructs a good posterior approximation from a few hundred, which matters when each evaluation is slow.

Q: When should you reach for PyVBMC, and when is it the wrong tool?
A: Two symptoms tell you PyVBMC might help. First, speed: if a single evaluation of your log density takes on the order of a second, running MCMC over tens of thousands of evaluations becomes painful, and PyVBMC's few-hundred-evaluation budget pays off. Second, dimensionality: because it leans on a Gaussian process surrogate, it works well up to roughly 10 to 15 parameters and degrades beyond that. If your model already runs fine in Stan or PyMC, you do not need it. It shines for expensive, low-dimensional models common in science and engineering, where you are modeling a process rather than composing nice distributions.
Full takeaways here
Chapters:
00:18:13 What is Variational Bayesian Monte Carlo (VBMC) and how does it differ from Bayesian optimization?
00:30:21 When should you use VBMC versus BADS in practice?
00:31:20 What is Bayesian Adaptive Direct Search (BADS) and how does its hybrid optimization strategy work?
00:39:18 What are neural processes, and why are transformers a natural neural process architecture?
00:45:54 What is the Amortized Conditioning Engine (ACE) and what problem does it unify?
00:55:42 What do PriorGuide and the new autoregressive buffer paper solve for amortized inference?
01:02:03 How does the new autoregressive buffer speed up predictions in transformer probabilistic models?
01:06:11 What is Luigi Acerbi's vision for a foundation model for inference?
01:09:26 What is ALINE and how does it add active data acquisition to amortized inference?
01:12:43 How does Luigi Acerbi connect LLM agents, Bayesian decision theory, and the nature of intelligence?
01:18:44 For a PyMC, Stan, or NumPyro user, where should you start with VBMC, BADS, or BayesFlow?
Thank you to my Patrons for making this episode possible!
Links from the show here

Afficher plus Afficher moins

1 h et 32 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Bayesian Statistics vs Epistemology, with Vaden Masrani

Jun 29 2026

Support & Resources
→ Support the show on Patreon
→ Bayesian Modeling Course (first 2 lessons free)

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work
Takeaways:
Q: What's the difference between Bayesian statistics and Bayesian epistemology?
A: Bayesian statistics uses Bayes' theorem on actual data: you put a prior over parameters, combine it with a likelihood, and the data is allowed to tell you your model is wrong. Vaden loves it. Bayesian epistemology, in his tongue-in-cheek phrase, is "Bayesian statistics minus the statistics" - taking Bayes' theorem as a general account of how anyone should reason under uncertainty, including about events where there is nothing to count. The first is falsifiable and grounded; the second, he argues, lets people attach authoritative-sounding numbers to pure belief.

Q: Why is it a problem to put a probability on a one-off future event like human extinction?
A: Because there are no statistics behind it. Vaden's trigger example is Toby Ord's The Precipice, where a data-derived probability (supervolcanoes per millennium) is placed side by side with a probability of extinction-by-superintelligence that came from no data at all. His reaction is the statistician's first instinct: where are the numbers coming from, and what could ever make them come out differently? A subjective degree of belief is fine as a hunch. The trouble starts when it is communicated as though it were an objective, data-grounded frequency.

Q: What does Vaden Masrani actually like about Bayesian statistics?
A: The freedom to encode domain knowledge as a prior and have the result respect common sense - estimating an average human height, you can rule out zero and a hundred feet before seeing a single measurement. But the part he keeps stressing is falsifiability: you fit the model, compare it to data, and the data can tell you the model was bad. That contact with reality is exactly what makes the statistics legitimate and what the epistemology lacks. On Bayesian-versus-frequentist for engineering problems, he says he has no dog in the fight -- both are useful, and any working statistician uses both.

Full takeaways here
Chapters:

00:24:01 What's the difference between Bayesian statistics and Bayesian epistemology?
00:33:12 How can Bayesian epistemology lead to bad real-world decisions?
00:36:36 Is Bayesian or frequentist statistics better for real-world problems?
00:39:31 What is the problem of induction, and how does Bayesian epistemology try to solve it?
00:43:50 What are the main logical problems with Bayesian epistemology?
00:48:40 What is Popper's critical rationalism, and how does falsifiability fit in?
00:52:31 How does critical rationalism work when you can't run a clean experiment?
01:15:03 Why should you treat criticism as a gift, even when it hurts?
01:19:54 How do Stoicism and equanimity help you handle criticism?
01:23:19 Why does critical rationalism apply to everyday life, not just science?
Thank you to my Patrons for making this episode possible!
Links from the show here

Afficher plus Afficher moins

1 h et 41 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Why Bayesian Statistics Is More Computational Than Ever

Jun 19 2026

Today's clip is from Episode 158 featuring Stefan Radev. In this conversation, Alex Andorra and Stefan break down a core argument from their paper: Bayesian statistics has never been more computational than it is now, and simulation is the thread that ties the whole workflow together.

Stefan parcellates the Bayesian workflow into four stages, and this clip covers the first two. Stage one is model specification, where the workflow community has long recommended prior predictive checks. You can do this informally, just running simulations from your model and eyeballing whether the output meets your expectations, or formally, à la Michael Betancourt, by pushing your model's high-dimensional output through a transformation into a low-dimensional, interpretable space and checking it against reality.

The punchline: a surprising number of models can be discarded before you've even seen real data, yet Stefan notes these checks remain underused in practice.

Stage two is model verification, where the question shifts to whether your inferences are well calibrated. This is the territory of simulation-based calibration and parameter recovery studies, classic tools that have always carried a steep computational price. You simulate thousands of synthetic datasets and run inference on every single one, which is exactly why these checks are so often skipped in papers, even though doing one well can be a contribution in its own right.

Here's where amortized simulation-based inference changes the math entirely. Checks that used to take days now take seconds, and instead of laboriously running inference dataset by dataset, you get millions of posterior samples essentially for free. The calibration checks that the field has always known it should be doing finally become cheap enough to actually do.
Get the full discussion here
Support & Resources
→ Support the show on Patreon
→ Bayesian Modeling Course (first 2 lessons free)

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work

Afficher plus Afficher moins

5 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Exact GPs vs Approximations: When to Use Each (and Why It Matters)

Jun 10 2026

Today's clip is from episode 159 featuring Matthijs Hollanders. In this conversation, Alex and Matthijs dig into a deceptively practical question: when you're modeling wildlife across space and time with Gaussian Processes, how do you keep the math from becoming computationally unbearable - and what does good engineering actually look like in the field?

Matthijs explains that for most real camera trapping datasets, exact GPs still hold up fine. The reason is less about clever math and more about ecological reality: researchers are usually resource-constrained, so datasets tend to be a few hundred sites, not thousands.

And when datasets do get large, they're rarely one giant connected grid - they're clusters of independent regions. That structure is exploitable. Run a separate, smaller GP per region, share the hyperparameters, and you avoid building the massive covariance matrix that makes exact GPs expensive in the first place.

But the more interesting thread is where this is heading. Alex introduces Hilbert Space Gaussian Processes (HSGPs) - an approximation that makes compute time nearly linear in dataset size, rather than cubic. The catch, as Matthijs points out, is that approximations aren't always better: if your dataset isn't large enough to be in the regime where the approximation accuracy kicks in, you're better off with the exact GP and its mathematical guarantees. The rule of thumb is simple - if you can use the vanilla GP, just do it.
Get the full discussion here
Support & Resources
→ Support the show on Patreon
→ Bayesian Modeling Course (first 2 lessons free)

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work

Afficher plus Afficher moins

4 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
#159 Bayesian Occupancy Models, with Matthijs Hollanders

Jun 8 2026

Support & Resources
→ Support the show on Patreon
→ Bayesian Modeling Course (first 2 lessons free)

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work
Takeaways:
Q: What is a Bayesian occupancy model and what problem does it solve?
A: An occupancy model accounts for the fact that you don't always detect a species when surveying for it, especially when the species is rare. A naive count of where you found it underestimates true occupancy. The model adds a repeated-measures component: you visit each site multiple times, and from the pattern of detections vs. non-detections it estimates a detection probability. Matthijs framed it as a zero-inflation structure where the zero-inflation happens at the site level rather than the observation level -- which keeps the model conceptually simple, just a standard GLM with a Bernoulli “is the species here at all?” stacked on top of a detection-rate process.

Q: What are Automated Recording Units and why don't traditional occupancy models handle them well?
A: ARUs are camera traps and acoustic monitors that record continuously over deployment periods of days, weeks, or months. The data they produce isn't a sequence of discrete human-led surveys; it's a continuous-time observation stream. Traditional occupancy models were designed for the discrete case -- a human visits a site, records yes or no, goes home. With ARUs, the question becomes how to bin or threshold the continuous data without losing the richer signal it actually contains.
Q: When should you not reach for occARU?
A: When your dataset is large and your survey interval is fine-grained. The bottleneck is Stan's fitting speed -- years of daily count data across many sites will fit slowly. The workaround is to bin coarser (weekly or monthly), which doesn't hurt occupancy estimation at all and only loses some detection-rate resolution. If you're only interested in occupancy, big grouping windows are fine.
Full takeaways here
Chapters:
00:12:14 What is an occupancy model and what problem does it solve?
00:16:16 What are Automated Recording Units and why do they need different models?
00:18:45 What is the occARU R package and why does it exist?
00:23:55 Why does occARU model counts directly rather than binary detection?
00:26:38 What does multi-species hierarchical modeling with Gaussian processes look like?
00:32:22 How does occARU implement Gaussian processes efficiently?
00:41:01 Why are Gaussian processes such a powerful but tricky modeling tool?
00:44:11 What is variance decomposition with global-local shrinkage priors?
00:49:02 How does occARU leverage recent Stan features for zero-sum constraints?
00:57:37 When does within-chain parallelization actually help?
01:01:30 How does Monte Carlo integration reduce high Pareto-k values?
01:15:27 When does occARU underperform and what's on the roadmap?
Thank you to my Patrons for making this episode possible!
Links from the show here.

Afficher plus Afficher moins

1 h et 26 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Can AI Learn What Experts Know? Automating Prior Elicitation with Generative Models

Jun 2 2026

Today's clip is from episode 158 featuring Stefan Radev. In this conversation, Alex and Stefan explore a genuinely fascinating problem: how do you turn an expert's intuition into a mathematically valid prior distribution - and can AI help automate that process?

Alex explains that prior elicitation is essentially a translation problem. Experts don't walk around thinking in probability distributions - their knowledge lives in intuitions, rules of thumb, and rough ranges. The challenge is converting that into something a Bayesian model can actually use.

The traditional approach? Ask an expert for quantiles or a mean, then parameterize your prior with hyperparameters and simulate until the model-implied quantities match what the expert described. If your pipeline is differentiable end-to-end, you use gradient descent. If not, you fall back to something like Bayesian optimization. Either way, you're iterating toward a prior that genuinely reflects expert knowledge - not just a convenient assumption.

But the really exciting part is what came next. In a follow-up paper, they pushed this further: instead of optimizing within a fixed parametric family (say, a Gaussian), they replaced the prior entirely with a normalizing flow - a flexible generative network - and ran the same procedure. No assumed distribution family. Just let the data and the expert's knowledge shape the prior from scratch.

The catch? More flexibility means more non-identifiability and stability headaches. But the direction is clear: a fully automated, end-to-end pipeline for building priors from non-probabilistic expert knowledge. And in 2026, that pipeline could theoretically be driven by an agent.

Get the full discussion here
Support & Resources
→ Support the show on Patreon
→ Bayesian Modeling Course (first 2 lessons free)

Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work

Afficher plus Afficher moins

5 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement

Épisodes

#162 Bayesian Hydrology & GPU AI, with Christopher Krapu

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

The Next Step Beyond LLMs: Foundation Models for Inference

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

#161 Amortized Inference & Neural Processes, with Luigi Acerbi

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Bayesian Statistics vs Epistemology, with Vaden Masrani

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Why Bayesian Statistics Is More Computational Than Ever

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Exact GPs vs Approximations: When to Use Each (and Why It Matters)

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

#159 Bayesian Occupancy Models, with Matthijs Hollanders

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Can AI Learn What Experts Know? Automating Prior Elicitation with Generative Models

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast