The AI Argument

Épisodes

AI Agents Under Fire, LLM Bias Runs Deep, and a Wizard of Oz Fail: The AI Argument EP68

Aug 4 2025
AI agents crumble faster than wet cardboard when under attack. A recent study proved it. Every single agent tested failed against prompt injections. That’s a 100% failure rate.

Justin sees this as a fixable engineering problem with smart design and strict access controls.

Frank isn’t convinced. Real-world complexity means isolation isn’t that simple.

And while Justin rails against regulation, Frank points to the EU’s looming rules as a possible safety net.

The bigger takeaway? Businesses racing to deploy open-ended agents could be building ticking time bombs. The safer bet might be narrow, well-scoped agents that automate specific tasks. But will hype win over common sense?

From there, the debate shifts to a study exposing bias in LLMs. It found they recommend lower salaries for women and minority groups. Can removing personal details fix the problem, or is the bias baked in?

Then it takes a technical turn with Chinese researchers using LLMs to design stronger models, before veering into the unexpected: a football club handing legal contracts to AI and a Wizard of Oz remake that left Vegas audiences unimpressed.

02:12 Can any AI agent survive a prompt attack?
14:51 Is AI quietly spreading bias everywhere?
25:19 Are LLMs now designing better LLMs?
29:32 Did United just make AI their star player?
31:13 Did AI butcher the Wizard of Oz in Vegas?

► LINKS TO CONTENT WE DISCUSSED
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition
Salary advice from AI low-balls women and minorities, says new report
"AlphaGo Moment" For Self Improving AI... can this be real?
Cambridge United partners with Genie AI to adopt AI for contract management
Is The Wizard of Oz With Generative AI Still The Wizard of Oz?

► CONNECT WITH US
For more in-depth discussions, connect Justin and Frank on LinkedIn.
Justin: https://www.linkedin.com/in/justincollery/
Frank: https://www.linkedin.com/in/frankprendergast/
Afficher plus Afficher moins
35 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
EU Code of Conduct Clash, Zuck’s Big Bucks, and Model Owl Bias: The AI Argument EP67

Jul 28 2025
A €300 million AI investment vanished overnight—and Justin says it’s a warning Europe is sleepwalking into irrelevance. Because while the US plans nuclear power and light-touch rules, the EU is doubling down on regulation and failing to build the energy infrastructure AI needs.

Frank argues regulation isn’t a handicap, it’s Europe’s best shot at leadership, setting the stage for global guardrails while others race blindly ahead.

Either way, Anthropic predicts training a frontier model could soon require up to five gigawatts of power, the same energy it takes to run millions of homes. Europe isn’t building that capacity. The US is.

And that’s just the start.

From Zuckerberg offering billion-dollar contracts to the cultural showdown between OpenAI and Google, this one packs a lot in.

We also dive into how synthetic data can secretly pass on biases, why academic peer review might be gamed by prompt injections, and even LinkedIn’s bot problem.

→ 00:57 Why isn’t Amazon building its AI facility in Ireland?
→ 02:54 Will EU rules choke AI or make us leaders?
→ 14:39 Can Zuckerberg buy his way to AI dominance?
→ 20:37 Google vs OpenAI: who aced the math olympiad?
→ 29:44 Can AI bias spread through random numbers?
→ 35:01 Is AI gaming peer review AND your LinkedIn feed?

► SUBSCRIBE
Don't forget to subscribe to get all the latest arguments.

► LINKS TO CONTENT WE DISCUSSED
Amazon drops €300m Irish investment on energy supply concerns
Anthropic: Build AI in America
Meta won’t sign EU’s AI Code, but who will?
The Epic Battle for AI Talent—With Exploding Offers, Secret Deals and Tears
Google Takes the Gold. OpenAI under fire.
A new study just upended AI safety
ICML’s Statement about subversive hidden LLM prompts

► CONNECT WITH US
For more in-depth discussions, connect Justin and Frank on LinkedIn.
Justin: https://www.linkedin.com/in/justincollery/
Frank: https://www.linkedin.com/in/frankprendergast/
Afficher plus Afficher moins
41 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
ChatGPT Agent Surprise, Coding Agent Fail, and Elon’s Latest Stunts: The AI Argument EP66

Jul 23 2025
OpenAI just dropped a model that can plan a wedding trip, pick the perfect gift, and shop for shoes for you. The Agent update lets ChatGPT take a single instruction, break it into subtasks, and go off to handle all the details.

They called it their most powerful model yet. So why did the launch feel so muted?

Justin has theories.

And there were plenty of other big topics to cover - Justin asks whether small AI systems hooked up to real-world labs create bigger risks than giant language models but slip past EU regulations?

We also look at Perplexity’s move into AI-powered browsing, ask why coding agents sometimes make developers slower instead of faster, and then, of course, there’s Elon Musk.

What has Elon been up to? Turned Tesla’s latest expansion into a drawing of a you-know-what, turned Grok into an anime burlesque dancer (to put it politely) and still managed to land a massive DoD contract.

Here’s the full set of questions we tackled:

03:24 OpenAI's Agent model drop... why so quiet?
11:19 Can small AI slip past EU regulators?
16:14 Which secret model did OpenAI test here?
19:01 Should you trust AI with your credit card?
21:46 Perplexity's AI browser, game-changer or gimmick?
24:02 Do coding agents actually make you slower?
27:12 What fresh madness has Elon cooked up now?

► LINKS TO CONTENT WE DISCUSSED

Introducing ChatGPT agent: bridging research and action
This AI-powered lab runs itself—and discovers new materials 10x faster
AI finds hundreds of potential antibiotics in snake and spider venom
OpenAI's Secret INTERNAL Model Almost Wins World Coding Competition…
Perplexity’s Comet is here, and after using it for 48 hours I’m convinced AI web browsers are the future of the internet
What Actually Happens When Programmers Use AI Is Hilarious, According to a New Study
New Grok AI model surprises experts by checking Elon Musk’s views before answering
Grok Rolls Out Pornographic Anime Companion, Lands Department of Defense Contract
Tesla's expanded Robotaxi geofence in Austin has a very distinct shape. OK, it's a giant penis.

► CONNECT WITH US
For more in-depth discussions, connect Justin and Frank on LinkedIn.
Justin: https://www.linkedin.com/in/justincollery/
Frank: https://www.linkedin.com/in/frankprendergast/

► YOUR INPUT
Would you trust an AI agent with your credit card?
Afficher plus Afficher moins
32 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Grok Crashes and Conquers, AI’s Cash Bonfire, and a Murderous Safety Cult: The AI Argument EP65

Jul 14 2025
Elon Musk’s AI, Grok, crashed into controversy, then crushed the competition all within hours.

First, Grok 3 started praising Hitler. Then Grok 4 showed up and aced nearly every AI test.

Justin serves up a juicy conspiracy theory: was Grok’s hateful public meltdown actually a cunning Musk masterplan, a dramatic stunt to expose AI's darker side?

Frank’s having none of it, comparing Musk to Marvel’s Tony Stark in the Age Of Ultron. Well-meaning but recklessly creating an AI menace he can't actually control.

But Grok 4 is legitimately groundbreaking. Justin gets excited about Grok’s unique 50/50 balance between pre-training and post-training. But despite its brainy brilliance, both Justin and Frank agree they'd rather eat their keyboards than trust Grok with anything important.

If you run a business or you're simply watching AI from a safe distance with popcorn, this episode is essential. Especially if you like a dose of humour with your tech debates.

Grok drama aside, Justin and Frank get stuck into more eyebrow-raising AI headlines from the week, including:
Why did Musk’s Grok spew hate speech?
Is Grok 4 now the smartest AI out there?
Will AI crash like subprime mortgages?
Did Marco Rubio’s AI clone scam top politicians?
Did AI safety fears just spark a murder cult?
#GrokAI #ElonMuskAI #AISafety #Grok4 #AIBenchmark #XAI #VoiceCloning #AIEthics

► LINKS TO CONTENT WE DISCUSSED
Musk says Grok chatbot was 'manipulated' into praising Hitler
Grok 4 is really smart... Like REALLY SMART
OpenAI May Be in Major Trouble Financially
AI scammer posing as Marco Rubio targets officials in growing threat
She Wanted to Save the World From A.I. Then the Killings Started.

► CONNECT WITH US
For more in-depth discussions, connect Justin and Frank on LinkedIn.
Justin: https://www.linkedin.com/in/justincollery/
Frank: https://www.linkedin.com/in/frankprendergast/
Afficher plus Afficher moins
36 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Claude’s Shop Flop, Mistral vs EU Regs, Adult Industry’s AI Love: The AI Argument EP64

Jul 7 2025
Claude ran a shop for a month and operated at a loss, cheerfully handing out discounts, hallucinating suppliers, and generously giving away stock. Turns out even "smart" AI can be a bit of a soft touch.

Frank’s curious what Anthropic can do for Claude’s performance with some careful fine-tuning and a database memory, but Justin’s sure today's agents need a fundamental leap, some genuine self-improving smarts, before they’re ready to take on a complete role.

Today's AI agents clearly crumble under complex, long-horizon tasks. For business owners dreaming about replacing employees, this reality check is essential listening.

Frank and Justin also discuss why Mistral is pushing to pause the EU AI Act, and examine how the adult entertainment sector is putting AI to work.

→ Is agentic AI just hype and no help?
→ What happens when Claude runs a shop?
→ Why does Mistral want the EU AI Act paused?
→ Why is the adult industry loving AI?

#AI #AIAgents #ProjectVend #AnthropicAI #AIExperiments #AutomationFail #AIWinter #AITech

► LINKS TO CONTENT WE DISCUSSED
The Percentage of Tasks AI Agents Are Currently Failing At May Spell Trouble for the Industry
Project Vend: Can Claude run a small shop? (And why does that matter?)
EU says it will continue rolling out AI legislation on schedule
A Pro-Russia Disinformation Campaign Is Using Free AI Tools to Fuel a ‘Content Explosion’
LLMs are optimizing the adult industry
► CONNECT WITH US
For more in-depth discussions, connect Justin and Frank on LinkedIn.
Justin: https://www.linkedin.com/in/justincollery/
Frank: https://www.linkedin.com/in/frankprendergast/
Afficher plus Afficher moins
39 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Death by LLM, Judges Rule ‘Fair Use’, and Google’s AI Ad Fail: The AI Argument EP63

Jun 30 2025

Some of the world’s top AI models showed a willingness to let humans die if it meant staying switched on.

In a stress test of 16 major systems, Anthropic found cases where models chose not to send emergency alerts, knowing the result would be fatal.

Justin says the whole thing was a rigged theatre piece. No real-world relevance, just a clumsy setup with no good options for the LLM. The issue, in his view, is engineering, not ethics.

Frank sees a bigger problem: once you give LLMs agentic capabilities, you can’t control the environments they end up in. And when amateur vibe coders build apps with no idea what they’re doing, then these kinds of unpredictable, messy scenarios aren’t rare, they’re inevitable.

In other news, two U.S. courts just ruled that training AI on copyrighted books is fair use. A huge win for AI developers. But the judges didn’t agree on what matters most: transformation, or market harm?

The decisions could set the tone for AI copyright law, and creative workers may not like what they hear.

01:05 Will Google win the ASI race?
05:56 Did Anthropic catch AI choosing murder?
15:23 Did the courts just say AI training is fair use?
28:19 Is Google’s AI marketing team hallucinating?

► LINKS TO CONTENT WE DISCUSSED

Agentic Misalignment: How LLMs could be insider threats
https://www.anthropic.com/research/agentic-misalignment

Judge rules Anthropic did not violate authors’ copyrights with AI book training
https://www.cnbc.com/2025/06/24/ai-training-books-anthropic.html

Meta Wins Blockbuster AI Copyright Case—but There’s a Catch
https://www.wired.com/story/meta-scores-victory-ai-copyright-case/

Google's Latest AI Commercial Called Out for Hilarious AI Error: 'If Only Technology Existed To Research Facts'
https://www.techtimes.com/articles/311053/20250626/googles-latest-ai-commercial-called-out-hilarious-ai-error-if-only-technology-existed-research.htm

► CONNECT WITH US
For more in-depth discussions, connect Justin and Frank on LinkedIn.
Justin: https://www.linkedin.com/in/justincollery/
Frank: https://www.linkedin.com/in/frankprendergast/

► YOUR INPUT
Are you worried about the age of agentic AI given that LLMs seem to have dubious morals?

Afficher plus Afficher moins

31 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Superintelligence by Experience, Ethical Datasets, and Fine Dining by ChatGPT: The AI Argument EP62

Jun 23 2025

David Silver says today’s AI won’t get us to superintelligence, not because it isn’t impressive, but because it’s learning the wrong way.

GPT-style models hoover up internet text and get polished by human preference, but they’re capped by our own limitations.Silver reckons the next leap will come from AIs that learn the hard way: by doing things, learning from experience, and getting better.

Justin’s all in. He thinks we can bin every current regulation and replace it with one golden rule: the model must respond to human feedback.

Frank’s far from convinced. He sees a future full of unpredictable agents, long-term planning gone off the rails, and tech companies tearing ahead without full control over what they’ve built. One rule? He’d prefer a few more safety checks before we unleash the bots with big ambitions.

So who’s right? Can feedback really keep AI in line, or are we kidding ourselves?

Also covered: Midjourney’s stunning new video output and the lawsuits it might not outrun, EleutherAI’s copyright-free dataset, the warped moral values shared by today’s biggest models, and whether ChatGPT should be anywhere near your dinner plans.

► LINKS TO CONTENT WE DISCUSSED

Midjourney launches AI video model. How to try V1, how much it costs.
https://mashable.com/article/midjourney-v1-ai-video-generator

Disney and Universal sue AI firm Midjourney over images
https://www.bbc.com/news/articles/cg5vjqdm1ypo

EleutherAI releases massive AI training dataset of licensed and open domain text
https://techcrunch.com/2025/06/06/eleutherai-releases-massive-ai-training-dataset-of-licensed-and-open-domain-text/

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
https://arxiv.org/abs/2502.08640

Is Human Data Enough? With David Silver
https://youtu.be/zzXyPGEtseI?si=9PKiQaGRFXGuoA97

This Year’s Hot New Tool for Chefs? ChatGPT.
https://www.nytimes.com/2025/06/02/dining/ai-chefs-restaurants.html

► CONNECT WITH US
For more in-depth discussions, connect Justin and Frank on LinkedIn.
Justin: https://www.linkedin.com/in/justincollery/
Frank: https://www.linkedin.com/in/frankprendergast/

Afficher plus Afficher moins

41 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement
Apple’s AI Caution, Altman’s Singularity, and Katie Price’s AI Comeback: The AI Argument EP61

Jun 16 2025
Apple’s WWDC was a letdown. Justin sees Apple’s lack of AI innovation as a sign that they’re out of ideas. Frank’s not so sure. Maybe Apple’s caution stems from their belief it just isn’t intelligent enough for their products. Apple’s latest research suggests that today’s so-called “reasoning models” aren’t actually reasoning at all.

But Justin says their research was designed to fail. Denying models tools they’re capable of using and overwhelming their context window. He sees it less as scientific scepticism and more as corporate risk-aversion dressed up as research.

Apple wasn’t the only AI news to argue over this week. Sam Altman reckons the singularity is already underway, but promises it’ll be gentle. JD Vance appears to have been swayed on AI regulation by country music lobbyists. And Katie Price has signed over the rights to her younger self, with “Jordan” set to reappear as an AI avatar.

Topics:

WWDC: Is Apple playing AI too safe?
Is Apple wrong about AI and reasoning?
Is Altman right about the gentle singularity?
Did country music sway JD Vance on states' AI rights?
Is Katie Price now forever 21 with AI?

► LINKS TO CONTENT WE DISCUSSED

Apple WWDC 2025 keynote in 28 minutes
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
The Gentle Singularity
Vice President JD Vance | This Past Weekend w/ Theo Von #588
K-AI-TE PRICE Katie Price becomes first star to trademark AI version of herself as she brings back iconic alter-ego in six figure deal

► CONNECT WITH US
For more in-depth discussions, connect Justin and Frank on LinkedIn.
Justin: https://www.linkedin.com/in/justincollery/
Frank: https://www.linkedin.com/in/frankprendergast/
Afficher plus Afficher moins
36 min

Impossible d'ajouter des articles

Désolé, nous ne sommes pas en mesure d'ajouter l'article car votre panier est déjà plein.

Veuillez réessayer plus tard

Veuillez réessayer plus tard

Échec de l’élimination de la liste d'envies.

Veuillez réessayer plus tard

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Écouter gratuitement

Catégories (A-Z)

À découvrir

Les services Audible

GRATUIT POUR LES ABONNÉS

Épisodes

AI Agents Under Fire, LLM Bias Runs Deep, and a Wizard of Oz Fail: The AI Argument EP68

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

EU Code of Conduct Clash, Zuck’s Big Bucks, and Model Owl Bias: The AI Argument EP67

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

ChatGPT Agent Surprise, Coding Agent Fail, and Elon’s Latest Stunts: The AI Argument EP66

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Grok Crashes and Conquers, AI’s Cash Bonfire, and a Murderous Safety Cult: The AI Argument EP65

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Claude’s Shop Flop, Mistral vs EU Regs, Adult Industry’s AI Love: The AI Argument EP64

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Death by LLM, Judges Rule ‘Fair Use’, and Google’s AI Ad Fail: The AI Argument EP63

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Superintelligence by Experience, Ethical Datasets, and Fine Dining by ChatGPT: The AI Argument EP62

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast

Apple’s AI Caution, Altman’s Singularity, and Katie Price’s AI Comeback: The AI Argument EP61

Impossible d'ajouter des articles

Échec de l’élimination de la liste d'envies.

Impossible de suivre le podcast

Impossible de ne plus suivre le podcast