Bijwerken

YAAP (Yet Another AI Podcast)

AI21

Uitgebracht: 2025-06-10

Gratis New

2 afleveringen

Audio

Gratis New

2 afleveringen

Audio

Uitgebracht: 2025-06-10

Meest recente aflevering

The Hard Truths About AI Agents: Why Benchmarks Lie and Frameworks Fail

Tijd: 39:54

Afspelen

Building AI agents that actually work is harder than the hype suggests — and most people are doing it wrong. In this special "YAAP: Unplugged" episode (a live panel from AI Tinkerers meetup at the Hugging Face offices in Paris), Yuval sits down with Aymeric Roucher (Project Lead for Agents at Hugging Face) and Niv Granot (Algorithms Group Lead at AI21 Labs) for an unfiltered discussion about the uncomfortable realities of agent development.

Key Topics:

Why current benchmarks are broken: From MMLU's limitations to RAG leaderboards that don't reflect real-world performanceThe tool use illusion: Why 95% accuracy on tool calling benchmarks doesn't mean your agent can actually planLLM-as-a-judge problems: How evaluation bottlenecks are capping progress compared to verifiable domains like codingFramework: friend or foe? When to ditch LangChain, LlamaIndex, and why minimal implementations often work betterThe real agent stack: MCP, sandbox environments, and the four essential components you actually needBeyond the hype cycle: From embeddings that can't distinguish positive from negative numbers to what comes after agentsFrom FIFA World Cup benchmarks that expose retrieval failures to the circular dependency problem with LLM judges, this conversation cuts through the marketing noise to reveal what it really takes to build agents that solve real problems — not just impressive demos.

Warning: Contains unpopular opinions about popular frameworks and uncomfortable truths about the current state of AI agent development.

Aflevering-ID: 1000712269923

GUID: b285ed06-e3cc-41e7-aa81-fc894f86c75d

Releasedatum: 10-6-2025 15:00:00

Beschrijving

YAAP brings you practical conversations with the people actually building generative AI solutions. No hype, no sales pitches, just honest discussions about challenges, solutions, and lessons learned.

Listen to developers and engineers share what works, what doesn't, and what they wish they'd known sooner. Simple, useful insights for anyone working with AI — hosted by AI21's Yuval Belfer.

Feed-URL

https://feeds.transistor.fm/yaap-by-ai21

Apple Podcasts: Recensies van klanten

Geen item

Verkrijgbaar via iTunes. Product prijzen en de beschikbaarheid zijn nauwkeurig als van 13-6-2025 08:31:05 en zijn onderhevig aan verandering. Elke prijs en beschikbaarheid informatie weergegeven op iTunes op het moment van aankoop van toepassing op de aankoop van dit product. Apple, het Apple logo, Apple Music, iPad, iPhone, iPod, iPod touch, iTunes, iTunes Store, iTunes U, Mac en OS X zijn handelsmerken van Apple Inc., die zijn gedeponeerd in de Verenigde Staten en andere landen. Apple Books, App Store en Mac App Store zijn servicemerken van Apple Inc. IOS is een handelsmerk of geregistreerd handelsmerk van Cisco in de VS en andere landen en wordt onder licentie gebruikt. QR Code is een geregistreerd handelsmerk van Denso Wave Incorporated. Alle andere handelsmerken, logo's en auteursrechten zijn eigendom van hun respectievelijke eigenaars.