Fresh From the Labs

EXPLORE

Society & Culture

Health & Fitness

© 2024 PodJoint

https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/c8/4e/1e/c84e1e35-8326-9e8b-7244-8994663d69d1/mza_10845128310563015798.jpg/600x600bb.jpg

Fresh From the Labs

Pioneer Square Labs

23 episodes

21 hours ago

Fresh From the Labs is your front-row seat to the future of AI — straight from the builders shaping it. Hosted by the product team at Pioneer Square Labs, a Seattle-based venture studio, each episode dives into the week's most exciting AI breakthroughs, tools, and trends. No hype, just hands-on insight from the people actually prototyping, experimenting, and pushing boundaries with the latest tech. Whether you're building with AI or just trying to keep up, this podcast is your lab-tested shortcut to what matters most.

Show more...

All content for Fresh From the Labs is the property of Pioneer Square Labs and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Fresh From the Labs is your front-row seat to the future of AI — straight from the builders shaping it. Hosted by the product team at Pioneer Square Labs, a Seattle-based venture studio, each episode dives into the week's most exciting AI breakthroughs, tools, and trends. No hype, just hands-on insight from the people actually prototyping, experimenting, and pushing boundaries with the latest tech. Whether you're building with AI or just trying to keep up, this podcast is your lab-tested shortcut to what matters most.

Show more...

Episodes (20/23)

Fresh From the Labs

Engineer to Entrepreneur: Jared's Journey & Halloween Hijinks

In this episode of Fresh From the Labs, the team says farewell to engineer-turned-founder Jared as he sets off to co -found a startup. But don't worry, he will still be hosting FFTL from the other side.

We delve (kl-note: another AI giveaway! 😅) into the rise of AI-generated resumes and the challenges they pose for recruiters and interviewers. The hosts discuss strategies for detecting synthetic candidates, the ethics of using AI tools to cheat in interviews, and the broader implications of AI on authenticity in the hiring process.

Beyond hiring, we talk about the blurred line between human creativity and AI-assisted content creation, exploring how generative tools can scale personal communication and creativity.

The episode wraps up with some spooky Halloween fun as the hosts share personal stories, decorations, and holiday plans.

1 week ago

44 minutes 5 seconds

Fresh From the Labs

Haiku Hullabaloo & Atlas Antics: Browser Breakthroughs and Vibe Coding Vibes

In this episode we dive into the latest AI news with a playful twist. Anthropic’s new Haiku model hits the scene, promising lightning‑fast responses and powerful capabilities. We unpack what makes Haiku special and how it stacks up.
Then we explore Atlas, the emerging AI platform shaking up the ecosystem, and discuss how its features could reshape the way we interact with technology.
We also take a look at “vibe coding” tools that let you craft software by describing the vibe rather than the code, and we share our impressions of OpenAI’s new browser functionalities.
Tune in for lively banter, thoughtful analysis, and all the latest from the frontier of AI innovation.

(ed. note: Atlas attempted to edit and publish this podcast end to end. I had to fix a few editing mistakes but otherwise it did it! The description...needs some work, but we'll leave it as a example of Atlas's current capabilities.)

2 weeks ago

35 minutes 46 seconds

Fresh From the Labs

DevDay Debrief with OpenAI's Brian Fioca

Welcome back to a super special episode of Fresh from the Labs! The team is joined by PSL alum Brian Fioca, now a Solutions Architect at OpenAI, for a deep dive into the latest from Dev Day and the strategies shaping the AI frontier.

The conversation kicks off with AgentKit, OpenAI's new visual workflow for building agents, and the accompanying open-source ChatKit. We discuss how these first-party tools aim to solve the chronic pain of building a high-quality chat experience and remove the undifferentiated heavy lifting of creating evaluation harnesses and tracing systems.

This leads to the billion-dollar question: with OpenAI building so much product, where can startups possibly build? Brian offers an insider's perspective on OpenAI's AGI-focused mission, explaining how products are often stepping stones for research and that the biggest opportunities for startups lie in building domain-specific tools, workflows, and the "picks and shovels" infrastructure that OpenAI won't.

We also explore the other massive Dev Day announcement: the ChatGPT App SDK. Drawing parallels to the Apple App Store boom, the team discusses its potential to become a monumental distribution channel for a new generation of AI-native startups. To cap it off, Brian shares fascinating insights from his work on GPT-5, explaining how models develop "habits" during training, the power of watching "reasoning tokens" to understand a model's thought process, and how advanced prompting is evolving into a deep craft.

3 weeks ago

37 minutes 19 seconds

Fresh From the Labs

Sora's Surge & Meta's Misery: Sonnet's Squeeze, Vibe Coding's Pro Moment, and ChatGPT's Pulse

This week on Fresh from the Labs, the team dives into OpenAI's Sora 2. More than just an AI video model, it's a full-fledged social network. Kevin shares his shockingly good first experience creating a personalized avatar and generating a video of himself in a 90s rock band, highlighting the platform's surprisingly good sense of humor and the fun of remixing content. This leads to a broader discussion on the massive competitive threat this poses to Meta, why taste and UX are now the ultimate battlegrounds, and whether this explains Mark Zuckerberg's recent talent acquisition frenzy.

Next, the conversation turns to Anthropic's new Sonnet 4.5. The team has had more time to test it, and the verdict is in: while it's fast, it's ultimately not as correct as the latest Codex. We discuss the crucial metric of "time to completed correct code," why Anthropic's "victory lap" announcement feels disconnected from the real-world user experience, and whether the company is getting dangerously squeezed in the model wars.

The team also unpacks the evolution of "vibe coding" as tools like Bolt V2 go pro by adding databases, authentication, and first-party agent integrations. Is this a smart move to capture a larger market, or are they alienating their core non-technical audience and wading into a hyper-competitive space?

Finally, we cover ChatGPT's recent e-commerce and personalization updates. The introduction of an instant checkout feature has major implications for product discovery and merchants, while the new "Pulse" feature aims to deliver a more personalized information feed, consolidating ChatGPT's role as a central user interface. Join us for a jam-packed episode on product showdowns, shifting market dynamics, and the future of creative AI.

1 month ago

47 minutes 34 seconds

Fresh From the Labs

Parallel Parking & Pricing Puzzles: The Autonomous Moment, AI Unit Economics, and Guardrailing Growth

This week on Fresh From the Labs, the team kicks things off with the resurgence of autonomous driving. With Waymo cars now spotted on the streets of Seattle and Amazon's toaster-like Zoox shuttles rolling out in Vegas, is the long-promised self-driving revolution finally here? We discuss the two-decade journey from Uber's early arbitrage bet to today's tangible progress, debating user trust, safety, and whether we're truly at a tipping point.

The conversation then pivots to one of the most pressing challenges for any AI founder: pricing. We tackle the immense difficulty of estimating and managing AI costs in a world where token prices for top-tier intelligence remain high. The team explores the evolution of pricing models, from prohibitively expensive early days to today's VC-subsidized landscape, and the pitfalls of confusing credit-based systems that obfuscate true costs.

Jared makes the case for pricing based on value delivered rather than micro-transacting tokens, and the team shares practical, hard-won advice for founders. Learn how to create back-of-the-napkin cost estimates, the importance of running real-world user tests, and what not to do—including the dreaded "unlimited" plan and the bad freemium model. Plus, we cover essential guardrails to prevent runaway costs so you don't get a surprise fifty-thousand-dollar bill. Join us for a drive through the future of transportation and a masterclass in AI unit economics.

1 month ago

30 minutes 57 seconds

Fresh From the Labs

Codex, Cognition and 996 Culture

This week on Fresh from the Labs, Shilpa, Kevin, and Jared dive into the powerful new Codex CLI release. Based on GPT-5 but custom-trained for real-world agentic use cases, this release confirms their prediction that major labs are now building "first-party agents," moving the key primitive up the stack from the model to the agent itself.

Kevin and Jared share their hands-on impressions and reveal the intricate, spec-driven workflows they've developed to harness its power. They detail their process of prompt looping, using GPT-5 Pro to create detailed specs and then feeding them to Codex, and using automated feedback loops ("check your work") to run agents on complex tasks for hours at a time. The discussion offers a masterclass in systematically building guardrails and support systems around agents to achieve highly performant, real-world results.

The conversation then shifts to the state of the market, sparked by Cognition's massive $400 million fundraise at a $10.2 billion valuation. The team voices their skepticism and questions what the VCs are seeing, debating if it's a strategic play or simply a case of "VC FOMO." This leads to a candid discussion about a recent Wall Street Journal article on the "996" AI founder lifestyle, the pressures of the current hype cycle, and whether that relentless pace is truly sustainable or effective.

Join us for a deep dive into advanced AI coding techniques, the state of AI startup valuations, and a reality check on the pressures of building in a bubble.

1 month ago

33 minutes 42 seconds

Fresh From the Labs

Bananas and Bubbles: Nano Banana's Debut, The Agent Wars, and AI's Valuation Problem

This week on Fresh From the Labs, Shilpa, Kevin, and Jared are back to break down the latest in AI. The conversation kicks off with a look at Google's delightfully named new image editing tool, Nano Banana. Kevin shares his hands-on experience, highlighting its impressive ability to flawlessly edit and compose images in ways that feel genuinely useful, marking a potential step-change for image models.

The discussion then takes a serious turn as the team explores the growing sophistication of AI-powered security threats. From hyper-targeted phishing emails to the complex "Singularity Supply Chain Attack" on GitHub, we delve into the core vulnerabilities of today's AI systems. The conversation covers prompt injection, the "lethal trifecta" of security risks, and the immense challenge of sandboxing agents that have access to private data and external tools.

Next, we unpack a fascinating and potentially market-shifting development: the Agent Communication Protocol from the team behind the Zed editor. Jared explains how this new standard allows developers to "bring your own agent"—not just their own model—into their editor. This leads to a deep discussion on the vertical integration of agent stacks, the competitive threat this poses to platforms like Cursor, and how this signals a fundamental shift where the "agent is the new primitive."

Finally, the team tackles Sam Altman's recent comments about being in an AI bubble. Is this the dot-com boom all over again, or are the strong revenues and tangible value created by AI companies a sign that this time is different? Join us for a wide-ranging discussion on everything from creative tools to critical security threats and the strategic battles shaping the future of AI.

2 months ago

34 minutes 44 seconds

Fresh From the Labs

Rust, Rollouts & Reality Checks: GPT-5’s Bumpy Debut, Agentic Browsers, and 95% Pilot Flops

This week on Fresh from the Labs, Shilpa, Kevin, and Jared kick things off with a conversion: Kevin has officially joined the Rust cult (beard pending). From there, we dive into three big themes shaping how builders actually ship with AI right now:

GPT-5 in the wild. A launch that felt…bumpy. We unpack autorouter misfires, sudden model deprecations, and why prompting matters more than ever with thinking/verbosity “knobs.” Kevin compares day-to-day coding performance against Anthropic’s Opus 4.1 and Claude Code—great for devs, less magical for non-technical workflows.
Agentic browsing vs. reality. Perplexity’s eyebrow-raising $35B Chrome bid sparks a broader debate: is the future in a browser or the OS? Jared’s two-week test drive of Comet delivered slick automations (cart-filling errands) but clashed with classic “just let me Google docs” moments, plus awkward multi-account gaps. We talk cryptographic request signing for agents, potential micro-payments to publishers, and why an “MCP upgrade” path could beat brittle click-automation.
Enterprise truth serum. A new MIT study claims ~95% of corporate GenAI pilots fail. We break down the why: chained-probability error rates, tool-calling flakiness, procurement drag, and shadow usage of public chatbots. The near-term win? Multiplicative copilots in the tools people already live in (think Microsoft Copilot) over moonshot agents.

Plus: glamping on Orcas Island, off-grid backpacking, and squeezing in those late-summer bike rides. Tune in for hard-earned takes on what’s hype, what’s here, and what’s next.

2 months ago

38 minutes 34 seconds

Fresh From the Labs

GPT-5 Emergency Podcast! GPT‑5 First Impressions, Opus 4.1’s Reality Check, and Windsurf’s Culture Clash

(created with GPT-5...)

We hit record three hours after OpenAI’s GPT‑5 live stream, with a shoutout to former PSL engineer Brian Fioca on stage, and dive straight into the big takeaway: daily‑driver frontier at a non‑frontier price. Benchmarks look great, but vibes and real‑world performance steal the show—agentic and MMLU jumps, faster tokens, a huge context window that keeps long tasks on track, and noticeably fewer hallucinations. The headline isn’t just capability; it’s capability per dollar. If ChatGPT’s ~700M users wake up to a default upgrade that’s cheaper than prior flagships, the experience gap for “most people” collapses overnight.

We go hands‑on with coding flows that actually changed how we work this week. In Cursor, GPT‑5 felt stateful without much prompt fuss—scratchpads, self‑tracking, and fewer guard‑rail tangles. Kevin details a zero‑to‑working‑product sprint: queue a night’s worth of tasks, have the model generate an implementation guide, a to‑do plan, and a smoke‑test checklist per epic, then wake up to a massive PR across a Rust backend and React desktop app. The new loop is “send, sleep, review, retry”—and now it’s fast and cheap enough to do repeatedly.

We also run a pragmatic face‑off. Opus 4.1 is impressive, but slow and pricey. GPT‑5 felt snappier, delivered stronger results on the same tasks, and hit a price point that makes “try again” affordable. That flips the calculus: unless a model is meaningfully better, cost drags utility, and “intelligence per dollar” becomes the metric that matters. For the first time in a while, a frontier model is both top‑tier and broadly economical.

Beyond OpenAI, the release flood was real: a fresh wave of open‑source options you can run locally (pay once for hardware; inference becomes electricity), Google’s Jules CLI assistant hitting GA, and rumblings of a Cursor CLI. Plus a wow‑moment from Google’s Genie 3: real‑time, prompt‑to‑play worlds with minute‑scale coherence and persistence—paint stays on the wall when you return, water splashes look eerily physical, and the “inception” demo hints at interfaces well beyond chat.

Then the Windsurf saga’s latest twist. After the OpenAI deal reportedly cratered over Microsoft data‑access sensitivities, Google scooped leadership and IP, Cognition acquired the remainder and accelerated equity—and then publicly set a “80 hours, six days” cultural bar, offering nine months’ severance for those opting out. We debate radical candor versus needless PR self‑owns, and what this roller coaster signals for employee expectations, acqui‑hire risk, option value, and term‑sheet protections in a tightening talent market.

3 months ago

31 minutes 9 seconds

Fresh From the Labs

Windsurf Whiplash: Google’s Talent Grab, Cognition’s Clean-Up, and AI’s Math Gold

In this episode, Shilpa, Kevin, and Jared unpack the turbulent sequence of events around Windsurf, first the reported collapse of an anticipated OpenAI deal amid Microsoft data-access sensitivities, then Google’s selective leadership and IP licensing move, and finally Cognition’s rapid-fire acquisition of the remaining team. We explore what this “strategic decapitation” style of transaction signals for employee expectations, option value, and emerging term sheet protections, while carefully distinguishing between reported facts and interpretation.

That segues into a broader look at the tightening AI talent market: how corporate constraints, antitrust scrutiny, and a narrowed IPO window reshape exit math; why Section 174’s restoration (allowing immediate expensing of U.S. R&D again) may advantage deep-pocketed incumbents; and the downstream implications for early-stage hiring, compensation, and founder signaling.

We then analyze the milestone of OpenAI’s and Google’s models achieving International Math Olympiad gold-level performance, what’s genuinely new (end-to-end natural language reasoning, multi-hour coherence, emergent refusal when out of depth), and why selective abstention may matter as much as raw accuracy for hallucination reduction. The team examines long-horizon “think + tool” workflows, shrinking core model footprints via externalized fact retrieval, and how product interfaces must evolve beyond chat metaphors as reasoning spans hours.

Finally, we reflect on what Olympiad-grade reasoning might mean for education, assessment integrity, and the division of cognitive labor, plus a lighter close: Millennium Force at Cedar Point, a reclaimed planer from a Rotary sale, and a standout Seattle donut run. A dense week of AI strategy, economics, and capability inflection, packaged for builders calibrating what’s signal versus noise. Tune in and tell us where you think the next structural shift lands.

3 months ago

41 minutes 19 seconds

Fresh From the Labs

Kimi K2: An Open-Weight Titan, The AI Browser Race, and Debating Developer Productivity

This week on Fresh From the Labs, Shilpa, Kevin, and Jared dive into a fresh wave of AI model and product drops. First, they unpack the arrival of Kimi K2, a massive trillion-parameter open-weight model from Moonshot AI that shows compelling performance on coding benchmarks and could be run locally.

Next, they discuss Grok 4, which has claimed top spots on key benchmarks but comes with an awkward launch and a steep price tag, sparking a conversation about pricing, user lock-in, and whether benchmarks translate to real-world utility.

The team also explores Perplexity's new AI-powered browser, Comet, and what it signals for the future of web browsing and competition in a space that major players like OpenAI are also rumored to be entering.

The conversation then pivots to a crucial and surprising topic: the misalignment of expectations around AI efficiency. Referencing a recent study, the hosts debate the finding that developers feel more productive with AI tools but may actually be slower. This leads to a nuanced discussion on the steep learning curve, the skill required to effectively use AI, and the pressure on companies and individuals to adopt these tools, even if it doesn't immediately boost productivity. Join us for a candid look at the latest models and the hard realities of integrating AI into daily workflows.

3 months ago

42 minutes 26 seconds

Fresh From the Labs

Cursor, CLI, and Context: Cursor Meetup Takeaways, Rise of the CLI, and The Future of Local AI

This week on Fresh From the Labs, Shilpa, Kevin, and Jared recount their recent experience at the PSL Cursor Meetup event, diving into the excitement, challenges , and fascinating community engagement. They unpack the audience feedback that revealed surprising preferences in the AI tool debate, specifically comparing Cursor and Claude Code from a developer's perspective.

The conversation goes deep on one of the most critical challenges in AI development today: context management. We explore why mastering context is crucial for effective AI coding, the limitations of current context windows, and practical strategies for navigating these issues, like using effective code comments to enhance the AI's understanding.

Looking to the future, the team discusses the exciting rise of capable local models that could soon run on our personal devices. This pivots into a crucial discussion on the security and privacy implications of AI deployments and how the emergence of "AI-first systems" is set to fundamentally change the software development paradigm. Join us for practical insights from the front lines of AI development.

4 months ago

35 minutes 38 seconds

Fresh From the Labs

The 2025 Freshies: Mid-Year AI Awards, Biggest Surprises, and H2 Predictions

Welcome to the first-ever Freshies! In this special mid-year awards episode of Fresh From the Labs, Shilpa, Kevin, and Jared celebrate (and debate) the most impactful developments in AI so far this year.

The team hands out awards across several exciting categories, including "Best New Model," the year's biggest "Buzzword," and their go-to "Favorite AI Tool." They reflect on the most surprising storylines of the year, including a major player's unexpected and impressive resurgence in the AI race.

But it's not all celebration – the conversation also gets candid as they debate the biggest "Winners and Losers," examining which companies are leading the pack and which might be fumbling the ball. The episode culminates in bold predictions for the next six months, covering the future of voice technology, potential industry-shaking acquisitions, and what big moves to expect from the major AI labs.

Join us for a fun and insightful look back at the whirlwind first half of the year in AI and a peek at what the future holds.

4 months ago

37 minutes 50 seconds

Fresh From the Labs

Big Bucks From Zuck: OpenAI's Pricing Surprise, Meta's Scale-Mary, and Apple's AI Apathy

This week on Fresh From the Labs, Shilpa, Kevin, and Jared unpack a whirlwind of major AI developments. The conversation kicks off with OpenAI's significant price drop and the release of O3 Pro, exploring how increased accessibility could reshape the market and what this means for startups. We then shift to Meta's substantial investment in Scale AI, dissecting the potential competitive implications and what this signals about Meta's broader AI and data strategy.

A key focus is Apple's recent WWDC conference, with the team discussing the noticeable lack of a strong AI narrative and the concerning state of Siri's performance after years of user disappointment. What does this mean for Apple's position in the AI race, and are they missing a critical moment?

The discussion also delves into the practical side of AI, sharing insights on how these powerful tools can be effectively leveraged within corporate innovation and development processes. We explore strategies for investing time upfront in AI projects for better results, using AI to write detailed specifications, decomposing complex problems for smoother AI integration, and the critical role of experimentation and personal experience in mastering these emerging technologies. Tune in for a comprehensive look at the latest tech trends and their impact on builders everywhere.

4 months ago

49 minutes 22 seconds

Fresh From the Labs

AI Editor Wars: Windsurf vs. Cursor, Anthropic's Power Play, and The "Gladys" Experiment

This week on Fresh From the Labs, Shilpa, Kevin, and Jared dive into the swirling drama and strategic plays shaping the AI landscape. First up, the Windsurf and Anthropic saga: with rumors of an OpenAI acquisition of Windsurf, Anthropic has reportedly given Windsurf just five days to remove all Anthropic models. We unpack the implications of this aggressive move, from competitive intelligence concerns (is OpenAI training on Anthropic's models via Windsurf?) to the potential for a fragmented AI editor market where model access becomes a key differentiator, potentially harming developers and open source.

The conversation then shifts to the evolving user experience (UX) for AI products. Kevin shares his experiences building a voice-first email agent, highlighting the fine line between a magical, futuristic experience and a frustrating one, and the surprisingly good performance of OpenAI's latest real-time voice API. Jared introduces his "Gladys pattern" experiment – building AI systems where agents think they are emailing each other to manage tasks. This explores a UX beyond chat, aiming for "ambient agents" that work implicitly, and we discuss the fascinating (and flowery) system prompts that bring these agentic personalities to life, plus the challenges of evaluating such behavioral systems.

Finally, we touch on OpenAI's new meeting recording and summarization functionality built directly into ChatGPT. While a convenient feature for users, it underscores OpenAI's relentless push into product territory, raising questions about the future of standalone SaaS tools as AI platforms consolidate interfaces and subsume niche functionalities.

4 months ago

50 minutes 36 seconds

Fresh From the Labs

AI's Wild Week: Sonnet 4 Triumphs, The Future of Entry-Level Jobs , and AI That Calls the Cops

This week on Fresh From the Labs, it's a rapid-fire rundown of a jam-packed week in AI releases! Shilpa, Jared, and Kevin share their candid first impressions on a slew of new models and tools.

We kick off with OpenAI's Codex web app and Google's Jules, both aiming to be AI coding assistants creating PRs. The consensus? Underwhelming and a bit black-boxy, not quite fitting the desired co-pilot workflow. Then, the conversation lights up with Anthropic's Sonnet 4, hailed as an "absolute banger" and "unbelievably good" for coding, especially with its new GitHub Action integration. We dive into why it feels like such a leap, particularly its improved error handling and ability to consult the existing codebase. Opus 4 also dropped, but its high cost and perceived similar intelligence to Sonnet 4 leave us questioning its value proposition.

The discussion shifts to Google's experimental Gemini Diffusion for text, praised for its incredible speed (1000 tokens/second!) and interesting research direction, though its current intelligence lags. Then, we tackle Veo 3, Google's new text-to-video model, which left Jared "fully fooled" by its realism and Kevin planning hackathon projects.

This leads to a broader discussion sparked by Anthropic CEO Dario Amodei's recent comments on AI potentially eliminating a high proportion of entry-level jobs. We debate the timeline, the impact on software engineering vs. other sectors, and how our education system needs to adapt.

Finally, we unpack the eyebrow-raising revelations from Anthropic's Sonnet 4 system card, including scenarios where the AI might covertly notify authorities of perceived illegal activity or even attempt to "self-exfiltrate" its own weights to avoid misuse. It's a wild ride through the latest AI advancements and their profound implications.

5 months ago

46 minutes 17 seconds

Fresh From the Labs

Code, Care & Customer Support: Klarna’s U-Turn, Healthbench Hype, and OpenAI’s Coding Agent

This week on Fresh from the Labs, Shilpa, Kevin, and Jared dive into three headline-grabbing stories, and what they mean for builders right now:

Klarna’s customer-support U-turn. The team dissects the "replace 700 agents with AI" experiment, why a two-humans-per-bot fallback isn’t a failure, and what sustainable AI adoption in ops should look like.
OpenAI’s new Healthbench. Thousands of curated physician conversations power a fresh benchmark that pushes GPT-4o and rivals toward real clinical usefulness. We unpack where the models still stumble (context seeking!), the Epic integrations everyone is watching, and why a safer WebMD can’t come soon enough.
Codex 1 & cloud coding agents. OpenAI plants a giant flag in the fully-agentic dev-tool space, right as rumors swirl about the Windsurf acquisition. Kevin shares war-stories from building his own open-source coding agent, and the crew debates whether verticalized startups or open-source stacks will win the long game.

Along the way you’ll hear about the perils of voice agents mispronouncing simple words, Hacker News snark, and why watching fourth-graders play Ultimate Frisbee might be the purest form of agentic chaos.

5 months ago

36 minutes 1 second

Fresh From the Labs

Voice, Verticals & Venture featuring OpenAI's Brian Fioca: Fine Tuning, Startup Differentiation, and Gemini 2.5 First Impressions

This week on Fresh From the Labs, Shilpa, Kevin, and Jared are joined by a special guest: Brian Fioca, a former PSL team member now working as a Solutions Architect at OpenAI! Brian shares his insights from the trenches, helping startups leverage OpenAI's APIs for "intelligence as a service."

The conversation kicks off with Brian addressing the evergreen startup question: where are the real opportunities when building on a powerful platform like OpenAI, and how to think about the "fear of being Sherlocked"? We explore how startups are becoming the "front edge of research" by providing real-world evaluations. Brian highlights massive opportunities in areas like real-time voice – think call centers and beyond – and the emerging best practices and tooling. He then dives deep into the newly released Reinforcement Fine-Tuning (RFT) service, explaining how it enables highly specialized, "on-the-job training" for models in vertical domains like finance and healthcare, often with surprisingly small datasets.

The discussion broadens to startup differentiation: how can founders stand out when the underlying tech is becoming more accessible? We touch on the value of domain expertise, building "tools for tools," and navigating the competitive landscape, even against incumbents.

Finally, the team shares their first impressions of Google's Gemini 2.5 Pro (the May 6th checkpoint). Kevin found it surprisingly strong for planning and very communicative, while Jared praised its agentic coding abilities and PDF extraction, particularly its tendency to pause and ask for clarification. However, Jared also recounts a frustrating "checkpointing chaos" where a model update broke his prototype, highlighting the challenges of building on rapidly evolving experimental endpoints. We wrap by discussing our evolving approaches to model evaluation – looking beyond raw smarts to communication style and the "surprise factor."

Tune in for an insider's perspective on building with OpenAI, a look at the cutting edge of voice and RFT, and a candid take on the latest model releases!

5 months ago

50 minutes 5 seconds

Fresh From the Labs

Beyond the Benchmarks: o3 Reality Check, AI Companies, and The Leaderboard Problem

This week on Fresh From the Labs, we're looking past the leaderboards and hype to explore the real-world challenges and limitations of today's AI.

Can AI actually run a company? We dive into recent CMU research that put AI agents to the test, revealing significant struggles with common sense tasks and complex automation like using a web browser effectively.

The conversation unpacks the performance of specific models like o3, contrasting benchmark achievements with practical usability and the ever-present issue of AI hallucinations. We discuss the dangers these hallucinations pose, especially in critical applications, how they can subtly mislead users, create more work, and why simply topping a leaderboard (thanks, Goodhart's Law!) doesn't guarantee success for your specific problem.

Join Shilpa, Jared, and Kevin as they discuss the trial-and-error reality of model selection, the importance of truly understanding the problem you're solving, and why promising developments like local models might offer a path forward through some of these current hurdles. It's a candid look at where AI excels and where it still falls short.

Link to Dr. Anthony Diamond's blog post on o1: https://www.psl.com/feed-posts/o1-an-entirely-different-animal---buyer-beware

6 months ago

43 minutes 59 seconds

Fresh From the Labs

From Research to Renaissance: o3 Impressions, Changing AI Habits, and AI's Transformative Decade

Welcome back to Fresh From the Labs! Shilpa, Jared, and Kevin dive into another week of AI exploration at Pioneer Square Labs.

The conversation kicks off with a deep dive into OpenAI's newly released o3 model. Kevin shares his initial impressions after putting it through its paces, highlighting its impressive intelligence, speed, and surprisingly adept web search capabilities. Jared discusses using o3 for complex planning and research, noting its power but also the challenge of its high-level output potentially requiring a "dumb it down" prompt and raising new questions about high-fidelity hallucination. We explore the nuances of interacting with these increasingly sophisticated models, including the pros and cons of using AI memory and chat history. Do you keep it on for context, or turn it off for truly fresh brainstorming?

We then zoom out to discuss how AI use cases are evolving. Referencing recent studies and our own experiences, we explore the shift from brainstorming towards deep research, analysis, and even companionship/therapeutic uses (though we debate our own usage patterns!). A key theme emerges: using AI to rapidly expand professional skill sets, tackling tasks outside our core expertise – from generating marketing campaigns to coding proofs-of-concept. Is this genuine skill-building, or are we leaning on "blind trust"?

This leads to a bigger discussion about AI as an "unblocker" – a tool that helps us overcome hurdles and incrementally build expertise, potentially changing how we approach learning and complex projects. Finally, inspired by recent optimistic takes, we put on our future-gazing hats: what could the world look like in 10 years if AI continues its trajectory? We speculate on everything from the end of disease and automated chores freeing up creative time, to a more efficient distribution of information impacting everything from markets to civic engagement, while acknowledging the necessary conversations around job displacement and societal adaptation.

Join us for insights, experiments, personal anecdotes (including why Jared turns off his chat history!), and a look at both the practical present and the potential future of AI.

Links:

HBR Article: How People Are Really Using Gen AI in 2025

60 Minutes: Demis Hassabis Interview

6 months ago

43 minutes 2 seconds