EP 18 - Computer Use, Agent Teams, and Spec-Driven Coding

https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/c8/99/7c/c8997cbf-7008-e83c-f5b8-219e6da91445/mza_4479149704352098345.jpg/600x600bb.jpg

The Build - Ai dev and product show.

Cameron Rohn and Tom Spencer

19 episodes

2 days ago

Weekly deep dives on the most interesting dev, ai and product releases, research updates and emerging trends in the AI engineering, agent development and software industry.

Technology

RSS

All content for The Build - Ai dev and product show. is the property of Cameron Rohn and Tom Spencer and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Weekly deep dives on the most interesting dev, ai and product releases, research updates and emerging trends in the AI engineering, agent development and software industry.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43735253/43735253-1748210934824-470cb6c68d93c.jpg

EP 18 - Computer Use, Agent Teams, and Spec-Driven Coding

The Build - Ai dev and product show.

1 hour 39 minutes 32 seconds

2 weeks ago

EP 18 - Computer Use, Agent Teams, and Spec-Driven Coding

Key Points

Why “computer use” agents matter for real form-filling and CRUD tasks
Live look at Gemini 2.5 Computer Use, Browserbase/Playwright, and WebVoyager-style tasks
Operator vs Claude/Gemini computer use: accuracy, speed, and safety guardrails (CAPTCHAs, consent, impersonation)
Where computer use fits vs MCP tools, local OS access, and classic scraping
Veo 3.1 API update: reference images and start/end frames for consistent video
Claude Code Plugins & community marketplaces (sub-agents, tools, slash commands)
GitHub “Spec Kit” and spec-driven workflows for coding at scale
Cerebras inference speed vs quality tradeoffs; why speed sometimes beats depth
Local rigs and training: DGX Spark use cases and limits
Karpathy’s NanoChat: small-scale train-your-own chat model and cost envelope
“Agent Universe” demo: NAICS-led industry mapping → value flows → agent blueprints
Architecture questions: vertical vs horizontal agents, data layer, tool connectors (HubSpot, Procore, Google Workspace)

A focused walkthrough of today’s agentic stack in practice. The episode tests Gemini 2.5 “computer use” for real browser tasks, compares it with Operator and Claude, and breaks down safety guardrails and why screenshot-loop agents remain slow. It covers where computer use fits alongside MCP and OS tools, then shifts to Veo 3.1’s new API features for reference-guided video. On the coding side, it explores Claude Code Plugins and community marketplaces, plus GitHub’s Spec Kit for spec-driven development on large codebases. The discussion touches Cerebras for ultra-fast inference, DGX Spark for local experiments, and Karpathy’s NanoChat for training compact chat models. It closes with the “Agent Universe” demo: mapping industries via NAICS, generating value-flow diagrams, and turning stages into deployable agent roles, with open questions on architecture, tools, and handoff into real systems.