
Key Points
Why “computer use” agents matter for real form-filling and CRUD tasks
Live look at Gemini 2.5 Computer Use, Browserbase/Playwright, and WebVoyager-style tasks
Operator vs Claude/Gemini computer use: accuracy, speed, and safety guardrails (CAPTCHAs, consent, impersonation)
Where computer use fits vs MCP tools, local OS access, and classic scraping
Veo 3.1 API update: reference images and start/end frames for consistent video
Claude Code Plugins & community marketplaces (sub-agents, tools, slash commands)
GitHub “Spec Kit” and spec-driven workflows for coding at scale
Cerebras inference speed vs quality tradeoffs; why speed sometimes beats depth
Local rigs and training: DGX Spark use cases and limits
Karpathy’s NanoChat: small-scale train-your-own chat model and cost envelope
“Agent Universe” demo: NAICS-led industry mapping → value flows → agent blueprints
Architecture questions: vertical vs horizontal agents, data layer, tool connectors (HubSpot, Procore, Google Workspace)
A focused walkthrough of today’s agentic stack in practice. The episode tests Gemini 2.5 “computer use” for real browser tasks, compares it with Operator and Claude, and breaks down safety guardrails and why screenshot-loop agents remain slow. It covers where computer use fits alongside MCP and OS tools, then shifts to Veo 3.1’s new API features for reference-guided video. On the coding side, it explores Claude Code Plugins and community marketplaces, plus GitHub’s Spec Kit for spec-driven development on large codebases. The discussion touches Cerebras for ultra-fast inference, DGX Spark for local experiments, and Karpathy’s NanoChat for training compact chat models. It closes with the “Agent Universe” demo: mapping industries via NAICS, generating value-flow diagrams, and turning stages into deployable agent roles, with open questions on architecture, tools, and handoff into real systems.