Episode 11: Agentkit

https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/24/d4/a4/24d4a407-d544-f33c-0f16-49d556222f57/mza_17262048424009967100.jpg/600x600bb.jpg

Before The Commit

Danny Gershman, Dustin Hilgaertner

13 episodes

1 week ago

🧠 Before The Commit: What happens before code is written matters more than ever. Join engineers from SC2S, C2S, and secure DoD factories as they explore AI-powered dev, shifting threat surfaces, and real-world workflows. No hype, no hot takes — just honest, tactical insight for leaders who know that pre-commit is too late. Hosted live by Danny Gershman & Dustin Hilgaertner. Secure or Sus? Let’s find out.

Technology

RSS

All content for Before The Commit is the property of Danny Gershman, Dustin Hilgaertner and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/44033863/44033863-1752004425161-c1ba27a4d2e0e.jpg

Episode 11: Agentkit

Before The Commit

1 hour 24 minutes 11 seconds

3 weeks ago

Episode 11: Agentkit

The main focus is OpenAI's Agent Kit, dubbed a potential "N8N killer." Agent Kit includes Agent Builder, a drag-and-drop interface for creating agentic workflows, inspired by N8N but with enterprise features like guardrails (e.g., hallucination detection via vector stores, PII moderation, jailbreak prevention). It supports branching, human-in-the-loop approvals, and widgets for custom HTML/CSS templating (e.g., styling travel itineraries). Chat Kit embeds these workflows into apps or websites with branding, though locked to OpenAI models. Users can generate SDK code for customization, enabling porting to other frameworks like LangChain. Evaluations allow A/B testing prompts and tracking metrics. Limitations include no Python dropdown for complex transforms (stuck with Sem-like language) and immaturity compared to N8N's openness (e.g., no air-gapping, model agnosticism). Hosts see it as a no-code tool for non-engineers, boosting OpenAI model consumption, while vertically integrated tools like Claude Code excel due to tailored agents and workflows.

Broader discussion critiques LLM commoditization: models like Grok seem smarter, but tools like Cursor or Claude Code integrate better (e.g., file editing, diffs, semantic search, Git). Vertical integration is key—Anthropic's Claude Agent SDK (renamed from Code SDK) powers diverse agents beyond coding (e.g., research, video). Hosts argue IP lies in agent suits (tools, prompts, evals) over base models. They note competitors: Google's Jules, Grok's rumored Code Flow, Meta's DevMate, Anthropic's Claude, Amazon's Kiro. AI enhances non-coding tasks like document editing with "filters" for cross-cutting changes, outpacing tools like Google Docs or Word's Copilot. Google's struggles highlight big tech's challenges in paradigm shifts.

In "Newz or Noize," they cover AMD's rise: OpenAI's investment (up to 10% stake, 6GW compute), Oracle deploying 50,000 AMD chips—creating a money loop (OpenAI-AMD-Oracle). Broadcom partners with OpenAI for custom AI chips (shares up 10%). Hosts discuss supply chain vulnerabilities: rare earth minerals (China's restrictions spiking stocks), potential U.S. deals abroad. Vertical integration advantages (e.g., Google's TPUs) emphasized. California's new law mandates AI chatbots disclose they're non-human to prevent harm (e.g., suicide from bot relationships), but critics fear overreach (e.g., AI-derived content disclaimers). A Senate Democrat report proposes a "robot tax" on firms automating jobs (potentially 100M lost in U.S. over 10 years, e.g., fast food, trucking, accounting), to offset displacement; Republicans warn it advantages China/Russia. Hosts debate: AI creates jobs via productivity (historical parallels like agriculture), though disruption needs safety nets; no net job loss proven yet.

The "KiLLM Chain" segment explores LLM side-channel attacks: exploiting indirect paths (e.g., caching, memory) without direct breaches. Examples include prompting to leak hospital records or code snippets (e.g., past Cloud Code vulnerabilities). Attacks use clever prompts, timing, weak validation, over-reliance on context. Mitigations: proper guardrails, segmentation (e.g., dedicated LLMs, air-gapping like GovCloud), avoiding cross-user caching/memory. Even cloud LLMs (Bedrock, OpenAI) need proxies; businesses add their own layers but must secure boundaries to prevent lateral data leaks.

Episode wraps urging deeper dives into Agent Kit and Claude SDK, teasing future AI supply chain coverage.