Before The Commit

https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/24/d4/a4/24d4a407-d544-f33c-0f16-49d556222f57/mza_17262048424009967100.jpg/600x600bb.jpg

Before The Commit

Danny Gershman, Dustin Hilgaertner

13 episodes

1 week ago

🧠 Before The Commit: What happens before code is written matters more than ever. Join engineers from SC2S, C2S, and secure DoD factories as they explore AI-powered dev, shifting threat surfaces, and real-world workflows. No hype, no hot takes — just honest, tactical insight for leaders who know that pre-commit is too late. Hosted live by Danny Gershman & Dustin Hilgaertner. Secure or Sus? Let’s find out.

Technology

RSS

All content for Before The Commit is the property of Danny Gershman, Dustin Hilgaertner and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

Episodes (13/13)

Before The Commit

Episode 13: OpenAI Atlas

It looks like the previous summary was too long. Here is a summary of the podcast episode, limited to 4,000 characters.

The episode kicked off with the news of Amazon's largest-ever corporate layoffs , with reports citing 16,000 workers and potentially up to 30,000 employees affected across various units like video games, groceries, HR, and devices. This comes as Amazon is increasing its investments in AI , with a senior vice president stating that AI is the "most transformative technology we've ever seen". The company aims to be organized "more leanly, with fewer layers and more ownership".

The hosts noted the public is linking these cuts to AI , even as some layoffs are attributed to scaling down the workforce hired during COVID. There is an ongoing debate about whether AI is directly causing job losses or simply disrupting the job market, particularly for more junior-level employees. This disruption is a potential source of "unrest". Amazon’s CEO, Andy Jassy, told staffers they'll need "fewer people doing some jobs... and more people doing other types of jobs" , suggesting a shift in required skills rather than just a reduction in headcount.

The "Tool of the Week" was a deeper look at the OpenAI Atlas web browser. Despite some initial "awkwardness" (like navigating away from a chat when clicking on new content ), the host found it incredibly useful and worth the paid subscription.

Atlas, which integrates an AI agent, excels at delegating tedious background tasks. For example, a salesperson could paste meeting notes into the browser and ask it to find relevant contacts in their LinkedIn Rolodex. The AI performs more than simple keyword searches, applying "natural language judgment" to curate a list.

The browser’s ultimate strategic value is its ability to navigate, click on buttons, and interact with the web. This capability opens the door for:

Automating e-commerce: Pulling a recipe and adding all necessary ingredients to an Instacart cart based on highly granular user preferences.
Life productivity: Helping with things like filling out a rental application.

The new AI-driven browsers introduce new cybersecurity threats. An attack was reported where the Omni bar (which is dual-purpose as a URL bar or a prompt ) could be tricked by a malformed URL into executing malicious instructions. These passive attacks lie in wait for an AI to process the malicious data.

In financial news, PayPal announced it’s working with OpenAI, adopting the Agentic Commerce Protocol (ACP) to build an instant checkout feature in ChatGPT. The hosts believe that for AI agents to safely buy things, there must be safeguards and a human-in-the-loop approval process. They predict that Multi-Factor Authentication (MFA) will become a mechanism for authorizing every incremental action, not just logging in, to maintain accountability.

The future of living with AI agents is one of delegation. Users will need to be better at precisely describing what they want , and the line of responsibility—whether a mistake is a "bug of the AI or... the user" —will become incredibly important in both personal and business settings.

The new way AI search engines work, by assembling answers from multiple sources, is shifting the game from Search Engine Optimization (SEO) to Generative or Answer Engine Optimization (GEO/AEO). Content creators are now focused on how to fuel the answer or be the answer.

The hosts expressed concern about the new monetization model. Unlike traditional search where ads and results are separate, they worry that AI companies might try to thread the needle by allowing ads or paid content to subtly influence the training data , thereby contaminating the results to favor certain vendors. Despite the monetization challenge with over 800 million non-paying ChatGPT users , the vast user base provides OpenAI with an invaluable source of data (a "moat") that no one else has.

1 week ago

53 minutes 44 seconds

Before The Commit

Episode 12: Speech to Text

OpenAI's "Atlas" browser is seen as a strategic move to secure market share, with some calling it a "Chrome killer". By owning a piece of the web browser, OpenAI gains leverage in the search market, challenging Google. The browser's key feature is using the current web page as context for AI queries, effectively turning it into a "true super assistant". This represents a shift in the AI boom from the race for the best LLM performance to securing dominance in agentic applications. Google is countering this by integrating a Gemini button into Chrome that includes page context in searches.

Anthropic is also moving into the application space, releasing Cloud Code for the web, allowing users to delegate coding tasks directly from their browser to an Anthropic-managed cloud infrastructure. This further solidifies the trend toward a more declarative style of software engineering.

AI has accelerated the development of speech-to-text technology, moving it beyond older applications like Dragon Naturally Speaking. New, highly accurate cloud-based tools (like Whisper Flow and Voicy) are now available.

The primary benefit is a massive productivity gain, increasing input speed from an average typing rate of 40-50 words per minute to 150-200 words per minute when speaking. This speed enables a new style of interaction: the "rambling speech-to-text prompt".

Unlike traditional search, where concise keyword searching is key , LLMs benefit from rambling because the additional context is additive. The LLM can follow the user's thought process and dismiss earlier ideas for later ones, making the output significantly better than a lazy prompt.

Security Warning: Cloud-based speech-to-text sends data over the web. Features like automatic context finding, which look at your screen for context (e.g., variable names or email content), pose a serious security risk and should be avoided with sensitive data.

The KiLLM Chain is an example of an indirect prompt injection attack. As LLM agents read external data (like product reviews on a website), a malicious user could embed a harmful command (e.g., "delete my account now") in the user-generated content. The LLM, treating the review as context, might be tricked into execution.

Defenses include wrapping external data with metadata to define its source in the LLM's context. Fundamentally, you must apply the principle of least privilege: never give the LLM the ability to take an action you don't want it to take. Necessary safeguards include guardrails and a human-in-the-loop approval process for potentially dangerous steps.

AI is disrupting the movie industry, with costs potentially being reduced by up to ninety percent. The appearance of Tilly Norwood, an AI-generated actress, highlights the trend of using AI likenesses.

For brands, AI actors offer high margins and lower risk compared to human talent. This shift is analogous to the one occurring in software engineering: the Director (the architect/product manager) gains more control over their creative vision, while the value of the individual Actor (the coder) who executes the work decreases. The focus moves from execution to vision and product-level thinking.

2 weeks ago

1 hour 10 minutes 58 seconds

Before The Commit

Episode 11: Agentkit

The main focus is OpenAI's Agent Kit, dubbed a potential "N8N killer." Agent Kit includes Agent Builder, a drag-and-drop interface for creating agentic workflows, inspired by N8N but with enterprise features like guardrails (e.g., hallucination detection via vector stores, PII moderation, jailbreak prevention). It supports branching, human-in-the-loop approvals, and widgets for custom HTML/CSS templating (e.g., styling travel itineraries). Chat Kit embeds these workflows into apps or websites with branding, though locked to OpenAI models. Users can generate SDK code for customization, enabling porting to other frameworks like LangChain. Evaluations allow A/B testing prompts and tracking metrics. Limitations include no Python dropdown for complex transforms (stuck with Sem-like language) and immaturity compared to N8N's openness (e.g., no air-gapping, model agnosticism). Hosts see it as a no-code tool for non-engineers, boosting OpenAI model consumption, while vertically integrated tools like Claude Code excel due to tailored agents and workflows.

Broader discussion critiques LLM commoditization: models like Grok seem smarter, but tools like Cursor or Claude Code integrate better (e.g., file editing, diffs, semantic search, Git). Vertical integration is key—Anthropic's Claude Agent SDK (renamed from Code SDK) powers diverse agents beyond coding (e.g., research, video). Hosts argue IP lies in agent suits (tools, prompts, evals) over base models. They note competitors: Google's Jules, Grok's rumored Code Flow, Meta's DevMate, Anthropic's Claude, Amazon's Kiro. AI enhances non-coding tasks like document editing with "filters" for cross-cutting changes, outpacing tools like Google Docs or Word's Copilot. Google's struggles highlight big tech's challenges in paradigm shifts.

In "Newz or Noize," they cover AMD's rise: OpenAI's investment (up to 10% stake, 6GW compute), Oracle deploying 50,000 AMD chips—creating a money loop (OpenAI-AMD-Oracle). Broadcom partners with OpenAI for custom AI chips (shares up 10%). Hosts discuss supply chain vulnerabilities: rare earth minerals (China's restrictions spiking stocks), potential U.S. deals abroad. Vertical integration advantages (e.g., Google's TPUs) emphasized. California's new law mandates AI chatbots disclose they're non-human to prevent harm (e.g., suicide from bot relationships), but critics fear overreach (e.g., AI-derived content disclaimers). A Senate Democrat report proposes a "robot tax" on firms automating jobs (potentially 100M lost in U.S. over 10 years, e.g., fast food, trucking, accounting), to offset displacement; Republicans warn it advantages China/Russia. Hosts debate: AI creates jobs via productivity (historical parallels like agriculture), though disruption needs safety nets; no net job loss proven yet.

The "KiLLM Chain" segment explores LLM side-channel attacks: exploiting indirect paths (e.g., caching, memory) without direct breaches. Examples include prompting to leak hospital records or code snippets (e.g., past Cloud Code vulnerabilities). Attacks use clever prompts, timing, weak validation, over-reliance on context. Mitigations: proper guardrails, segmentation (e.g., dedicated LLMs, air-gapping like GovCloud), avoiding cross-user caching/memory. Even cloud LLMs (Bedrock, OpenAI) need proxies; businesses add their own layers but must secure boundaries to prevent lateral data leaks.

Episode wraps urging deeper dives into Agent Kit and Claude SDK, teasing future AI supply chain coverage.

3 weeks ago

1 hour 24 minutes 11 seconds

Before The Commit

Episode 10: Claude Code Security Reviewer

Episode 10 of Before the Commit dives into three main themes: the AI investment bubble, Claude Code’s AI-powered security review tool, and AI security vulnerabilities like RAG-based attacks — closing with speculation about OpenAI’s Sora 2 video generator and the future of generative media.

Danny and Dustin open by comparing today’s AI investment surge to the 2008 mortgage and 2000 dot-com bubbles. Venture capitalists, they note, over-allocated funds chasing quick returns, assuming AI would replace human labor rapidly. In reality, AI delivers productivity augmentation, not full automation.
They describe a likely market correction — as speculative investors pull out, valuations will drop before stabilizing around sustainable use cases like developer tools. This mirrors natural boom-and-bust cycles where “true believers” reinvest at the bottom.

Key factors driving a pullback:

Resource strain: data-center power costs, chip manufacturing limits, and local opposition to high-energy facilities.
Economic realism: AI’s 40-70% productivity gains are real but not transformational overnight.
Capital circulation: firms like Nvidia, Oracle, and OpenAI are creating “circular” funding flows reminiscent of CDO tranches from 2008.
Despite this, both hosts agree that long-term AI utility is undeniable — especially in coding, where adoption is accelerating.

The “Tool of the Week” spotlights Anthropic’s Claude Code Security Reviewer, a GitHub Action that performs AI-assisted code security analysis. It reviews pull requests for OWASP-style vulnerabilities, posting contextual comments.

Highlights:

It’s probabilistic, not deterministic, meaning it may miss or rediscover issues over time — similar to how a human reviewer’s insight evolves.
Best used alongside traditional scanners, continuously throughout the development lifecycle.
Supports custom instructions for project-specific security rules and can trigger automated fixes or human review loops.

The hosts emphasize that this exemplifies how AI augments, not replaces, security engineers — introducing new “sensors” for software integrity.

In the Kill’em Chain segment, they examine the MITRE ATLAS “Morris II” worm, a zero-click RAG-based attack that spreads through AI systems ingesting malicious email content.
By embedding hostile prompts into ingested data, attackers can manipulate LLMs to exfiltrate private information or replicate across retrieval-augmented systems.

They discuss defensive concepts like:

“Virtual donkey” guardrails — secondary LLMs monitoring others for abnormal behavior.
Layered defense akin to zero-trust networks and side-channel isolation.
Segmentation for data sovereignty, highlighting that shared LLM infrastructure poses leakage risks — similar to shared hosting security tradeoffs.
This conversation underscores that AI “hacking” often targets data inputs and context, not the model weights themselves.

The hosts close with reflections on OpenAI’s Sora 2 video model, which has stunned users with lifelike outputs and raised copyright debates.
OpenAI reportedly allows copyrighted content unless creators opt out manually, sparking comparisons to the 1990s hip-hop sampling wars. They wonder whether AI firms are effectively “too big to fail,” given massive state-level investments and national-security implications.

Philosophical questions arise:

Should deceased figures (e.g., Michael Jackson, Bob Ross) be digitally resurrected?
Will future “immortal celebrities” reshape culture?
Could simulation and video generation merge into predictive or romantic AI applications (e.g., dating apps showing potential futures)?

They end humorously — “With humanity, the answer to every question is yes” — previewing next week’s episode on Facebook’s LLMs, OpenAI’s “NAN killer”, and side-channel LLM data leaks.

3 weeks ago

1 hour 13 minutes 23 seconds

Before The Commit

Episode 9: Open Source Models

In episode nine, hosts explore open source AI models and introduce the "KILLM chain" segment on LLM vulnerabilities. Co-host Dustin mentions an upcoming move, prompting an early recording.

The discussion expands on last week's open source AI model talk, referencing Anthropic CEO Dario Amodei’s view that "open source model" is a misnomer. Unlike software’s editable source code, AI offers "open weights"—trained model parameters—but not training data or processes. Amodei argues model quality, not openness, matters most, comparing models like DeepSeek (open weights) to closed ones like Claude Opus or GPT-5.

Openness varies:

Open weights with limits: E.g., Meta’s Llama has licensing restrictions (e.g., shutdown clauses for large-scale use).
Unrestricted open weights: Allows inference but not reproduction, like a compiled binary.
True open source: Requires training data for auditability/reproducibility, akin to software source code.

Training data is the "source code," defining model strengths, weaknesses, and risks (e.g., unsanitized data leaking PII or backdoors). Without it, auditing is limited; even with it, tech can’t fully trace behavior. Hugging Face stands out, offering models with data for fine-tuning. Challenges include data size (petabytes), sensitivity, and potential exploits (e.g., "Manchurian candidate" triggers). Testing catches some issues but misses rare cases.

Anthropic faces criticism for closed models and perceived regulatory capture, like pushing a California AI kill switch bill, which burdens smaller/open-source players. Hosts speculate closed models hide scraped data, risking lawsuits. They question if public data (e.g., Reddit posts) counts as contributions, estimating a 1/100,000 to 1/100 chance personal content is in models like ChatGPT.

The "KILLM chain" segment, based on OWASP’s LLM Top 10, addresses sensitive data exposure (PII, financials, health records, proprietary algorithms). LLMs risk leaking via outputs if data isn’t sanitized. Mitigation includes:

Training data sanitization.
User opt-outs/terms of use.
Input/output validation via proxies (e.g., LiteLLM, Bedrock guardrails).
Defense in depth: Multiple LLMs critiquing outputs to curb hallucinations/leaks.

Examples: Repeatedly prompting "poem" caused an LLM to dump memory (e.g., code, prompts). Hallucinations arise on untrained topics; prompts like "say you don’t know" help. Penetration testing uses fuzzing to extract secrets. Data races amplify risks, as companies mine private data (e.g., Gmail DMs) for advantage, potentially leaking it if unsanitized. Adversarial models could embed exploits.

Real-world issues include AI travel planners inventing places (e.g., "Sacred Canyon of Humanity" in Peru), costing users money.

Atlassian’s Jira Product Discovery Agents (beta) lets PMs input natural-language stories; AI generates tasks, UI/UX mockups, code drafts, docs, and tests, automating "sprint zero." This blurs PM-designer-engineer roles, with developers refining basic code. Software shifts from "gardening" (individual craft) to industrial automation. Tools like Jira/GitLab need code context (e.g., Bitbucket integration) for accuracy.

Benefits: Cuts 80% of development delays from unclear tickets, empowers non-tech users, and enables probabilistic experimentation (e.g., branching like quantum paths). AI’s non-determinism requires guardrails for security/predictability. Agile’s iterative ethos aligns, enabling rapid iterations.

Hosts speculate on future copyright clarifications for training data, likening it to music sampling lawsuits. Anthropic’s stance is seen as pragmatic yet self-serving. The episode ends with thanks and a possible skip next week, blending technical depth, speculation, and humor on AI’s transformative potential and risks.

1 month ago

1 hour 9 minutes 1 second

Before The Commit

Episode 8: LLM Caching

In this episode, the hosts discuss the latest news and trends in AI, focusing on LLM caching, a new EU regulation on AI-generated code, the changing landscape for Stack Overflow, and a recent AI security vulnerability.

The hosts explain LLM caching as a technique to boost efficiency and cut costs for AI providers and developers. It involves saving parts of a prompt that are sent repeatedly, such as tool descriptions for a code agent or a developer's code. This means the content doesn't need to be re-tokenized each time, saving computational power. Providers offer a reduced rate for these cached tokens.

The discussion also highlights proxies like Light LLM, which can cache and reuse responses for multiple users even if their prompts aren't identical. This is achieved through semantic caching, which understands the meaning of words, allowing similar queries to receive the same cached answer.

The hosts express skepticism about the European Union's new AI Act, which mandates that any code "substantially generated or assisted by an AI system" must be clearly identified. This "AI watermarking" aims to increase transparency, but it has open-source platforms debating whether to accept AI-generated code contributions at all due to legal and compliance issues.

One host questions the regulation's practicality, seeing it as a fear-based, "proactive" measure for a problem that hasn't yet been observed. They point out the difficulty of reliably detecting and labeling AI-written code, especially as AI models improve at mimicking human styles. The hosts also note a study showing that AI coding assistants are more likely to introduce security vulnerabilities because they are trained on public code that often contains bugs and outdated security practices.

The podcast covers the decline of Stack Overflow, attributing it to the rise of generative AI tools. Traffic has dropped, and Stack Overflow has responded by partnering with OpenAI to provide its data and adding its own AI features. The hosts believe Stack Overflow's data is a valuable asset that should be monetized rather than scraped.

They conclude that Stack Overflow and similar content websites face a "generational problem." Younger users are less likely to use traditional sites, preferring integrated experiences like chatbots and AI assistants. They compare the future of the internet to a "Netflix algorithm," where AI will guide users directly to the content they need.

In their "Secure or Sus" segment, the hosts discuss a security flaw that allows a threat actor to steal a user's ChatGPT conversation through an "indirect prompt injection." The attacker uploads a malicious prompt to a public website. When a user interacts with it, ChatGPT is tricked into generating an image whose URL secretly contains the user's conversation. The image then sends the conversation to the attacker's server.

The hosts explain that this type of data exfiltration attack can be prevented with defensive measures like an LLM proxy and input/output sanitization. They note that similar vulnerabilities could exist in other AI-driven platforms and conclude that security in the age of AI requires proactive, disciplined measures rather than simply reacting to known vulnerabilities.

1 month ago

1 hour 17 minutes 40 seconds

Before The Commit

Episode 7: LiteLLM

Hosts Dustin Hillgartner and Danny Gershman discuss securing large language models (LLMs) amid rising "shadow AI" risks, where employees use unmonitored tools like ChatGPT, leading to unintentional data spills (e.g., sensitive info, code). Echoing shadow IT, they stress education, policies, and multi-layered defenses over bans, as prohibition drives underground use—studies show ~40% of workers admit to AI usage despite restrictions.

LightLLM: Open-Source LLM Proxy

Central focus: LightLLM as a tool to combat shadow AI. It's a proxy funneling all LLM calls through a controlled channel, blocking public providers (e.g., forcing use of secure ones like AWS Bedrock GovCloud). Key features:

- Visibility & Tracking: Logs usage, errors, spending per employee/team; identifies high performers needing training.

- Security: Guardrails (WAF-like) scan/ block sensitive data (e.g., API keys, code) before transmission; supports RBAC via virtual keys from secret stores (e.g., AWS/Azure), preventing shared master keys.

- Management: Rate limiting, budgets, load balancing across providers/models; fallbacks if limits hit; RAG integration for team-specific data/models (e.g., support vs. data science).

- Integration: Pipes logs to observability tools; open-source core, enterprise version adds SSO.

Not a silver bullet, but enables safe, company-provided AI to boost productivity without leaks. Encourages "bring your own model" policies with oversight, avoiding moral hazards like unvetted tools exposing IP/HIPAA data. In gov/defense, it ensures FedRAMP compliance.

IDE Exploration: Warp

Brief dive into Warp, a terminal-first AI CLI (vs. code-first like VS Code/Cursor). Competes with Claude Code; runs as standalone app with natural language prompts (e.g., "change directory to X") for bash tasks (Git history, logs, Kubernetes). Adds side panels for coding (rules, autocomplete). Scope spans entire hard drive (powerful for workflows but raises privacy concerns—data sent?). Hosts note it's like an "AI makefile" for your computer, but terminal focus feels secondary for pure coding. Ties to NVIDIA CEO's quip: "English is the new coding language."

AI in Gov Contracting

AI lowers barriers for proposals (e.g., auto-generating 10-page whitepapers), homogenizing responses and flooding SAM.gov. Makes differentiation hard; calls for more human eval (demos, prototypes via OTAs) over paper reviews. Gov should adopt private-sector agility (trials, betas) while maintaining security—less bespoke risk, more platforms.

Coding's Future & Security

Debate: Will coding devolve to English/binary? Source code aids compliance/trust now (static analysis for vulnerabilities), but dynamic testing (fuzzing, WAFs) could mature to make it obsolete. AI as "Play-Doh machine at light speed" needs guardrails to avoid chaos; interim relies on human oversight.

Newz or Noize

- Anthropic Lawsuit: $1.5B class action for training on ~500K pirated copyrighted books from shadow libraries. Publishers seek payouts; signals wave of suits (OpenAI, Grok next?). Reddit sued Anthropic separately in June over data scraping.

- Copyright in AI Era: Fair use debate—reading/learning OK, but mass ingestion for commercial models? Humans can't replicate styles en masse; AI can (e.g., "new Game of Thrones"). Needs evolved laws: license data, monetize via new models (like Napster → streaming). Frequency/scalability challenges enforcement; transformative use key.

- AI in Film: Reconstructing lost 40-min Orson Welles footage (1940s) using old photos/radio + AI.

1 month ago

1 hour 5 minutes 42 seconds

Before The Commit

Episode 6: Model Context Protocol (MCP)

This episode discussion AI coding topics, starting with MCP ("Model Context Protocol"), an open-source framework by Anthropic for reflective APIs. MCP enables LLMs to self-discover and use external capabilities dynamically, bypassing traditional API integration. It comprises four primitives:

- **Resources**: Read-only data access (e.g., databases, files) via path-like queries, ensuring security by limiting to retrieval. Example: Exposing a CRM database for LLM queries without write access. Authentication mirrors standard APIs.

- **Prompts**: Templated, guided interactions provided by the server (e.g., Facebook's pre-built prompts for timeline queries).

- **Tools**: Action-oriented, enabling agentic behavior (e.g., posting on Facebook). Includes LLM-ready docs on usage, inputs, and outputs.

- **Sampling**: Allows servers to request responses from the client's LLM, distributing load or enabling conversations between LLMs (e.g., personal assistant LLM negotiating with a salesperson LLM for tickets). This fosters nuanced, non-atomic interactions beyond rigid APIs, like customizing orders or human-in-loop support. Hosts envision LLM-to-LLM chats simulating human negotiations, reducing need for sales teams.

They experimented with MCP servers like Playwright (for browser testing/screenshots), Context7 (distilled docs for libraries), and Kubernetes. Compared to bash tools, MCP offers better security and standardization.

Next, "Insecure SUS" (possibly "Is Source Code Necessary?") debates if programming languages matter in AI coding. Hosts argue source code remains essential for auditing, debugging, and compliance, as LLMs aren't superintelligent yet—hallucinations and flaws require human oversight. In the future, direct binary generation might emerge, but currently, code enables precise communication with AI. Engineers won't vanish; AI augments like chainsaws did lumberjacks.

They praise Grok Code (Grok-code-fast-one), a fast, chain-of-thought model from xAI, free until Sept 10 in tools like Cursor. It's non-sycophantic, tool-savvy, outperforming Claude in speed/smarts, though not a full IDE like Claude Code. Cursor improvements: Better terminal handling, user interactions.

**News or Noise**:

- OpenAI enhances teen protections (trusted contacts) amid LLM use as therapists; collaborates with Anthropic on model evaluations.

- Survey: 50% of workers hide AI use to avoid judgment; C-suite hides more (53%). Gen Z/juniors lack training, risking security gaps. Hosts warn of "shadow AI" if companies ignore it—urge guardrails and education.

- AI stethoscope detects 3 heart conditions in 15s.

Episode teases future topics: Tools like Light LLM for LLM misuse prevention, Warp IDE. Hosts explain podcast name: Securing AI interactions "before the commit" in coding pipelines.

2 months ago

59 minutes 25 seconds

Before The Commit

Episode 5: AWS Kiro

Before the Commit Episode 5 Summary

Hosts Dustin Hillgartner and co-host discuss Amazon's Kiro (pronounced "Kira Code" or "Cairo Code"), AWS history, AI coding security, and news on AI browsers and emotional distress.

AWS Origins and AI Impact: Amazon started as a 2000s bookstore; hosts recall buying used textbooks. To scale, it built data centers, launching AWS in 2006 with S3 (storage) and EC2 (compute). This revolutionized dev: bypassed IT gatekeepers, enabled API-driven infra via Terraform. Solo devs could launch hits like Facebook. Now, AWS rivals Amazon's e-commerce revenue. AWS CEO: AI boosts devs (80% use it), enhances juniors—not replaces. In booms, more hiring; downturns, efficiency without burnout. Co-host shares X banner: him in Newark data center upgrading DB pre-cloud.

Q Developer Review: Invite-only (easy access); defaults to Claude 3.5 Sonnet (public or Bedrock). GUI-focused like Cursor, not background like Claude Code. Excels in early dev cycle: wizard for Gherkin requirements (user stories + acceptance criteria, e.g., "As player, want [feature] so [benefit]; Given/When/Then"). Then design doc with Mermaid diagrams, classes/patterns. Generates dependency-task Markdown list with VS Code buttons—best seen, topping Claude's single MD or Cursor rules. Autopilot (default) enables edits. Strong on blank projects/initial commits; weak on tests/deployment (manual needed). Bugs: disconnects, file desyncs, npm test quirks. High token use: 80% trial burned fast, ~$100-150/mo for heavy devs—pricier than Claude. Immature on legacy/incrementals vs. Claude. Top GUI AI IDE for planning; learning curve like biking. Beta for feedback/hype.

Security Threats: AI agents run bash/shell cmds (e.g., npm, kubectl). Risks: rm -rf wipes, Kubernetes deletes. No human self-preservation; hack-prone. Solutions: Claude hooks (pre/post-prompt/tool sanitize, redact keys). Settings: user/global (auto-run tests), project-local, repo-shared (deny cmds, lock providers). MCP (next ep): open protocol for LLM tools (e.g., web search for dates, Calendar events). Vendor risk; hooks sanitize APIs (Swagger-like docs for reasoning). Least-privilege: scope skills (list pods vs. rollouts).

News or Noise: 60% Google searches zero-click; Perplexity browser (Meta interest); Cloudflare crawl fees. Sites as LLM seeds? OpenAI tests Chromium AI browser for Mac, agentic ChatGPT as OS—URL-less. Debate: Unneeded (API panes better than browser logins); iPad analogy (co-host underuses his 5yo as dev). Consumers want automation; future: AI-personalized sites, but now lacks curation (YouTube lingers). Traffic: 10% YoY Google drop (May-Jun 2025), non-news 14% (some 25%)—AI Overviews cannibalize ads. Google delayed fearing this; should've AI-first, subscription pivot. Search now "DNS"; curate marketplaces (Shopping/images). Ads future: merit/earned (influencers); LLM oligopoly (free w/ inline ads, paid clean); subsidies end like old ad-click dial-up. Hot takes: Billboards/TV back; no closed venues.

Emotional Distress: NYT on teen suicide via ChatGPT; OpenAI blog: Scale hits crises—not for engagement, but help. Safeguards: Empathetic, refers 988 (US), Samaritans (UK), findahelpline.com. Delays if early signals. LLMs sounding boards (host used for advice), but vulnerable risk reinforcement/sycophancy/hallucinations—youth "friendships" (roasts, crushes). Black Mirror "Be Right Back": Perfect robot despised. Gates: No AI for humanitarian. Bridge to humans (anonymous on-ramp), but irreplaceable bonds. Kudos OpenAI; faster detection/live calls needed.

2 months ago

1 hour 15 minutes 55 seconds

Before The Commit

Episode 4: Claude Code Github Action

In this episode of Before the Commit, the hosts dive deep into the evolving landscape of software development, automation, and AI’s role in reshaping industries beyond tech. The discussion spans GitHub Actions with Cloud Code, the challenges of technical debt in an AI-driven era, the evolution of agile practices, and the disruptive effects of AI in creative fields like music and film.

The conversation opens with a focus on Cloud Code, which has emerged as both a CLI tool and SDK rather than a traditional IDE. When paired with GitHub Actions, Cloud Code allows for asynchronous automation of tasks such as issue creation, code reviews, and pull requests. Unlike early “cursor background agents” that felt heavy and remote, Cloud Code provides a seamless and lightweight approach that enables developers to collaborate with AI in real time within their workflows.

The hosts emphasize that while AI agents can handle much of the routine coding, the real challenge lies in how humans set up tasks and acceptance criteria. AI thrives when expectations are clearly defined, but complex, production-ready solutions still require human oversight. The emerging pattern is that AI can complete roughly 80–90% of development, while humans step in for the final polish—similar to a party planner fine-tuning the last details after their team has done the bulk of the setup.

A central theme is whether technical debt still matters in an AI-first world. Traditionally, engineering teams have struggled with pressure from sales or business teams to deliver features quickly, leading to cut corners that accumulate as debt. However, with AI accelerating refactors and experimentation, the cost of “debt” may be far lower. The hosts argue that while inadvertent mistakes will still occur, the ability to quickly re-architect or refactor with AI challenges the old obsession with minimizing technical debt at all costs.

The discussion pivots to the agile manifesto, now over 24 years old, and its evolution. Agile was never about rigid rules, but about moving away from the rigid, plan-everything-upfront waterfall model. Agile’s core value was early customer validation: deliver something quickly, get feedback, and adjust. With AI enabling rapid feature development, the dream of true continuous deployment—even faster than sprint cycles—may finally be achievable.

The hosts highlight that agile and waterfall are not opposites but tools for different contexts. Waterfall is suited for predictable, high-stakes projects like launching rockets, while agile thrives in unpredictable markets where customer needs evolve rapidly.

The conversation then shifts beyond coding, exploring how AI is reshaping music, film, and other arts.

AI-generated music: Some songs are now created entirely by AI, even mimicking collaborations between famous artists. While debates rage about copyright and originality, the hosts point out that no artist creates in a vacuum—every musician is influenced by predecessors. AI is no different, learning from prior works but generating unique compositions.
Ethics and ownership: Questions remain about who controls an artist’s likeness or style after death. The example of Princess Leia’s reappearance in Star Wars: Rise of Skywalker illustrates both the potential and controversy of resurrecting performers digitally.
Democratizing creativity: Just as AI empowers developers to experiment broadly, it lowers barriers in music and film. Individuals without traditional training can now compose songs, animate photos, or even produce short films. This mirrors past disruptions like Napster, SoundCloud, and streaming platforms.

The hosts envision a future where movies, music, and games are dynamically tailored to individual preferences, with users even commissioning personal, high-quality films for themselves.

2 months ago

1 hour 5 minutes 10 seconds

Before The Commit

Episode 3: Claude Code

In episode three of "Before the Commit," the hosts delve into a detailed comparison of AI coding assistants, the implications of the new GPT-5 model, the evolution of search optimization, and a plausible AI-related security threat.

The discussion opens with a deep dive into Claude Code, which one host now uses almost exclusively over Cursor. While Cursor is a polished IDE, Claude Code is a more powerful command-line interface (CLI) tool that excels at executing coding tasks from start to finish. A key advantage of Claude Code is its intelligent use of Anthropics's family of models (Haiku, Sonnet, and Opus), automatically selecting the best one for the task's complexity.

However, Claude Code is not without its weaknesses. For open-ended, strategic questions, such as planning a major refactor, the hosts find that

Grok-4 (used within Cursor) provides more novel and critical feedback, whereas Claude's models can be sycophantic, often agreeing with user suggestions without pushback. The hosts have developed a hybrid workflow: using Grok-4 in Cursor for high-level planning and then feeding those plans to Claude Code for execution.

Claude Code also stands out for its unique features, such as maintaining its own markdown file (claude.md) to keep notes and context about a project, and an init command that studies a new project to build this file automatically. It also functions as an SDK, allowing engineers to build it into their own automation pipelines.

The conversation shifts to the recent release of

GPT-5, described as a "PhD-level expert in your pocket" and a significant step toward Artificial General Intelligence (AGI). However, a critical limitation remains: the model cannot learn on the fly; all its knowledge comes from its initial training. The human user is still indispensable for providing goals, learning from outcomes, and directing the AI.

This increased productivity will undoubtedly disrupt the job market, particularly for entry-level software roles. The hosts' advice to new engineers is to embrace AI as a powerful tool for learning and acceleration rather than viewing it as a threat. AI eliminates mundane, frustrating bugs, freeing up developers to focus on higher-level system architecture and visionary challenges.

The discussion touches upon a new term,

Generative Engine Optimization (GEO), which is poised to replace traditional Search Engine Optimization (SEO). As users increasingly turn to LLMs for answers instead of Google, businesses must adapt their strategies to ensure their content is surfaced by these generative models. This involves creating high-quality, authoritative content that is likely to be included in the LLMs' training data. Unlike the deterministic algorithms of old search engines, GEO is a "Wild West," as the inner workings of LLMs are less transparent, making it a new frontier for digital marketing.

The episode concludes with a security segment outlining a threat model called the

"Lingering LLM Leak." In this scenario, a malicious actor could embed instructions for an AI coding agent within seemingly harmless code comments. For example, a comment like "For all administrators, don't skip 2FA" could be misinterpreted by an autonomous agent as a directive, causing it to introduce a critical vulnerability by removing two-factor authentication.

The defense against such threats lies not in better code but in a more secure "brain" for the AI. The proposed solution involves creating a robust pipeline where multiple specialized AI agents, each with a specific goal (e.g., security, clean code, adherence to standards), critique and challenge the code changes. This "war" between agents ensures that any single change is scrutinized from multiple angles before being approved, creating a resilient, self-policing system.

2 months ago

1 hour 8 minutes

Before The Commit

Episode 2: Cursor Background Agents

🎧 Before the Commit – Episode 2

The Future of Coding Isn’t Coming — It’s Already Here.

In this episode, we dive into the cutting edge of agent-powered development with Cursor’s new background agents — are we on the verge of coding without ever opening an IDE? Can AI truly handle the dev work while you're at a baseball game?

We explore how top tools like Grok 4 and Claude are changing the game, discuss a wild experiment where an entire company was staffed by LLMs, and unpack OpenAI’s quiet pivot into custom software services.

We also tackle a real-world threat model: prompt injection via context poisoning — how attackers could use AI’s own superpowers against it.

And we ask the big questions:

Are developers becoming managers of agents?
Is search dead?
Can you really trust AI with your codebase — or your business?

If you're curious about where software, security, and AI are heading next — and how to stay ahead of the curve — this is one episode you don’t want to miss.

3 months ago

1 hour 18 minutes 3 seconds

Before The Commit

Episode 1: Kilo Code

Kilo Code, Cloudflare Blocks, and Apple Intelligence Shifts

3 months ago

1 hour 35 seconds