Prompt injection isn't some new exotic hack. It’s what happens when you throw your admin console and your users into the same text box and pray the intern doesn’t find the keys to production. Vendors keep chanting about “guardrails” like it’s a Harry Potter spell, but let’s be real—if your entire security model is “please don’t say ignore previous instructions,” you’re not doing security, you’re doing improv.
So we're digging into what it actually takes to keep agentic AI from dumpster-diving its own system prompts: deterministic policy engines, mediated tool use, and maybe—just maybe—admitting that your LLM is not a CISO. Because at the end of the day, you can’t trust a probabilistic parrot to enforce your compliance framework. That’s how you end up with a fax machine defending against a DDoS—again.
The core premise here is that prompt injection is not actually injection, it's system prompt manipulation—but it's not a bug, it's by design. There's a GitHub repo full of system prompts extracted by folks and a number of articles on "exfiltration" of system prompts. Join F5's Lori MacVittie, Joel Moses, and Jason Williams as they explain why it's so easy, why it's hard to prevent, and possible mechanisms for constraining AI to minimize damage. Cause you can't stop it. At least not yet.