Input Sanitization for Agentic Systems: What Actually Works
KC Udonsi on the production sanitization layer that sits between untrusted input and the model.
7 / 65 confirmed
Speaker
About
Agentic systems amplify every classic LLM safety problem. A prompt injection isn't a jailbreak anymore, it's remote code execution by way of your assistant, or an intellectual property or data leak. A PII leak isn't a compliance footnote, it's training data for a vendor's next model. And as agents start reading tool outputs, retrieved documents, and other agents' messages, the trusted-input boundary disappears entirely.
This talk walks through the design of a production sanitization layer that sits between untrusted input and the model, regardless of whether that input comes from a user, a tool, or another agent.
What we cover
- Why generic guardrails fail. Regex gets bypassed in seconds. Bracketed PII redaction like
[NAME_1]actively provokes hallucinations. Single-classifier approaches miss paraphrased attacks. - A layered detection model: heuristics, fine-tuned classifiers, semantic drift, and LLM-as-judge. When each pays for itself and when it doesn't.
- Context-preserving pseudonymization: replacing PII with structurally valid fakes (real names, reserved IPs, 555-phones) instead of placeholders, and why this keeps downstream reasoning intact.
- Integration trade-offs: transparent proxy vs SDK hook vs sidecar gRPC. Latency budgets, blast radius, and the operational cost of each.
You leave with a concrete reference architecture, the failure modes we hit in production, and the numbers behind why some "obvious" defenses make things worse.