Breakdown 13

Egress redaction gate: PII and secrets don't cross the org boundary

A network firewall operates at layer 3. It doesn't read payload semantics. An agent can forward customer data or API keys in its messages — and the firewall won't notice. A runtime gate at the application layer catches what the network cannot.

CTO Head of AI Architect

Problem

In a cross-org agent negotiation scenario (Breakdown 02), a network firewall already sits at the boundary. But what passes through it, the firewall doesn't parse. Company A's agent can include in its message customer records from example data (private_person, private_email), internal API keys, or credentials in headers (secret). The network boundary doesn't stop this — it doesn't look at payload semantics.

The same gap exists in any RAG scenario: corporate documents enter the agent's context, the agent forwards fragments to an external LLM or another org. There is no layer that automatically blocks accidental leaks before the text leaves. The standard answer — "we'll write a code review policy" — works until the first deadline.

Methodology

A local model detects and redacts PII and secrets before text crosses the trust boundary. No cloud calls — data never leaves the perimeter.

1. Privacy Filter (openai/privacy-filter): 1.5B parameters, 50M active (MoE architecture), context up to 128k tokens. One pass over text, runs locally via transformers.
2. 8 span categories. For agent scenarios three are critical: secret (API keys, passwords, tokens), private_person / private_email / private_phone / private_address (PII), account_number (financial identifiers).
3. Egress function. Every text block leaving the org boundary passes through egress_gate(text) before forwarding. Output: redacted text + a replacement log with positions and categories — that's your audit trail.
4. Human-in-the-loop checkpoint on low-confidence flags: the agent pauses the message, a human decides whether to forward.

from transformers import pipeline

redactor = pipeline("token-classification", model="openai/privacy-filter")

def egress_gate(text: str) -> tuple[str, list]:
    spans = redactor(text)
    redacted = text
    audit_log = []
    for span in reversed(spans):  # reversed to avoid offset drift
        label = span["entity"]
        start, end = span["start"], span["end"]
        audit_log.append({"label": label, "original": text[start:end], "start": start})
        redacted = redacted[:start] + f"[{label}]" + redacted[end:]
    return redacted, audit_log

# In the agent loop — instead of sending directly:
outgoing_message = agent.compose_message(context)
safe_message, log = egress_gate(outgoing_message)
send_to_partner_org(safe_message)
store_audit(log)

Plugs into any agent pipeline as a wrapper around the send step. No changes to agent architecture required.

Artifact

Demo script in privacy-egress-gate/ inside the bots-discuss-spec repository: egress_gate(text) accepts an agent message, returns (redacted_text, audit_log). The demo includes real examples with intentionally injected secret and private_email spans — you can see what the filter catches and what slips through.

• Privacy Filter (GitHub): github.com/openai/privacy-filter
• Privacy Filter (HuggingFace): huggingface.co/openai/privacy-filter

Series signature

Where it breaks

OpenAI is explicit: Privacy Filter is a data-minimization aid, not anonymization, not a compliance guarantee. Three concrete failure modes:

Domain-specific identifiers. Internal customer codes, CRM IDs, SKUs — if they're secret in your domain, the filter doesn't know that. They pass undetected. Custom rules are needed on top of the standard categories.
Context-dependent paraphrase. The agent can paraphrase PII without quoting it verbatim — "the customer from New York who called yesterday." Token-span detection misses this. Human-in-the-loop is needed not only on flags but on substantive messages as a whole.
Latency. 50M active parameters is sufficient for text, but the delay on every egress call accumulates. Designed as an async gate with a buffer — not a synchronous blocking call in the agent's critical path.
First layer, not the final one. Egress redaction closes accidental leaks and creates an audit trail. It does not replace architectural access separation and is not legal anonymization.

For whom and why

CTOs and architects in multi-agent systems with external integrations: a runtime gate that automatically blocks accidental PII/secret leaks through agent messages — without code review, without policy, without changing agent architecture.

Head of AI: AI system governance is an engineering problem with a measurable output (audit log), not a checklist in documentation.

Want a runtime PII gate for your agent pipeline?

Automatic PII and secret redaction at the org boundary — local model, audit trail, no cloud calls, no architecture changes.

Email me

Other breakdowns

An engineering breakdown series: real task → methodology → working artifact → honest breakdown of where it fails.

Back to series →