An Architectural Approach to SOC 2 and ISO 42001 in LLM-Powered Applications
Business-to-business software companies deploying large language model APIs face a structural compliance challenge. Enterprise buyers require SOC 2 Type II attestation before procurement can proceed. Industry surveys estimate that 83–85% of enterprise buyers mandate SOC 2 compliance as a condition of vendor selection.1
Existing compliance automation platforms—Vanta, Drata, Secureframe—audit static infrastructure: IAM policies, encryption at rest, endpoint management, network configurations. They have zero visibility into runtime LLM API traffic. When an application sends user data to OpenAI or Anthropic, these platforms cannot answer the auditor's questions:
Cloud-based AI firewall products exist but introduce two problems. First, they add 30–50ms of latency by routing traffic through external infrastructure. For real-time applications, this latency penalty is unacceptable. Second, they require prompts and responses to transit third-party servers—expanding the data processing boundary and creating the very compliance exposure they claim to mitigate.
The result is a gap: no product combines zero-latency local architecture with automated, auditor-ready evidence generation mapped to specific compliance controls.
SOC 2 auditors evaluating AI-powered applications focus on three control families:
Without runtime monitoring, the answers are manual screenshots, periodic spot checks, and attestations of intent. Auditors accept these today because better evidence does not exist. That changes when continuous, cryptographically sealed evidence is available.
Any system addressing the compliance gap must satisfy six constraints simultaneously:
| Requirement | Rationale | Control |
|---|---|---|
| Zero latency on critical path | The proxy must not add measurable overhead to user-facing requests. Forwarding must happen before audit processing. | CC6.6 |
| Zero data egress | Prompts, responses, and evidence must never leave the customer's infrastructure. Any external dependency expands the audit boundary. | CC6.6, ISO 42001 |
| Continuous evidence | Point-in-time audits miss intermittent exposure. Evidence must cover every request in every window with quantified coverage. | CC6.1, CC9.2 |
| Fail-open design | If the audit pipeline is under pressure, the proxy continues forwarding. Audit capture is dropped, not traffic. Drop events are recorded. | CC6.6 |
| Cryptographic integrity | SHA-256 hash chains on raw and redacted payloads provide tamper-evident seals. Timestamp binding prevents replay attacks. | CC6.1, ISO 42001 |
| Kubernetes-native operation | Container orchestrators treat stdout as the standard telemetry channel. The system must support stdout mode with no filesystem dependencies for telemetry. | CC6.6 |
Four architectural patterns were evaluated:
OPENAI_API_BASE=http://localhost:8080). No code changes. No infrastructure. Full L7 visibility.The sidecar pattern meets all six requirements. The trade-off is an additional process and its associated attack surface, but the sidecar runs in the same trust boundary as the application—it does not expand the trust perimeter.2
Outband is a transparent reverse proxy written in Go. The application sends requests to localhost:8080 instead of the LLM API directly. The proxy forwards immediately and clones payloads asynchronously for audit processing.
The hot path is a single io.Read wrapper. The auditReader copies bytes into pre-allocated blocks as the request streams upstream. If the block pool is exhausted or the queue is full, capture is abandoned—the proxy never blocks.
All CPU-intensive work (JSON parsing, regex matching, SHA-256 hashing) happens asynchronously in the worker pool. The proxy has returned the upstream response before the first redaction starts.
| Metric | Sequential | Concurrent (75ms upstream) |
|---|---|---|
| p50 overhead | 43µs | 0.42ms |
| p95 overhead | 51µs | 0.02ms |
| p99 overhead | 105µs | 0.49ms |
| GC impact | 12 cycles / 2.3ms total STW (155 allocs/op) | |
Benchmarks: Apple M2, Go 1.26.1. Upstream is a local httptest.Server to isolate proxy overhead from network variance. Cloud instance benchmarks (e.g., AWS c5.xlarge, c7g.2xlarge) will be published after initial production deployments and added to this section.
The audit pipeline uses a pre-allocated ring buffer (BlockPool). Blocks are fixed-size byte slices (default 64KB) reused across requests. The total pool budget defaults to 50MB. This design eliminates per-request heap allocations on the hot path and bounds memory usage regardless of traffic volume.
An in-flight memory budget (default 256MB) caps the total bytes held in reassembly buffers across all concurrent requests. When the budget is exceeded, new payloads are dropped until in-flight processing completes.
In Kubernetes deployments, a log collector (Datadog Agent, Fluentd, Promtail, Fluent Bit) tails container stdout by default. The --log-output stdout flag directs JSONL telemetry to stdout instead of rotating files. A sync.Mutex serializes concurrent flushes to prevent interleaved output. json.Encoder writes directly to os.Stdout with zero intermediate allocations.
In stdout mode, all file rotation logic is bypassed. Evidence summaries continue writing to --evidence-dir regardless of the telemetry output mode.
Outband produces two schema types: raw telemetry records (one JSONL line per request) and evidence summaries (one JSON document per aggregation window).
Each audited request produces a telemetry entry containing:
| Field | Type | Compliance Justification |
|---|---|---|
request_id | uint64 | CC6.1 — Unique identifier for access event correlation |
timestamp | RFC 3339 | CC6.1 — Temporal ordering for forensic reconstruction |
original_hash | SHA-256 | CC6.1 — Cryptographic proof of capture integrity |
redacted_payload | string | CC6.1, CC9.2 — Demonstrates data classification and tagging |
redacted_hash | SHA-256 | CC6.1 — Tamper-evident seal with timestamp binding |
pii_categories_found | []string | CC6.1, CC9.2 — Data classification per category |
extractor_used | string | CC6.6 — Identifies the intercepted AI API |
capture_complete | bool | CC6.6 — Transparency about inspection completeness |
Produced once per aggregation window (default: 60 minutes). This is the primary artifact presented to auditors.
{
"window_start": "2026-03-27T14:00:00Z",
"window_end": "2026-03-27T15:00:00Z",
"total_requests_processed": 12847,
"total_requests_audited": 12831,
"total_requests_dropped": 16,
"audit_coverage_percent": 99.88,
"total_pii_detected": 347,
"redaction_events_by_category": {
"SSN": 12, "CC_NUMBER": 3, "EMAIL": 198,
"PHONE": 89, "IP_ADDRESS": 45
},
"redaction_level": "pattern-based",
"pii_categories_not_covered": [
"NAME", "STREET_ADDRESS", "MEDICAL", "NON_US_ID"
],
"proxy_p50_latency_ms": 0,
"proxy_p95_latency_ms": 1,
"proxy_p99_latency_ms": 1,
"io_errors": 0,
"soc2_controls_satisfied": ["CC6.1", "CC6.6", "CC9.2"],
"iso42001_controls_satisfied": ["A.10.2", "A.10.3", "A.10.4"]
}
| Control | Requirement | Outband Evidence |
|---|---|---|
| CC6.1 | Logical access controls | PII detection, redaction tagging, cryptographic hash chains, per-request audit trail |
| CC6.6 | Boundary protection | Audit coverage percentage, latency proof, inspection completeness flags |
| CC9.2 | Risk mitigation | Automated PII detection, categorized risk events, explicit scope limitations |
ISO 42001 addresses AI-specific governance requirements:
Outband uses pattern-based (regex) PII detection. The following categories are supported:
| Category | Pattern | Validation |
|---|---|---|
| SSN | NNN-NN-NNNN | Format only |
| Credit Card | Visa, Mastercard, Amex formats | Luhn checksum |
| RFC 5322 simplified | None | |
| Phone (US) | 10-digit with optional +1 | Format only |
| IPv4 | Dotted quad | Octet range 0–255 |
Not covered: names mentioned in prose, street addresses, medical information, company-proprietary data, non-US identification formats. Only natural-language content fields (messages[*].content) are scanned—structural JSON fields (model names, token counts, IDs) are excluded to prevent false positives.
The
pii_categories_not_coveredfield in every evidence summary explicitly declares these scope limitations. Auditors specifically ask about coverage boundaries. Listing limitations proactively builds credibility.
git clone https://github.com/outband-ai/outband.git
cd outband
docker compose up -d
# Send a request through the proxy
curl http://localhost:8081/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}]}'
# Check telemetry
cat outband-telemetry-current.jsonl | jq .
containers:
- name: app
image: your-app:latest
env:
- name: OPENAI_API_BASE
value: "http://localhost:8080/v1"
- name: outband
image: ghcr.io/outband-ai/outband:latest
args:
- "--target=https://api.openai.com"
- "--listen=0.0.0.0:8080"
- "--log-output=stdout"
- "--evidence-dir=/data/evidence"
- "--summary-interval=60m"
volumeMounts:
- name: evidence
mountPath: /data/evidence
volumes:
- name: evidence
emptyDir: {}
Only the evidence directory needs a volume mount. Telemetry goes to stdout, where the cluster's existing log collector (Datadog, Fluentd, Promtail) picks it up with zero configuration.
| Question | Answer |
|---|---|
| Where does data flow? | Application → localhost sidecar → upstream LLM API. All audit data stays on your infrastructure. |
| What ports are opened? | Localhost only (127.0.0.1:8080). No inbound network access required. |
| What outbound connections? | Only to the configured LLM API endpoint (--target). |
| Does Outband receive any data? | No. The open-source build contains no hardcoded external service calls. |
| What happens if the sidecar fails? | Same as the upstream API being unreachable. The application's HTTP client handles the error. |
| Health probes? | /healthz (liveness), /readyz (audit pipeline initialized + upstream reachable). |
Outband follows an open-core model. The proxy, PII detection, telemetry logging, and evidence generation are open source under the Apache 2.0 license. The free tier is not a trial—it is the full audit pipeline.
Enterprise (custom pricing, tailored to your environment):
Enterprise features are delivered as a separate binary that extends the open-source core via a plugin registry. All audit data remains on the customer's infrastructure regardless of tier. Evidence files generated by any tier are forward-compatible.