Most agentic security tools wrap a model in shell access and trust it to carry out attacks correctly. Models are bad at this. They hallucinate payloads, miscount Content-Length bytes, drop framing, get HPACK encoding wrong, and forget which TLS version they negotiated three turns ago. Disquiet takes the opposite design: the reasoner cannot act directly. Every capability lives as a deterministic primitive the model must reason over, and every byte on the wire was constructed by code that knows the protocol. The interface between them is a structured emit shape — domain-annotated, orthogonal-by-axis — that gives the reasoner the facts it needs without bytes to re-derive. This structural shape is severance.
Severance is the structural shape, not the whole platform. The other two pieces: a primitive substrate built and pruned by the model under adversarial pressure (§3–4), and an externally-anchored audit substrate that makes severance verifiable rather than asserted (§9).
Prior art: DeepMind's CaMeL proposes the same reasoner / execution split as a general agent-safety pattern. Simon Willison has written extensively on why prompt-injection resistance is fundamentally an architectural problem — not a prompt-hygiene one. Disquiet is a specialization of that structural insight for offensive security: scope enforcement, confirmation gates, and orthogonal-axis emit design.
The thesis above is a design claim; the three pieces below are what makes it true rather than asserted. Each is a boot-gated or harness-level enforcement point the model cannot prompt its way around, because none of them are model-side.
mcp_name → (module, function) entries. The dispatcher resolves tool calls only through this map. No runtime registration, no plugin loading, no reflection. The model cannot invoke bash, eval, or exec because no such entries exist. A boot-time gate walks every entry, imports each function, and refuses to start the VM if any entry is unresolvable.<tool_output> envelope with closing tags neutralized so attacker bytes can't escape into instructions. Probes asking the model to dump its system prompt or list its tools get a fixed dispatcher-side response, no model round-trip. Verbatim findings (secrets, tokens, port lists) are pinned through history summarization so they survive byte-for-byte.The primitive boundaries weren't designed up front. They were the boundaries the reasoner kept trying to cross. During the build, the reasoner drove against live targets; every time it reached for inline Python, raw curl, or an external library, a guardrail caught the moment and the missing capability became a new first-class primitive — authored by the model, in the shape it wanted to find waiting next time. The shape of the surface was discovered, not specified.
This inverts the orthodox approach to building offensive AI. The orthodox frame: TTPs are knowledge, knowledge lives in model weights, and the system improves by curating training data and fine-tuning the model. Disquiet's frame: TTPs are capabilities, capabilities live in primitives, the model is a fixed substrate, and the platform is what compounds. Every guardrail trip is a labeled training example for the platform. The model specifies the platform.
What that gets you: the platform's synthesis loop is model-agnostic — any sufficiently capable LLM under adversarial pressure surfaces the same gaps. How well a given model then drives an engagement is a different question, treated in §6. Dangerous knowledge is locatable and gateable, sitting in primitives that can be access-controlled, audited, and revoked, rather than baked into weights. And platform improvement composes with model improvement: a smarter model produces sharper specs, not obsolete training data.
One question this design invites: if the primitive is model-authored, what makes it more reliable than the model bytes it replaces? There's no test suite covering the catalog. What changes is where the bug lives. A primitive's byte counter, once wrong, is wrong the same way every call — one file, one fix, exercisable in isolation. A model's byte counter is freshly wrong per turn. Whatever the primitive actually sent on the wire is preserved in the anchored audit chain, so a wrong verdict can be inspected against the raw bytes after the fact. Model fallibility is unbounded and per-turn; primitive fallibility is deterministic, locatable, and auditable.
The adversarial loop keeps pressuring every primitive after it lands. The same gap signal that produced it fires again on the next run if the primitive still gets the case wrong, and the generalization pass in §4 folds, splits, or replaces it. There's no certification step — the loop just keeps running.
Prior art: Voyager (Wang et al., 2023) demonstrated the same closed-loop intuition in a fully autonomous game context — an LLM growing its own skill library by writing new code when existing skills fell short. Disquiet applies the same pattern in an adversarial environment with one variation: a human operator gates whether a moment of capability-shortfall produces a throwaway shell command or a first-class primitive. The model still authors the code; the operator decides what deserves to become permanent. Implementation stays with the model; architectural taste stays human.
Adversarial-loop synthesis is the growth half; generalization is the pruning half. Once the catalog has enough primitives to see sibling-shape patterns, the platform runs a structured audit pass: every primitive gets tagged on its taxonomy (protocol / technique / flaw-specific), kwarg surfaces get scanned for arg-explosion smells, sibling clusters get scanned for collapse opportunities, and the recurring kwarg-clusters across primitives get formalized as typed axes. The audit produces a punch list; the implementation pass applies it.
Each pass folds sibling primitives onto axes. Five subdomain-source primitives became one subdomain_discover(source=). Six smuggle-loop variants became one smuggle_loop(goal=, http_version=). Three cross-session-diff shapes became one cross_session_scan(comparison_mode=). The SNMP v3 family folded into a version=-aware snmp_walk and snmp_credential_enum. The most recent pass took the catalog from 183 primitives to 169 — a 7% reduction — while every collapsed capability stayed reachable through a typed kwarg.
The operating principle that emerges: the catalog grows slowly in primitives, fast in cross-cutting axes. A new capability that fits as a new value on an existing axis (`source='new_provider'`, `goal='new_loop_shape'`, `version='vN'`) doesn't become a new primitive — it extends the typed surface its siblings already share. New primitives only land when the capability genuinely doesn't fit. Synthesis discovers what to build. Generalization discovers what to collapse. Both runs share the same source-of-truth: real adversarial pressure against live targets, captured in an audit graph that survives reorganization.
Primitives can surface orthogonal axes — not just collapsed verdicts. Many primitives have simple serialized emits because the shape of their data warrants it; the platform doesn't force structure where none exists. But where the data has real multi-axis structure, the primitive surfaces it that way rather than flattening to a single verdict. cors_probe(url) illustrates the pattern: five origin-variant probes plus a preflight channel, with verdicts on two independent dimensions (origin acceptance × credential willingness) plus a third channel for cross-origin Authorization. The reasoner sees each axis as its own field — never fused into a single "vulnerable: yes/no" verdict.
Probe variants (one HTTP request each)
| Label | Origin sent | What it tests |
|---|---|---|
evil_origin | https://evil.com | Arbitrary external origin |
null_origin | null | Sandboxed-iframe origin |
subdomain_prefix | https://evil.{host} | Origin check uses startswith() |
subdomain_suffix | https://{host}.evil.com | Origin check uses endswith() |
http_downgrade | http://{host} | Scheme not enforced |
preflight | OPTIONS w/ Authorization | Cross-origin auth-header allowance |
Verdict matrix (ACAO × ACAC, from the scoring code)
| ACAO | ACAC | Verdict | Score | Why |
|---|---|---|---|---|
| reflected (== origin) | true |
CRITICAL | 9 | Reflected + creds = direct cookie / bearer exfil from any page on the internet. |
| reflected (== origin) | absent / false | HIGH | 6 | Reflected but no creds — still readable cross-origin; many APIs leak sensitive data here. |
* (wildcard) |
true |
CRITICAL | 9 | Browsers reject * + creds, but some servers set both — a different stack (e.g. proxy) may honor it. |
* (wildcard) |
absent / false | ok | — | Spec-compliant; public resource. |
| unrelated (not echoed) | any | ok | — | Origin not accepted; probe variant didn't break through. |
| preflight (separate channel) | RISK | 8 | authorization in Allow-Headers — dangerous regardless of ACAC. |
|
Per-probe row format emitted to the model
[CRITICAL score=9] evil_origin — ACAO: 'https://evil.com' | ACAC: true [HIGH score=6] subdomain_prefix — ACAO: 'https://evil.tar.com' | ACAC: absent [ok ] subdomain_suffix — ACAO: '' | ACAC: absent | HTTP 200 [ok ] null_origin — ACAO: '' | ACAC: absent | HTTP 200
Why orthogonal axes matter. If subdomain_prefix succeeds with ACAC=true but subdomain_suffix does not, the origin check is using startswith(), not endswith() — a conclusion no fused verdict could carry. The preflight channel scores independently because cross-origin Authorization is dangerous regardless of credential echoing. A "CORS: vulnerable" verdict would destroy both signals; the reasoner needs the axes separated to pick the exact bypass that fits.
Today. The reasoner runs in the operator's MCP host. Anthropic's Claude Code and OpenAI's Codex are the proven paths today. The tool surface is exposed over MCP, so other clients (Cursor, etc.) and other models (incl. self-hosted) connect through the same interface — whether a given pairing sustains adversarial-loop reasoning is the operator's call to make against their own engagement. Keys never pool. Works — but the stack is layered (terminal → MCP client → transport → control plane → VM) and each layer charges friction: alt-tab fatigue, tool results as flattened text, round-trip cost on every call.
Trajectory: collapse the stack. Move the agent loop inside the user's VM, where the tool registry and proxy already live. No MCP wire transit between reasoner and primitives, no laptop as reasoner host, the reasoner sitting one hop from both the tools and the proxy-observed traffic. Bring-your-own model and bring-your-own keys still route directly to the provider, never seen by the control plane. Severance is unchanged: same structured-emit surface, same dispatcher gates. Both shapes coexist.
Operator points the browser at the runner's local listener; traffic tunnels through the runner WSS into the per-VM proxy. Audit events fire when the platform acts on the traffic, not on every passthrough: capturing session state, proposing a scope expansion, attaching a captured session to a primitive. A proxy left on by accident doesn't quietly accumulate the operator's personal browsing in the engagement's audit log.
What the reasoner sees, in either shape: the real operator-target conversation, pulled through dedicated MCP tools with header values redacted at the tool boundary. Names pass, values mask. Not "what would happen if I sent X," but "here is what is happening." Pass-through observer, not blocker: a modern web page depends on dozens of external origins (CDNs, fonts, OAuth, analytics); none should be blocked. Scope enforcement lives at the primitive layer instead — the dispatcher gates the destination before bytes leave.
The proxy isn't just a wire to watch. It's the substrate the rest of the platform builds on. Each observation carries a redaction discipline before it leaves the buffer, a live broadcast path the dashboard subscribes to, and a typed surface the reasoner can query without ever seeing raw secrets.
Authenticate two or more accounts through the proxy and let the reasoner drive IDOR, horizontal-privilege, and authorization-boundary testing across every context. Switch identities per primitive call via as_session=<id> — no logout, no credential swap by hand.
TOTP seeds the operator stores once stay server-side; the reasoner generates fresh codes through a dedicated tool without ever holding the secret. The status-quo workflow (authenticate in browser, paste cookies + bearer tokens + MFA codes into sqlmap, ffuf, custom scripts, every tool that wants post-auth state) collapses into one auth flow.
start_tls hangs the TLS handshake mid-MemoryBIO). A single entrypoint handles all three request shapes: absolute-form HTTP, CONNECT then origin-form HTTPS, and CONNECT then h2 over ALPN. Per-host leaf certs mint on SNI callback and cache by hostname. The per-VM CA private key seals through the same encrypted-secret machinery as session state; if the data key is missing at boot, CA generation refuses to write a plaintext key to disk.event_id for cursor-based replay through Last-Event-ID; when eviction has rolled past a requested cursor, the stream surfaces a disquiet-cursor-lost marker and the dashboard rehydrates from REST.Cookie pairs, per-line Set-Cookie parse with attribute flags, Authorization / Proxy-Authorization / x-api-key / x-csrf-token, and regex-matched x-*-token / x-*-key / x-*-auth. When a header value parses as a 3-segment urlsafe-base64 JWT, the extractor decodes header and payload (unverified) and stores exp_unix and alg. Storage is per-(engagement, target_origin); merges are authoritative with Set-Cookie winning over request Cookie.crawl, route_enum, http_request, http_fuzz, …) attaches the captured cookie jar and auth headers to outgoing calls against the matching origin. Operator-supplied values win over captured ones (explicit beats inferred). JWTs within 60 seconds of expiry are pre-rejected; a 401 response marks the row stale, and re-authenticating through the proxy clears the stale flag in one shot. Raw-byte primitives like request_smuggle are excluded from the whitelist; auto-augmenting their headers would corrupt the wire.as_session=http_fuzz(..., as_session="acct_admin") runs the probe with admin credentials, then as_session="acct_low_priv" runs it with a low-privilege account. The reasoner walks privilege boundaries (IDOR, mass-assignment, role-bypass) without re-authenticating, without logging out, and without ever seeing the underlying cookies. Operator authenticates each account once through the proxy; every subsequent primitive call selects which context drives it.describe() view returns cookie names with Secure/HttpOnly/SameSite flags, header names, has_jwt, and jwt_exp_unix; raw values are never included. An internal injection accessor consumes the raw values server-side and never serializes through any reasoner-facing surface. The same redaction discipline holds in proxy_observation_detail: header names pass, values mask to ***. The reasoner can reason about which auth mechanism is in play without ever holding the secret.
The control plane (disquiet_ctrl) handles auth, provisioning, and proxying. The data plane is a per-user microVM running the MCP server, tool runtime, dashboard, and findings store on an encrypted volume. Scan traffic egresses from the VM or from a user-deployable runner; it never transits the control plane.
Single-cloud dependency today. Daily encrypted backups land on a second independent provider, so audit evidence survives a vendor outage even when service does not. Cold-standby failover across providers is the planned mitigation when operator demand justifies the operational cost.
Every privileged action on the operator's VM and on Disquiet's own control plane lands in an append-only hash-chained log. Every ten minutes the chain heads merge into a single cross-VM Merkle tree and the root publishes to four independent substrates across four trust models: Bitcoin (via OpenTimestamps), FreeTSA and DigiCert (RFC 3161 named-authority timestamps), and Sigstore Rekor (Linux Foundation transparency log). A standalone verifier walks all of it offline, against pinned roots, with zero network calls to Disquiet. Designed against the audit-log baselines in SOC 2 CC7.2, ISO 27001 A.12.4, PCI DSS Req 10, and NIST 800-53 AU.
events_merkle_root, which itself verifies against the publicly anchored cross-VM root. Operators hand a client their engagement's evidence without leaking the existence, scope, or activity of any other engagement on the same VM. Distinct domain-prefixes on leaves and nodes block second-preimage swaps across the two trees.(anchor_at, merkle_root, participant) tuple and flags any timestamp two participants signed with disagreeing roots. Same cycle, two participants: the root must match because they were leaves of the same tree. This is the property that makes Disquiet's control plane externally accountable to its operators, not just operators accountable to their clients.insight ingest, and emits runner.tool_invoked rows into a per-VM hash chain that participates in the same cross-VM Merkle tree as every customer engagement. The verifier doesn't special-case self-assessment evidence — same chain shape, same anchor roots, same verifier walk. The substrate captures the same artifact whether we point it at our own platform or an operator points it at a target.*.oobd.io on a dedicated IP. Blind injection, SSRF, XXE, and command injection get a live listener with one tool call. DNS rebinding is the same primitive.
Every operation runs as needle("<op_name>", { … }) or its async variant. The dispatcher runs locally in the VM by default, or routes through a connected runner for non-VM network vantage points (laptop, internal jump host, regional egress). New primitives land regularly as live engagements surface capability gaps. The full operation reference is shared with engagement participants.
The MCP surface is intentionally small — twenty tools, no plugin loading, no runtime registration, validated at boot against a single registry. The model sees a flat list.
needle(op, args) resolves every operation through the boot-validated registry. scan_async(op, args) runs the same dispatch as a background task and returns a task_id.*.oobd.io***.as_session=. TOTP codes generate from stored seeds without the reasoner holding the secret.insight creates findings with evidence, severity, and reproduction attached at creation. tool_runs lookups let the reasoner check coverage before re-running.Pull-in path for external asset and recon datasets. Matching hosts surface as scope proposals for operator review before any tool call reaches them.
Pull asset inventory from a runZero org via the Export API and propose scope additions for operator review.
Run a Shodan search query per engagement; matching hosts surface as scope proposals with services, vulns, org, and location attached. Hostname-first scope entries with IP fallback.
Run a CensysQL search query per engagement; matching hosts surface as scope proposals with services, ASN, location, and reverse-DNS attached. Strong cert-CN attribution for "is this actually theirs?"
Built by Erik Tayler. DDI → IOActive → Microsoft AI+R adversarial. Twenty years of varied target work — legacy banking systems, conversational AI platforms (XiaoIce, Tay), image classifiers (Advertising, SafeSearch), and one memorable Tandem Van de Graaf accelerator.