Adversarial test mode¶

Tailtest does not just generate tests that confirm your code works. It also generates tests designed to break it. Adversarial mode is how tailtest finds bugs, not just builds coverage.

This page covers the three surfaces of adversarial testing in tailtest, when each one fires, the eight scenario categories tailtest probes, and the story of why this feature exists.

The three surfaces¶

Tailtest exposes adversarial testing through three integrated mechanisms. They serve different user moments. You do not need to choose one.

1. R15 -- always on at standard depth and above¶

R15 is a rule baked into the rule layer of every tailtest plugin. It says: every SCENARIO PLAN at standard or higher depth must include a minimum count of adversarial scenarios labelled [adversarial: <category>].

Depth	Required adversarial scenarios
`simple`	0
`standard` (default)	>=2
`thorough`	>=4
`adversarial`	8-12

You do not need to do anything to get R15. If you are running at standard depth (the default), tailtest already includes 2+ adversarial scenarios in every test it generates. Look for the [adversarial: <category>] label in the SCENARIO PLAN.

2. Depth tier `adversarial` -- opt-in for the whole project¶

Set "depth": "adversarial" in .tailtest/config.json to make adversarial scenarios the dominant mode for every file edited in the project.

{
  "depth": "adversarial"
}

At adversarial depth, every test file gets 8-12 scenarios biased toward breakage paths. Use this when you want bug-hunting as your default for an entire codebase.

3. `/tailtest hunt <file>` -- one-shot adversarial pass¶

When you suspect a bug in a specific file but do not want to change project depth, run:

/tailtest hunt path/to/file.py

Tailtest writes 8-12 adversarial scenarios into a separate hunt test file (tests/test_<basename>_hunt.py etc.) so the hunt does not contaminate your main test suite. Each failing scenario gets R12 classification: real_bug, environment, or test_bug.

After review you decide whether to keep the hunt file, merge it into the main test file, or discard.

The eight scenario categories¶

R15 draws adversarial scenarios from these categories. Tailtest picks the ones relevant to the source file under test and skips ones that genuinely do not apply.

#	Category	What tailtest probes
1	Boundary inputs	`MAX_INT`, `MIN_INT`, empty, single-element, unicode, null bytes, malformed UTF-8
2	Format / injection	path traversal `..`, regex specials, shell metacharacters, SQL fragments, HTML / XML entities
3	Type confusion	wrong type passed (string where int expected; list where dict expected)
4	Concurrent state	race conditions, shared mutable state, double invocation
5	Time / locale edges	DST transitions, leap year, locale-specific formats, timezone shifts
6	Error handling under partial failures	network mid-call fail, disk full, EINTR, permission revoked
7	Resource exhaustion	very large input, deeply nested input, many open file descriptors
8	Off-by-one logic	boundary indices, fence-post errors, last-element handling

Tailtest documents the choice when it skips a category, e.g. Skipped category 4 (concurrency): no shared state in module.

If the source file has no external input or branching (a pure constant module, a re-export barrel), R15 does not apply at all and tailtest skips adversarial scenarios entirely.

When to use each surface¶

A short mental model:

Situation	Use
You want bug-hunting baked into every edit, no friction	Default `standard` depth (R15 fires automatically with 2+ adversarial probes)
You are auditing a codebase or working in a security-sensitive domain	Set `depth: adversarial` for the project
You suspect a specific file has a bug	`/tailtest hunt path/to/file.py`
You want minimal noise during early prototyping	Set `depth: simple` (R15 stops, no adversarial scenarios)

The three surfaces compose. With depth: adversarial set globally, R15 is satisfied automatically because every file gets 8-12 adversarial scenarios. With /tailtest hunt, depth is bypassed for the named file regardless of project setting.

Where adversarial mode came from¶

In April 2026 we ran an outreach pilot across 6 Python repos (docsight, wireup, flask-openapi, jsonargparse, radio-active, cmd2). The first pass used stock tailtest with R1-R14 (the rule layer that ensures coverage tests pass): 90 tests generated across the 6 repos, all green, zero real bugs found. The maintainers had written code that worked.

The second pass on the same 6 files added an explicit adversarial prompt that told the model to try to break the code: probe boundary inputs, inject malformed inputs, exercise concurrent state, probe error handling. 25 distinct real bugs surfaced. Among them: a memory leak in docsight's IP rate limiter, a state leak in radio-active's search function, a module-level mutation in cmd2 that corrupted instance state across test invocations.

Within 24 hours of filing, two of the six maintainers had merged fix PRs (cmd2 v3.5.1, docsight #358). Four of the six replied with positive engagement.

The outreach pilot proved the gap. Tailtest had been validating that what the agent wrote works, but had not been probing for bugs. Adversarial mode (V13, shipped 2026-04-25) bakes that adversarial layer into the product so every tailtest user gets bug-hunting capability without writing custom prompts.

How this differs across the 3 plugins¶

Adversarial mode ships in Claude Code, Cursor, and Codex plugins simultaneously. The mechanism is identical across all three:

Same rule R15 in the rule layer file (CLAUDE.md / AGENTS.md / cursor mdc)
Same depth tier adversarial honored by lib/runners.read_depth()
Same /tailtest hunt <file> slash command, named per-host convention

The hunt slash file naming differs because each host has its own slash command convention:

Plugin	Hunt invocation surface
Claude Code	`commands/tailtest-hunt.md` slash command
Cursor	`skills/tailtest-hunt/SKILL.md` skill folder
Codex	`skills/tailtest/hunt.md` skill verb

Same behavior, different file location.

Examples¶

A SCENARIO PLAN at `standard` depth (R15 fires automatically)¶

Source file: services/billing.py -- a function apply_discount(amount, rate) that returns the discounted amount.

SCENARIO PLAN (not final test code):
1. Apply 20% discount to a $100 amount, expect $80
2. Apply 0% discount, expect amount unchanged
3. Apply 100% discount, expect zero
4. [adversarial: type confusion] amount as a string "100", expect TypeError
5. [adversarial: boundary inputs] amount = 0, expect zero (not negative)
6. [adversarial: boundary inputs] rate < 0, expect ValueError
7. [adversarial: type confusion] rate as None, expect TypeError

R15 contributed scenarios 4-7 (the four [adversarial: ...] ones).

A `/tailtest hunt` invocation¶

> /tailtest hunt services/billing.py

SCENARIO PLAN (not final test code):
1. [adversarial: boundary inputs] amount = MAX_INT, rate = 0.99 (overflow check)
2. [adversarial: boundary inputs] amount = -1, expect ValueError or correct sign handling
3. [adversarial: type confusion] amount = decimal.Decimal('100.00'), rate = float (mixed-type arithmetic)
4. [adversarial: format / injection] amount = '100; DROP TABLE billing', expect TypeError before any persistence
5. [adversarial: time / locale edges] amount expressed as locale-formatted string '1.000,50' (European), expect TypeError
6. [adversarial: off-by-one] amount = 0.0001, rate = 1.0 (full discount on near-zero), expect 0
7. [adversarial: resource exhaustion] amount = MAX_INT * 1000 (exceeds float64 precision)
8. [adversarial: error handling] rate = float('nan'), expect ValueError or NaN-clean handling
9. [adversarial: error handling] rate = float('inf'), expect ValueError

Skipped category 4 (concurrent state): function is pure with no shared state.

Hunt writes the test file at tests/test_billing_hunt.py and runs it.

Configuration recap¶

Where	Setting	Effect
`.tailtest/config.json`	`"depth": "simple"`	Disables R15; no adversarial scenarios
`.tailtest/config.json`	`"depth": "standard"` (default)	R15 contributes 2+ adversarial scenarios per test
`.tailtest/config.json`	`"depth": "thorough"`	R15 contributes 4+ adversarial scenarios per test
`.tailtest/config.json`	`"depth": "adversarial"`	All 8-12 scenarios are adversarial-biased
Slash / skill	`/tailtest hunt <file>`	One-shot adversarial pass on `<file>`, separate test file, regardless of depth

See also: Configuration for the full .tailtest/config.json schema.