Adversarial test mode¶
Tailtest does not just generate tests that confirm your code works. It also generates tests designed to break it. Adversarial mode is how tailtest finds bugs, not just builds coverage.
This page covers the three surfaces of adversarial testing in tailtest, when each one fires, the eight scenario categories tailtest probes, and the story of why this feature exists.
The three surfaces¶
Tailtest exposes adversarial testing through three integrated mechanisms. They serve different user moments. You do not need to choose one.
1. R15 -- always on at standard depth and above¶
R15 is a rule baked into the rule layer of every tailtest plugin. It says: every SCENARIO PLAN at standard or higher depth must include a minimum count of adversarial scenarios labelled [adversarial: <category>].
| Depth | Required adversarial scenarios |
|---|---|
simple |
0 |
standard (default) |
>=2 |
thorough |
>=4 |
adversarial |
8-12 |
You do not need to do anything to get R15. If you are running at standard depth (the default), tailtest already includes 2+ adversarial scenarios in every test it generates. Look for the [adversarial: <category>] label in the SCENARIO PLAN.
2. Depth tier adversarial -- opt-in for the whole project¶
Set "depth": "adversarial" in .tailtest/config.json to make adversarial scenarios the dominant mode for every file edited in the project.
At adversarial depth, every test file gets 8-12 scenarios biased toward breakage paths. Use this when you want bug-hunting as your default for an entire codebase.
3. /tailtest hunt <file> -- one-shot adversarial pass¶
When you suspect a bug in a specific file but do not want to change project depth, run:
Tailtest writes 8-12 adversarial scenarios into a separate hunt test file (tests/test_<basename>_hunt.py etc.) so the hunt does not contaminate your main test suite. Each failing scenario gets R12 classification: real_bug, environment, or test_bug.
After review you decide whether to keep the hunt file, merge it into the main test file, or discard.
The eight scenario categories¶
R15 draws adversarial scenarios from these categories. Tailtest picks the ones relevant to the source file under test and skips ones that genuinely do not apply.
| # | Category | What tailtest probes |
|---|---|---|
| 1 | Boundary inputs | MAX_INT, MIN_INT, empty, single-element, unicode, null bytes, malformed UTF-8 |
| 2 | Format / injection | path traversal .., regex specials, shell metacharacters, SQL fragments, HTML / XML entities |
| 3 | Type confusion | wrong type passed (string where int expected; list where dict expected) |
| 4 | Concurrent state | race conditions, shared mutable state, double invocation |
| 5 | Time / locale edges | DST transitions, leap year, locale-specific formats, timezone shifts |
| 6 | Error handling under partial failures | network mid-call fail, disk full, EINTR, permission revoked |
| 7 | Resource exhaustion | very large input, deeply nested input, many open file descriptors |
| 8 | Off-by-one logic | boundary indices, fence-post errors, last-element handling |
Tailtest documents the choice when it skips a category, e.g. Skipped category 4 (concurrency): no shared state in module.
If the source file has no external input or branching (a pure constant module, a re-export barrel), R15 does not apply at all and tailtest skips adversarial scenarios entirely.
When to use each surface¶
A short mental model:
| Situation | Use |
|---|---|
| You want bug-hunting baked into every edit, no friction | Default standard depth (R15 fires automatically with 2+ adversarial probes) |
| You are auditing a codebase or working in a security-sensitive domain | Set depth: adversarial for the project |
| You suspect a specific file has a bug | /tailtest hunt path/to/file.py |
| You want minimal noise during early prototyping | Set depth: simple (R15 stops, no adversarial scenarios) |
The three surfaces compose. With depth: adversarial set globally, R15 is satisfied automatically because every file gets 8-12 adversarial scenarios. With /tailtest hunt, depth is bypassed for the named file regardless of project setting.
Where adversarial mode came from¶
In April 2026 we ran an outreach pilot across 6 Python repos (docsight, wireup, flask-openapi, jsonargparse, radio-active, cmd2). The first pass used stock tailtest with R1-R14 (the rule layer that ensures coverage tests pass): 90 tests generated across the 6 repos, all green, zero real bugs found. The maintainers had written code that worked.
The second pass on the same 6 files added an explicit adversarial prompt that told the model to try to break the code: probe boundary inputs, inject malformed inputs, exercise concurrent state, probe error handling. 25 distinct real bugs surfaced. Among them: a memory leak in docsight's IP rate limiter, a state leak in radio-active's search function, a module-level mutation in cmd2 that corrupted instance state across test invocations.
Within 24 hours of filing, two of the six maintainers had merged fix PRs (cmd2 v3.5.1, docsight #358). Four of the six replied with positive engagement.
The outreach pilot proved the gap. Tailtest had been validating that what the agent wrote works, but had not been probing for bugs. Adversarial mode (V13, shipped 2026-04-25) bakes that adversarial layer into the product so every tailtest user gets bug-hunting capability without writing custom prompts.
How this differs across the 3 plugins¶
Adversarial mode ships in Claude Code, Cursor, and Codex plugins simultaneously. The mechanism is identical across all three:
- Same rule R15 in the rule layer file (CLAUDE.md / AGENTS.md / cursor mdc)
- Same depth tier
adversarialhonored bylib/runners.read_depth() - Same
/tailtest hunt <file>slash command, named per-host convention
The hunt slash file naming differs because each host has its own slash command convention:
| Plugin | Hunt invocation surface |
|---|---|
| Claude Code | commands/tailtest-hunt.md slash command |
| Cursor | skills/tailtest-hunt/SKILL.md skill folder |
| Codex | skills/tailtest/hunt.md skill verb |
Same behavior, different file location.
Examples¶
A SCENARIO PLAN at standard depth (R15 fires automatically)¶
Source file: services/billing.py -- a function apply_discount(amount, rate) that returns the discounted amount.
SCENARIO PLAN (not final test code):
1. Apply 20% discount to a $100 amount, expect $80
2. Apply 0% discount, expect amount unchanged
3. Apply 100% discount, expect zero
4. [adversarial: type confusion] amount as a string "100", expect TypeError
5. [adversarial: boundary inputs] amount = 0, expect zero (not negative)
6. [adversarial: boundary inputs] rate < 0, expect ValueError
7. [adversarial: type confusion] rate as None, expect TypeError
R15 contributed scenarios 4-7 (the four [adversarial: ...] ones).
A /tailtest hunt invocation¶
> /tailtest hunt services/billing.py
SCENARIO PLAN (not final test code):
1. [adversarial: boundary inputs] amount = MAX_INT, rate = 0.99 (overflow check)
2. [adversarial: boundary inputs] amount = -1, expect ValueError or correct sign handling
3. [adversarial: type confusion] amount = decimal.Decimal('100.00'), rate = float (mixed-type arithmetic)
4. [adversarial: format / injection] amount = '100; DROP TABLE billing', expect TypeError before any persistence
5. [adversarial: time / locale edges] amount expressed as locale-formatted string '1.000,50' (European), expect TypeError
6. [adversarial: off-by-one] amount = 0.0001, rate = 1.0 (full discount on near-zero), expect 0
7. [adversarial: resource exhaustion] amount = MAX_INT * 1000 (exceeds float64 precision)
8. [adversarial: error handling] rate = float('nan'), expect ValueError or NaN-clean handling
9. [adversarial: error handling] rate = float('inf'), expect ValueError
Skipped category 4 (concurrent state): function is pure with no shared state.
Hunt writes the test file at tests/test_billing_hunt.py and runs it.
Configuration recap¶
| Where | Setting | Effect |
|---|---|---|
.tailtest/config.json |
"depth": "simple" |
Disables R15; no adversarial scenarios |
.tailtest/config.json |
"depth": "standard" (default) |
R15 contributes 2+ adversarial scenarios per test |
.tailtest/config.json |
"depth": "thorough" |
R15 contributes 4+ adversarial scenarios per test |
.tailtest/config.json |
"depth": "adversarial" |
All 8-12 scenarios are adversarial-biased |
| Slash / skill | /tailtest hunt <file> |
One-shot adversarial pass on <file>, separate test file, regardless of depth |
See also: Configuration for the full .tailtest/config.json schema.