tailtest for Codex CLI¶

tailtest blocks the Codex agent at the end of every turn and asks it to write and run tests before continuing -- automatically, with no prompting.

How the cycle works:

When the turn ends, the stop hook sweeps the project for files whose modification time is newer than when the turn started
Any new or changed source files are queued
Codex gets a decision: block with an instruction to write tests before continuing
After tests pass, the normal agent turn resumes

Requirements¶

Python 3.9+
Codex CLI 0.120+
codex_hooks = true in your Codex config
macOS or Linux

Install¶

# Step 1 (one-time): clone the plugin
git clone https://github.com/avansaber/tailtest-codex ~/.codex/plugins/tailtest

# Step 2 (one-time): enable the hooks feature flag
# Add to ~/.codex/config.toml:
[features]
codex_hooks = true

# Step 3 (per project): run the init helper inside each project where you want tailtest
cd <your-project>
bash ~/.codex/plugins/tailtest/scripts/init.sh

# Done. Start a codex session in the project.

The init.sh helper writes .codex/hooks.json in your current directory pointing at the plugin's session_start.py and stop.py. It is idempotent and will not overwrite an existing .codex/hooks.json with different content -- it writes a .codex/hooks.json.tailtest sidecar for manual merging instead. You run it once per project; no global registration required.

If you prefer manual setup (skipping the helper script):

mkdir -p <your-project>/.codex
cp ~/.codex/plugins/tailtest/hooks/hooks.json <your-project>/.codex/hooks.json

How it works internally¶

SessionStart hook -- scans your project for test runners, detects test style, and injects AGENTS.md so Codex knows the test workflow.

Stop hook -- mtime sweep at turn end. Files newer than turn_start_mtime are queued. Codex is blocked until tests are written and run.

AGENTS.md -- the instruction file that drives the entire cycle: scenario selection, test writing, execution, fix loop, and reporting.

Complexity scoring¶

tailtest scores every queued file before generating scenarios. Path signals (auth, billing, payment, checkout) and content patterns (HTTP calls, database queries, branch count, public functions) contribute to a score. Files scoring 10 or above get thorough-depth testing (10-15 scenarios) regardless of the session-level depth setting, and Codex sees a reasoning note like "billing: +4 billing +3 HTTP = 12 scenarios". Low-complexity files get 2-3 scenarios. This happens automatically on every file write with no configuration needed.

Scenario tracking¶

At turn end, tailtest logs the outcome for each tested file: passed, fixed (failed but resolved within the turn), unresolved, or deferred. This log feeds cross-session history so recurring failures are surfaced at the start of future sessions. See the History page and Advanced for details.

Configuration¶

Create .tailtest/config.json in your project root (optional):

{
  "depth": "standard"
}

See Configuration for all options.

Commands¶

Command	What it does
`/tailtest <file>`	Manually queue a specific file
`/summary`	Print session test results
`/tailtest off`	Pause automatic test generation
`/tailtest on`	Resume after pausing

Troubleshooting¶

No tests after Codex writes a file: Check that codex_hooks = true is in your .codex/config.toml. Without this flag, the Stop hook never fires.

Codex seems stuck in a loop: Tailtest uses a stop_hook_active guard. If you see repeated test prompts without progress, verify hooks.json is in your project root, not a subdirectory.

Go/Rust/Java not queuing: These languages require a detected runner. If go.mod, Cargo.toml, or a Maven/Gradle file was not found, files in that language are silently skipped.

Windows: Codex hooks are not supported on Windows.