Test History¶

Tailtest keeps a persistent record of every file it has tested. This record survives across sessions and gives tailtest context before it writes a single line of tests.

Where the file lives¶

.tailtest/history.json

Same path for all three platforms (Claude Code, Cursor, Codex). Add .tailtest/ to your .gitignore to keep it out of version control.

What gets tracked¶

Every time tailtest finishes testing a file, it writes one entry to history.json. Each entry has:

Field	What it contains
`file`	Relative path to the source file
`status`	Outcome: `passed`, `fixed`, `unresolved`, or `deferred`
`session_id`	ID of the session in which this result was recorded
`timestamp`	UTC timestamp
`classification`	How this result compares to prior history (see below)

Classification¶

Each entry is classified based on the file's history up to that point:

Classification	Meaning
`gap`	This file has never been tested before. First time tailtest has seen it.
`passed`	Tests passed this session, and the file has prior history.
`fixed`	Tests were failing but were resolved within the same session.
`regression`	The file was passing in the most recent prior session. Now it is failing.

Classifications are assigned automatically when the session ends. They appear in the history file and in session reports.

Recurring failure detection¶

If the same file fails across three or more different sessions, tailtest flags it as a recurring problem.

"Different sessions" means distinct session_id values. If a file fails ten times within a single session while tailtest tries to fix it, that counts as one session, not ten. The threshold is met only when the file has failed and remained unresolved across three separate work sessions.

Files that meet this threshold are surfaced at the next session start so the agent knows about them before it begins.

Cross-session context at startup¶

When you open a new session, tailtest reads history.json and prepends a context note before the agent starts working. This note contains:

Recurring failures: files that have failed in three or more sessions
Recent regressions: files that were passing before but failed in a recent session

For example, the agent might start with: "Recurring failures across sessions: billing.py. Recent regressions: checkout.py (was passing, now failing)."

This means the agent already knows which files are trouble spots before it writes any tests.

Limits¶

history.json is capped at 1000 entries. When a new session would push the total over that limit, the oldest entries are dropped first.