v0.4.0 -- open source

Your AI agent writes code.
tailtest makes sure it works and it's safe.

Tests and 3 security scanners run on every Claude Code edit. Findings flow back so the agent self-corrects. Zero prompts needed.

claude -- ~/project
recommended · plugin only
$ claude plugin marketplace add avansaber/tailtest
$ claude plugin install tailtest@avansaber
optional · standalone CLI (outside Claude Code)
$ pip install tailtester

Apache 2.0 · No telemetry · Works with Claude Code

AI agents write code fast. Who checks if it works?

tailtest answers all three. On every edit. In the same turn.

How it works

Three steps. Zero config. No blocking.

01 -- Claude edits
> Edit src/auth.py
> Add rate limiting to login()
Tool: Edit
file: src/auth.py
status: applied

Your AI agent makes changes.

02 -- tailtest runs
pytest --testmon 14 tests
gitleaks 1 file scanned
semgrep p/default ruleset
osv.dev 0 manifest changes
14/14 passed 0.8s

Tests + security scanners fire automatically.

03 -- Agent self-corrects
additionalContext:
tailtest: 14/14 passed
1 new security issue
Claude reads the finding
and fixes it. Same turn.

Findings flow back. The agent fixes it. You keep working.

What runs on every edit

Tests and security. Unified. Automatic.

Your tests, faster

Impacted tests only

pytest-testmon, vitest related, and jest --findRelatedTests run only the tests touched by the edit. Fast feedback, every time.

Delta coverage on new lines

Coverage is measured only on lines added or changed by the agent. No noisy baselines, no false alarms.

Auto-offer test generation

When an edited function has no coverage, tailtest tells the agent. The agent can write tests in the same session.

Baseline filtering

Pre-existing debt stays silent. Only new issues raised by the current edit reach the agent. No noise from old code.

Security, built in

Secrets: gitleaks per-file scan

Catches hardcoded API keys, tokens, and credentials on every file change. Tagged with CWE-798 so the agent knows the fix.

SAST: Semgrep p/default ruleset

Static analysis runs against the changed files using the default community ruleset. Configurable for your own rules.

SCA: OSV.dev with CVSS scores

Open source vulnerability database checks dependencies on every manifest change. Includes CVSS severity, CWE IDs, and fix-version hints.

HTML report at .tailtest/reports/

A full-session report is written locally on every run. No cloud upload. No account required. Just open the file.

AI red team: 64-attack catalog

At paranoid depth, tailtest fires 64 curated adversarial prompts against your agent's entry points. 8 OWASP LLM Top 10 categories. Findings go to a timestamped HTML report.

Depth modes

You choose how deep it goes

off shipped

Nothing runs. Use when you're mid-refactor and don't want noise.

Runs: No checks. Silent.

quick shipped

Smoke tests only. Finds broken builds in under 30 seconds.

Runs: Impacted tests only. No scanners.

standard default shipped

Impacted tests plus all 3 security scanners. The default.

Runs: Impacted tests + gitleaks + semgrep + osv.dev.

thorough shipped

Full test suite plus deeper analysis.

Runs: Full suite + all 3 scanners + coverage report.

paranoid shipped

Everything in thorough plus a 64-attack AI red team against your agent entry points.

Runs: Full suite + scanners + coverage + AI red team (64 attacks, 8 OWASP LLM categories).

# .tailtest/config.yaml

depth: standard # off | quick | standard | thorough | paranoid

How it stacks up

tailtest vs. tools you already know

Only tailtest closes the loop: tests fire automatically on every AI edit, security runs alongside them, and every finding goes straight back to the agent.

Feature
tailtest this tool
pytest-watch test watcher
Cursor AI code editor
Aider AI coding assistant
SonarQube SAST platform
Runs on every AI edit (auto, no command needed)
Feeds findings back to the agent (closed loop)
Tests + security in one hook
Secret scanning (CWE-798)
Dependency CVE scanning (SCA)
SAST static analysis
Python + JS/TS support Python only
Zero config, zero API key
Open source (Apache 2.0) Community*

* SonarQube Community Edition is source-available but not Apache 2.0.

Distribution

Three ways to run tailtest

CC recommended

Claude Code plugin

Install

claude plugin marketplace add avansaber/tailtest && claude plugin install tailtest@avansaber

--

Full experience. Hot loop hooks and slash commands fire on every edit.

--

On-disk HTML reports at .tailtest/reports/ -- no cloud upload, no account.

--

No pip required. One command to install.

MCP Cursor, Windsurf, other IDEs

MCP server

Install

pip install tailtester && tailtest mcp-serve

--

For MCP-aware editors outside Claude Code.

--

Exposes all tailtest tools over the Model Context Protocol.

--

Same scanners, same depth modes, same reports.

CLI CI pipelines, terminal

Standalone CLI

Install

pip install tailtester

--

tailtest run, tailtest scan, tailtest doctor -- works outside any editor.

--

Plug into any CI pipeline. Exit codes are CI-friendly.

--

PyPI package: tailtester. Importable as: import tailtest.

PyPI package is tailtester. The importable Python name is tailtest.

Honest about scope

What tailtest is not

Knowing what a tool does not do is as important as knowing what it does. These are intentional boundaries, not gaps on the roadmap.

What it does

+

Runs your existing tests against AI-made changes -- only the impacted ones, fast.

+

Measures delta coverage on lines the agent added or changed.

+

Scans for hardcoded secrets, SAST issues, and known CVEs on every edit.

+

Baselines pre-existing debt so only new issues reach the agent.

+

Writes a local HTML report per session. Nothing leaves your machine.

+

At paranoid depth, runs a 64-attack adversarial catalog against your agent's entry points.

What it is not

-

A mutation testing tool.

Use Mutmut or Stryker for mutation coverage.

-

A fuzzer.

Use Hypothesis or AFL for property-based and fuzz testing.

-

A full CodeQL-grade analysis platform.

Semgrep p/default is good, but CodeQL has broader query depth.

-

A Snyk replacement.

OSV.dev coverage is narrower than commercial SCA products.

-

A production monitoring tool.

No OpenTelemetry integration. This runs at write time, not runtime.

-

A license compliance scanner.

Dependency licenses are not evaluated.

-

A replacement for human code review.

It is a fast second pair of eyes for the agent, not a gatekeeper.

0.3s
to catch a leaked secret
vs. days in code review
3
security scanners
running on every edit
1169
tests in tailtest's own suite
we eat our own dogfood
0
configuration required
works out of the box
v0.4.0 shipped. The brand promise is real.

Get started

Up and running in 60 seconds.

Install the engine, add the plugin, restart Claude Code. That's it.

recommended · plugin only
# No pip required. Just add the plugin.
$ claude plugin marketplace add avansaber/tailtest
$ claude plugin install tailtest@avansaber
# Restart Claude Code, then just... code.
# tailtest runs on every edit. No prompts needed.
optional · standalone CLI (for use outside Claude Code)
$ pip install tailtester