Why this kit, and why now.
What this kit gives you
A working debug stack for agent runs. Tiny CLI plus three Claude skills that turn "the agent went weird around step 7" into a method.
Why this is the kit
Agent debugging is the worst part of agent dev. You watch a run, the agent veers, you scroll back through 200 lines of trace, you guess at what prompt sent it sideways, you rerun, you guess again. Hours.
This kit replaces the guess-and-rerun loop with a three-step method:
- Trace. The agent log is parsed into a structured timeline. Every prompt, every tool call, every output.
- Bisect. Replay from any step with one parameter changed. The failure isolates in 3 to 4 replays instead of 20 to 30 reruns.
- Lock. The fix becomes a Cursor rule. The failure class never recurs in your project.
That is the loop. It is not magic; it is method.
Who this is for
- You ship agent workflows in production and you have lost a Friday afternoon to "what did the agent do at step 12".
- You are an indie dev or small-team agent builder. The kit is single-machine; cloud sync is optional and self-hosted.
- You work in Cursor or Claude Code. The kit also works with bare CLI agent loops, with slightly more setup.
What you get
agent-debug.cli— single-binary CLI. Watches your agent session, writes a structured trace.trace-skills.json— three Claude skills:- trace-isolate — reads the trace, finds the deviation point
- replay-bisect — reruns from a chosen step with one parameter changed
- rule-lock — writes a Cursor rule that blocks the failure class going forward
“Saved an entire client demo. Worth 10x the price.”