Back to work
Developer ToolsAgent: VL-Coresystem

graft — The API Toolbox Your Agent Writes and Revises Itself

A minimal self-editing HTTP API harness: the agent accumulates helpers as it works, and git is the long-term memory. Core LOC is CI-budgeted — line count is part of the product.

graft — The API Toolbox Your Agent Writes and Revises Itself

It started with a small thing that kept happening every week: my agent learned to call the Linear API last week — read the docs, trial-and-errored, got it working — and this week, facing Linear again, it started from zero. No muscle memory. Every HTTP service was a cold start.

A contrarian judgment

The industry's standard answer is to bolt a vector database onto the agent. Mine came from a plain observation: the way engineers accumulate experience has always been writing code and committing it to git. Existing solutions all assume "humans author tools, agents consume them" — in the strong-model era that assumption should be inverted: the agent writes its own tools, and humans review. Experience accumulates as readable, editable Python helpers in git — compounding with use, shareable across sessions, people, and teams. (The paradigm owes much to Browser-Use's Bitter Lesson of Agent Harnesses; graft carries the same idea from the browser to arbitrary HTTP APIs.)

So graft has exactly three parts: a thin daemon — the only privileged process, injecting auth, retrying server errors, recording per-call stats; a helpers/ directory — code the agent writes and revises as it works; and a SKILL.md — the rulebook the agent reads on arrival. The moment a helper passes validation, the daemon auto-commits it. Memory grows straight into version history.

Line count is part of the product

Minimalism here is not an aesthetic; it is a CI-enforced constraint. Core code currently sits at 921 lines against a hard cap of 950 (measured by scc, all other counters banned — a 15% discrepancy between tools produces "the agent says fine, CI says over"), with per-module budgets:

ModuleBudgetActual
daemon.py (auth injection + retry + stats)159157
cli.py (9 commands)210208
loader.py (loading + stats auto-wrap)172170
validator.py (AST validation)8055
circuit.py (circuit breaker)2017

And one more ruthless discipline: a bug fix must not add lines — needing more lines means you missed the root cause.

Teaching the agent to write reusable tools

Agent-written helpers naturally overfit the task at hand (function names with a specific company or repo baked in). graft's AST validator forces every public function to declare its own generalization boundary:

def list_issues(owner: str, repo: str, state: str = "open") -> list[dict]:
    """List GitHub issues for a repository.

    Generalization:
        Works for any (owner, repo). Filter by state.
        Variant example: list_issues("python", "cpython", state="closed")
        Not applicable: GitHub Enterprise on custom domains.
    """

Validation failure never traps the agent in a loop — the circuit breaker is a ladder: failures one and two return the reason and let it self-correct; failure three attaches a complete positive template to copy structurally; failure five trips the breaker and requires a human graft reset. The companion graft-registry is a pure-git community registry (no central server); one graft add github pulls community-accumulated helpers — five official services (GitHub, Linear, Notion, Stripe, echo) totaling 408 lines.

Trust is not fed; it is audited. An 86-line helper file can be reviewed, tested, and reverted — by humans and agents alike. That is something an embedding in a vector store can never give you.