Skip to main content

AI Agent Tools — Architecture Guardrails

The AI agent panel's tools live under backend/src/ai/tools/. They are the LLM's entry points for read and write operations. The architectural rule is simple:

Tools must not mutate the database directly. Every write goes through a canonical service in backend/src/domain/agentOps/ (Layer-3, in progress) so business invariants are enforced once, regardless of whether the call comes from an HTTP route or the AI panel.

Two automated guardrails enforce this:

Layer 1 — Arch test

backend/tests/architecture/no-prisma-writes-in-ai-tools.test.ts

Scans every .ts file under src/ai/tools/ for:

  • Prisma write methods (create, createMany, update, updateMany, upsert, delete, deleteMany).
  • Prisma escape-hatches ($transaction, $queryRaw, $queryRawUnsafe, $executeRaw, $executeRawUnsafe).

Any hit in a file that is not in AMNESTY fails CI. Amnesty entries represent files known to bypass the rule today, scheduled for Layer-3 extraction — the list should shrink to zero, not grow. A third assertion emits an advisory log line when an amnesty file is now clean and ready to be removed from the list.

Run:

cd backend
npx vitest run tests/architecture/no-prisma-writes-in-ai-tools.test.ts

Layer 2 — Inventory script

backend/scripts/inventory-ai-tool-writes.ts

Statically walks the same tree and emits a markdown report at docs/ai-agent-fab/AI_TOOL_DB_INVENTORY.md listing every write, read, escape-hatch, and prisma-client import. The punch list at the top is the Layer-3 migration backlog.

Re-run after any change under src/ai/tools/:

cd backend
npx tsx scripts/inventory-ai-tool-writes.ts

What about reads?

Reads (findUnique, findMany, aggregate, etc.) are NOT blocked by Layer 1. They will be the focus of a later pass — the priority order is writes (state-changing, invariant-laden) before reads (mostly downline scope filters that already live in helper functions). Track read access through the inventory's READ rows.

Layer 4 — Parity tests

backend/scripts/parity-layer3.mjs

For every extracted domain operation, the parity script runs the SAME logical scenario through two paths:

  • Path A — direct domain call (the shape HTTP routes will use post-migration).
  • Path B — through the AI tool executor (executeGivePoints, executeCreatePlayer, executeEditLimit).

Both must produce byte-identical DB state. Any divergence means the AI tool's translation layer (its argument unpacking, defaults, side effects) has drifted from the canonical. Layer 1 prevents the tool from inventing its own write path; Layer 4 proves the tool's argument plumbing is correct.

Each test creates a throw-away fixture (timestamp-suffixed user) so multiple runs don't clash and the script is idempotent. Runs the scenario through Path A, captures the diff, restores the fixture, runs Path B, captures that diff, asserts both are equal — then cleans up.

Run inside the BE container:

docker exec -w /app strykr-dev1-backend node scripts/parity-layer3.mjs

Current coverage (6 cases, all passing on dev1):

  • credit parity: creditPlayerexecuteGivePoints (positive amount)
  • debit parity: debitPlayerexecuteGivePoints (negative amount)
  • createPlayer parity: domain shape + allocation shape
  • editPlayerLimit parity: creditLimit branch (composes creditPlayer)
  • editPlayerLimit parity: minStake policy branch

The first version of these tests caught a real policy divergence: checkUsernameForAi (AI side) rejects dashes; bare createPlayer accepts them. That's by-design AI-tool conservatism, not a domain bug — but the test forced us to notice and document it. That's exactly the value Layer 4 provides.

True route↔tool parity (TODO)

The current parity tests compare "Path A direct domain call" vs "Path B AI tool executor" — both converge on the domain service. They catch AI tool drift. They do NOT yet exercise the HTTP route side, because the routes still have their own inline Prisma writes. Migrating the routes (POST /agent/allocate-to-sub-agent, PATCH /admin/players/:id/credit-limit, POST /agent/invite-player, etc.) to call the same domain services is the next step — at which point we can swap Path A to call the route handler directly and the test becomes a true route↔tool comparison.

What's next

  • Layer 5 — schema invariants: CHECK constraints (balance_points >= 0, credit_limit >= 0, etc.), partial unique indexes for idempotency.
  • Layer 6 — reconciliation job: per-period verification that SUM(Transaction.amount) per user equals current balance_points, double-entry balances to zero, no over-limit balances.

These are domain-agnostic — they apply identically to credit, settlement, betting, hedging, limits, and anything else added later. We never enumerate invariants by hand; we make it structurally impossible for a tool to bypass the canonical service.