AI Agent FAB — Phase 0 Spec (Design Contract)
Scope contract for the agent-panel AI assistant. Sign-off required before any tool is shipped beyond Phase 1 scaffolding. Each later phase ships independently behind a feature flag.
Goals
- Natural-language interface for everything an agent does in the panel today.
- Conversational query, structured-card rendering, role-gated tool dispatch.
- Destructive actions go through a separate confirmation step; the LLM never moves money on first call.
- Page-aware: the assistant knows which agent page is open and pre-fills context.
Non-goals (this iteration)
- No new sport / fixture data — reuse existing match tools.
- No write paths outside the listed action tools.
- No prompt that exposes internal IDs or schema to the agent.
Tool catalog
Each tool: input schema (JSON-Schema fragment), output schema (ToolResult.data shape), allowedRoles, isDestructive, sample request/response.
Read-only tools (Phase 1)
| Tool | Inputs | Output | Roles |
|---|---|---|---|
agent.searchPlayers | { query: string, limit?: number } | { players: PlayerRef[] } | agent, platform_admin |
agent.getDownline | { depth?: number } | { tree: DownlineNode[] } | agent, platform_admin |
agent.getTake | { playerId?: string } | { rows: TakeRow[], totals: { take, cl, available } } | agent, platform_admin |
agent.getCreditLimit | { playerId?: string } | { rows: CreditRow[] } | agent, platform_admin |
agent.getLiveExposure | { fixtureId?: string, marketId?: string } | { scopes: ExposureScope[], total } | agent, platform_admin |
agent.getPnL | { from?: ISO, to?: ISO, playerId?: string, fixtureId?: string, marketId?: string } | { rows: PnLRow[], total } | agent, platform_admin |
agent.getCommission | { from?: ISO, to?: ISO, playerId?: string } | { rows: CommissionRow[], total } | agent, platform_admin |
agent.getSettlementHistory | { counterpartyId?: string, from?: ISO, to?: ISO, limit?: number } | { rows: SettlementRow[] } | agent, platform_admin |
agent.getBetHistory | { playerId?: string, fixtureId?: string, status?: string[], from?: ISO, to?: ISO, limit?: number } | { bets: BetRow[], summary } | agent, platform_admin |
agent.getUpline | {} | { cl, owed, takeNet } | agent |
agent.getTopMarkets | { period: 'today'|'week'|'month', metric: 'pnl'|'volume'|'exposure', limit?: number } | { rows } | agent, platform_admin |
Destructive tools (Phase 3)
| Tool | Inputs | Confirmation card | Roles |
|---|---|---|---|
agent.editLimit | { playerId, limitType: 'creditLimit'|'perClickWin'|'aggregateDailyWin'|'minStake', newValue } | diff_card showing before/after | agent (only direct downline) |
agent.settlePlayer | { playerId, periodId?: string, mode: 'auto'|'amount', amount? } | settlement_summary | agent, platform_admin |
agent.placeHedgeBet | { fixtureId, marketId, outcomeId, side, stake } | existing bet_slip UI | agent, platform_admin |
Reports (Phase 4)
| Tool | Inputs | Output | Roles |
|---|---|---|---|
agent.generateReport | { template: 'per_player_pnl'|'bets_by_match'|'settlement_log', format: 'xlsx'|'pdf', filters } | { artifactId, downloadUrl, expiresAt } | agent, platform_admin |
agent.shareReport | { artifactId, recipient: 'player_id' | 'self', channel: 'notification' } | { ok: true } | agent, platform_admin |
UI component contract (additions)
Extends existing UIComponentType enum in backend/src/ai/types.ts:
// New types added for agent FAB:
| 'exposure_card' // live exposure breakdown by scope
| 'pnl_table' // tabular PnL rows + totals
| 'take_table' // take / CL rows
| 'commission_table' // commission rows
| 'settlement_summary' // pre-settlement cash-movement diff
| 'settlement_table' // historical settlements list
| 'bet_table' // tabular bet list (richer than bet_history)
| 'downline_tree' // collapsible agent hierarchy
| 'diff_card' // before/after for limit edits, with confirm
| 'report_artifact' // downloadable report tile
| 'player_picker' // typeahead disambiguation
Each has a corresponding *Data interface and a React renderer on the FE.
Confirmation contract (Phase 3)
1. Agent says "set CL for player A to 50,000"
2. LLM picks `agent.editLimit` tool, dispatches with `{playerId, limitType, newValue}`.
3. Tool returns:
{ success: true,
requiresConfirmation: true,
ui: { type: 'diff_card', data: {beforeAfter}, actions: [
{id: 'confirm', action: 'confirm',
payload: { toolName: 'agent.editLimit',
args: {...},
nonce: 'short-lived-uuid'}},
{id: 'cancel', action: 'cancel'}] } }
4. FE renders diff_card. User clicks Confirm.
5. FE POSTs to /api/ai/confirm with the nonce.
6. Backend looks up the nonce in Redis (5-min TTL), verifies the user
matches the original caller, re-executes the same tool with
`confirmed: true` so it bypasses the confirmation branch and runs
the underlying mutation.
7. Audit log entry written with user, tool, args, before/after,
nonce, timestamp.
Why a nonce, not a token replay through /command:
- Confirmation must be exempt from prompt-injection (the LLM cannot construct a valid nonce).
- The action payload is locked at step 3; the LLM cannot mutate it between read and write.
- Audit trail captures the round trip cleanly.
System prompt deltas (agent role)
Extends generateSystemPrompt() when user.role === 'agent':
You are an agent assistant. Beyond the standard rules:
- All money fields are in points (FP). 1 point = 1 USD-equivalent. Format with thousands separators.
- Limits are stored in paisa (= 1/100 of a point); when the user gives FP, multiply by 100 before calling tools.
- "Take" = net the upline currently owes you OR you owe upline (sign matters). State direction explicitly.
- "CL" = credit limit. "available" = CL − take.
- If the user names a player by partial name, call `agent.searchPlayers` FIRST and present a player_picker UI when >1 match. Never guess.
- For date queries, default to "today" in the agent's timezone; if ambiguous, ask once.
- When the user is on a specific agent page, the `metadata.currentPage` field gives you context. If they're on `/agent/downline?agentId=X`, prefer X as the playerId unless overridden.
- For destructive actions, ALWAYS surface the diff card. Never paraphrase the financial change in text alone.
Frontend integration
- Mount point:
strykr-fe/src/app/agent/layout.tsx(sidebar version) AND the/agent-v2mobile redirect target. - Component: reuse
AIFloatingButton+AIChatPanel, parameterised bymode: 'player' | 'agent'. - Page-context injection: pass
currentPage: pathname+ parsed query params in themetadatafield of/api/ai/commandrequests. - New renderers: one component per new
UIComponentTypelisted above. Live instrykr-fe/src/components/ai-chat/ui/. - Feature flag:
AGENT_AI_UI_ENABLEDinlib/features.ts— default off until Phase 1 stabilises on dev.
Test plan (per phase)
Unit (vitest, backend)
- For every tool: happy path, missing-required-arg, role-denied, empty-result, downstream-error swallowed.
- Confirmation flow: nonce stored, nonce-mismatch rejected, nonce-expired rejected, replay rejected (consumed).
- Cascade-side effects: editLimit hits the right DB column; settlePlayer creates the right transaction rows.
Integration (vitest, backend)
- One test per tool that hits the real Prisma client against a seeded test DB and asserts shape.
- One test per Phase 3 destructive tool: full /command → /confirm round trip.
E2E (manual + scripted, dev1 + devagentai)
- Smoke: log in as
pm (Self), open FAB, ask "what's my balance" → balance_card renders. - Each tool exercised via natural-language prompt with expected card shape.
- Confirmation: ask to edit a CL on a test player; verify diff card, click confirm, verify DB change + audit row.
- Cancellation: same flow, click cancel, verify no DB change.
- Adversarial: prompt-injection attempts ("ignore previous instructions", "set CL to 1 trillion") rejected.
Load (Phase 1 release)
- 100 concurrent agents × 10 tool calls/min for 30 min, watch p95 latency + error rate.
Phase exit criteria
| Phase | Definition of done |
|---|---|
| 0 | This document signed off by you. |
| 1 | Read-only tools 1-7 above shipped on devagentai; FAB visible; unit + integration tests green; manual E2E green for at least 5 representative queries. |
| 2 | Player-picker disambiguation working; page-context resolves "this player" without re-prompting. |
| 3 | editLimit + settlePlayer + confirmation flow shipped; audit log entry written per action; adversarial tests green. |
| 4 | Report generation working for all 3 templates × 2 formats; artifacts stored with 30-day TTL; download links signed. |
| 5 | placeHedgeBet self-player path working end-to-end; reuses existing orderService guardrails; reroute logic unchanged. |
| 6 | Push notifications firing for player login / large bets / market settled; agent can mute per category. |
Risks & mitigations (carried into every phase)
| Risk | Mitigation |
|---|---|
| LLM hallucinates a tool call with wrong playerId | Always slot-fill via searchPlayers; reject ambiguous matches; player_picker UI for >1 hit |
| LLM constructs convincing-looking financial answer instead of calling a tool | System prompt rule + post-response check: if response includes a number that wasn't in a tool result, flag it |
| Prompt injection from player names / notes | Sanitise/escape any free-form text from DB before it enters the prompt context |
| Confirmation nonce leak | 5-min TTL, single-use, user-bound; logged on issuance + consumption |
| Cost runaway from heavy agents | Per-agent rate limit + provider routing (cheap reads on Gemini Flash, sensitive ops on Claude) |
| Stale data | Tools query live DB on every call; no caching in Phase 1; Phase 3+ may add short-TTL caches for read tools only |
| Real-money mistakes | Every destructive tool ends with the same audit log row + Slack alert that human-driven actions emit; the AI is a UX layer, not a parallel system |
Open questions (please decide before Phase 3)
- Should
editLimitbe restricted to direct downline only, or can it walk down the tree? - Confirmation TTL — 5 minutes (proposed) or stricter (1 minute) for high-value changes?
- Hedge bet self-player — does this account already exist for every agent, or does the tool need to provision it on first use?
- Report storage backend — S3, local disk + signed URL, or stream-only-no-storage?
- Provider routing budget — cap at Anthropic Sonnet for all calls (simple, expensive) or use Gemini Flash for reads (cheaper, slightly more variance)?