AI Agent FAB — Phase 0 Spec (Design Contract)

Scope contract for the agent-panel AI assistant. Sign-off required before any tool is shipped beyond Phase 1 scaffolding. Each later phase ships independently behind a feature flag.

Goals

Natural-language interface for everything an agent does in the panel today.
Conversational query, structured-card rendering, role-gated tool dispatch.
Destructive actions go through a separate confirmation step; the LLM never moves money on first call.
Page-aware: the assistant knows which agent page is open and pre-fills context.

Non-goals (this iteration)

No new sport / fixture data — reuse existing match tools.
No write paths outside the listed action tools.
No prompt that exposes internal IDs or schema to the agent.

Tool catalog

Each tool: input schema (JSON-Schema fragment), output schema (ToolResult.data shape), allowedRoles, isDestructive, sample request/response.

Read-only tools (Phase 1)

Tool	Inputs	Output	Roles
`agent.searchPlayers`	`{ query: string, limit?: number }`	`{ players: PlayerRef[] }`	agent, platform_admin
`agent.getDownline`	`{ depth?: number }`	`{ tree: DownlineNode[] }`	agent, platform_admin
`agent.getTake`	`{ playerId?: string }`	`{ rows: TakeRow[], totals: { take, cl, available } }`	agent, platform_admin
`agent.getCreditLimit`	`{ playerId?: string }`	`{ rows: CreditRow[] }`	agent, platform_admin
`agent.getLiveExposure`	`{ fixtureId?: string, marketId?: string }`	`{ scopes: ExposureScope[], total }`	agent, platform_admin
`agent.getPnL`	`{ from?: ISO, to?: ISO, playerId?: string, fixtureId?: string, marketId?: string }`	`{ rows: PnLRow[], total }`	agent, platform_admin
`agent.getCommission`	`{ from?: ISO, to?: ISO, playerId?: string }`	`{ rows: CommissionRow[], total }`	agent, platform_admin
`agent.getSettlementHistory`	`{ counterpartyId?: string, from?: ISO, to?: ISO, limit?: number }`	`{ rows: SettlementRow[] }`	agent, platform_admin
`agent.getBetHistory`	`{ playerId?: string, fixtureId?: string, status?: string[], from?: ISO, to?: ISO, limit?: number }`	`{ bets: BetRow[], summary }`	agent, platform_admin
`agent.getUpline`	`{}`	`{ cl, owed, takeNet }`	agent
`agent.getTopMarkets`	`{ period: 'today'\|'week'\|'month', metric: 'pnl'\|'volume'\|'exposure', limit?: number }`	`{ rows }`	agent, platform_admin

Destructive tools (Phase 3)

Tool	Inputs	Confirmation card	Roles
`agent.editLimit`	`{ playerId, limitType: 'creditLimit'\|'perClickWin'\|'aggregateDailyWin'\|'minStake', newValue }`	`diff_card` showing before/after	agent (only direct downline)
`agent.settlePlayer`	`{ playerId, periodId?: string, mode: 'auto'\|'amount', amount? }`	`settlement_summary`	agent, platform_admin
`agent.placeHedgeBet`	`{ fixtureId, marketId, outcomeId, side, stake }`	existing `bet_slip` UI	agent, platform_admin

Reports (Phase 4)

Tool	Inputs	Output	Roles
`agent.generateReport`	`{ template: 'per_player_pnl'\|'bets_by_match'\|'settlement_log', format: 'xlsx'\|'pdf', filters }`	`{ artifactId, downloadUrl, expiresAt }`	agent, platform_admin
`agent.shareReport`	`{ artifactId, recipient: 'player_id' \| 'self', channel: 'notification' }`	`{ ok: true }`	agent, platform_admin

UI component contract (additions)

Extends existing UIComponentType enum in backend/src/ai/types.ts:

// New types added for agent FAB:
| 'exposure_card'        // live exposure breakdown by scope
| 'pnl_table'            // tabular PnL rows + totals
| 'take_table'           // take / CL rows
| 'commission_table'     // commission rows
| 'settlement_summary'   // pre-settlement cash-movement diff
| 'settlement_table'     // historical settlements list
| 'bet_table'            // tabular bet list (richer than bet_history)
| 'downline_tree'        // collapsible agent hierarchy
| 'diff_card'            // before/after for limit edits, with confirm
| 'report_artifact'      // downloadable report tile
| 'player_picker'        // typeahead disambiguation

Each has a corresponding *Data interface and a React renderer on the FE.

Confirmation contract (Phase 3)

1.  Agent says "set CL for player A to 50,000"
2.  LLM picks `agent.editLimit` tool, dispatches with `{playerId, limitType, newValue}`.
3.  Tool returns:
      { success: true,
        requiresConfirmation: true,
        ui: { type: 'diff_card', data: {beforeAfter}, actions: [
                {id: 'confirm', action: 'confirm',
                 payload: { toolName: 'agent.editLimit',
                            args: {...},
                            nonce: 'short-lived-uuid'}},
                {id: 'cancel', action: 'cancel'}] } }
4.  FE renders diff_card. User clicks Confirm.
5.  FE POSTs to /api/ai/confirm with the nonce.
6.  Backend looks up the nonce in Redis (5-min TTL), verifies the user
    matches the original caller, re-executes the same tool with
    `confirmed: true` so it bypasses the confirmation branch and runs
    the underlying mutation.
7.  Audit log entry written with user, tool, args, before/after,
    nonce, timestamp.

Why a nonce, not a token replay through /command:

Confirmation must be exempt from prompt-injection (the LLM cannot construct a valid nonce).
The action payload is locked at step 3; the LLM cannot mutate it between read and write.
Audit trail captures the round trip cleanly.

System prompt deltas (agent role)

Extends generateSystemPrompt() when user.role === 'agent':

You are an agent assistant. Beyond the standard rules:
- All money fields are in points (FP). 1 point = 1 USD-equivalent. Format with thousands separators.
- Limits are stored in paisa (= 1/100 of a point); when the user gives FP, multiply by 100 before calling tools.
- "Take" = net the upline currently owes you OR you owe upline (sign matters). State direction explicitly.
- "CL" = credit limit. "available" = CL − take.
- If the user names a player by partial name, call `agent.searchPlayers` FIRST and present a player_picker UI when >1 match. Never guess.
- For date queries, default to "today" in the agent's timezone; if ambiguous, ask once.
- When the user is on a specific agent page, the `metadata.currentPage` field gives you context. If they're on `/agent/downline?agentId=X`, prefer X as the playerId unless overridden.
- For destructive actions, ALWAYS surface the diff card. Never paraphrase the financial change in text alone.

Frontend integration

Mount point: strykr-fe/src/app/agent/layout.tsx (sidebar version) AND the /agent-v2 mobile redirect target.
Component: reuse AIFloatingButton + AIChatPanel, parameterised by mode: 'player' | 'agent'.
Page-context injection: pass currentPage: pathname + parsed query params in the metadata field of /api/ai/command requests.
New renderers: one component per new UIComponentType listed above. Live in strykr-fe/src/components/ai-chat/ui/.
Feature flag: AGENT_AI_UI_ENABLED in lib/features.ts — default off until Phase 1 stabilises on dev.

Test plan (per phase)

Unit (vitest, backend)

For every tool: happy path, missing-required-arg, role-denied, empty-result, downstream-error swallowed.
Confirmation flow: nonce stored, nonce-mismatch rejected, nonce-expired rejected, replay rejected (consumed).
Cascade-side effects: editLimit hits the right DB column; settlePlayer creates the right transaction rows.

Integration (vitest, backend)

One test per tool that hits the real Prisma client against a seeded test DB and asserts shape.
One test per Phase 3 destructive tool: full /command → /confirm round trip.

E2E (manual + scripted, dev1 + devagentai)

Smoke: log in as pm (Self), open FAB, ask "what's my balance" → balance_card renders.
Each tool exercised via natural-language prompt with expected card shape.
Confirmation: ask to edit a CL on a test player; verify diff card, click confirm, verify DB change + audit row.
Cancellation: same flow, click cancel, verify no DB change.
Adversarial: prompt-injection attempts ("ignore previous instructions", "set CL to 1 trillion") rejected.

Load (Phase 1 release)

100 concurrent agents × 10 tool calls/min for 30 min, watch p95 latency + error rate.

Phase exit criteria

Phase	Definition of done
0	This document signed off by you.
1	Read-only tools 1-7 above shipped on devagentai; FAB visible; unit + integration tests green; manual E2E green for at least 5 representative queries.
2	Player-picker disambiguation working; page-context resolves "this player" without re-prompting.
3	editLimit + settlePlayer + confirmation flow shipped; audit log entry written per action; adversarial tests green.
4	Report generation working for all 3 templates × 2 formats; artifacts stored with 30-day TTL; download links signed.
5	placeHedgeBet self-player path working end-to-end; reuses existing orderService guardrails; reroute logic unchanged.
6	Push notifications firing for player login / large bets / market settled; agent can mute per category.

Risks & mitigations (carried into every phase)

Risk	Mitigation
LLM hallucinates a tool call with wrong playerId	Always slot-fill via `searchPlayers`; reject ambiguous matches; player_picker UI for >1 hit
LLM constructs convincing-looking financial answer instead of calling a tool	System prompt rule + post-response check: if response includes a number that wasn't in a tool result, flag it
Prompt injection from player names / notes	Sanitise/escape any free-form text from DB before it enters the prompt context
Confirmation nonce leak	5-min TTL, single-use, user-bound; logged on issuance + consumption
Cost runaway from heavy agents	Per-agent rate limit + provider routing (cheap reads on Gemini Flash, sensitive ops on Claude)
Stale data	Tools query live DB on every call; no caching in Phase 1; Phase 3+ may add short-TTL caches for read tools only
Real-money mistakes	Every destructive tool ends with the same audit log row + Slack alert that human-driven actions emit; the AI is a UX layer, not a parallel system

Open questions (please decide before Phase 3)

Should editLimit be restricted to direct downline only, or can it walk down the tree?
Confirmation TTL — 5 minutes (proposed) or stricter (1 minute) for high-value changes?
Hedge bet self-player — does this account already exist for every agent, or does the tool need to provision it on first use?
Report storage backend — S3, local disk + signed URL, or stream-only-no-storage?
Provider routing budget — cap at Anthropic Sonnet for all calls (simple, expensive) or use Gemini Flash for reads (cheaper, slightly more variance)?

Goals​

Non-goals (this iteration)​

Tool catalog​

Read-only tools (Phase 1)​

Destructive tools (Phase 3)​

Reports (Phase 4)​

UI component contract (additions)​

Confirmation contract (Phase 3)​

System prompt deltas (agent role)​

Frontend integration​

Test plan (per phase)​

Unit (vitest, backend)​

Integration (vitest, backend)​

E2E (manual + scripted, dev1 + devagentai)​

Load (Phase 1 release)​

Phase exit criteria​

Risks & mitigations (carried into every phase)​

Open questions (please decide before Phase 3)​