Skip to main content

AI Agent FAB — Phase 0 Spec (Design Contract)

Scope contract for the agent-panel AI assistant. Sign-off required before any tool is shipped beyond Phase 1 scaffolding. Each later phase ships independently behind a feature flag.

Goals

  • Natural-language interface for everything an agent does in the panel today.
  • Conversational query, structured-card rendering, role-gated tool dispatch.
  • Destructive actions go through a separate confirmation step; the LLM never moves money on first call.
  • Page-aware: the assistant knows which agent page is open and pre-fills context.

Non-goals (this iteration)

  • No new sport / fixture data — reuse existing match tools.
  • No write paths outside the listed action tools.
  • No prompt that exposes internal IDs or schema to the agent.

Tool catalog

Each tool: input schema (JSON-Schema fragment), output schema (ToolResult.data shape), allowedRoles, isDestructive, sample request/response.

Read-only tools (Phase 1)

ToolInputsOutputRoles
agent.searchPlayers{ query: string, limit?: number }{ players: PlayerRef[] }agent, platform_admin
agent.getDownline{ depth?: number }{ tree: DownlineNode[] }agent, platform_admin
agent.getTake{ playerId?: string }{ rows: TakeRow[], totals: { take, cl, available } }agent, platform_admin
agent.getCreditLimit{ playerId?: string }{ rows: CreditRow[] }agent, platform_admin
agent.getLiveExposure{ fixtureId?: string, marketId?: string }{ scopes: ExposureScope[], total }agent, platform_admin
agent.getPnL{ from?: ISO, to?: ISO, playerId?: string, fixtureId?: string, marketId?: string }{ rows: PnLRow[], total }agent, platform_admin
agent.getCommission{ from?: ISO, to?: ISO, playerId?: string }{ rows: CommissionRow[], total }agent, platform_admin
agent.getSettlementHistory{ counterpartyId?: string, from?: ISO, to?: ISO, limit?: number }{ rows: SettlementRow[] }agent, platform_admin
agent.getBetHistory{ playerId?: string, fixtureId?: string, status?: string[], from?: ISO, to?: ISO, limit?: number }{ bets: BetRow[], summary }agent, platform_admin
agent.getUpline{}{ cl, owed, takeNet }agent
agent.getTopMarkets{ period: 'today'|'week'|'month', metric: 'pnl'|'volume'|'exposure', limit?: number }{ rows }agent, platform_admin

Destructive tools (Phase 3)

ToolInputsConfirmation cardRoles
agent.editLimit{ playerId, limitType: 'creditLimit'|'perClickWin'|'aggregateDailyWin'|'minStake', newValue }diff_card showing before/afteragent (only direct downline)
agent.settlePlayer{ playerId, periodId?: string, mode: 'auto'|'amount', amount? }settlement_summaryagent, platform_admin
agent.placeHedgeBet{ fixtureId, marketId, outcomeId, side, stake }existing bet_slip UIagent, platform_admin

Reports (Phase 4)

ToolInputsOutputRoles
agent.generateReport{ template: 'per_player_pnl'|'bets_by_match'|'settlement_log', format: 'xlsx'|'pdf', filters }{ artifactId, downloadUrl, expiresAt }agent, platform_admin
agent.shareReport{ artifactId, recipient: 'player_id' | 'self', channel: 'notification' }{ ok: true }agent, platform_admin

UI component contract (additions)

Extends existing UIComponentType enum in backend/src/ai/types.ts:

// New types added for agent FAB:
| 'exposure_card' // live exposure breakdown by scope
| 'pnl_table' // tabular PnL rows + totals
| 'take_table' // take / CL rows
| 'commission_table' // commission rows
| 'settlement_summary' // pre-settlement cash-movement diff
| 'settlement_table' // historical settlements list
| 'bet_table' // tabular bet list (richer than bet_history)
| 'downline_tree' // collapsible agent hierarchy
| 'diff_card' // before/after for limit edits, with confirm
| 'report_artifact' // downloadable report tile
| 'player_picker' // typeahead disambiguation

Each has a corresponding *Data interface and a React renderer on the FE.


Confirmation contract (Phase 3)

1.  Agent says "set CL for player A to 50,000"
2. LLM picks `agent.editLimit` tool, dispatches with `{playerId, limitType, newValue}`.
3. Tool returns:
{ success: true,
requiresConfirmation: true,
ui: { type: 'diff_card', data: {beforeAfter}, actions: [
{id: 'confirm', action: 'confirm',
payload: { toolName: 'agent.editLimit',
args: {...},
nonce: 'short-lived-uuid'}},
{id: 'cancel', action: 'cancel'}] } }
4. FE renders diff_card. User clicks Confirm.
5. FE POSTs to /api/ai/confirm with the nonce.
6. Backend looks up the nonce in Redis (5-min TTL), verifies the user
matches the original caller, re-executes the same tool with
`confirmed: true` so it bypasses the confirmation branch and runs
the underlying mutation.
7. Audit log entry written with user, tool, args, before/after,
nonce, timestamp.

Why a nonce, not a token replay through /command:

  • Confirmation must be exempt from prompt-injection (the LLM cannot construct a valid nonce).
  • The action payload is locked at step 3; the LLM cannot mutate it between read and write.
  • Audit trail captures the round trip cleanly.

System prompt deltas (agent role)

Extends generateSystemPrompt() when user.role === 'agent':

You are an agent assistant. Beyond the standard rules:
- All money fields are in points (FP). 1 point = 1 USD-equivalent. Format with thousands separators.
- Limits are stored in paisa (= 1/100 of a point); when the user gives FP, multiply by 100 before calling tools.
- "Take" = net the upline currently owes you OR you owe upline (sign matters). State direction explicitly.
- "CL" = credit limit. "available" = CL − take.
- If the user names a player by partial name, call `agent.searchPlayers` FIRST and present a player_picker UI when >1 match. Never guess.
- For date queries, default to "today" in the agent's timezone; if ambiguous, ask once.
- When the user is on a specific agent page, the `metadata.currentPage` field gives you context. If they're on `/agent/downline?agentId=X`, prefer X as the playerId unless overridden.
- For destructive actions, ALWAYS surface the diff card. Never paraphrase the financial change in text alone.

Frontend integration

  • Mount point: strykr-fe/src/app/agent/layout.tsx (sidebar version) AND the /agent-v2 mobile redirect target.
  • Component: reuse AIFloatingButton + AIChatPanel, parameterised by mode: 'player' | 'agent'.
  • Page-context injection: pass currentPage: pathname + parsed query params in the metadata field of /api/ai/command requests.
  • New renderers: one component per new UIComponentType listed above. Live in strykr-fe/src/components/ai-chat/ui/.
  • Feature flag: AGENT_AI_UI_ENABLED in lib/features.ts — default off until Phase 1 stabilises on dev.

Test plan (per phase)

Unit (vitest, backend)

  • For every tool: happy path, missing-required-arg, role-denied, empty-result, downstream-error swallowed.
  • Confirmation flow: nonce stored, nonce-mismatch rejected, nonce-expired rejected, replay rejected (consumed).
  • Cascade-side effects: editLimit hits the right DB column; settlePlayer creates the right transaction rows.

Integration (vitest, backend)

  • One test per tool that hits the real Prisma client against a seeded test DB and asserts shape.
  • One test per Phase 3 destructive tool: full /command → /confirm round trip.

E2E (manual + scripted, dev1 + devagentai)

  • Smoke: log in as pm (Self), open FAB, ask "what's my balance" → balance_card renders.
  • Each tool exercised via natural-language prompt with expected card shape.
  • Confirmation: ask to edit a CL on a test player; verify diff card, click confirm, verify DB change + audit row.
  • Cancellation: same flow, click cancel, verify no DB change.
  • Adversarial: prompt-injection attempts ("ignore previous instructions", "set CL to 1 trillion") rejected.

Load (Phase 1 release)

  • 100 concurrent agents × 10 tool calls/min for 30 min, watch p95 latency + error rate.

Phase exit criteria

PhaseDefinition of done
0This document signed off by you.
1Read-only tools 1-7 above shipped on devagentai; FAB visible; unit + integration tests green; manual E2E green for at least 5 representative queries.
2Player-picker disambiguation working; page-context resolves "this player" without re-prompting.
3editLimit + settlePlayer + confirmation flow shipped; audit log entry written per action; adversarial tests green.
4Report generation working for all 3 templates × 2 formats; artifacts stored with 30-day TTL; download links signed.
5placeHedgeBet self-player path working end-to-end; reuses existing orderService guardrails; reroute logic unchanged.
6Push notifications firing for player login / large bets / market settled; agent can mute per category.

Risks & mitigations (carried into every phase)

RiskMitigation
LLM hallucinates a tool call with wrong playerIdAlways slot-fill via searchPlayers; reject ambiguous matches; player_picker UI for >1 hit
LLM constructs convincing-looking financial answer instead of calling a toolSystem prompt rule + post-response check: if response includes a number that wasn't in a tool result, flag it
Prompt injection from player names / notesSanitise/escape any free-form text from DB before it enters the prompt context
Confirmation nonce leak5-min TTL, single-use, user-bound; logged on issuance + consumption
Cost runaway from heavy agentsPer-agent rate limit + provider routing (cheap reads on Gemini Flash, sensitive ops on Claude)
Stale dataTools query live DB on every call; no caching in Phase 1; Phase 3+ may add short-TTL caches for read tools only
Real-money mistakesEvery destructive tool ends with the same audit log row + Slack alert that human-driven actions emit; the AI is a UX layer, not a parallel system

Open questions (please decide before Phase 3)

  1. Should editLimit be restricted to direct downline only, or can it walk down the tree?
  2. Confirmation TTL — 5 minutes (proposed) or stricter (1 minute) for high-value changes?
  3. Hedge bet self-player — does this account already exist for every agent, or does the tool need to provision it on first use?
  4. Report storage backend — S3, local disk + signed URL, or stream-only-no-storage?
  5. Provider routing budget — cap at Anthropic Sonnet for all calls (simple, expensive) or use Gemini Flash for reads (cheaper, slightly more variance)?