B-Book Architecture & Design Document v3.0 (Simplified)

System: Hannibal Date: February 2026 Status: Living Document Audience: Product, Engineering, Operations, Stakeholders Predecessor: v2.0 (preserved as scale-up reference)

Executive Summary
The Core Problem -- In Plain English
Simplification Philosophy
The Forwarding Matrix -- The Brain of the System
The Bet Flow -- Step by Step
Cascading Upline Routing
Agent Liability Limits & NO_NEW_RISK
User Win Limits & Stake Reduction
Period Definitions -- Night & Weekly
Exposure Accounting (2-Tier: Redis + PostgreSQL)
Audit Trail & Determinism
Performance Architecture
The Agent Experience
Nightmare Scenarios & How We Handle Them
Bet Cancellation / Void / Partial Settlement State Machine
Matrix Version Control
Dead Letter Queue and Poison Bet Handling
Settlement Cascade & Failure Isolation
Cash-Out / Early Settlement
Lay Bet Support
Collusion Detection
Agent Hierarchy Migration
Minimum Forwarding / Skin-in-the-Game Requirements
Panic Button & Abuse Prevention
Timestamp Security & Period Boundaries
Sharp Detection via Multiple Accounts
Rate Limiting on Configuration Changes
Hedge Execution
Reconciliation
Monitoring & Alerting

1. Executive Summary

What is the B-Book?

The B-Book is Hannibal's hierarchical, deterministic risk-management and routing engine. It sits between a punter placing a bet and the final destination of that bet's risk.

In traditional bookmaking, a "B-Book" means the bookie keeps the bet on their own books. An "A-Book" means the bookie hedges on an exchange like Betfair. Hannibal's B-Book is a routing engine that decides, for every bet, what percentage stays at each level of an agent hierarchy and what percentage gets forwarded up the chain. It enforces limits automatically, cascades overflow intelligently, and maintains a complete audit trail.

What problem does it solve?

In sports betting agent networks across India, agents operate at different levels of a hierarchy. Risk allocation today happens through manual spreadsheets, WhatsApp groups, and phone calls. There is no enforcement, no automatic cap management, and no audit trail. Disputes have no source of truth.

The B-Book automates all of this with a configurable, enforceable, auditable system.

v3 Simplification

This document is a deliberate simplification of the v2.0 architecture for Phase 1 launch. We removed premature optimizations, eliminated unnecessary infrastructure layers, and focused on correctness over throughput. The v2.0 document is preserved as a scale-up reference when we outgrow these simpler designs.

What changed: 2-tier caching instead of 3. Simple atomic counters instead of sharded exposure. A version integer instead of full MVCC. Sequential writes instead of distributed locking. Market orders for hedging instead of limit orders. No multi-currency. No responsible gambling module. Nightly reconciliation instead of every 15 minutes. Structured logging instead of Prometheus/Grafana.

What did NOT change: The forwarding matrix, cascading routing, exposure ledger design, NO_NEW_RISK, void/cancellation state machine, dead letter queue, settlement idempotency, cash-out, lay bets, collusion detection design, panic button, agent migration, minimum forwarding requirements, timestamp security, sharp detection design, and rate limiting.

2. The Core Problem -- In Plain English

The Agent Hierarchy: A Real Example

                    +-----------------------+
                    |   Betfair Exchange    |
                    |  (External Hedge)     |
                    +-----------+-----------+
                                |
                    +-----------+-----------+
                    |     HANNIBAL          |
                    |  (The Platform)       |
                    +-----------+-----------+
                                |
                    +-----------+-----------+
                    |      VIKRAM           |
                    |  Master Agent, Delhi  |
                    |  Manages 12 sub-agents|
                    +-----------+-----------+
                                |
              +-----------------+-----------------+
              |                                   |
  +-----------+-----------+           +-----------+-----------+
  |       RAJESH          |           |       PRIYA           |
  |  Sub-Agent, Mumbai    |           |  Sub-Agent, Bangalore |
  |  200 cricket punters  |           |  150 football punters |
  +-----------+-----------+           +-----------------------+
              |
    +---------+---------+
    |  AMIT   |  SONIA  |  ... 198 more punters
    | Punter  | Punter  |
    +---------+---------+

Vikram is a master agent in Delhi with 15 years of experience and a strong bankroll. Rajesh is one of his sub-agents in Mumbai, managing 200 cricket punters with a moderate bankroll. Amit is one of Rajesh's punters placing bets of 500 to 50,000 on IPL matches.

When Amit Places a Bet

Amit places 10,000 on Mumbai Indians to win at odds 1.85. Here is what must happen in 90 milliseconds:

Can Amit place this bet? Check his per-click win limit.
How much does Rajesh keep? The forwarding matrix says 60%. Rajesh keeps 6,000.
Can Rajesh afford that? Check his cricket limits, per-match limits, night limit.
The other 4,000 goes to Vikram. Vikram's matrix keeps 60%, forwards 40%.
The remaining 1,600 reaches the platform. Hannibal retains some, hedges the rest on Betfair.
Record everything.

The Fundamental Questions

Question	Who Answers It
How much risk does Rajesh keep?	Rajesh's forwarding matrix + his limits
How much risk does Vikram keep?	Vikram's forwarding matrix + his limits
How much risk does the platform keep?	Platform's risk configuration
How much gets hedged on Betfair?	Whatever remains after all agents have taken their share
What if someone's limits are breached?	Overflow cascades up to the next level
What if Betfair is unavailable?	Platform absorbs as retained risk, retries asynchronously

3. Simplification Philosophy

The Guiding Principle

Naive, then optimize. Build the simplest correct thing. Measure. Optimize only what measurements prove is too slow. The v2.0 document was designed by experienced architects who anticipated scale problems. Those problems are real but they are not Phase 1 problems. Solving them prematurely introduces complexity that slows development, increases bugs, and makes the system harder to reason about.

What Was Simplified and Why

Simplification	v2.0 Design	v3.0 Design	Why
Caching	3-tier: Application LRU (5s TTL) -> Redis -> PostgreSQL	2-tier: Redis -> PostgreSQL	LRU adds cache coherency problems across instances. Redis is sub-millisecond and shared. Removing LRU eliminates pub/sub invalidation, multi-instance divergence, and an entire class of bugs. Adds ~1-2ms latency, still well within 90ms budget.
Exposure Counters	Sharded counters (8-16 shards per agent per scope) with random shard selection and summation	One Redis key per agent per scope. `INCRBY` for atomic updates. Write-behind to PostgreSQL.	Redis handles 100K+ ops/sec on a single key. We expect ~60 bets/sec for the busiest agent. Sharding is unnecessary until we 10x our traffic. No shard summation, no random selection, no complexity.
Matrix Versioning	Full MVCC with version history, garbage collection, version-aware caching, compressed retention tiers	Simple integer version on the matrix. Bet captures version at start. If version changed by position write time, re-process with new matrix.	Matrix changes are rare (5-10 per day per agent). The probability of a collision between a matrix change and a bet in flight is astronomically low. When it happens, re-processing one bet is trivial. No version storage, no GC, no version-aware cache logic.
Write Atomicity	Cross-agent distributed locking OR per-level atomicity with partial routing states	Sequential writes: Level 1, then Level 2, then Platform. Each level is its own transaction. Mid-cascade failure goes to DLQ.	The cascade is 2-3 levels deep and each write takes 5-15ms. Sequential execution adds 10-30ms total but eliminates deadlocks, cross-agent locking, and partial routing states. If Level 2 fails after Level 1 succeeds, the bet enters the DLQ for manual resolution. This is acceptable because it is extremely rare.
Hedge Execution	Limit orders, configurable max slippage, partial fill management, re-pricing (3 attempts), execution quality analytics	Market orders on Betfair. Accept the fill price. Retry 3 times on failure, then mark as unhedged and alert. Log theoretical vs actual spread for future optimization.	For Phase 1, we are hedging small amounts. The slippage difference between a market order and an optimized limit order strategy on these amounts is negligible. Building a sophisticated execution engine before we have meaningful hedge volume is premature.
Multi-Currency	Full FX conversion at currency zone boundaries, FX rate capture, FX risk management, FX reserve buffers	Everything in INR. No FX.	India-only launch. When we expand internationally, we add currency support then.
Responsible Gambling	Self-exclusion, deposit limits, session time limits, reality check notifications	Not included in v3.	B2B agent model, not B2C. Add when entering regulated B2C markets.
Reconciliation	Every 15 minutes (scheduled), post-settlement, full, targeted. Auto-correction for minor drift. Drift tracking over time.	One nightly job: compare ledger values to SUM of open positions. If mismatch, alert ops team. Manual fix.	Our bet volume in Phase 1 does not warrant 15-minute checks. Nightly is sufficient to catch any drift before it compounds. If a ledger diverges by a meaningful amount, a human should investigate the root cause rather than auto-correcting.
Monitoring	Prometheus + Grafana + AlertManager. Per-pipeline-stage histograms. Ops dashboards.	Structured logging (Pino) with key metrics. Simple health endpoint. Slack/email alerts for critical failures.	Prometheus/Grafana is infrastructure that must be deployed, configured, and maintained. For Phase 1 with 3 app instances, structured logs with grep/jq are sufficient. Add Prometheus when operational complexity warrants it.
Audit Trail	3-tier hot/warm/cold storage. Checksum chains. Separate database. Nightly tier migration.	JSONB rows in PostgreSQL, one table, partitioned by month. Delete partitions older than 6 months.	Our first-season audit volume is ~18 GB. This fits comfortably in a single PostgreSQL instance. Cold storage, compression, and migration jobs are unnecessary overhead for a year.

What This Means for the Latency Budget

Step	v2.0 Budget	v3.0 Budget	Change
Request parsing & validation	5ms	5ms	Same
Metrics computation	3ms	3ms	Same
User win cap check	5ms	5ms	Same
Stake reduction	2ms	2ms	Same
Matrix resolution (from Redis, not LRU)	10ms	12ms	+2ms (Redis vs LRU)
Agent cap evaluation (per level, sequential)	10ms x N	12ms x N	+2ms per level (Redis vs LRU)
Position creation (sequential per level)	15ms	20ms	+5ms (sequential vs parallel)
Exposure ledger update (Redis INCRBY + async PG write)	10ms	8ms	-2ms (simpler, no sharding)
Audit record creation	10ms	10ms	Same
Response	5ms	5ms	Same
Total (3-level cascade)	~85ms	~92ms	+7ms, still within 110ms headroom

The simplified architecture adds roughly 7ms to the critical path. We have 20ms of headroom before user experience degrades. This is acceptable.

4. The Forwarding Matrix -- The Brain of the System

What It Is

The forwarding matrix is a multi-dimensional lookup table that determines what percentage of each bet an agent retains versus forwards to their upline. It is the single most important configuration in the entire B-Book system.

The 5 Dimensions

Dimension	What It Means	Example Values
market_type	The type of bet	MATCH_ODDS, FANCY, BOOKMAKER, OVER_UNDER, LINE
sport_type	Which sport	CRICKET, FOOTBALL, TENNIS, KABADDI
event_phase	When in the event lifecycle	PRE_MATCH, IN_PLAY, APPROACHING_START
source_type	What kind of punter	NORMAL, SHARP, VIP, NEW_ACCOUNT
liquidity_band	Exchange liquidity available to hedge	HIGH, MEDIUM, LOW, NONE

Wildcard Matching

An agent does not need a rule for every combination. Wildcards (*) mean "match anything."

Example of Rajesh's forwarding matrix:

Rule	market_type	sport_type	event_phase	source_type	liquidity_band	Forward %
R1	FANCY	CRICKET	IN_PLAY	SHARP	*	95%
R2	FANCY	CRICKET	IN_PLAY	*	*	70%
R3	MATCH_ODDS	CRICKET	PRE_MATCH	*	HIGH	40%
R4	MATCH_ODDS	CRICKET	PRE_MATCH	*	LOW	70%
R5	MATCH_ODDS	CRICKET	IN_PLAY	*	*	60%
R6	*	CRICKET	*	SHARP	*	90%
R7	*	FOOTBALL	*	*	*	80%
R8	*	*	*	*	*	50%

R1: Sharp user, in-play cricket fancy -- forward 95%. Rajesh keeps only 5% because this is the most dangerous bet type. R3: Pre-match cricket match odds with high liquidity -- forward only 40%. Rajesh keeps 60% because these are safe, well-priced bets. R7: Any football bet -- forward 80%. Rajesh is not a football expert. R8: Catch-all. Forward 50%.

Tie-Breaking Rules (Deterministic)

When a bet matches multiple rules:

Most specific rule wins. Specificity = count of non-wildcard dimensions. R1 (4 specific dimensions) beats R2 (3 specific dimensions).
If specificity is equal, higher forward percentage wins. This is the risk-safe default: when in doubt, forward more.
If forward percentage is also equal, oldest rule wins. Deterministic ordering by creation timestamp.

Resolution Precedence Chain

The matrix is not the only routing input. There is a four-level precedence chain:

Level	What It Is	When to Use It
User Override	A specific forward % for a specific punter	"This user is a known sharp -- forward 100% of their bets"
Market Override	A specific forward % for a specific event/market	"The CSK vs MI final is too big -- forward 90% of everything"
Matrix Rule	The multi-dimensional lookup	Normal day-to-day operations
Agent Default	A single fallback percentage	"If nothing else matches, forward 50%"

Sensible Defaults by Sport

Scenario	Recommended Retention	Why
Cricket in-play fancy	10-20%	Highest variance, hardest to price
Cricket pre-match match odds	40-60%	Well-priced, ample liquidity
Cricket in-play match odds	20-40%	More volatile than pre-match
Football Premier League pre-match	50-70%	Deep liquidity, well-understood markets
Football lower leagues	20-40%	Less information, integrity risk
Tennis	10-25%	Volatile, retirement risk, low liquidity
Kabaddi	5-15%	Thin markets, poor external pricing

Common Mistakes and Prevention

Mistake	How We Prevent It
Retention too high on in-play fancies	Warn when retention exceeds recommended range; require confirmation
No catch-all rule	System requires a `* / * / * / * / *` fallback at all times
Conflicting rules they do not understand	Dashboard shows which rule matched for every bet; "test my matrix" dry-run tool

5. The Bet Flow -- Step by Step

The Complete Flow

Walking Through the Numbers

The Bet: Amit places 10,000 on MI to beat CSK at odds 1.85, pre-match IPL 2026.

Step 1: Compute Metrics

Metric	Calculation	Value
Stake	As submitted	10,000
Potential Win	10,000 x 0.85	8,500
Liability	Same as potential win for a back bet	8,500

Step 2: User Win Cap Check. Amit's per-click limit is 50,000. Potential win 8,500 is under. Aggregate daily limit is 2,00,000. Current accumulated: 45,000 + 8,500 = 53,500, still under. Pass.

Step 3: Resolve Forwarding Percentage. Bet characteristics: MATCH_ODDS, CRICKET, PRE_MATCH, NORMAL, HIGH liquidity. Matches Rule R3: forward 40%. Rajesh retains 60%.

Step 4: Agent Cap Evaluation (Rajesh). Rajesh retains 60% of 10,000 = 6,000 stake (5,100 liability). Check limits:

Cricket sport limit: 12,05,100 / 50,00,000. Pass.
MI vs CSK match limit: 1,25,100 / 5,00,000. Pass.
Night period limit: 3,05,100 / 10,00,000. Pass.

All pass. Rajesh retains full 6,000.

Step 5: Cascade to Vikram. 4,000 flows to Vikram. His matrix says retain 60% of cricket pre-match match odds. Retains 2,400, forwards 1,600.

Step 6: Platform. Receives 1,600. Retains 800, hedges 800 on Betfair.

Step 7: Execute. Positions created sequentially -- Rajesh first, then Vikram, then Platform.

Entity	Retained Stake	Retained Liability	Forwarded
Rajesh	6,000	5,100	4,000 to Vikram
Vikram	2,400	2,040	1,600 to Platform
Platform	800	680	800 to Betfair
Total	10,000

The stake always sums to the original 10,000. Risk is distributed, never created or destroyed.

Step 8: Audit. Complete audit record persisted as a JSONB row in the audit table: original bet, matrix rule matched at each level, every limit check, final routing breakdown, timestamps.

6. Cascading Upline Routing

How Bets Flow Through the Hierarchy

A bet enters at the bottom and flows upward. At each level, the agent retains what they can and forwards the rest.

User (Amit) places 10,000 bet
        |
        v
+---[RAJESH: Level 1]---+
|  Forward 40%            |
|  Retains: 6,000        |
|  Forwards: 4,000       |
+---------|---------------+
          |
          v
+---[VIKRAM: Level 2]---+
|  Forward 40%            |
|  Retains: 2,400        |
|  Forwards: 1,600       |
+---------|---------------+
          |
          v
+---[PLATFORM]----------+
|  Retains: 800          |
|  Hedges: 800           |
+---------|---------------+
          |
          v
+---[BETFAIR]------------+
|  Receives: 800          |
+-------------------------+

What Happens at Each Level

At every level the system performs:

Resolve source_type for this agent (own classification > downstream trust > default NORMAL)
Resolve forwarding percentage (precedence chain)
Calculate retained amount = incoming stake x (1 - forward %)
Check agent's limits
If limits allow: retain, forward the rest
If limits breached: retain only up to the limit, forward everything else (overflow)

Overflow Example

Amit bets 50,000 on CSK at 2.10. Potential win: 55,000.

RAJESH (L1) - Forward 40%
    Wants to retain: 30,000 (60%)
    Cricket match limit remaining: 25,000  <-- LIMIT HIT
    Actually retains: 25,000
    Forwards: 25,000 (intended 20,000 + 5,000 overflow)
    |
    v
VIKRAM (L2) - Forward 40%
    Receives: 25,000
    Retains: 15,000 (60%), all limits OK
    Forwards: 10,000
    |
    v
PLATFORM: Retains 5,000, Hedges 5,000

Vikram did not know 5,000 was overflow. He simply received 25,000 and processed normally. The cascade never drops a bet.

Suspended Agents

If Vikram is suspended, bets skip him. Rajesh's forwarded amount goes directly to the platform. Punters can still bet; risk routes around the suspended agent.

The Betfair Backstop

After all agents take their share, any remaining exposure reaches the platform, which can hedge on Betfair. If Betfair is down: the platform absorbs the risk temporarily, the bet is still accepted, and the hedge order enters a retry queue. Punter experience is never degraded by hedge-side issues.

Does Sharp Classification Travel Upline?

The information travels, but each agent decides independently.

Each forwarded bet carries metadata: originating_user_id, downstream_classification, forwarding_reason. Each upline agent resolves source_type independently:

Each agent configures trust per sub-agent:

Sub-Agent	trust_downstream_flags	Reason
Rajesh	true	Experienced, 8 years track record
Arun	false	New, only 3 months, unproven detection

7. Agent Liability Limits & NO_NEW_RISK

The Limit Structure

Limit Type	Scope	Example
Sport Limit	Total liability across all events in a sport	"50 lakh total cricket exposure"
Market Limit	Total liability on a specific event/market	"No more than 5 lakh on any single IPL match"
Night Period Limit	Total liability during the night window	"Cap my night session at 10 lakh"
Weekly Period Limit	Total liability during the weekly cycle	"Cap my weekly exposure at 1 crore"

How Limits Interact: Most Restrictive Wins

All applicable limits are checked simultaneously. The most restrictive determines retention.

Example: Thursday night during IPL. Rajesh's state:

Limit	Capacity	Used	Remaining
Cricket Sport	50,00,000	38,00,000	12,00,000
MI vs CSK Match	5,00,000	4,50,000	50,000
Night Period	10,00,000	9,20,000	80,000
Weekly Period	40,00,000	35,00,000	5,00,000

A bet wants to add 1,00,000 of retained liability. The match limit (50,000) is most restrictive. Rajesh retains only 50,000; the other 50,000 overflows to Vikram.

NO_NEW_RISK Mode

NO_NEW_RISK activates when retained liability reaches the limit for a given scope. The agent cannot take on new risk-increasing exposure, but hedge bets are still accepted.

Think of it like a credit card limit. You cannot make new purchases, but you can make payments.

Scope is granular:

Scenario	Cricket	Tennis	Football
Rajesh hits cricket limit	NO_NEW_RISK	Normal	Normal
Rajesh hits MI vs CSK match limit	NO_NEW_RISK (this match only)	Normal	Normal
Rajesh hits night period limit	NO_NEW_RISK (all sports)	NO_NEW_RISK	NO_NEW_RISK

Hedge detection rule:

If WorstCaseLiability AFTER the bet < WorstCaseLiability BEFORE the bet, it is a hedge. Allow it.

A bet on the opposite outcome of an existing position is almost always a hedge. A bet on the same outcome is never a hedge. A bet on a third outcome may or may not be depending on amounts and odds.

Exiting NO_NEW_RISK: Settlements reduce exposure, hedge bets reduce exposure, or an admin raises the limit.

8. User Win Limits & Stake Reduction

Per-Click Win Limit

Caps the maximum a punter can win on a single bet. If Amit has a per-click limit of 50,000 and bets at odds 50.00 with a 5,000 stake (potential win 2,45,000), the stake is reduced:

max_stake = 50,000 / (50.00 - 1) = 50,000 / 49 = 1,020 (rounded down)

The punter sees: "Maximum stake at these odds: 1,020." If the reduced stake falls below the minimum bet size (100), the bet is rejected: "This market is currently unavailable at these odds."

Aggregate Win Limit

Caps total cumulative potential wins over a period (daily). Prevents a punter from placing many individually-acceptable bets that collectively create enormous exposure.

Sharp Detection Signals

Signal	What It Means
Closing Line Value (CLV)	Consistently bets at better prices than the closing line. Strongest predictor of long-term profitability.
Consistent staking	Same stake regardless of odds. Professionals use flat staking.
Early betting	Bets within the first hour of market opening when prices are softest.
Unpopular markets	Bets on obscure leagues with weakest pricing.
No mean reversion	Profits do not revert over time. Lucky punters revert; skilled punters do not.
Post-bet price movement	Price moves in their direction after they bet.

9. Period Definitions -- Night & Weekly

Why Bookies Use Periods

Bookies think in operational windows, not lifetime exposure. The night session (when live betting peaks) has different risk characteristics than a quiet afternoon. The weekly cycle aligns with settlement cycles.

Night Period

Configurable per agent. Examples:

Agent	Timezone	Night Period	Why
Rajesh (India, cricket)	IST	7:00 PM - 2:00 AM	IPL matches start at 7:30 PM
Priya (India, football)	IST	10:00 PM - 4:00 AM	PL matches at 12:30 AM IST

Period Rollover

When a live match spans a period boundary, open exposure carries forward. Rajesh's 8,00,000 at 1:55 AM is NOT zeroed at 2:00 AM. It carries forward as a starting balance. New bets after 2:00 AM count against day period limits.

DST

Period boundaries are defined in the agent's local time and converted to UTC fresh each day. "7 PM to 2 AM" always means that in the agent's local clock. On spring-forward day, the period is 1 hour shorter. On fall-back, the system uses the first occurrence of the repeated hour.

10. Exposure Accounting (2-Tier: Redis + PostgreSQL)

Three Ledgers Per Agent Per Scope

Ledger	What It Tracks
retained_open_liability	Worst-case payout on retained bets. Checked against limits.
forwarded_open_liability	Liability forwarded upward. Agent has no financial exposure.
open_potential_win	Total punters could win against this agent.

The 2-Tier Architecture

+------------------+     +------------------+
|  REDIS           |     |  POSTGRESQL      |
|  (Fast R/W)      |     |  (Source of Truth)|
|                  |     |                  |
|  - Sub-ms reads  |     |  - Atomic writes |
|  - INCRBY atomic |     |  - FOR UPDATE    |
|  - Single shared |     |    locking at    |
|    cache for all |     |    boundaries    |
|    instances     |     |                  |
+--------+---------+     +--------+---------+
         |                        |
         v                        v
   "Rajesh has 12L        "Rajesh has exactly
    used of 50L --          12,05,100 used --
    clearly within          UPDATE with lock"
    limit, fast pass"

How it works:

Redis (primary read/write): Every bet reads exposure from Redis first. Redis is shared across all application instances -- no cache coherency problem. Exposure counters are stored as simple keys: exposure:{agent_id}:{scope}:retained, exposure:{agent_id}:{scope}:forwarded, exposure:{agent_id}:{scope}:potential_win. Updated with atomic INCRBY on every bet.
PostgreSQL (source of truth): After the Redis update, a write-behind job flushes the current Redis value to PostgreSQL every 5 seconds. For bets where the agent is near their limit (>85% utilized), we bypass Redis and go directly to PostgreSQL with SELECT ... FOR UPDATE locking. This ensures two simultaneous bets cannot both claim the last 50,000 of capacity.

Why this works: During an IPL match, Rajesh might receive 50 bets per minute. For 45 of those, Redis immediately confirms he is within limits. Only the 5 bets near his limit need the PostgreSQL lock. Median latency stays low while correctness is guaranteed at the boundary.

When Redis is down: Fall back to PostgreSQL for all reads. Latency increases from <1ms to 5-15ms. No data loss, no incorrect decisions.

Settlement Impact

When a match settles, exposure associated with that match is removed from all agents: retained_open_liability decrements for retained positions, forwarded_open_liability decrements for forwarded positions, open_potential_win decrements for all.

11. Audit Trail & Determinism

The Audit Record

Every bet produces a structured, queryable JSONB record in PostgreSQL. The audit table is partitioned by month. Partitions older than 6 months are dropped.

An audit record contains:

Field	Example
bet_id	`bet_a1b2c3d4`
original_stake	10,000
adjusted_stake	10,000
per_click_win_cap_check	`PASS: 8,500 < 50,000`
forwarding_chain	Full routing at each level with matrix rule, limits, overflow
limit_checks	Every limit checked at every level with result
period_context	`NIGHT (19:00-02:00 IST), Week 7 of 2026`
timestamps	`matrix_resolve: 2ms, cap_check: 5ms, total: 23ms`
matrix_version	7 (integer version captured at start of processing)

Why Determinism Matters

Agents must trust the system. If Rajesh sees a routing he does not understand, he will revert to manual processes. The audit trail shows exactly why every decision was made.

Disputes are trivially resolvable: load the audit record, see the exact split and the rules that produced it.

Configuration Change Log

All configuration changes are recorded with who made the change, when, and why. You can answer: "What was Rajesh's matrix at 9:47 PM on March 15?" by querying the config changelog.

12. Performance Architecture

Latency Budget (Simplified)

Step	Budget
Request parsing & validation	5ms
Metrics computation	3ms
User win cap check (Redis)	5ms
Matrix resolution (Redis)	12ms
Agent cap evaluation (per level, Redis or PG)	12ms x N levels
Position creation (sequential, per level)	8ms x N levels
Exposure ledger update (Redis INCRBY)	3ms
Audit record (async buffer)	0ms on critical path
Response	5ms
Total (3-level cascade)	~92ms

Exposure Counter Design (Simple Redis Atomic Counters)

One Redis key per agent per scope. No sharding.

KEY: exposure:rajesh:cricket:retained
VALUE: 1205100  (in paisa)

KEY: exposure:rajesh:mi_vs_csk:retained
VALUE: 125100

KEY: exposure:rajesh:night_2026_02_11:retained
VALUE: 305100

Operations:

Read: GET exposure:rajesh:cricket:retained -- sub-millisecond
Increment: INCRBY exposure:rajesh:cricket:retained 5100 -- atomic, sub-millisecond
Decrement (settlement): DECRBY exposure:rajesh:cricket:retained 5100 -- atomic

Write-behind to PostgreSQL: a background job reads all exposure keys and writes to exposure_ledger table every 5 seconds. If Redis restarts, the recovery job rebuilds keys from PostgreSQL.

What Is Cached Where

Data	Location	Invalidation
Agent forwarding matrix	Redis hash `matrix:{agent_id}`	On matrix update, delete key
Agent limits	Redis hash `limits:{agent_id}`	On limit update, delete key
Current exposure	Redis keys `exposure:{agent_id}:{scope}:*`	Updated on every bet via INCRBY
User win cap state	Redis key `user_agg_win:{user_id}:{date}`	Updated on every bet, reset on period boundary
NO_NEW_RISK flags	Redis key `nonewrisk:{agent_id}:{scope}`	Set on limit breach, cleared on settlement or limit change

13. The Agent Experience

Three Tiers of Experience

The system runs on the same engine for everyone. What changes is how much complexity the agent sees.

Tier 1: "Set and Forget" (80% of agents). Three questions at setup: What sports? What is your nightly budget? How aggressive? The system generates the full matrix, limits, and win caps from these three answers. Traffic light dashboard. WhatsApp alerts.

Tier 2: "Dashboard Driver" (15% of agents). Real-time risk dashboard. Per-sport limits. Per-user management. Sees exposure bars, match-by-match breakdown, recent bets.

Tier 3: "Matrix Master" (5% of agents). Full 5D matrix editor. Test bet simulator. Historical P&L analysis per rule.

The "Sleep Well" Number

Every agent sees one number prominently:

YOUR MAXIMUM LOSS TONIGHT
          3,42,000
   out of your 10,00,000 night budget
   ████████░░░░░░░░░░░░  34%

This is the sum of worst-case liability across all retained open positions. It is a mathematical guarantee: the cascade routing, limits, and NO_NEW_RISK ensure this number never exceeds the budget.

The Panic Button

One button sets forwarding to 100% for all sports and markets, places hedge orders on Betfair for all current retained positions, and notifies the upline. This is the emergency exit.

14. Nightmare Scenarios & How We Handle Them

Syndicate Attack (Correlated Positions Across Agents)

The platform maintains aggregate exposure per event. Cross-agent correlation detects users who consistently bet the same outcome at the same time through different agents. Platform-level event limits cap total exposure regardless of individual agent limits.

Data Feed Failure During Live Play

If odds have not updated for more than 5 seconds during in-play, the market is automatically suspended. No new bets until the feed resumes. Price movement circuit breaker suspends on abnormal jumps.

Rogue Agent Dumping Toxic Flow

Track P&L of retained vs forwarded bets per agent. If forwarded bets consistently lose money while retained bets win, flag the agent. The platform can force minimum retention.

Double Settlement / Result Correction

Re-settlement reverses previous settlements and re-applies with corrected results. All agents in the cascade are affected. Communication chain notifies everyone with full audit trails.

System Outage During Peak

Circuit breaker pattern: when response times exceed 500ms, switch to degraded mode where all bets are forwarded 100% to Betfair. No agent retains risk during the outage. When the system recovers, normal routing resumes.

15. Bet Cancellation / Void / Partial Settlement State Machine

The Complete State Machine

State Definitions

State	Exposure Impact	Reversible?
BET_PLACED	Ledgers not yet updated	Yes (rolls back on system error)
ACTIVE	Fully reflected in all agent ledgers	No -- can only move forward
SETTLED	Exposure removed, P&L applied	Can move to RE_SETTLED
VOIDED	All exposure atomically removed as if bet never happened	No -- final
CANCELLED	All exposure removed. Functionally identical to void.	No
CASH_OUT_SETTLED	Original position closed via counter-position	No
RE_SETTLED	Previous P&L reversed, new P&L applied	Can be re-settled again

Who Can Initiate Transitions

Transition	By	Time Window
ACTIVE -> SETTLED	System (automatic on event result)	After event concludes
ACTIVE -> VOIDED	Platform admin only	Any time before settlement
BET_PLACED -> CANCELLED	Punter (own bet)	Within 3-5 seconds pre-match, 0 for in-play
BET_PLACED -> CANCELLED	Agent (their punter's bet)	Within 60 seconds
SETTLED -> RE_SETTLED	SUPER_ADMIN only	Within 72 hours

Agents cannot void bets. Only the platform can void. This prevents fraud against the upline.

How Voids Cascade

A void reads the original audit record to determine exactly what to reverse. It does not recalculate. Even if the agent changed their matrix since placement, the void reverses exactly what was originally done.

Every void has an idempotency key. Pressing "Void" twice does not double-decrement exposure. All ledger updates happen in a single PostgreSQL transaction with FOR UPDATE locks in agent_id ascending order (preventing deadlocks).

After a void, NO_NEW_RISK flags are re-evaluated: if exposure dropped below the limit, the flag is cleared.

Partial Void (Multi-Leg Bets)

When one leg of an accumulator is voided, that leg is treated as a winner at odds 1.00. Remaining legs settle normally with the voided leg's odds removed from the calculation.

16. Matrix Version Control

The Problem

Rajesh changes his matrix at 9:47 PM during MI vs CSK. Fifteen bets are in various processing stages. If half use the old matrix and half use the new one, the audit trail is inconsistent.

The Simple Solution: Version Integer

The matrix has a version integer, incremented on any change. When a bet enters processing, it captures the current version.

BET PROCESSING:
1. Bet arrives at matrix resolution step
2. Read current matrix version from Redis: version = 7
3. Evaluate rules
4. Record version 7 in the bet's audit trail
5. Continue to cap evaluation and position creation

At position write time:
6. Re-check: is matrix version still 7?
7a. If yes: write positions normally
7b. If no (version is now 8): re-resolve the matrix with the new rules, re-calculate the split, then write

Why This Is Sufficient

Matrix changes are rare: 5-10 per day per agent. The window between a bet reading the matrix version and writing positions is 20-40 milliseconds. The probability of a matrix change landing in that window is roughly:

10 changes/day = 10 changes / 86,400,000 ms = 1 change per 8,640,000 ms
40ms window: probability = 40 / 8,640,000 = 0.000005 (0.0005%)

When the astronomically unlikely collision does occur, we simply re-process the bet with the new matrix. No version history needed. No garbage collection. No version-aware caching.

What the Agent Sees

Matrix changes take effect on the next bet that starts processing after the Redis key is updated (which happens within milliseconds of the save). The dashboard shows: "Matrix updated. New bets will use the updated rules."

17. Dead Letter Queue and Poison Bet Handling

The Retry Pipeline

What Constitutes a Poison Bet

Condition	Example
Event already settled	Bet queued during outage, event settles before replay
Agent suspended mid-processing	Agent suspended while bets were in retry queue
Market no longer exists	Data feed error caused market deletion
Duplicate bet_id	Network timeout caused client to retry; first attempt succeeded
Invalid state transition	Punter cancelled during retry window

What the Punter Experiences

If the punter never saw "Bet Confirmed": They saw "Bet is being processed." If retries succeed, push notification: "Your bet has been confirmed." If retries fail, push notification: "Your bet could not be placed. No funds were deducted."

If the punter saw "Bet Confirmed" (partial cascade failure): The punter continues to see "Bet Active." The system retries in the background. If retries fail, an admin resolves via the DLQ -- typically void and refund.

Manual Resolution

The DLQ dashboard shows each entry with full context: the original bet, the failure reason, the punter experience (did they see a confirmation?), and the event status. Admin options: Void & Refund, Force Process, or Settle at Result.

18. Settlement Cascade & Failure Isolation

Per-Position Settlement State

Each position is settled independently. Position 2,847 failing does not block position 2,848. The worker records the failure and moves on.

Settlement Worker Design

Workers claim positions using SELECT ... FOR UPDATE SKIP LOCKED. Each position is processed: calculate P&L, update exposure ledger, set status to SETTLED. Failed positions remain for retry. Settlement is partitioned by agent for fault isolation.

Settlement Idempotency

Every settlement has an idempotency key: {position_id}_{hash(event_result)}. Before executing, the system checks if this key already exists. If it does, the settlement is skipped. This makes settlement safe to retry at any time.

Partial Failure Recovery

A background monitor runs every 30 seconds:

Positions in PROCESSING for >60 seconds: assume the worker crashed, reset to PENDING
Positions in FAILED: retry up to 5 times, then escalate to DLQ
Events with mixed CONFIRMED/PENDING: generate a report and alert if >15 minutes old

19. Cash-Out / Early Settlement

How Cash-Out Works

Cash-out is a counter-bet at current odds that closes the original position. The punter locks in a guaranteed profit or limits a loss.

Formula (back bet):

cash_out_return = stake * (original_odds / current_odds) * (1 - margin)
cash_out_profit = cash_out_return - stake

Example: Amit bet 10,000 on MI at 1.85. MI now at 1.20 (dominating).

cash_out_return = 10,000 * (1.85 / 1.20) * (1 - 0.05) = 14,646
Amit's profit if he cashes out: 4,646
(vs 8,500 if MI wins, vs -10,000 if MI loses)

Routing Through the Original Proportions

The cash-out counter-bet routes through the SAME proportions as the original bet, not the current matrix. If Rajesh retained 60%, he closes 60% of the position.

Partial Cash-Out

Amit can cash out any percentage. 50% cash-out closes half the position at all levels. The remaining 50% continues normally.

Interaction with NO_NEW_RISK

Cash-out creates a counter-position that reduces exposure. It is treated as a hedge and allowed even when NO_NEW_RISK is active.

20. Lay Bet Support

What Is a Lay Bet?

A lay bet is the opposite of a back bet. Sonia lays MI to win means she bets MI will NOT win. She wins if MI draws or loses.

Liability difference:

Back: punter risks stake, wins stake * (odds - 1)
Lay: punter risks stake * (odds - 1), wins the stake

Exposure Impact

A lay bet on outcome X DECREASES the agent's exposure on outcome X and INCREASES exposure on other outcomes. It is a natural hedge.

Before Sonia's lay:
  MI Win liability:     5,00,000
  MI Not Win liability: 0

After Sonia lays MI at 1.85 for 10,000 (Rajesh retains 60%):
  MI Win liability:     4,94,900  (reduced by 5,100)
  MI Not Win liability: 6,000     (increased by 6,000)

Worst case went from 5,00,000 to 4,94,900 = HEDGE

How the Matrix Handles Lay Bets

Same 5-dimensional matrix, same rules, same precedence chain. The only difference is how the exposure ledger is updated and how NO_NEW_RISK evaluates the bet. If the lay reduces worst-case liability, it is allowed even under NO_NEW_RISK.

21. Collusion Detection

The Problem

An agent conspires with a sharp punter: marks them as NORMAL to retain winning bets instead of forwarding them. Or marks them as SHARP only when they are about to lose, forwarding toxic flow upline.

Detection Signals

Signal	Severity
Classification flip before winning streak	HIGH
Classification flip-flop (>3 times/week)	HIGH
Override % changes correlate with bet outcomes	CRITICAL
Forwarded flow consistently loses vs retained flow consistently wins	HIGH
Single user dominates forwarded volume (>30%)	MEDIUM

Cooling-Off Period for Classification Changes

Changes are queued, not instant:

NORMAL to SHARP: 24-hour cooling-off
SHARP to NORMAL: 72-hour cooling-off (riskier direction, longer wait)
During cooling-off: old classification remains active. Agent cannot change again.

Upline Audit Rights

Vikram can see Rajesh's user classifications, pending changes, and classification-vs-outcome correlation reports. Vikram can flag changes for review and request the platform to freeze Rajesh's override capability. Vikram cannot directly change Rajesh's classifications.

Correlation Engine (Phase 2 Implementation)

A nightly analysis builds timelines of classification changes, bet placements, and outcomes. Calculates collusion scores based on statistical correlation. Thresholds trigger escalating alerts from informational (score 25-75) to automatic override freeze (score 150+).

For Phase 1, the cooling-off periods and upline audit rights provide baseline protection. The automated correlation engine is Phase 2.

22. Agent Hierarchy Migration

The Problem

Rajesh leaves Vikram's network and joins Suresh's. Rajesh has open positions forwarded through Vikram. The system must handle this cleanly.

Effective-Dated Changes

Hierarchy changes are never instantaneous. They take effect at a scheduled date.

Dual-Path Settlement

After cutover:

New bets: Rajesh -> Suresh -> Platform
Old positions: Still settled through Vikram (he holds the positions)

Bet Timing	Routing	Settlement
Before cutover, settles before cutover	Through Vikram	Through Vikram
Before cutover, settles after cutover	Through Vikram	Through Vikram (dual-path)
After cutover	Through Suresh	Through Suresh

When all old positions settle, final reconciliation between Rajesh and Vikram, then migration status becomes COMPLETE.

23. Minimum Forwarding / Skin-in-the-Game Requirements

How It Works

The upline sets a minimum retention percentage per downstream agent. This floor cannot be overridden by the downstream agent's matrix.

Vikram requires Rajesh to retain at least 20% of every bet. If Rajesh's matrix says forward 95% for SHARP users, the system caps it at 80% forward (20% retention minimum).

Limits always win over minimum retention. If Rajesh's limits would be breached by retaining 20%, he retains only what his limits allow.

When Rajesh tries to save a matrix rule that violates the minimum, the system auto-adjusts and shows a clear message explaining the constraint.

24. Panic Button & Abuse Prevention

What the Panic Button Does

Immediately sets forwarding to 100% for all sports and markets
Places hedge orders on Betfair for all current retained positions
Notifies the upline
Logs the action

Abuse Prevention

The abuse pattern: panic when losing (lock in small loss), hold when winning (collect full profit).

Cost escalation:

First panic per period: platform absorbs hedge spread cost
Second panic: 50% spread cost charged to the agent
Third+: 100% charged to the agent

Controls:

Control	Value
Panics per night (free)	2
Cooling-off after panic	30 minutes before matrix can be restored
Minimum hedge duration	15 minutes (cannot un-hedge immediately)

Detection: Track panic frequency, correlation with book direction (panics when losing, not when winning), and P&L improvement from panic usage. If panic consistently improves P&L by >20% over 3 weeks with asymmetric usage, flag as GAMING. Escalation: full spread cost, then suspension of panic feature replaced with automatic NO_NEW_RISK.

25. Timestamp Security & Period Boundaries

Server-Side Authority

The authoritative timestamp is assigned at the API gateway when the request is received. This is immutable. Client timestamps are stored for debugging and fraud detection but never trusted for business logic.

BET PROCESSING:
Client sends request with client_timestamp
API gateway assigns server_timestamp = NOW()
All period boundary evaluation uses server_timestamp
All exposure updates reference server_timestamp
Audit stores BOTH timestamps

Period Boundary Tie-Breaking

Bets at exactly the boundary belong to the ENDING period. The interval is closed-open: [19:00, 02:00). A bet at 01:59:59.999 is NIGHT. A bet at 02:00:00.000 is DAY.

Clock Skew

All server instances synchronized via NTP to within 50ms. If drift exceeds 200ms, the instance is removed from the load balancer. For period boundaries (at hour granularity), 50ms skew is irrelevant.

26. Sharp Detection via Multiple Accounts

The Detection Pillars (Design for Phase 1, Heavy Implementation Phase 2)

Four independent signals:

Pillar 1: Device Fingerprinting. Same device used by multiple accounts. Similarity scoring based on device_id, canvas fingerprint, IP + user agent, screen resolution + timezone.

Pillar 2: IP/Network Correlation. Multiple accounts from same IP or subnet. Sequential IP usage patterns. VPN/proxy detection.

Pillar 3: Betting Pattern Similarity. Outcome correlation (>80% over 100+ bets is suspicious). Timing correlation (<5 minutes apart). Correlated obscure market selection. Similar stake distributions. Similar CLV profiles.

Pillar 4: Payment Method Overlap. Same bank account, UPI ID, phone number, or card on multiple accounts.

Phase 1 Implementation

For Phase 1: implement device fingerprint collection and basic IP logging. Store the data. Manual review when agents report suspected syndicates.

For Phase 2: build the automated correlation engine that runs nightly, scores clusters, and generates syndicate alerts with recommended actions (classify all as SHARP, block, review individually).

27. Rate Limiting on Configuration Changes

Per-Agent Rate Limits

Configuration Type	Rate Limit
Matrix rule changes	1 per 5 minutes
User override changes	1 per user per 10 minutes
Market override changes	1 per market per 5 minutes
Agent default changes	1 per 15 minutes
Limit changes	1 per limit per 10 minutes
Panic button	No rate limit on activation; 30-minute cooling-off

Queue Behavior

Rapid changes that exceed the rate limit are queued. Only the most recent value is applied when the rate limit window expires. Intermediate values are never applied.

The agent sees: "Your change is pending. It will apply in ~4 minutes. Only your most recent value will be used."

Panic Button Interaction

Panic activation bypasses all rate limits (safety trumps stability). Configuration changes while panic is active are queued and applied only after panic is deactivated.

28. Hedge Execution

The Simple Design

For Phase 1, hedge execution is deliberately simple:

What this means:

Market orders, not limit orders. Accept whatever price Betfair gives.
No re-pricing, no partial fill management, no slippage optimization.
If the order fails 3 times (Betfair down, no liquidity, API error), mark the amount as unhedged and alert.
Log the theoretical price (what the punter got) vs the actual fill price. This data feeds Phase 2 optimization.

Why this is fine for Phase 1: We are hedging small amounts. The spread between a market order and an optimized limit order on a 800 bet is a few rupees. Building a sophisticated execution engine before we have meaningful hedge volume is premature.

Betfair Health Check

Ping Betfair every 10 seconds. Three consecutive failures means Betfair is down. Queue all orders for retry. Platform absorbs risk. When Betfair returns, drain the queue.

If an event settles while a hedge order is queued, cancel the order. The platform bears the outcome as retained risk.

29. Reconciliation

The Simple Design

One nightly job. Runs at 4:00 AM IST when traffic is minimal.

What it checks:

For every active agent, for every scope with open positions:

Step 1: Read exposure_ledger.retained_open_liability for this agent+scope
Step 2: SUM(liability) FROM positions WHERE agent_id AND scope AND status='OPEN' AND type='RETAINED'
Step 3: Compare.
Step 4: If they match: log success.
Step 5: If mismatch: alert ops team via Slack/email with:
        - Agent name
        - Scope
        - Ledger value vs computed value
        - Drift amount and direction

No auto-correction. If the ledger is wrong, a human investigates the root cause. Auto-correction hides bugs. In Phase 1, every mismatch should be understood and fixed at the source.

The manual recompute tool:

reconciliation recompute --agent=rajesh --scope=cricket

Lock the agent+scope (advisory lock, briefly delays new bets)
SUM all open positions
Update exposure_ledger to match
Update Redis
Release lock
Log the correction

This is the fix tool. Use it after investigating and understanding why the drift occurred.

30. Monitoring & Alerting

The Simple Design

No Prometheus. No Grafana. No AlertManager. Instead:

1. Structured Logging (Pino)

Every bet, settlement, hedge, and error is logged as structured JSON. Key fields: timestamp, event, agent_id, bet_id, latency_ms, success, error.

This gives us everything we need for debugging and performance analysis. Use jq or a log aggregator to query.

2. Health Endpoint

GET /api/v1/monitoring/health returns:

{
  "status": "healthy",
  "redis": "connected",
  "postgresql": "connected",
  "betfair": "healthy",
  "uptime_seconds": 86400,
  "bets_last_minute": 42,
  "avg_latency_ms": 65,
  "errors_last_minute": 0
}

3. Critical Failure Alerts (Slack/Email)

Only three categories of alerts for Phase 1:

Alert	Trigger	Channel
NO_NEW_RISK activated	Any agent enters NO_NEW_RISK	Slack
Settlement failure	Any position fails to settle after 3 retries	Slack + Email
Hedge failure	Hedge marked as unhedged after 3 retries	Slack
DLQ entry	Any bet enters the dead letter queue	Slack
Reconciliation mismatch	Nightly job finds any drift	Email
Betfair down	3 consecutive health check failures	Slack

That is it. Six alert types. When we have operational complexity that warrants dashboards, we add Prometheus/Grafana.

This concludes Parts I and II of the simplified B-Book Architecture v3.0. The document covers 30 sections (down from 46 in v2.0), with every simplification explicitly documented and justified. The v2.0 document is preserved as the scale-up reference for when Phase 1 traffic and operational needs outgrow these simpler designs.

Part III: Implementation Architecture

The following sections provide the complete implementation specification. Every database table, every API endpoint, every pipeline step, and every deployment detail. An LLM reading this can build the entire system without asking a single question.

31. Technology Stack

Core Technologies

Technology	Version	Purpose	Why This Choice
Node.js	20 LTS	Application runtime	Event-loop model handles high concurrency cheaply. Non-blocking I/O suits the many Redis and DB calls in the bet pipeline. Team has existing expertise.
TypeScript	5.x	Language	Type safety prevents financial bugs. Domain types like `Stake`, `Liability`, and `ForwardPercentage` catch errors at compile time.
PostgreSQL	16	Primary database	ACID transactions for financial data. `FOR UPDATE` locking for limit enforcement at boundaries. JSONB for flexible audit payloads. Partitioning for table growth.
Prisma	5.x	ORM	Type-safe database access from TypeScript. Schema-as-code for migrations. Use raw SQL only for `FOR UPDATE` locks and bulk operations.
Redis	7.x	Cache and counters	Sub-millisecond reads for exposure checks. Atomic `INCRBY` for exposure counters. Used as the backing store for BullMQ.
BullMQ	Latest	Job queue	Settlement processing, hedge retry, reconciliation, audit writes. Built on Redis. Supports delayed jobs, retries, priority queues, dead letter.
Docker	Latest	Containerization	Consistent environments. Docker Compose for local development. Single-container production deployment.
Pino	Latest	Structured logging	Fast JSON logger. Structured logs are queryable. Low overhead at high throughput. Replaces the need for Prometheus in v1.
Zod	Latest	Runtime validation	Validates all API inputs and configuration at runtime. Complements TypeScript compile-time types.
Socket.IO	Latest	WebSocket	Real-time dashboard updates to agents. Push notifications for alerts.

What We Are NOT Using (and Why)

Technology	Why Not
Prometheus + Grafana	No dedicated ops person yet. Pino structured logs and a simple `/health` endpoint are sufficient for v1. Add Prometheus when you have someone to look at dashboards.
Separate Audit DB	Overkill for v1 volume. Audit trail goes into the same PostgreSQL instance as JSONB in a partitioned table. Separate it when table size exceeds 100 GB.
Kafka or Redis Streams	BullMQ covers all async job needs (settlement, hedging, reconciliation). Kafka is for 1000+ bets/sec. We are targeting 167 bets/sec.
Application LRU Cache	Adds a consistency headache (3-tier cache invalidation). Redis is fast enough at sub-millisecond latency. Add an LRU layer only if Redis P99 exceeds 5ms under load.
Microservices	One Node.js process is simpler to deploy, debug, and operate. Modules are separated in code but deployed as one application. Extract into microservices only if a specific module needs independent scaling.
Multi-currency / FX	INR only for v1. All amounts in paisa (integer arithmetic, no floating-point). Add multi-currency when expanding beyond India.
Responsible Gambling Module	Not required for the Indian agent market in v1. Add when entering regulated markets (UK, EU).

32. System Architecture Overview

System Diagram

How Services Communicate

From	To	Method	Notes
Client to API	REST	HTTP/JSON, auth via JWT	Standard request-response
Client to Dashboard	WebSocket	Socket.IO	Real-time exposure updates, alerts
API to Bet Processing	Function call	In-process	Everything is one Node.js process
Bet Processing to Cascade Engine	Function call	In-process	Synchronous -- the bet pipeline is a single call chain
Bet Processing to Hedge Queue	BullMQ job	Async	Fire-and-forget from the bet pipeline
Settlement trigger to Settlement Worker	BullMQ job	Async	Settlement jobs queued when events settle
Config Change to Cache	Redis DEL	Direct	Delete the Redis key; next read repopulates from PostgreSQL

Deployment Topology

Local Development: Docker Compose with three services: app (Node.js), postgres, redis.

Production (v1): One Node.js process running the full application and all BullMQ workers. One PostgreSQL instance. One Redis instance. Put behind nginx or a cloud load balancer if you need SSL termination.

Scale horizontally by adding a second Node.js instance behind the load balancer when traffic demands it. No code changes required -- BullMQ handles job distribution across workers automatically, and Redis ensures shared state.

33. Database Schema Design

Entity-Relationship Diagram

Table: agents

Stores every entity in the hierarchy: sub-agents, master agents, and the platform itself.

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY, DEFAULT gen_random_uuid()	Unique agent identifier
external_id	VARCHAR(100)	UNIQUE, NOT NULL	Human-readable ID (e.g., `rajesh_mumbai`)
name	VARCHAR(255)	NOT NULL	Display name
parent_agent_id	UUID	REFERENCES agents(id), NULLABLE	Upline agent. NULL for the platform.
level	INTEGER	NOT NULL	Hierarchy depth. 0 = platform, 1 = master, 2 = sub, etc.
status	VARCHAR(20)	NOT NULL, DEFAULT 'ACTIVE'	ACTIVE, SUSPENDED, DEACTIVATED
timezone	VARCHAR(50)	NOT NULL, DEFAULT 'Asia/Kolkata'	IANA timezone for period boundaries
default_forward_percentage	DECIMAL(5,2)	NOT NULL, DEFAULT 50.00	Fallback when no matrix rule matches
matrix_version	INTEGER	NOT NULL, DEFAULT 1	Incremented on every matrix change. Simple integer -- no version history table.
night_period_start	TIME	NULLABLE	Night period start in local time
night_period_end	TIME	NULLABLE	Night period end in local time
weekly_period_start_day	INTEGER	NOT NULL, DEFAULT 1	1=Monday, 7=Sunday
is_platform	BOOLEAN	NOT NULL, DEFAULT false	True for the single platform agent
platform_retain_percentage	DECIMAL(5,2)	NULLABLE	Only for platform: percent to retain vs hedge
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_agents_parent on (parent_agent_id)
idx_agents_status on (status)
UNIQUE on (external_id)

Simplification vs v2: No separate tier column (derive from config complexity). The matrix_version integer lives directly on the agent row instead of a separate matrix_versions table. When a matrix changes, increment this integer and record the change in audit_trail.

Table: agent_limits

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
agent_id	UUID	REFERENCES agents(id), NOT NULL
limit_type	VARCHAR(30)	NOT NULL	SPORT, MARKET, NIGHT_PERIOD, WEEKLY_PERIOD
sport_type	VARCHAR(30)	NULLABLE	CRICKET, FOOTBALL, etc. NULL for period limits across all sports
event_id	VARCHAR(100)	NULLABLE	Only for MARKET limits -- the specific event
limit_amount	BIGINT	NOT NULL	In paisa. Example: 50 lakh = 5,000,000,000 paisa
is_active	BOOLEAN	NOT NULL, DEFAULT true
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_agent_limits_lookup on (agent_id, limit_type, sport_type)
UNIQUE on (agent_id, limit_type, sport_type, event_id) -- prevents duplicate limits

Table: forwarding_matrix_rules

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
agent_id	UUID	REFERENCES agents(id), NOT NULL
market_type	VARCHAR(30)	NOT NULL, DEFAULT '*'	MATCH_ODDS, FANCY, BOOKMAKER, OVER_UNDER, LINE, or *
sport_type	VARCHAR(30)	NOT NULL, DEFAULT '*'	CRICKET, FOOTBALL, TENNIS, KABADDI, or *
event_phase	VARCHAR(30)	NOT NULL, DEFAULT '*'	PRE_MATCH, IN_PLAY, APPROACHING_START, or *
source_type	VARCHAR(30)	NOT NULL, DEFAULT '*'	NORMAL, SHARP, VIP, NEW_ACCOUNT, or *
liquidity_band	VARCHAR(30)	NOT NULL, DEFAULT '*'	HIGH, MEDIUM, LOW, NONE, or *
forward_percentage	DECIMAL(5,2)	NOT NULL, CHECK (0 <= val <= 100)	Percentage to forward to upline
specificity	INTEGER	NOT NULL	Count of non-wildcard dimensions (0-5). Computed at insert/update in application code.
is_active	BOOLEAN	NOT NULL, DEFAULT true
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()	Oldest rule wins tie-breaks
updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_fmr_agent_active on (agent_id) WHERE is_active = true
idx_fmr_lookup on (agent_id, market_type, sport_type, event_phase, source_type, liquidity_band) WHERE is_active = true

Simplification vs v2: No version column on each rule. Instead, the agent's matrix_version integer is incremented whenever any rule changes. Bets record the matrix_version from the agent row at processing time, which is sufficient for audit replay.

Table: user_overrides

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
user_id	UUID	REFERENCES users(id), NOT NULL
agent_id	UUID	REFERENCES agents(id), NOT NULL	The agent applying this override
forward_percentage	DECIMAL(5,2)	NOT NULL	Override forward percent
reason	TEXT	NOT NULL	Why the override was set
created_by	UUID	NOT NULL	Who set the override
is_active	BOOLEAN	NOT NULL, DEFAULT true
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
expires_at	TIMESTAMPTZ	NULLABLE	Optional expiry

Indexes:

UNIQUE on (user_id, agent_id) WHERE is_active = true

Table: market_overrides

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
agent_id	UUID	REFERENCES agents(id), NOT NULL
event_id	VARCHAR(100)	NOT NULL	The specific event/market
forward_percentage	DECIMAL(5,2)	NOT NULL	Override forward percent
reason	TEXT	NOT NULL
created_by	UUID	NOT NULL
is_active	BOOLEAN	NOT NULL, DEFAULT true
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
expires_at	TIMESTAMPTZ	NULLABLE

Indexes:

UNIQUE on (agent_id, event_id) WHERE is_active = true

Table: users

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
external_id	VARCHAR(100)	UNIQUE, NOT NULL	User ID from the main platform
agent_id	UUID	REFERENCES agents(id), NOT NULL	The agent this user belongs to
name	VARCHAR(255)	NOT NULL
per_click_win_limit	BIGINT	NOT NULL, DEFAULT 5000000	In paisa. Default 50,000 INR
aggregate_win_limit_daily	BIGINT	NOT NULL, DEFAULT 20000000	In paisa. Default 2,00,000 INR
min_stake	BIGINT	NOT NULL, DEFAULT 10000	In paisa. Default 100 INR
status	VARCHAR(20)	NOT NULL, DEFAULT 'ACTIVE'	ACTIVE, SUSPENDED
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_users_agent on (agent_id)
UNIQUE on (external_id)

Simplification vs v2: No self-exclusion fields, no session time limits, no deposit limits. Those belong to the responsible gambling module (v2).

Table: user_classifications

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
user_id	UUID	REFERENCES users(id), NOT NULL
agent_id	UUID	REFERENCES agents(id), NOT NULL	Agent making this classification
classification	VARCHAR(30)	NOT NULL	NORMAL, SHARP, VIP, NEW_ACCOUNT
reason	TEXT	NULLABLE	Why classified
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

UNIQUE on (user_id, agent_id)

Table: agent_trust_config

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
agent_id	UUID	REFERENCES agents(id), NOT NULL	The upline agent
sub_agent_id	UUID	REFERENCES agents(id), NOT NULL	The downstream agent
trust_downstream_flags	BOOLEAN	NOT NULL, DEFAULT false	Whether to trust sub-agent's user classifications
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

UNIQUE on (agent_id, sub_agent_id)

Table: events

Column	Type	Constraints	Description
id	VARCHAR(100)	PRIMARY KEY	External event ID from odds provider
sport_type	VARCHAR(30)	NOT NULL
name	VARCHAR(500)	NOT NULL	"MI vs CSK, IPL 2026"
start_time	TIMESTAMPTZ	NOT NULL
status	VARCHAR(20)	NOT NULL, DEFAULT 'UPCOMING'	UPCOMING, LIVE, SUSPENDED, SETTLED, VOID
result	JSONB	NULLABLE	Settlement result data
settled_at	TIMESTAMPTZ	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_events_sport_status on (sport_type, status)
idx_events_start_time on (start_time)

Table: markets

Column	Type	Constraints	Description
id	VARCHAR(100)	PRIMARY KEY	External market ID
event_id	VARCHAR(100)	REFERENCES events(id), NOT NULL
market_type	VARCHAR(30)	NOT NULL	MATCH_ODDS, FANCY, BOOKMAKER, etc.
name	VARCHAR(255)	NOT NULL
status	VARCHAR(20)	NOT NULL, DEFAULT 'OPEN'	OPEN, SUSPENDED, CLOSED, SETTLED, VOID
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_markets_event on (event_id)
idx_markets_status on (status)

Table: bets (partitioned by month on created_at)

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
user_id	UUID	NOT NULL	The punter
agent_id	UUID	NOT NULL	The originating agent (Level 1)
event_id	VARCHAR(100)	NOT NULL
market_id	VARCHAR(100)	NOT NULL
selection	VARCHAR(255)	NOT NULL	"MI to win"
side	VARCHAR(10)	NOT NULL	BACK or LAY
requested_stake	BIGINT	NOT NULL	Original stake in paisa
accepted_stake	BIGINT	NOT NULL	After stake reduction
odds	DECIMAL(10,4)	NOT NULL	Decimal odds
potential_win	BIGINT	NOT NULL	In paisa
liability	BIGINT	NOT NULL	In paisa
stake_reduction_reason	VARCHAR(50)	NULLABLE	PER_CLICK_LIMIT, AGGREGATE_LIMIT, or null
market_type	VARCHAR(30)	NOT NULL
sport_type	VARCHAR(30)	NOT NULL
event_phase	VARCHAR(30)	NOT NULL	PRE_MATCH, IN_PLAY
source_type	VARCHAR(30)	NOT NULL	As resolved at originating agent
liquidity_band	VARCHAR(30)	NOT NULL	HIGH, MEDIUM, LOW, NONE
matrix_version_snapshot	INTEGER	NOT NULL	The agent's matrix_version at processing time
status	VARCHAR(20)	NOT NULL, DEFAULT 'OPEN'	OPEN, SETTLED, VOIDED, PARTIALLY_VOIDED, CANCELLED, CASH_OUT_SETTLED, RE_SETTLED
routing_status	VARCHAR(20)	NOT NULL, DEFAULT 'COMPLETE'	COMPLETE, PARTIAL, FAILED
period_context	VARCHAR(20)	NOT NULL	NIGHT, DAY
total_processing_time_ms	INTEGER	NOT NULL	End-to-end latency
settled_at	TIMESTAMPTZ	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()	Partition key

Indexes:

idx_bets_user_time on (user_id, created_at DESC)
idx_bets_agent_time on (agent_id, created_at DESC)
idx_bets_event on (event_id)
idx_bets_market on (market_id)
idx_bets_status_open on (status) WHERE status = 'OPEN'
idx_bets_routing_incomplete on (routing_status) WHERE routing_status != 'COMPLETE'

Table: positions (partitioned by month on created_at)

One row per agent per bet. If Amit's bet flows through Rajesh, Vikram, and the Platform, that is 3 RETAINED position rows (and optionally 3 FORWARDED rows, though forwarded amounts can be derived).

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
bet_id	UUID	NOT NULL
agent_id	UUID	NOT NULL	Agent holding this position
cascade_level	INTEGER	NOT NULL	1 = first agent, 2 = upline, etc.
position_type	VARCHAR(20)	NOT NULL	RETAINED or FORWARDED
stake	BIGINT	NOT NULL	Stake portion in paisa
liability	BIGINT	NOT NULL	Liability portion in paisa
potential_win	BIGINT	NOT NULL	Potential win portion in paisa
forward_percentage_used	DECIMAL(5,2)	NOT NULL	The forward percent that produced this split
forward_source	VARCHAR(30)	NOT NULL	USER_OVERRIDE, MARKET_OVERRIDE, MATRIX_RULE, AGENT_DEFAULT
matrix_rule_id	UUID	NULLABLE	The specific matrix rule that matched
overflow_amount	BIGINT	NOT NULL, DEFAULT 0	How much was overflow from limit breach
event_id	VARCHAR(100)	NOT NULL	Denormalized for fast queries
market_id	VARCHAR(100)	NOT NULL	Denormalized
sport_type	VARCHAR(30)	NOT NULL	Denormalized
selection	VARCHAR(255)	NOT NULL	Denormalized
side	VARCHAR(10)	NOT NULL	Denormalized BACK/LAY
odds	DECIMAL(10,4)	NOT NULL	Denormalized
status	VARCHAR(20)	NOT NULL, DEFAULT 'OPEN'	OPEN, SETTLED, VOIDED
settlement_id	UUID	NULLABLE
settled_amount	BIGINT	NULLABLE	Actual P&L in paisa (positive = agent profit)
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_positions_bet on (bet_id)
idx_positions_agent_open on (agent_id, status) WHERE status = 'OPEN'
idx_positions_agent_event on (agent_id, event_id) WHERE status = 'OPEN'
idx_positions_event_status on (event_id, status)

Table: exposure_ledger

One row per agent per scope. No sharding. Simple.

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
agent_id	UUID	NOT NULL
scope_type	VARCHAR(30)	NOT NULL	SPORT, MARKET, NIGHT_PERIOD, WEEKLY_PERIOD
scope_key	VARCHAR(200)	NOT NULL	e.g., "cricket", "mi_vs_csk_2026_03_15", "night_2026_03_15"
retained_open_liability	BIGINT	NOT NULL, DEFAULT 0	In paisa
forwarded_open_liability	BIGINT	NOT NULL, DEFAULT 0	In paisa
open_potential_win	BIGINT	NOT NULL, DEFAULT 0	In paisa
no_new_risk_active	BOOLEAN	NOT NULL, DEFAULT false
no_new_risk_triggered_at	TIMESTAMPTZ	NULLABLE
last_updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

UNIQUE on (agent_id, scope_type, scope_key)
idx_exposure_agent_scope on (agent_id, scope_type)
idx_exposure_no_new_risk on (no_new_risk_active) WHERE no_new_risk_active = true

Simplification vs v2: No shard_index column. One row per agent per scope. If this becomes a write bottleneck (measurable as PostgreSQL write contention P99 exceeding 50ms), add sharding later.

Table: settlements (partitioned by month on created_at)

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
event_id	VARCHAR(100)	NOT NULL
market_id	VARCHAR(100)	NOT NULL
position_id	UUID	NOT NULL
agent_id	UUID	NOT NULL
bet_id	UUID	NOT NULL
settlement_type	VARCHAR(20)	NOT NULL	WIN, LOSS, VOID, PUSH
stake	BIGINT	NOT NULL	The position's stake
payout	BIGINT	NOT NULL	Amount paid to/from agent
profit_loss	BIGINT	NOT NULL	Agent P&L (positive = profit)
idempotency_key	VARCHAR(200)	UNIQUE, NOT NULL	`{position_id}_{event_result_hash}`
status	VARCHAR(20)	NOT NULL, DEFAULT 'COMPLETED'	COMPLETED, REVERSED, RE_SETTLED
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_settlements_event on (event_id)
idx_settlements_agent_time on (agent_id, created_at DESC)
idx_settlements_bet on (bet_id)
UNIQUE on (idempotency_key)

Table: audit_trail (partitioned by month on created_at)

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
bet_id	UUID	NULLABLE	NULL for config change audits
record_type	VARCHAR(30)	NOT NULL	BET_PLACED, BET_SETTLED, BET_VOIDED, CONFIG_CHANGED
agent_id	UUID	NOT NULL	Primary agent for this record
user_id	UUID	NULLABLE
event_id	VARCHAR(100)	NULLABLE
payload	JSONB	NOT NULL	Full structured audit data
processing_time_ms	INTEGER	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_audit_bet on (bet_id) WHERE bet_id IS NOT NULL
idx_audit_agent_time on (agent_id, created_at DESC)
idx_audit_type_time on (record_type, created_at DESC)

Simplification vs v2: No checksum or previous_checksum columns. No checksum chain. The JSONB payload is the truth. If you need tamper detection later, add checksums in a migration.

Table: dead_letter_queue

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
source	VARCHAR(50)	NOT NULL	BET_PROCESSING, SETTLEMENT, HEDGE, RECONCILIATION
reference_id	UUID	NOT NULL	bet_id, settlement_id, etc.
error_message	TEXT	NOT NULL
error_stack	TEXT	NULLABLE
payload	JSONB	NOT NULL	Full context for retry
retry_count	INTEGER	NOT NULL, DEFAULT 0
max_retries	INTEGER	NOT NULL, DEFAULT 3
status	VARCHAR(20)	NOT NULL, DEFAULT 'PENDING'	PENDING, RETRYING, RESOLVED, ESCALATED
resolved_by	UUID	NULLABLE
resolved_at	TIMESTAMPTZ	NULLABLE
resolution_notes	TEXT	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_dlq_status on (status) WHERE status IN ('PENDING', 'RETRYING')
idx_dlq_reference on (reference_id)

Table: hedge_orders

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
bet_id	UUID	NOT NULL
event_id	VARCHAR(100)	NOT NULL
market_id	VARCHAR(100)	NOT NULL
selection	VARCHAR(255)	NOT NULL
side	VARCHAR(10)	NOT NULL	BACK or LAY
target_price	DECIMAL(10,4)	NOT NULL	Price the punter received
requested_amount	BIGINT	NOT NULL	In paisa
filled_amount	BIGINT	NOT NULL, DEFAULT 0	In paisa
betfair_bet_id	VARCHAR(100)	NULLABLE	Betfair's order reference
status	VARCHAR(20)	NOT NULL, DEFAULT 'QUEUED'	QUEUED, SENT, FILLED, FAILED, UNHEDGED
retry_count	INTEGER	NOT NULL, DEFAULT 0
error_message	TEXT	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
sent_at	TIMESTAMPTZ	NULLABLE	When sent to Betfair
filled_at	TIMESTAMPTZ	NULLABLE

Indexes:

idx_hedge_bet on (bet_id)
idx_hedge_status on (status) WHERE status IN ('QUEUED', 'SENT')

Simplification vs v2: No reprice_count, max_reprice_attempts, time_in_force_seconds, limit_price, slippage, or average_fill_price. We send market orders, not limit orders. Retry 3 times. Mark as UNHEDGED on failure.

Table: alerts

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
alert_type	VARCHAR(50)	NOT NULL	NO_NEW_RISK_ACTIVATED, LIMIT_APPROACHING, RECONCILIATION_MISMATCH, HEDGE_FAILED, DLQ_ENTRY, etc.
severity	VARCHAR(5)	NOT NULL	P1, P2, P3
agent_id	UUID	NULLABLE
title	VARCHAR(255)	NOT NULL
description	TEXT	NOT NULL
metadata	JSONB	NULLABLE
status	VARCHAR(20)	NOT NULL, DEFAULT 'ACTIVE'	ACTIVE, ACKNOWLEDGED, RESOLVED
acknowledged_by	UUID	NULLABLE
resolved_at	TIMESTAMPTZ	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_alerts_active on (status) WHERE status = 'ACTIVE'
idx_alerts_agent on (agent_id, created_at DESC)

Note on removed tables vs v2:

No config_changelog table: config change audits go into audit_trail with record_type = 'CONFIG_CHANGED' and full old/new values in the JSONB payload.
No feature_flags table: use environment variables (e.g., FEATURE_CASCADING_ROUTING=true) or a simple JSON config file. Reload on deploy.
No reconciliation_results / reconciliation_discrepancies tables: reconciliation mismatches are written as alerts with alert_type = 'RECONCILIATION_MISMATCH' and details in the metadata JSONB.
No agent_hierarchy_history table: hierarchy changes go into audit_trail with record_type = 'CONFIG_CHANGED'.

34. API Design

Bet Placement APIs

Method	Path	Description	Auth
POST	`/api/v1/bets`	Place a new bet	User JWT
GET	`/api/v1/bets/:betId`	Get bet details with routing info	User/Agent JWT
GET	`/api/v1/bets`	List bets with filters	Agent JWT
POST	`/api/v1/bets/:betId/void`	Void an open bet	Admin JWT
POST	`/api/v1/bets/:betId/cancel`	Cancel within window	User/Agent JWT
POST	`/api/v1/bets/:betId/cash-out`	Request cash-out offer	User JWT
POST	`/api/v1/bets/:betId/cash-out/accept`	Accept cash-out	User JWT
POST	`/api/v1/bets/simulate`	Dry-run (no money, full routing)	Agent JWT

POST /api/v1/bets -- Request:

{
  "user_id": "uuid",
  "event_id": "string",
  "market_id": "string",
  "selection": "MI to win",
  "side": "BACK",
  "stake": 1000000,
  "odds": 1.85,
  "market_type": "MATCH_ODDS",
  "sport_type": "CRICKET",
  "event_phase": "PRE_MATCH",
  "liquidity_band": "HIGH"
}

All monetary values in paisa. 1000000 paisa = 10,000 INR.

Response (success):

{
  "bet_id": "uuid",
  "status": "ACCEPTED",
  "accepted_stake": 1000000,
  "stake_reduced": false,
  "potential_win": 850000,
  "message": null
}

Response (stake reduced):

{
  "bet_id": "uuid",
  "status": "ACCEPTED_REDUCED",
  "accepted_stake": 588200,
  "original_stake": 1000000,
  "stake_reduced": true,
  "potential_win": 500000,
  "message": "Maximum stake at these odds: 5,882 INR"
}

Response (rejected):

{
  "bet_id": null,
  "status": "REJECTED",
  "reason": "MARKET_SUSPENDED",
  "message": "This market is currently unavailable."
}

Agent Configuration APIs

Method	Path	Description	Auth
GET	`/api/v1/agents/:agentId`	Get agent profile and config	Agent JWT
PATCH	`/api/v1/agents/:agentId`	Update agent settings	Agent JWT
GET	`/api/v1/agents/:agentId/limits`	Get all limits	Agent JWT
PUT	`/api/v1/agents/:agentId/limits`	Set/update limits	Agent JWT
GET	`/api/v1/agents/:agentId/matrix`	Get forwarding matrix rules	Agent JWT
POST	`/api/v1/agents/:agentId/matrix/rules`	Add a matrix rule	Agent JWT
PUT	`/api/v1/agents/:agentId/matrix/rules/:ruleId`	Update a rule	Agent JWT
DELETE	`/api/v1/agents/:agentId/matrix/rules/:ruleId`	Delete a rule	Agent JWT
POST	`/api/v1/agents/:agentId/matrix/test`	Test a bet against the matrix	Agent JWT
GET	`/api/v1/agents/:agentId/exposure`	Current exposure summary	Agent JWT
GET	`/api/v1/agents/:agentId/exposure/:scope`	Exposure for a specific scope	Agent JWT
POST	`/api/v1/agents/:agentId/panic`	Trigger panic mode	Agent JWT
GET	`/api/v1/agents/:agentId/sub-agents`	List sub-agents	Agent JWT
GET	`/api/v1/agents/:agentId/trust-config`	Get trust settings	Agent JWT
PUT	`/api/v1/agents/:agentId/trust-config/:subAgentId`	Update trust	Agent JWT

POST /api/v1/agents/:agentId/matrix/rules -- Request:

{
  "market_type": "FANCY",
  "sport_type": "CRICKET",
  "event_phase": "IN_PLAY",
  "source_type": "*",
  "liquidity_band": "*",
  "forward_percentage": 70.00
}

Response:

{
  "rule_id": "uuid",
  "new_matrix_version": 48,
  "specificity": 3,
  "effective_immediately": true
}

User Management APIs

Method	Path	Description	Auth
GET	`/api/v1/users/:userId`	Get user profile	Agent JWT
PATCH	`/api/v1/users/:userId`	Update user settings	Agent JWT
POST	`/api/v1/users/:userId/override`	Set user forward override	Agent JWT
DELETE	`/api/v1/users/:userId/override`	Remove override	Agent JWT
POST	`/api/v1/users/:userId/classify`	Set classification	Agent JWT
GET	`/api/v1/users/:userId/bets`	Bet history	Agent JWT

Settlement APIs

Method	Path	Description	Auth
POST	`/api/v1/settlements/events/:eventId`	Trigger settlement	System/Admin JWT
GET	`/api/v1/settlements/events/:eventId`	Settlement status	Agent JWT
POST	`/api/v1/settlements/events/:eventId/reverse`	Reverse settlement	Admin JWT
POST	`/api/v1/settlements/events/:eventId/resettle`	Re-settle with corrected results	Admin JWT
GET	`/api/v1/settlements/agents/:agentId`	Agent settlement history	Agent JWT

POST /api/v1/settlements/events/:eventId -- Request:

{
  "result": {
    "winner": "MI",
    "market_results": {
      "match_odds": { "winning_selection": "MI to win" },
      "fancy_180_runs": { "actual_value": 187, "line": 180 }
    }
  },
  "result_source": "OFFICIAL",
  "confirmed_by": "admin_uuid"
}

Admin APIs

Method	Path	Description	Auth
POST	`/api/v1/admin/agents`	Create agent	Admin JWT
POST	`/api/v1/admin/agents/:agentId/suspend`	Suspend agent	Admin JWT
POST	`/api/v1/admin/agents/:agentId/reactivate`	Reactivate	Admin JWT
GET	`/api/v1/admin/dead-letter-queue`	View DLQ	Admin JWT
POST	`/api/v1/admin/dead-letter-queue/:id/retry`	Retry DLQ entry	Admin JWT
POST	`/api/v1/admin/dead-letter-queue/:id/resolve`	Resolve DLQ entry	Admin JWT
POST	`/api/v1/admin/reconciliation/run`	Manual reconciliation	Admin JWT

Monitoring APIs

Method	Path	Description	Auth
GET	`/api/v1/monitoring/health`	Health check	Public
GET	`/api/v1/monitoring/alerts`	Active alerts	Admin JWT
POST	`/api/v1/monitoring/alerts/:id/acknowledge`	Ack alert	Admin JWT

35. The Bet Processing Pipeline (Step by Step)

This is the most important section. Every step from HTTP request to response.

Latency Budget (2-tier cache: Redis and PostgreSQL)

Step	Budget	Description
1. Request parsing & validation	5ms	Parse JSON, Zod validate
2. Timestamp & metrics	1ms	Assign server timestamp, compute win/liability
3. User win cap check	8ms	Redis for aggregate, compute max stake
4. Matrix version capture	1ms	Read agent.matrix_version from Redis
5. Forwarding resolution	10ms	Precedence chain lookup
6. Cascade routing (per level)	12ms x N	Limit checks + retention calc at each level
7. Position creation + ledger update	15ms	DB transaction
8. Hedge queue placement	2ms	BullMQ add
9. Audit trail write	5ms	Async BullMQ job (non-blocking)
10. Response	2ms	HTTP response
Total (3-level cascade)	~85ms	Within 90ms budget

Without the LRU cache, steps that previously hit LRU now hit Redis (sub-millisecond). The total latency increases by roughly 3-5ms compared to the 3-tier approach, which is well within budget.

Step 1: Request Received (5ms)

What happens: HTTP POST arrives at /api/v1/bets. Express middleware parses JSON. Zod validates the schema: user_id is UUID, stake is positive integer, odds is positive decimal, side is BACK or LAY, all required fields present.

What can go wrong:

Malformed JSON: Return 400.
Missing or invalid fields: Return 400 with specific validation errors.

Error handling: Immediate rejection. No audit record. No further processing.

Step 2: Timestamp Assignment and Metrics Computation (1ms)

What happens:

2a. Assign a server-side timestamp: const processedAt = Date.now(). This is the authoritative time for all downstream decisions (period boundaries, ordering). Never trust the client timestamp.

2b. Compute financial metrics using integer arithmetic in paisa:

potential_win = Math.floor(stake * (odds - 1))
liability = potential_win  (for a BACK bet from the bookie's perspective)

For a LAY bet, the liability is different:

// LAY bet: punter risks stake * (odds - 1), bookie's liability = stake
liability = stake  (for a LAY bet)
potential_win = Math.floor(stake * (odds - 1))

2c. Determine period context. Convert processedAt to the originating agent's timezone. Check if the time falls within the agent's night period (closed-open interval: [night_start, night_end)). Set period_context to NIGHT or DAY.

What can go wrong: Nothing. Pure computation.

Step 3: User Win Cap Check (8ms)

What happens:

3a. Per-click win cap: Compare potential_win against user.per_click_win_limit. If exceeded, compute the max allowable stake:

max_stake = Math.floor(per_click_win_limit / (odds - 1))

3b. Aggregate win cap: Read the user's accumulated potential wins for today from Redis key agg_win:{user_id}:{date}. If adding this bet's potential_win exceeds user.aggregate_win_limit_daily, compute:

remaining_win = aggregate_limit - current_accumulated
max_stake_from_aggregate = Math.floor(remaining_win / (odds - 1))

3c. Take the minimum of max_stake, max_stake_from_aggregate, and the original stake. If the result is less than user.min_stake, reject the bet entirely with "This market is currently unavailable at these odds."

3d. If the stake was reduced, set stake_reduction_reason to PER_CLICK_LIMIT or AGGREGATE_LIMIT.

3e. After accepting the bet, atomically increment the Redis aggregate counter: INCRBY agg_win:{user_id}:{date} {potential_win}. Set TTL to end of day if not already set.

What can go wrong:

Redis unavailable for aggregate check: Fall back to a PostgreSQL query -- SELECT SUM(potential_win) FROM bets WHERE user_id = $1 AND created_at >= $today. Adds ~15ms but is correct.
Two simultaneous bets both read the same aggregate before either increments: The 10% buffer on the aggregate limit absorbs this. The INCRBY is atomic, so the second bet will see the updated value for subsequent bets.

Error handling: If reduced stake is below minimum, return status: REJECTED, reason: BELOW_MINIMUM.

Step 4: Matrix Version Capture (1ms)

What happens: Read the originating agent's matrix_version from Redis key agent:{agent_id}:matrix_version. Store it in the processing context as matrix_version_snapshot. This value will be saved on the bet record to enable deterministic replay.

What can go wrong:

Redis miss: Read from PostgreSQL (SELECT matrix_version FROM agents WHERE id = $1). Adds ~3ms.

Why this matters: If Rajesh changes his matrix after this bet is captured but before it finishes processing, the bet still uses the version it captured. During disputes, the audit record shows which version was used.

Step 5: Forwarding Percentage Resolution (10ms)

What happens: For each agent in the cascade (starting with the punter's direct agent), resolve the forwarding percentage using the 4-level precedence chain:

5a. User override: Check Redis key user_override:{user_id}:{agent_id}. If active and not expired, use its forward_percentage. Done.

5b. Market override: Check Redis key market_override:{agent_id}:{event_id}. If active and not expired, use its forward_percentage. Done.

5c. Matrix rule lookup: Load the agent's active rules from Redis key matrix:{agent_id}. This is a JSON array of all active rules. In application code, filter to rules where every non-wildcard dimension matches the bet's characteristics. From matching rules, pick the one with the highest specificity. If tied on specificity, pick the highest forward_percentage (risk-safe default). If still tied, pick the oldest rule (lowest created_at).

5d. Agent default: If no rule matches (should never happen if a catch-all wildcard rule exists), use agent.default_forward_percentage.

What can go wrong:

Agent has no matrix rules AND no default: Log a P2 alert. Use 100% forward (safest possible -- the agent retains nothing).
Redis cache miss for matrix rules: Load from PostgreSQL. On load, populate Redis key with TTL of 1 hour.

Step 6: Cascade Routing -- Level by Level (12ms per level)

This is the heart of the system. The cascade engine walks up the agent hierarchy from the punter's agent to the platform.

For each level in the hierarchy:

6a. Determine source_type. Does this agent have their own classification for the user? Check user_classifications (cached in Redis). If yes, use it. If no, check agent_trust_config -- does this agent trust the downstream agent's flags? If yes, use the downstream classification. If no, default to NORMAL.

6b. Resolve forward percentage using the same precedence chain as Step 5, but for THIS agent's matrix with the resolved source_type.

6c. Calculate retention:

retained_stake = Math.floor(incoming_stake * (100 - forward_percentage) / 100)
forwarded_stake = incoming_stake - retained_stake
retained_liability = Math.floor(retained_stake * (odds - 1))

6d. Check all applicable limits. For this agent, read exposure from Redis for each relevant scope:

Sport limit: exposure:{agent_id}:SPORT:{sport_type}
Market limit: exposure:{agent_id}:MARKET:{event_id}
Night period limit: exposure:{agent_id}:NIGHT_PERIOD:{date}
Weekly period limit: exposure:{agent_id}:WEEKLY_PERIOD:{week}

For each limit, compare current_exposure + retained_liability against limit_amount. The most restrictive limit determines max retention.

The exposure check decision:

Read the exposure value from Redis. If the agent is far from their limit (say, below 80% utilization), trust the Redis value and proceed. If the agent is near their limit (above 80%), use PostgreSQL with SELECT ... FOR UPDATE to get an authoritative locked value. The 80% threshold is configurable per agent.

Why 80%? At 80% utilization, even if Redis is stale by a few bets, the remaining 20% headroom absorbs the error. At 95% utilization, staleness matters because a few extra bets could breach the limit.

6e. If all limits pass: Agent retains the calculated amount.

6f. If any limit would be breached: Agent retains only up to the most restrictive remaining capacity. The difference becomes overflow added to forwarded_stake.

Real example: Rajesh's matrix says retain 60% of Amit's 10,000 INR bet = 6,000 INR stake. But Rajesh's MI vs CSK match limit has only 3,000 INR remaining. Rajesh retains 3,000 INR. The extra 3,000 INR (overflow) is added to forwarded, making it 7,000 INR total forwarded to Vikram.

6g. NO_NEW_RISK check: If the agent is in NO_NEW_RISK for any applicable scope (checked via Redis flag nnr:{agent_id}:{scope}), determine if this bet is a hedge:

worst_case_before = current worst-case liability across all outcomes
worst_case_after = worst-case liability if this bet's retention is added

If worst_case_after < worst_case_before: it is a hedge. ALLOW retention.
If worst_case_after >= worst_case_before: NOT a hedge. Forward 100%.

For a simple back bet on the same outcome as existing liability: never a hedge. For a back bet on the opposite outcome (or a lay bet): usually a hedge.

6h. Suspended agent: If agent.status = SUSPENDED, skip entirely. Forward 100% to the parent.

6i. Record the decision for this level in a local array (will become the audit payload).

6j. Forward remaining to next level. If the current agent is the platform, the remaining amount is the hedge amount.

What can go wrong:

Parent agent not found (broken hierarchy): P1 alert. Forward directly to the platform.
DB lock timeout on near-limit check: Retry once (most waits resolve in <20ms). If retry fails, assume limit breached, forward 100% from this level. Safe default.

Step 7: Position Creation and Ledger Update (15ms)

What happens: Write positions and update exposure ledgers. This is done sequentially, level by level, each level in its own database transaction.

For each level (e.g., Rajesh, then Vikram, then Platform):

7a. Begin a PostgreSQL transaction.

7b. Insert the RETAINED position row for this agent.

7c. Update the exposure_ledger rows for this agent:

UPDATE exposure_ledger
SET retained_open_liability = retained_open_liability + $retained_liability,
    forwarded_open_liability = forwarded_open_liability + $forwarded_liability,
    open_potential_win = open_potential_win + $potential_win,
    last_updated_at = NOW()
WHERE agent_id = $agent_id
  AND scope_type = $scope_type
  AND scope_key = $scope_key;

If no row exists yet for this scope, insert one (upsert pattern).

7d. If the ledger update pushes the agent into NO_NEW_RISK territory (retained_open_liability >= limit), set no_new_risk_active = true and update Redis flag.

7e. Commit the transaction.

7f. After commit, update Redis exposure counter: SET exposure:{agent_id}:{scope_type}:{scope_key} {new_value}.

Why sequential per-level instead of one big transaction? No cross-agent locking. Rajesh's transaction does not block Vikram's. If Rajesh's write succeeds but Vikram's fails, Rajesh's position is committed. The bet is marked as routing_status = PARTIAL and the remaining levels are retried via the DLQ.

What can go wrong:

DB write fails at level 2 of 3: Level 1 positions are already committed. Mark bet as routing_status = PARTIAL. Queue a retry job for levels 2-3 in BullMQ. The retry reads the bet record to see which levels are done and resumes from the incomplete level.
Redis update fails after DB commit: Redis is stale until the next write or until reconciliation corrects it. Acceptable because the DB is the source of truth.

Step 8: Hedge Queue Placement (2ms)

What happens: If the platform's forwarded amount (the amount to be hedged on Betfair) is greater than zero, add a BullMQ job to the hedge-orders queue:

{
  "bet_id": "uuid",
  "event_id": "string",
  "market_id": "string",
  "selection": "MI to win",
  "side": "BACK",
  "target_price": 1.85,
  "amount_paisa": 80000
}

This is async. The hedge does not block the bet response.

What can go wrong:

BullMQ/Redis unavailable: Write the hedge order directly to the hedge_orders table in PostgreSQL with status = QUEUED. A polling job picks up queued orders every 5 seconds.

Step 9: Audit Trail Write (async, non-blocking)

What happens: Add a BullMQ job to the audit-trail queue with the complete audit payload:

The payload includes:

Original bet details (all fields)
The forwarding chain: for each level, the incoming stake, the forward percentage source (USER_OVERRIDE, MATRIX_RULE, etc.), the rule that matched, the retention amount, every limit checked and its result, any overflow
The matrix_version_snapshot
The period context
Processing timestamps for each step

This write is intentionally async. The bet is confirmed to the punter before the audit record is persisted. If the audit write fails, the bet is still valid -- a reconciliation job detects orphaned bets without audit records and regenerates them from position data.

What can go wrong:

Audit job fails 3 times: Goes to the DLQ. P3 alert. Manual investigation. The bet itself is unaffected.

Step 10: Response to Punter (2ms)

What happens: Return the HTTP response. Emit a Socket.IO event to the agent's dashboard with the new bet.

What can go wrong:

Client already disconnected: The bet is processed. Client can query GET /api/v1/bets/:betId.
Socket.IO delivery failure: Non-critical. Dashboard polls every 5 seconds as a fallback.

36. The Settlement Pipeline (Step by Step)

Step 1: Event Result Received

An admin or automated feed posts the result to POST /api/v1/settlements/events/:eventId. The system validates the result, confirms the event is in a settleable state (LIVE or SUSPENDED), and enqueues a BullMQ settlement job.

Step 2: Market Resolution

For each market in the event, determine winners and losers:

Market Type	Resolution Logic
MATCH_ODDS	Selection matching `winning_selection` = WIN. Others = LOSS.
FANCY (over/under)	actual_value >= line: OVER wins. actual_value < line: UNDER wins.
BOOKMAKER	Same as MATCH_ODDS.

Step 3: Position Identification

Query all open positions for this event:

SELECT * FROM positions
WHERE event_id = :eventId AND status = 'OPEN'
ORDER BY bet_id, cascade_level

Group by bet_id, then by agent_id.

Step 4: Per-Position Settlement (Idempotent)

For each position:

4a. Generate idempotency key: {position_id}_{sha256(event_result_json)}.

4b. Check if a settlement with this key exists. If yes, skip. Already settled.

4c. Determine WIN or LOSS from the market resolution.

4d. Calculate settlement:

WIN (punter wins, agent loses): payout = position.liability, profit_loss = -position.liability
LOSS (punter loses, agent wins): payout = 0, profit_loss = position.stake
VOID: payout = position.stake (return stake), profit_loss = 0

4e. In a transaction:

Insert settlement record.
Update position status to SETTLED.
Decrement exposure ledger: retained_open_liability -= position.liability (for RETAINED). forwarded_open_liability -= position.liability (for FORWARDED). open_potential_win -= position.potential_win.

4f. After commit, update Redis exposure values.

What can go wrong:

Idempotency key already exists: Skip. Safe by design.
DB failure mid-settlement: The failed position stays OPEN. A settlement monitor job picks it up and retries every 30 seconds. After 5 failures, it goes to the DLQ.

Step 5: NO_NEW_RISK Re-evaluation

After settling positions, check all agents who were in NO_NEW_RISK. If retained_open_liability < limit, clear the NO_NEW_RISK flag in both PostgreSQL and Redis. Fire a P3 alert: "Rajesh exited NO_NEW_RISK for cricket."

Step 6: Reconciliation Check

Run a targeted check for each affected agent: compare the ledger value to the sum of open positions. If they match, all good. If they differ, insert an alert with alert_type = RECONCILIATION_MISMATCH. Details in the metadata JSONB.

Step 7: Notifications

Mark the event as SETTLED. Emit Socket.IO events. Queue settlement summary notifications.

37. Configuration Management

How Matrix Rules Are Cached in Redis

When the system needs an agent's matrix rules (during bet processing), it follows this path:

Check Redis key matrix:{agent_id}. If present, use it.
If missing, load from PostgreSQL: SELECT * FROM forwarding_matrix_rules WHERE agent_id = $1 AND is_active = true.
Store in Redis as matrix:{agent_id} with a 1-hour TTL.
Return the rules.

Cache Invalidation: Simple and Direct

When an agent's matrix changes (rule added, updated, or deleted):

The API handler writes the change to PostgreSQL.
Increment agents.matrix_version in the same transaction.
After commit, delete Redis key matrix:{agent_id}.
Delete Redis key agent:{agent_id}:matrix_version.

That is it. No pub/sub broadcast needed for v1 (single process). The next bet that needs this agent's matrix will find a cache miss, load from PostgreSQL, and repopulate Redis.

If you scale to multiple Node.js instances: Add a Redis pub/sub channel config.invalidate. After deleting the Redis key, publish a message {agent_id, type: "MATRIX"}. Each instance subscribes and can clear any in-process caches. But for v1 with one process, this is unnecessary.

What Gets Cached Where

Data	Redis Key Pattern	TTL	Invalidation
Agent matrix rules	`matrix:{agent_id}`	1 hour	Deleted on any rule change
Agent matrix version	`agent:{agent_id}:matrix_version`	1 hour	Deleted on any rule change
Agent limits	`limits:{agent_id}`	1 hour	Deleted on limit change
User override	`user_override:{user_id}:{agent_id}`	1 hour	Deleted on override change
Market override	`market_override:{agent_id}:{event_id}`	1 hour	Deleted on override change
Exposure counter	`exposure:{agent_id}:{scope_type}:{scope_key}`	No TTL	Updated on every write
NO_NEW_RISK flag	`nnr:{agent_id}:{scope}`	No TTL	Set/cleared on state change
User aggregate wins	`agg_win:{user_id}:{date}`	End of day	Incremented atomically
User classification	`user_class:{user_id}:{agent_id}`	1 hour	Deleted on change

38. Error Handling Patterns

Error Categories

Category	Examples	Handling
Validation	Bad input, missing fields	Return 400. No processing. No audit.
Business rule	Suspended market, below-minimum stake	Return 200 with `status: REJECTED`. Audit record created.
Transient infrastructure	Redis timeout, DB pool exhausted	Retry up to 3 times with backoff (100ms, 200ms, 400ms). Fall back or 503.
Permanent infrastructure	DB down	Circuit breaker opens. All bets fall back to 100% forward (safe default).
Data integrity	Negative ledger, broken hierarchy	P1 alert. DLQ. Manual investigation.
External service	Betfair API error	Hedge queue absorbs it. Mark as UNHEDGED after 3 retries.

Retry Policies

Operation	Max Retries	Backoff	On Exhaustion
Redis read	2	50ms, 100ms	Fall back to PostgreSQL
Redis write	2	50ms, 100ms	Log warning; DB is source of truth
PostgreSQL read	2	100ms, 200ms	Return 503
PostgreSQL write	1	100ms	DLQ the operation
Betfair API	3	1s, 2s, 4s	Mark hedge order as UNHEDGED
Settlement position	5	1s, 5s, 30s, 60s, 120s	DLQ

Circuit Breakers

Service	Opens After	Behavior When Open	Closes After
Redis	5 consecutive failures in 10 seconds	All reads go to PostgreSQL. Writes go to PostgreSQL only.	30 seconds of successful PostgreSQL reads
Betfair	5 consecutive failures in 60 seconds	All hedge orders queue up. No new Betfair calls.	Probe every 30 seconds; close after 3 successes

When Redis circuit breaker is open, the system still works -- just slower. Every Redis read becomes a PostgreSQL read (add ~5ms per read). This is the graceful degradation path.

DLQ Integration

Any operation that exhausts retries creates a DLQ entry:

Source: BET_PROCESSING / SETTLEMENT / HEDGE
Reference ID: the bet_id or position_id
Error: message + stack trace
Payload: full context to retry manually
Status: PENDING

A P2 alert fires on every DLQ entry. Admins resolve via API: retry, void and refund, or manual override.

39. Testing Strategy

Unit Test Targets

Module	Coverage	Key Scenarios
MatrixResolutionModule	95%	Wildcard matching, specificity tie-breaking (most specific wins, then highest forward wins, then oldest wins), precedence chain order (user override > market > matrix > default), missing rules fallback to 100% forward
CascadeEngineModule	95%	2-level cascade, 4-level cascade, limit overflow cascading extra to upline, suspended agent skip, NO_NEW_RISK with hedge detection (back bet blocked, lay bet allowed)
LimitEnforcementModule	95%	All 4 limit types, most restrictive wins, exact boundary (limit minus 1 paisa passes, limit exactly fails), period-aware checking
ExposureLedgerModule	90%	Increment/decrement, upsert on first bet, Redis update after commit, Redis fallback to PostgreSQL
SettlementModule	95%	WIN/LOSS/VOID settlement, idempotency (same key returns without re-processing), ledger decrement to zero, re-settlement
StakeReductionModule	95%	Per-click reduction, aggregate reduction, below-minimum rejection, edge case odds (1.01 produces huge stake, 1000.00 produces tiny stake)
HedgeExecutionModule	90%	Successful fill, Betfair error retry, 3x failure marks UNHEDGED

Integration Test Scenarios

Scenario	What It Tests	Expected Outcome
Full 3-level cascade bet	Amit bets 10,000 INR through Rajesh to Vikram to Platform	3 retained positions created, all exposure ledgers incremented, audit record complete, hedge order queued
Limit overflow	Rajesh has 3,000 INR remaining on match limit, bet wants 6,000 INR retention	Rajesh retains 3,000, overflow of 3,000 forwarded to Vikram, Vikram processes normally
NO_NEW_RISK hedge	Rajesh in NO_NEW_RISK, lay bet arrives on the overexposed outcome	Lay bet accepted as hedge, Rajesh's worst-case liability decreases
NO_NEW_RISK non-hedge	Rajesh in NO_NEW_RISK, back bet arrives on the same outcome	Back bet forwarded 100% to Vikram
Stake reduction	Bet at odds 50.00 with 50,000 INR per-click limit	Stake reduced from 5,000 to 1,020 INR
Settlement cascade	Event settles, positions across 3 agents	All positions settled, ledgers decremented, NO_NEW_RISK cleared
Matrix version capture	Agent changes matrix between two bets	First bet uses old version, second uses new version, audit records differ
Concurrent bets near limit	10 simultaneous bets at 95% utilization	All resolve, total does not exceed limit (PostgreSQL FOR UPDATE path)
Betfair timeout	Hedge order sent, Betfair returns 503	Retried 3 times, marked UNHEDGED, bet still accepted
Redis outage	Redis killed mid-test	System falls back to PostgreSQL for all reads, latency increases, correctness maintained
Void cascade	Admin voids a bet, all positions reversed	Every position reversed, every ledger decremented, NO_NEW_RISK re-evaluated
Cash-out	Amit requests cash-out at improved odds	Counter-positions created using original proportions (not current matrix), exposure reduced
Suspended agent	Vikram suspended, bet arrives for Rajesh	Rajesh's overflow skips Vikram, goes directly to Platform
Idempotent settlement	Settlement triggered twice for same event	Second invocation is a no-op (idempotency keys match)

Load Test Scenarios

Scenario	Traffic	Success Criteria
Sustained peak	167 bets/sec for 30 minutes	P99 < 90ms, zero errors
Burst spike	500 bets/sec for 60 seconds	P99 < 200ms, error rate < 0.1%
Settlement storm	3 events settle simultaneously (5,000 positions)	Completes within 5 minutes

Chaos Test Scenarios

Scenario	Simulation	Expected Behavior
Redis down	Kill Redis	Circuit breaker opens in 3 seconds. All reads fall back to PostgreSQL. No data loss. Bets accepted at higher latency.
PostgreSQL down	Kill PostgreSQL	All writes fail. 503 returned. Alert fires. Bets queued on client side or rejected.
Betfair down	Block outbound to Betfair	Hedge queue grows. Bets accepted. Platform absorbs risk. UNHEDGED alert fires.
Slow PostgreSQL	Add artificial 50ms latency	P99 increases. Some bets exceed budget. No data loss.

40. Deployment Strategy

Docker Compose for Local Dev

services:
  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      DATABASE_URL: postgresql://hannibal:secret@postgres:5432/hannibal
      REDIS_URL: redis://redis:6379
      BETFAIR_API_KEY: test_key
      NODE_ENV: development
    depends_on:
      - postgres
      - redis

  postgres:
    image: postgres:16
    environment:
      POSTGRES_DB: hannibal
      POSTGRES_USER: hannibal
      POSTGRES_PASSWORD: secret
    volumes:
      - postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"

  redis:
    image: redis:7
    ports:
      - "6379:6379"

volumes:
  postgres_data:

Three services. One command: docker compose up. Database migrations via npx prisma migrate dev.

Production Deployment

One Docker container running the Node.js application with all modules and BullMQ workers. Deploy behind nginx or a cloud load balancer for SSL termination.

Component	Spec	Notes
App container	2 vCPU, 4 GB RAM	Single instance to start. Add a second when P99 exceeds 90ms under load.
PostgreSQL	4 vCPU, 16 GB RAM, SSD	Managed service (RDS, Cloud SQL, etc.). Automated backups.
Redis	2 vCPU, 4 GB RAM	Managed service (ElastiCache, Redis Cloud). Persistence enabled.

Feature Flag Rollout Process

Feature flags are environment variables:

FEATURE_CASCADING_ROUTING=false
FEATURE_USER_WIN_LIMITS=false
FEATURE_NO_NEW_RISK=false
FEATURE_HEDGE_EXECUTION=false
FEATURE_PERIOD_MANAGEMENT=false

Rollout steps:

Deploy code with feature off (env var = false).
Enable for a single test agent by adding agent-specific override in config.
Monitor for 24-48 hours: check audit trails, ledger accuracy.
Enable globally (set env var = true). Redeploy.
After 2 weeks stable, remove the feature flag check from code.

41. Implementation Phases

Phase 1: Foundation (Weeks 1-4)

Goal: All building blocks exist but are not connected into a pipeline.

Week	Deliverables
1	Prisma schema migration: all tables. Seed data with test agents (Vikram, Rajesh, Priya) and test users (Amit, Sonia). Docker Compose setup. Basic Express app with health endpoint.
2	ExposureLedgerModule: PostgreSQL reads/writes, single counter per agent per scope (no sharding). Redis read/write for counters. LimitEnforcementModule: check all 4 limit types, return max retainable amount.
3	MatrixResolutionModule: full 5D wildcard matching with specificity tie-breaking. Precedence chain. ConfigModule: load matrix rules from DB, cache in Redis with delete-on-change invalidation.
4	UserManagementModule: per-click win cap check, aggregate win cap (Redis INCRBY). StakeReductionModule. AuditModule: BullMQ job that writes JSONB audit record.

Phase 2: Core Pipeline (Weeks 5-8)

Goal: End-to-end bet placement and settlement through the full cascade.

Week	Deliverables
5	CascadeEngineModule: N-level cascade with matrix resolution and limit checking per level. Overflow handling. Suspended agent skip. BetProcessingModule: orchestrates the full pipeline from HTTP request to response.
6	Position creation with per-level transactions. Exposure ledger updates atomic with positions. End-to-end bet placement through 3 levels. Full integration tests.
7	SettlementModule: event result processing, position settlement (idempotent), ledger decrement, bet status updates.
8	Void/cancellation state machine. Cash-out (counter-position using original proportions). Lay bet support. Feature flag: enable cascade per agent.

End of Phase 2: The system can accept, route, settle, void, and cash out bets through the full cascade. This is the MVP.

Phase 3: Production Hardening (Weeks 9-12)

Goal: Ready for real traffic with safety mechanisms.

Week	Deliverables
9	NO_NEW_RISK: automatic trigger, hedge detection, scoped activation. Period management: night/weekly periods, timezone handling, carry-forward logic.
10	HedgeExecutionModule: Betfair API integration, market orders, 3x retry, mark UNHEDGED on failure. BullMQ hedge queue.
11	Dead letter queue with admin API (list, retry, resolve). Nightly reconciliation: compare ledgers to position sums, alert on mismatch.
12	Circuit breakers for Redis and Betfair. Redis fallback to PostgreSQL. End-to-end chaos testing. Load testing at 167 bets/sec.

End of Phase 3: Production-ready for controlled launch.

Phase 4: Intelligence (Weeks 13-16)

Week	Deliverables
13	Sharp detection: CLV calculation over rolling 500-bet window, behavioral scoring, automatic classification.
14	Collusion detection: classification flip timing correlation, forwarded-flow P&L asymmetry detection.
15	Agent dashboard: real-time exposure, traffic light view, bet feed, settlement summary. Socket.IO push updates.
16	WhatsApp integration: scheduled status messages, interactive commands (status, stop, resume, panic).

Phase 5: Scale-Up (When Data Demands It)

This phase has no fixed timeline. Each item is triggered by a measurable threshold, not a calendar date. See the Scale-Up Reference section below.

42. Scale-Up Reference

This section is the bridge between v1 (simple) and v2 (full architecture). Add complexity only when data proves you need it.

When to Add Each Optimization

Optimization	Trigger Threshold	What to Do	Expected Impact
Application LRU cache	Redis P99 latency exceeds 5ms under sustained load	Add an in-memory LRU cache (e.g., `lru-cache` npm) with 5-second TTL in front of Redis for exposure reads and matrix lookups	Reduces Redis calls by ~60%. Drops P99 back below 2ms. Adds cache invalidation complexity.
Sharded exposure counters	PostgreSQL write contention on `exposure_ledger` exceeds 50ms P99 for hot agents	Add `shard_index` column to exposure_ledger. Use 8 shards per hot agent. Each write picks a random shard. Reads sum all shards.	Reduces lock contention by 8x.
Prometheus + Grafana	You have a dedicated ops person (or the team exceeds 5 engineers)	Add `prom-client` for metrics. Deploy Prometheus and Grafana. Build dashboards for bet latency, settlement time, hedge fill rate, DLQ depth.	Real-time operational visibility. Replaces log-grep debugging with dashboards.
Read replicas	Dashboard queries slow down bet processing (same PostgreSQL instance)	Add a PostgreSQL read replica. Route all dashboard/reporting queries to the replica.	Isolates read load from write path.
Multiple Node.js instances	Single process CPU exceeds 70% sustained during peak	Add a second instance behind a load balancer. BullMQ handles job distribution automatically.	Doubles throughput capacity.
Separate audit database	Audit trail table exceeds 100 GB or audit writes cause write contention	Move `audit_trail` to a separate PostgreSQL instance. Change audit write module to target the new connection.	Isolates audit I/O from transactional I/O.
Multi-currency	Expanding beyond India (Southeast Asia, Africa)	Add base_currency to agents, FX conversion at platform boundary, FX rate capture in audit trail. See v2 Section 34 for full design.	Enables Ghanaian Cedis, Thai Baht, etc.
Responsible gambling module	Entering regulated markets (UK, EU, Australia)	Add self-exclusion, session limits, reality checks, deposit limits. See v2 Section 46 for full design.	Regulatory compliance.
Message broker (Kafka)	Bet volume exceeds 1000 bets/sec sustained	Replace BullMQ for bet event streaming with Kafka. Keep BullMQ for background jobs.	Higher throughput, better backpressure, multi-consumer.
Microservice extraction	A specific module (e.g., hedge execution) needs independent scaling or a separate deployment cycle	Extract the module into its own service with its own repository, deployment, and scaling. Communicate via message queue.	Independent scaling and deployment.
Limit order hedging	Betfair slippage on market orders exceeds 2% average	Replace market orders with limit orders. Add reprice logic, time-in-force, partial fill tracking. See v2 Section 52 hedge_orders schema.	Better execution quality, less slippage.
Matrix version history table	Agents create more than 100 matrix versions and audit replay needs fast version lookup	Add a `matrix_versions` table storing immutable snapshots. Reference by version_id instead of version integer.	Faster historical replay, cleaner audit.
Reconciliation frequency	Nightly reconciliation is too slow to catch drift during IPL matches	Add 15-minute scheduled reconciliation. Add post-settlement targeted reconciliation. See v2 Section 36 for full design.	Faster drift detection.

How to Measure Each Threshold

Metric	How to Measure in v1 (without Prometheus)
Redis P99 latency	Pino log with `redis_latency_ms` on each call. Query logs: sort, take 99th percentile.
PostgreSQL write contention	Pino log with `db_write_ms` on exposure_ledger updates. Watch for spikes above 50ms.
CPU utilization	`process.cpuUsage()` logged every 60 seconds. Or `top` / container metrics from your hosting platform.
Bet volume	Count from Pino logs or a simple in-memory counter logged every minute.
Hedge slippage	`target_price - fill_price` logged on each hedge fill. Average over a rolling window.
Audit table size	`SELECT pg_total_relation_size('audit_trail')` run nightly.

The Principle

Start simple. Measure everything. Add complexity only when a specific, measurable threshold is crossed. Every optimization in the v2 document is available as a well-documented upgrade path. You never need to redesign -- you add a layer, a shard, or a service.

The simplest architecture that works is the best architecture until it stops working. Then you have data to tell you exactly where to invest.

This document is maintained by the Hannibal engineering and product teams. For the full-scale architecture with all optimizations, refer to B-Book-Architecture-v2.md.

Table of Contents
1. Executive Summary
- What is the B-Book?
- What problem does it solve?
- v3 Simplification
2. The Core Problem -- In Plain English
- The Agent Hierarchy: A Real Example
- When Amit Places a Bet
- The Fundamental Questions
3. Simplification Philosophy
- The Guiding Principle
- What Was Simplified and Why
- What This Means for the Latency Budget
4. The Forwarding Matrix -- The Brain of the System
- What It Is
- The 5 Dimensions
- Wildcard Matching
- Tie-Breaking Rules (Deterministic)
- Resolution Precedence Chain
- Sensible Defaults by Sport
- Common Mistakes and Prevention
5. The Bet Flow -- Step by Step
- The Complete Flow
- Walking Through the Numbers
6. Cascading Upline Routing
- How Bets Flow Through the Hierarchy
- What Happens at Each Level
- Overflow Example
- Suspended Agents
- The Betfair Backstop
- Does Sharp Classification Travel Upline?
7. Agent Liability Limits & NO_NEW_RISK
- The Limit Structure
- How Limits Interact: Most Restrictive Wins
- NO_NEW_RISK Mode
8. User Win Limits & Stake Reduction
- Per-Click Win Limit
- Aggregate Win Limit
- Sharp Detection Signals
9. Period Definitions -- Night & Weekly
- Why Bookies Use Periods
- Night Period
- Period Rollover
- DST
10. Exposure Accounting (2-Tier: Redis + PostgreSQL)
- Three Ledgers Per Agent Per Scope
- The 2-Tier Architecture
- Settlement Impact
11. Audit Trail & Determinism
- The Audit Record
- Why Determinism Matters
- Configuration Change Log
12. Performance Architecture
- Latency Budget (Simplified)
- Exposure Counter Design (Simple Redis Atomic Counters)
- What Is Cached Where
13. The Agent Experience
- Three Tiers of Experience
- The "Sleep Well" Number
- The Panic Button
14. Nightmare Scenarios & How We Handle Them
- Syndicate Attack (Correlated Positions Across Agents)
- Data Feed Failure During Live Play
- Rogue Agent Dumping Toxic Flow
- Double Settlement / Result Correction
- System Outage During Peak
15. Bet Cancellation / Void / Partial Settlement State Machine
- The Complete State Machine
- State Definitions
- Who Can Initiate Transitions
- How Voids Cascade
- Partial Void (Multi-Leg Bets)
16. Matrix Version Control
- The Problem
- The Simple Solution: Version Integer
- Why This Is Sufficient
- What the Agent Sees
17. Dead Letter Queue and Poison Bet Handling
- The Retry Pipeline
- What Constitutes a Poison Bet
- What the Punter Experiences
- Manual Resolution
18. Settlement Cascade & Failure Isolation
- Per-Position Settlement State
- Settlement Worker Design
- Settlement Idempotency
- Partial Failure Recovery
19. Cash-Out / Early Settlement
- How Cash-Out Works
- Routing Through the Original Proportions
- Partial Cash-Out
- Interaction with NO_NEW_RISK
20. Lay Bet Support
- What Is a Lay Bet?
- Exposure Impact
- How the Matrix Handles Lay Bets
21. Collusion Detection
- The Problem
- Detection Signals
- Cooling-Off Period for Classification Changes
- Upline Audit Rights
- Correlation Engine (Phase 2 Implementation)
22. Agent Hierarchy Migration
- The Problem
- Effective-Dated Changes
- Dual-Path Settlement
23. Minimum Forwarding / Skin-in-the-Game Requirements
- How It Works
24. Panic Button & Abuse Prevention
- What the Panic Button Does
- Abuse Prevention
25. Timestamp Security & Period Boundaries
- Server-Side Authority
- Period Boundary Tie-Breaking
- Clock Skew
26. Sharp Detection via Multiple Accounts
- The Detection Pillars (Design for Phase 1, Heavy Implementation Phase 2)
- Phase 1 Implementation
27. Rate Limiting on Configuration Changes
- Per-Agent Rate Limits
- Queue Behavior
- Panic Button Interaction
28. Hedge Execution
- The Simple Design
- Betfair Health Check
29. Reconciliation
- The Simple Design
30. Monitoring & Alerting
- The Simple Design
31. Technology Stack
- Core Technologies
- What We Are NOT Using (and Why)
32. System Architecture Overview
- System Diagram
- How Services Communicate
- Deployment Topology
33. Database Schema Design
- Entity-Relationship Diagram
- Table: agents
- Table: agent_limits
- Table: forwarding_matrix_rules
- Table: user_overrides
- Table: market_overrides
- Table: users
- Table: user_classifications
- Table: agent_trust_config
- Table: events
- Table: markets
- Table: bets (partitioned by month on created_at)
- Table: positions (partitioned by month on created_at)
- Table: exposure_ledger
- Table: settlements (partitioned by month on created_at)
- Table: audit_trail (partitioned by month on created_at)
- Table: dead_letter_queue
- Table: hedge_orders
- Table: alerts
34. API Design
- Bet Placement APIs
- Agent Configuration APIs
- User Management APIs
- Settlement APIs
- Admin APIs
- Monitoring APIs
35. The Bet Processing Pipeline (Step by Step)
- Latency Budget (2-tier cache: Redis and PostgreSQL)
- Step 1: Request Received (5ms)
- Step 2: Timestamp Assignment and Metrics Computation (1ms)
- Step 3: User Win Cap Check (8ms)
- Step 4: Matrix Version Capture (1ms)
- Step 5: Forwarding Percentage Resolution (10ms)
- Step 6: Cascade Routing -- Level by Level (12ms per level)
- Step 7: Position Creation and Ledger Update (15ms)
- Step 8: Hedge Queue Placement (2ms)
- Step 9: Audit Trail Write (async, non-blocking)
- Step 10: Response to Punter (2ms)
36. The Settlement Pipeline (Step by Step)
- Step 1: Event Result Received
- Step 2: Market Resolution
- Step 3: Position Identification
- Step 4: Per-Position Settlement (Idempotent)
- Step 5: NO_NEW_RISK Re-evaluation
- Step 6: Reconciliation Check
- Step 7: Notifications
37. Configuration Management
- How Matrix Rules Are Cached in Redis
- Cache Invalidation: Simple and Direct
- What Gets Cached Where
38. Error Handling Patterns
- Error Categories
- Retry Policies
- Circuit Breakers
- DLQ Integration
39. Testing Strategy
- Unit Test Targets
- Integration Test Scenarios
- Load Test Scenarios
- Chaos Test Scenarios
40. Deployment Strategy
- Docker Compose for Local Dev
- Production Deployment
- Feature Flag Rollout Process
41. Implementation Phases
- Phase 1: Foundation (Weeks 1-4)
- Phase 2: Core Pipeline (Weeks 5-8)
- Phase 3: Production Hardening (Weeks 9-12)
- Phase 4: Intelligence (Weeks 13-16)
- Phase 5: Scale-Up (When Data Demands It)
42. Scale-Up Reference
- When to Add Each Optimization
- How to Measure Each Threshold
- The Principle

Table of Contents​

1. Executive Summary​

What is the B-Book?​

What problem does it solve?​

v3 Simplification​

2. The Core Problem -- In Plain English​

The Agent Hierarchy: A Real Example​

When Amit Places a Bet​

The Fundamental Questions​

3. Simplification Philosophy​

The Guiding Principle​

What Was Simplified and Why​

What This Means for the Latency Budget​

4. The Forwarding Matrix -- The Brain of the System​

What It Is​

The 5 Dimensions​

Wildcard Matching​

Tie-Breaking Rules (Deterministic)​

Resolution Precedence Chain​

Sensible Defaults by Sport​

Common Mistakes and Prevention​

5. The Bet Flow -- Step by Step​

The Complete Flow​

Walking Through the Numbers​

6. Cascading Upline Routing​

How Bets Flow Through the Hierarchy​

What Happens at Each Level​

Overflow Example​

Suspended Agents​

The Betfair Backstop​

Does Sharp Classification Travel Upline?​

7. Agent Liability Limits & NO_NEW_RISK​

The Limit Structure​

How Limits Interact: Most Restrictive Wins​

NO_NEW_RISK Mode​

8. User Win Limits & Stake Reduction​

Per-Click Win Limit​

Aggregate Win Limit​

Sharp Detection Signals​

9. Period Definitions -- Night & Weekly​

Why Bookies Use Periods​

Night Period​

Period Rollover​

DST​

10. Exposure Accounting (2-Tier: Redis + PostgreSQL)​

Three Ledgers Per Agent Per Scope​

The 2-Tier Architecture​

Settlement Impact​

11. Audit Trail & Determinism​

The Audit Record​

Why Determinism Matters​

Configuration Change Log​

12. Performance Architecture​

Latency Budget (Simplified)​

Exposure Counter Design (Simple Redis Atomic Counters)​

What Is Cached Where​

13. The Agent Experience​

Three Tiers of Experience​

The "Sleep Well" Number​

The Panic Button​

14. Nightmare Scenarios & How We Handle Them​

Syndicate Attack (Correlated Positions Across Agents)​

Data Feed Failure During Live Play​

Rogue Agent Dumping Toxic Flow​

Double Settlement / Result Correction​

System Outage During Peak​

15. Bet Cancellation / Void / Partial Settlement State Machine​

The Complete State Machine​

State Definitions​

Who Can Initiate Transitions​

How Voids Cascade​

Partial Void (Multi-Leg Bets)​

16. Matrix Version Control​

The Problem​

The Simple Solution: Version Integer​

Why This Is Sufficient​

What the Agent Sees​

17. Dead Letter Queue and Poison Bet Handling​

The Retry Pipeline​

What Constitutes a Poison Bet​

Table of Contents

1. Executive Summary

What is the B-Book?

What problem does it solve?

v3 Simplification

2. The Core Problem -- In Plain English

The Agent Hierarchy: A Real Example

When Amit Places a Bet

The Fundamental Questions

3. Simplification Philosophy

The Guiding Principle

What Was Simplified and Why

What This Means for the Latency Budget

4. The Forwarding Matrix -- The Brain of the System

What It Is

The 5 Dimensions

Wildcard Matching

Tie-Breaking Rules (Deterministic)

Resolution Precedence Chain

Sensible Defaults by Sport

Common Mistakes and Prevention

5. The Bet Flow -- Step by Step

The Complete Flow

Walking Through the Numbers

6. Cascading Upline Routing

How Bets Flow Through the Hierarchy

What Happens at Each Level

Overflow Example

Suspended Agents

The Betfair Backstop

Does Sharp Classification Travel Upline?

7. Agent Liability Limits & NO_NEW_RISK

The Limit Structure

How Limits Interact: Most Restrictive Wins

NO_NEW_RISK Mode

8. User Win Limits & Stake Reduction

Per-Click Win Limit

Aggregate Win Limit

Sharp Detection Signals

9. Period Definitions -- Night & Weekly

Why Bookies Use Periods

Night Period

Period Rollover

DST

10. Exposure Accounting (2-Tier: Redis + PostgreSQL)

Three Ledgers Per Agent Per Scope

The 2-Tier Architecture

Settlement Impact

11. Audit Trail & Determinism

The Audit Record

Why Determinism Matters

Configuration Change Log

12. Performance Architecture

Latency Budget (Simplified)

Exposure Counter Design (Simple Redis Atomic Counters)

What Is Cached Where

13. The Agent Experience

Three Tiers of Experience

The "Sleep Well" Number

The Panic Button

14. Nightmare Scenarios & How We Handle Them

Syndicate Attack (Correlated Positions Across Agents)

Data Feed Failure During Live Play

Rogue Agent Dumping Toxic Flow

Double Settlement / Result Correction

System Outage During Peak

15. Bet Cancellation / Void / Partial Settlement State Machine

The Complete State Machine

State Definitions

Who Can Initiate Transitions

How Voids Cascade

Partial Void (Multi-Leg Bets)

16. Matrix Version Control

The Problem

The Simple Solution: Version Integer

Why This Is Sufficient

What the Agent Sees

17. Dead Letter Queue and Poison Bet Handling

The Retry Pipeline

What Constitutes a Poison Bet