B-Book Architecture & Design Document v2.0

System: Hannibal Date: February 2026 Status: Living Document Audience: Product, Engineering, Operations, Stakeholders

Executive Summary
The Core Problem -- In Plain English
The Forwarding Matrix -- The Brain of the System
The Bet Flow -- Step by Step
Cascading Upline Routing
Agent Liability Limits
User Win Limits & Stake Reduction
NO_NEW_RISK Mode
Period Definitions -- Night & Weekly
Exposure Accounting
Audit & Determinism
What the Current Codebase Already Has (and What's Missing)
The Nightmare Scenarios & How We Handle Them
Operational Dashboard -- What Agents Need to See
Performance Architecture
Competitive Landscape
Phased Rollout Plan
Revenue Model
Implementation Order (for Developers)
The Bookie's Final Verdict

1. Executive Summary

What is the B-Book?

The B-Book is Hannibal's hierarchical, deterministic risk-management and routing engine. Think of it as the brain that sits between a punter placing a bet and the final destination of that bet's risk.

In traditional bookmaking, a "B-Book" means the bookie keeps the bet on their own books -- they take the other side of the punter's wager. If the punter loses, the bookie profits. If the punter wins, the bookie pays. The opposite is an "A-Book" where the bookie immediately hedges the bet on an exchange like Betfair, earning a small commission but taking no risk.

Hannibal's B-Book is more sophisticated than either approach. It is a routing engine that decides, for every single bet, what percentage stays at each level of an agent hierarchy and what percentage gets forwarded up the chain. It enforces limits automatically, cascades overflow intelligently, and maintains a complete audit trail of every decision.

What problem does it solve?

In sports betting agent networks -- prevalent across India, Southeast Asia, and Africa -- agents operate at different levels of a hierarchy. A master agent in Delhi manages sub-agents across several cities. Each sub-agent manages hundreds or thousands of punters. Every agent bears a different amount of risk based on their appetite, bankroll, and expertise.

Today, this risk allocation happens through manual spreadsheet management, WhatsApp groups, and phone calls. An agent might tell their upline "I'll keep 30% of cricket bets and forward you 70%." But there is no enforcement, no automatic cap management, and no audit trail. When disputes arise -- and they always arise -- there is no source of truth.

The B-Book automates all of this. It replaces manual negotiation with a configurable, enforceable, auditable system.

Why this matters

For agents: Automatic enforcement of limits means they never accidentally take on more risk than they can afford. No more 3am phone calls during an IPL match because exposure got out of hand.
For the platform: Deterministic routing means every rupee of every bet is accounted for. Disputes become trivially resolvable by replaying the decision trail.
For punters: Faster bet acceptance, consistent limits, and transparent maximum stakes.

The key differentiator

What sets Hannibal apart from existing tools is the automated forwarding matrix with cascading upline routing and deterministic audit trails. No other platform in this market offers a multi-dimensional forwarding matrix that automatically resolves routing percentages, cascades overflow through an arbitrary-depth agent hierarchy, and produces a complete, replayable decision record for every bet.

2. The Core Problem -- In Plain English

The Agent Hierarchy: A Real Example

Let us meet the people in our system:

                    +-----------------------+
                    |   Betfair Exchange    |
                    |  (External Hedge)     |
                    +-----------+-----------+
                                |
                    +-----------+-----------+
                    |     HANNIBAL          |
                    |  (The Platform)       |
                    +-----------+-----------+
                                |
                    +-----------+-----------+
                    |      VIKRAM           |
                    |  Master Agent, Delhi  |
                    |  Manages 12 sub-agents|
                    +-----------+-----------+
                                |
              +-----------------+-----------------+
              |                                   |
  +-----------+-----------+           +-----------+-----------+
  |       RAJESH          |           |       PRIYA           |
  |  Sub-Agent, Mumbai    |           |  Sub-Agent, Bangalore |
  |  200 cricket punters  |           |  150 football punters |
  +-----------+-----------+           +-----------------------+
              |
    +---------+---------+
    |  AMIT   |  SONIA  |  ... 198 more punters
    | Punter  | Punter  |
    +---------+---------+

Vikram is a master agent based in Delhi. He has been in the betting business for 15 years. He has a strong bankroll and deep knowledge of cricket markets. He is comfortable retaining significant risk on IPL matches.

Rajesh is one of Vikram's sub-agents, operating out of Mumbai. He manages about 200 punters who mainly bet on cricket. Rajesh has a moderate bankroll. He wants to keep some risk (because that is where the profit is) but cannot afford to take on unlimited exposure.

Amit is one of Rajesh's punters. He is a regular cricket bettor who places bets of between 500 and 50,000 on IPL matches.

When Amit Places a Bet: The Complete Journey

Amit opens his phone and places a bet: 10,000 on Mumbai Indians to win at odds 1.85 during the IPL.

Here is what needs to happen in the next 90 milliseconds:

Can Amit even place this bet? Check his per-click win limit. At odds 1.85, a 10,000 stake means a potential win of 8,500. Is that within his limit?
How much does Rajesh keep? The forwarding matrix says Rajesh retains 40% of IPL match odds bets. So Rajesh keeps 4,000 of the 10,000 stake (and the corresponding 3,400 potential liability).
Can Rajesh afford to keep that 4,000? Check Rajesh's cricket limits, his per-match limits, his night period limit. If any limit is breached, reduce what Rajesh keeps.
What happens to the other 6,000? It goes up to Vikram. Vikram's own forwarding matrix says he retains 60% of what arrives at his level. So Vikram keeps 3,600 and forwards 2,400 to the platform.
What does the platform do with the remaining 2,400? Hannibal may retain some and hedge the rest on Betfair, depending on its own risk appetite.
Record everything. The complete decision chain -- every percentage, every limit check, every cap evaluation -- is persisted as an audit record.

The Fundamental Questions

Every single bet must answer these questions:

Question	Who Answers It
How much risk does Rajesh keep?	Rajesh's forwarding matrix + his limits
How much risk does Vikram keep?	Vikram's forwarding matrix + his limits
How much risk does the platform keep?	Platform's risk configuration
How much gets hedged on Betfair?	Whatever remains after all agents have taken their share
What if someone's limits are breached?	Overflow cascades up to the next level
What if Betfair is unavailable?	Platform absorbs as retained risk, retries asynchronously

3. The Forwarding Matrix -- The Brain of the System

What It Is

The forwarding matrix is a multi-dimensional lookup table that determines what percentage of each bet an agent retains versus forwards to their upline. It is the single most important configuration in the entire B-Book system.

Think of it like a spreadsheet where the rows represent different combinations of conditions, and the output is a single number: the forward percentage. If the matrix says "forward 60%," the agent keeps 40% and sends 60% up the chain.

The 5 Dimensions

Every bet has characteristics that determine how it should be routed. The matrix uses five dimensions to make this decision:

Dimension	What It Means	Example Values
market_type	The type of bet being placed	MATCH_ODDS, FANCY, BOOKMAKER, OVER_UNDER, LINE
sport_type	Which sport	CRICKET, FOOTBALL, TENNIS, KABADDI
event_phase	When in the event lifecycle	PRE_MATCH, IN_PLAY, APPROACHING_START
source_type	What kind of punter	NORMAL, SHARP, VIP, NEW_ACCOUNT
liquidity_band	How much exchange liquidity exists to hedge	HIGH, MEDIUM, LOW, NONE

How Wildcard Matching Works

An agent does not need to define a rule for every possible combination. That would be thousands of rows. Instead, the matrix supports wildcards (shown as *), which mean "match anything."

Here is an example of Rajesh's forwarding matrix:

Rule	market_type	sport_type	event_phase	source_type	liquidity_band	Forward %
R1	FANCY	CRICKET	IN_PLAY	SHARP	*	95%
R2	FANCY	CRICKET	IN_PLAY	*	*	70%
R3	MATCH_ODDS	CRICKET	PRE_MATCH	*	HIGH	40%
R4	MATCH_ODDS	CRICKET	PRE_MATCH	*	LOW	70%
R5	MATCH_ODDS	CRICKET	IN_PLAY	*	*	60%
R6	*	CRICKET	*	SHARP	*	90%
R7	*	FOOTBALL	*	*	*	80%
R8	*	*	*	*	*	50%

Reading this table in plain English:

R1: If a sharp user places an in-play cricket fancy bet, forward 95%. Rajesh keeps only 5% because sharp users on in-play fancies are the most dangerous bets in cricket.
R3: For pre-match cricket match odds where exchange liquidity is high, forward only 40%. Rajesh keeps 60% because these are the safest bets -- they are easy to price and easy to hedge if needed.
R7: For any football bet, forward 80%. Rajesh is not a football expert, so he keeps very little.
R8: The catch-all rule. For anything not covered above, forward 50%.

Tie-Breaking Rules

What happens when a bet matches multiple rules? The system uses strict, deterministic tie-breaking:

Most specific rule wins. A rule with fewer wildcards is more specific. R1 (one wildcard) beats R2 (two wildcards). Specificity is counted as the number of non-wildcard dimensions.
If specificity is equal, higher forward percentage wins. This is the "risk-safe" default. When in doubt, forward more rather than less. The agent is protected from accidental over-exposure.
If forward percentage is also equal, deterministic ordering by rule creation timestamp. The oldest rule wins. This ensures the same bet always resolves the same way.

Resolution Precedence Chain

The forwarding matrix is not the only thing that determines routing. There is a four-level precedence chain:

Level	What It Is	When to Use It
User Override	A specific forward % for a specific punter	"This user is a known sharp -- forward 100% of their bets"
Market Override	A specific forward % for a specific event/market	"The CSK vs MI final is too big -- forward 90% of everything on this match"
Matrix Rule	The multi-dimensional lookup described above	Normal day-to-day operations
Agent Default	A single fallback percentage	"If nothing else matches, forward 50%"

Sensible Defaults by Sport

These are the recommended starting ranges based on industry practice. New agents should start at the higher end of the forward range (lower retention) until they build confidence:

Scenario	Recommended Retention	Why
Cricket in-play fancy (session/over runs)	10-20%	Highest variance, hardest to price, stale odds risk
Cricket pre-match match odds	40-60%	Well-priced, ample exchange liquidity, predictable markets
Cricket in-play match odds	20-40%	More volatile than pre-match, but still hedgeable
Football Premier League pre-match	50-70%	Deep liquidity, well-understood markets, strong pricing models
Football lower leagues pre-match	20-40%	Less information, worse pricing, integrity risk
Tennis	10-25%	Extremely volatile, retirement risk, low liquidity on most matches
Kabaddi	5-15%	Thin markets, poor external pricing, limited hedge options

Common Mistakes Agents Will Make

Mistake	What Happens	How We Prevent It
Setting retention too high on in-play fancies	One bad session wipes out a week of profit	Warn when retention exceeds recommended range; require confirmation
No catch-all rule	Some bets have no matching rule and the system cannot route them	System requires a default rule; matrix always has a `* / * / * / * / *` fallback
Conflicting rules that they do not understand	Agent thinks rule X applies but rule Y wins due to specificity	Dashboard shows which rule matched for every bet; "test my matrix" dry-run tool
Copying another agent's matrix without understanding it	Matrix tuned for a big operator does not suit a small one	Template system with clear explanations; onboarding wizard

4. The Bet Flow -- Step by Step

The Complete Flow

Real-Life Example: Walking Through the Numbers

The Bet: Amit places ₹10,000 on Mumbai Indians to beat Chennai Super Kings at decimal odds of 1.85, during IPL 2026, pre-match.

Step 1: Compute Metrics

Metric	Calculation	Value
Stake	As submitted	₹10,000
Potential Win	Stake x (Odds - 1) = 10,000 x 0.85	₹8,500
Liability (for the bookie)	Same as potential win for a back bet	₹8,500

Step 2: User Win Cap Check

Amit has a per-click win limit of ₹50,000. His potential win of ₹8,500 is well within that limit. No action needed.

Amit also has an aggregate win limit of ₹2,00,000 per day. He has accumulated ₹45,000 in potential wins today. Adding ₹8,500 brings him to ₹53,500 which is still under the limit. Pass.

Step 3: Stake Reduction

Since Amit passed both win cap checks, no stake reduction is applied. His full ₹10,000 stake is accepted.

Step 4: Resolve Forwarding Percentage

The system evaluates Rajesh's forwarding matrix. The bet characteristics are:

market_type: MATCH_ODDS
sport_type: CRICKET
event_phase: PRE_MATCH
source_type: NORMAL (Amit is not flagged as sharp)
liquidity_band: HIGH (MI vs CSK has deep exchange liquidity)

This matches Rule R3 from Rajesh's matrix: forward 40%. Rajesh retains 60%.

Step 5: Agent Cap Evaluation (Rajesh)

Rajesh retains 60% of ₹10,000 = ₹6,000 stake (₹5,100 liability).

Check Rajesh's limits:

Cricket overall limit: ₹50,00,000. Currently used: ₹12,00,000. After this bet: ₹12,05,100. Still within limit.
Per-match limit (this specific MI vs CSK match): ₹5,00,000. Currently used: ₹1,20,000. After: ₹1,25,100. Still within limit.
Night period limit: ₹10,00,000. Currently used: ₹3,00,000. After: ₹3,05,100. Still within limit.

All limits pass. Rajesh retains the full ₹6,000 stake.

Step 6: Cascade to Vikram

The remaining ₹4,000 stake (40%) flows up to Vikram. Vikram's matrix says he retains 60% of cricket pre-match match odds. So Vikram retains ₹2,400 and forwards ₹1,600 to the platform.

Step 7: Platform Routing

The platform receives ₹1,600. Based on platform risk configuration, it retains ₹800 and hedges ₹800 on Betfair.

Step 8: Execute

All positions are created atomically:

Entity	Retained Stake	Retained Liability	Forwarded
Rajesh	₹6,000	₹5,100	₹4,000 → Vikram
Vikram	₹2,400	₹2,040	₹1,600 → Platform
Platform	₹800	₹680	₹800 → Betfair
Betfair	₹800 (hedged)	--	--
Total	₹10,000

The stake always sums to the original ₹10,000. Nothing is created or destroyed; risk is simply distributed.

Step 9: Audit

A complete audit record is persisted containing: the original bet details, the matrix rule that matched at each level, every limit that was checked and its result, the final routing breakdown, timestamps for each step, and the total elapsed time.

5. Cascading Upline Routing

How Bets Flow Through the Hierarchy

The fundamental routing model is a cascade. A bet enters at the bottom of the agent hierarchy and flows upward. At each level, the agent retains what they can (based on their matrix and limits) and forwards the rest to their parent.

User (Amit) places ₹10,000 bet
        |
        v
+---[RAJESH: Level 1 Agent]---+
|  Matrix says: forward 40%    |
|  Retains: ₹6,000            |
|  Forwards: ₹4,000           |
+---------|--------------------+
          |
          v
+---[VIKRAM: Level 2 Agent]---+
|  Matrix says: forward 40%    |
|  Retains: ₹2,400            |
|  Forwards: ₹1,600           |
+---------|--------------------+
          |
          v
+---[HANNIBAL PLATFORM]-------+
|  Config: retain 50%          |
|  Retains: ₹800              |
|  Hedges: ₹800               |
+---------|--------------------+
          |
          v
+---[BETFAIR EXCHANGE]--------+
|  Final backstop              |
|  Receives: ₹800             |
+------------------------------+

What Happens at Each Level

At every level in the chain, the system performs the same sequence:

Resolve the source_type for this agent -- does this agent have their own classification for the user? If not, do they trust the downstream agent's classification? (See "Does Sharp Classification Travel Upline?" below)
Resolve the forwarding percentage for this agent (using their matrix with the resolved source_type, overrides, or default)
Calculate the retained amount = incoming stake x (1 - forward %)
Check the agent's limits -- can they actually absorb that retained amount?
If limits allow it: retain the calculated amount, forward the rest
If limits would be breached: retain only up to the limit, forward the overflow as well

A 4-Level Cascade with Actual Numbers

A more complex scenario: Amit bets ₹50,000 on CSK to win at odds 2.10.

Potential win: ₹55,000. Liability: ₹55,000.

AMIT bets ₹50,000
    |
    v
RAJESH (L1) - Forward 40%
    Wants to retain: ₹30,000 (60%)
    Cricket match limit remaining: ₹25,000  <-- LIMIT HIT
    Actually retains: ₹25,000
    Forwards: ₹25,000 (the intended ₹20,000 + ₹5,000 overflow)
    |
    v
VIKRAM (L2) - Forward 40%
    Receives: ₹25,000
    Wants to retain: ₹15,000 (60%)
    All limits OK
    Actually retains: ₹15,000
    Forwards: ₹10,000
    |
    v
PLATFORM (L3)
    Receives: ₹10,000
    Retains: ₹5,000
    Hedges: ₹5,000
    |
    v
BETFAIR (L4)
    Receives: ₹5,000 hedge order

Notice how Rajesh's overflow (₹5,000 he could not absorb because of his per-match limit) cascaded up to Vikram. Vikram did not "know" this was overflow -- he simply received ₹25,000 instead of ₹20,000 and processed it through his own matrix and limits.

What Happens When a Mid-Tier Agent Hits Their Cap?

If Vikram had also hit his limit in the example above, the overflow would continue cascading to the platform. The system guarantees that every rupee of every bet ends up somewhere. The cascade never drops a bet. It simply moves unclaimed risk upward until it reaches the platform, which is the final agent in the chain.

What Happens When a Parent Is Suspended?

If Vikram is suspended (say, for a payment dispute), his children cannot forward bets to him. In this scenario:

Rajesh can only retain bets up to his own limits
Any amount that would normally be forwarded to Vikram is instead forwarded directly to the platform
The platform absorbs the extra flow or hedges it on Betfair

The system treats a suspended agent as a "skip" in the chain, not a blockage. Punters can still bet. The risk simply routes around the suspended agent.

The Betfair Backstop

After all agents in the hierarchy have taken their share, any remaining exposure reaches the platform. The platform can retain some of this risk, but it always has the option to hedge on Betfair.

Betfair acts as the final backstop. It is the entity of last resort that absorbs whatever risk no agent in the hierarchy was willing or able to keep.

Edge Case: Betfair Is Down

If Betfair's API is unavailable (which happens during peak traffic or maintenance), the platform cannot hedge. In this case:

The platform absorbs the would-be-hedged amount as retained risk temporarily
The bet is still accepted (we do not reject punter bets because of a hedge-side issue)
The hedge order is placed in an async retry queue
When Betfair comes back online, the hedge is executed
If the event settles before the hedge is placed, the platform simply bears that risk as if it had been deliberately retained

This is a conscious design decision: punter experience is never degraded by infrastructure issues on the hedge side.

Does Sharp Classification Travel Upline?

This is one of the most important design questions in the entire cascade. When Rajesh tags Amit as a sharp user and forwards 95% of his bet to Vikram, does Vikram know that Amit is sharp?

The answer is: the information travels, but each agent decides independently.

The Problem with Simple Approaches

If sharp info always travels: Rajesh tags Amit as sharp. Vikram's matrix also sees source_type=SHARP and forwards 90%. The platform sees SHARP and forwards 95% to Betfair. Sharp bets rocket through the entire hierarchy in milliseconds. But here is the problem: Rajesh's "sharp" might be Vikram's "normal." Rajesh has loose prices and 200 punters -- anyone who consistently wins against him looks sharp. Vikram, with 2,000 punters and tighter pricing, might profitably retain that exact same flow. Blindly propagating sharp flags means Vikram loses profitable volume because of Rajesh's weaker pricing.

If sharp info never travels: Vikram receives forwarded flow from Rajesh and treats it all as NORMAL. But some of that flow is genuinely toxic -- sharp syndicate members that Rajesh correctly identified. Vikram unknowingly retains it, and his P&L suffers. This is exactly the bookie's nightmare scenario: "rogue agent dumping toxic flow upline."

The Hybrid Design

Each forwarded bet carries metadata about the originating agent's classification, but each upline agent makes their own independent decision about how to treat it.

What travels with the bet:

Metadata Field	Description	Example
`originating_user_id`	The punter who placed the bet	`user_amit_4521`
`originating_agent_id`	The first agent in the chain	`rajesh_mumbai`
`downstream_classification`	What the originating agent classified the user as	SHARP
`forwarding_reason`	Why it was forwarded at each level	SHARP_USER, MATRIX_RULE, CAPACITY_BREACH

How each upline agent uses this information:

The resolution order at each upline level is:

Own classification wins first. If Vikram has independently tagged Amit as SHARP (or NORMAL, or VIP), that classification is used regardless of what Rajesh thinks. Vikram's own data is the most relevant to Vikram's book.
Configurable trust in downstream flags. Each agent can configure per sub-agent whether to trust their sharp classifications. Vikram might set trust_downstream_flags: true for Rajesh (whose sharp detection he trusts) but false for a newer sub-agent whose judgment he has not validated.
Default to NORMAL if no other signal. If the upline has no opinion and does not trust the downstream flag, the bet is treated as normal flow. This is the safe default for the upline's book -- they apply their standard matrix rules.

Real-Life Example: The Same Bet, Three Different Outcomes

Setup: Amit is tagged as SHARP by Rajesh. Amit bets ₹20,000 on MI to win at 1.85. Rajesh forwards 95% (₹19,000) to Vikram.

Scenario A -- Vikram trusts Rajesh's flags: Vikram's config: trust_downstream_flags: true for Rajesh. Vikram's matrix: SHARP source on cricket → forward 80%. So Vikram forwards 80% of ₹19,000 = ₹15,200 to the platform. Vikram retains ₹3,800.

Scenario B -- Vikram has his own classification: Vikram has independently analyzed Amit's betting across all sub-agents and classified him as NORMAL (Amit's edge disappears at Vikram's sharper prices). Vikram's matrix: NORMAL source on cricket pre-match → forward 40%. Vikram retains 60% of ₹19,000 = ₹11,400.

Scenario C -- Vikram ignores downstream flags: Vikram's config: trust_downstream_flags: false for Rajesh. No own classification for Amit. Amit is treated as NORMAL. Same outcome as Scenario B.

Why This Design Is Correct

Each level of the hierarchy has different information and different risk tolerance. A user who is sharp at the sub-agent level (beating loose prices) may not be sharp at the master agent level (where prices are tighter). A user who looks normal to a sub-agent might be part of a syndicate that only the master agent can see (because the master agent has cross-agent visibility).

The audit trail records everything. For every forwarded bet, the audit record shows: what the originating agent classified the user as, what the upline agent's resolution was, and why. When Vikram asks "why did I retain a sharp user's bet?", the answer is clear: "Your config ignores downstream flags from Rajesh, your own detection had not flagged this user, so the bet was treated as NORMAL."

Cross-agent sharp detection fills the gap. The platform has visibility across ALL agents. If Amit is betting through three different sub-agents under Vikram, the platform-level sharp detection can flag this pattern and push a classification down to Vikram -- independent of what any individual sub-agent thinks. This is why Section 17's Phase 3 includes "cross-agent sharp detection" as a key deliverable.

Configuration Per Sub-Agent

Each agent configures trust settings per sub-agent:

Sub-Agent	trust_downstream_flags	Reason
Rajesh (Mumbai)	true	Experienced, reliable sharp detection, 8 years track record
Arun (Bangalore)	false	New sub-agent, unproven detection, only 3 months on platform
Sanjay (Chennai)	true	Good track record, conservative flagging

This means Vikram can gradually extend trust as sub-agents prove their detection quality -- much like how the real-world agent relationship works. You trust experienced partners more than new ones.

6. Agent Liability Limits

The Limit Structure

Every agent can configure limits at multiple levels of granularity. The purpose of limits is to ensure an agent never accidentally takes on more risk than their bankroll can support.

Limit Type	Scope	Example
Sport Limit	Total liability across all events in a sport	"I can handle ₹50 lakh total cricket exposure"
Market Limit	Total liability on a specific event or market	"No more than ₹5 lakh on any single IPL match"
Night Period Limit	Total liability accumulated during the night window	"Cap my night session at ₹10 lakh"
Weekly Period Limit	Total liability accumulated during the weekly cycle	"Cap my weekly exposure at ₹1 crore"

Real Example: Rajesh's Limit Configuration

Rajesh sets up the following limits for cricket:

RAJESH'S CRICKET LIMITS
========================

Sport-Level Limit (Cricket):         ₹50,00,000  (₹50 lakh)
  |
  +-- Per-Match Limit:               ₹5,00,000   (₹5 lakh per individual match)
  |
  +-- Night Period Limit:            ₹10,00,000  (₹10 lakh between 7pm-2am IST)
  |
  +-- Weekly Period Limit:           ₹40,00,000  (₹40 lakh Monday-Sunday)

How Limits Interact: The Most Restrictive Wins

When a bet arrives, all applicable limits are checked simultaneously. The most restrictive limit determines how much the agent can retain.

Example: It is Thursday night during IPL week. Rajesh's current state:

Limit	Capacity	Used	Remaining
Cricket Sport Limit	₹50,00,000	₹38,00,000	₹12,00,000
MI vs CSK Match Limit	₹5,00,000	₹4,50,000	₹50,000
Night Period Limit	₹10,00,000	₹9,20,000	₹80,000
Weekly Period Limit	₹40,00,000	₹35,00,000	₹5,00,000

A new bet wants to add ₹1,00,000 of retained liability. Looking at the remaining capacity:

Sport: ₹12,00,000 available -- sufficient
Match: ₹50,000 available -- NOT sufficient
Night: ₹80,000 available -- NOT sufficient
Weekly: ₹5,00,000 available -- sufficient

The most restrictive limit is the match limit at ₹50,000. So Rajesh can only retain ₹50,000 of the ₹1,00,000. The remaining ₹50,000 overflows and cascades upward to Vikram.

Limit Hierarchy Table

Priority	Limit	Checked When	Resets
1 (most restrictive wins)	Per-Match Limit	Every bet on that specific match	When match settles
2	Night Period Limit	Every bet during night window	At night period end
3	Weekly Period Limit	Every bet during the week	Monday start of day
4	Sport Limit	Every bet in that sport	Rolling / manual reset

7. User Win Limits & Stake Reduction

Per-Click Win Limit

The per-click win limit caps the maximum amount a punter can win on a single bet. This protects agents from large individual payouts.

How it works: If a punter bets at high odds, the potential win could be enormous. The per-click win limit ensures that no single bet can produce a payout above the configured threshold.

Example: Amit has a per-click win limit of ₹50,000.

Bet	Odds	Stake	Potential Win	Within Limit?
MI to win	1.85	₹10,000	₹8,500	Yes
Kohli top bat	5.00	₹15,000	₹60,000	No -- exceeds ₹50,000
Fancy: over 180 runs	50.00	₹5,000	₹2,45,000	No -- far exceeds

Aggregate Win Limit

The aggregate win limit caps the total cumulative potential wins a punter can accumulate over a configurable period (typically daily). This protects against a punter placing many winning bets that individually pass the per-click limit but collectively create enormous exposure.

Example: Amit has a daily aggregate win limit of ₹2,00,000.

He has already placed bets today with a total potential win of ₹1,85,000. His next bet has a potential win of ₹25,000. Since ₹1,85,000 + ₹25,000 = ₹2,10,000 which exceeds ₹2,00,000, the bet must be reduced or rejected.

How Stake Reduction Works

When a bet exceeds a win limit, the system does not reject it outright. Instead, it reduces the stake to the maximum amount that keeps the potential win within the limit. This is better for the punter (they still get to bet) and better for the agent (they still get action).

Real Example: High Odds Scenario

Sonia wants to bet ₹5,000 on a fancy market at odds of 50.00. Her per-click win limit is ₹50,000.

Step	Calculation
Potential win at full stake	₹5,000 x (50.00 - 1) = ₹2,45,000
Exceeds per-click limit?	Yes: ₹2,45,000 > ₹50,000
Maximum allowable win	₹50,000
Reduced stake	₹50,000 / (50.00 - 1) = ₹50,000 / 49 = ₹1,020 (rounded down)
Verify	₹1,020 x 49 = ₹49,980 which is under ₹50,000

What the Punter Sees

The punter sees a message like:

Maximum stake at these odds: ₹1,020

The message is transparent about the cap (the punter knows their stake was limited) but opaque about the reason (we do not say "your win limit is ₹50,000" because that reveals the agent's risk configuration).

Below Minimum Stake

If the reduced stake falls below the minimum allowed bet size (say, ₹100), the bet is rejected entirely with a clear message:

This market is currently unavailable at these odds.

This avoids the absurdity of accepting a ₹3 bet.

Sharp User Detection Signals

Agents need to identify "sharp" users -- punters who consistently beat the closing line and generate long-term losses for the bookie. Sharp detection informs the source_type dimension of the forwarding matrix.

The key signals that indicate a user may be sharp:

Signal	What It Means	Why It Matters
Closing Line Value (CLV)	The user consistently bets at prices better than where the market closes	This is the single strongest predictor of long-term profitability. A user with positive CLV over 500+ bets is almost certainly sharp.
Consistent staking	Same stake size regardless of odds or confidence	Recreational punters vary stakes; professionals use flat staking to disguise their edge
Early betting	Regularly bets within the first hour of a market opening	Early markets are softest; sharp users exploit them before prices adjust
Unpopular markets	Frequently bets on obscure leagues, low-tier events	These markets have the weakest pricing and the most exploitable edges
No mean reversion	Profits do not revert to average over time	Lucky punters revert; skilled punters sustain their edge
Rapid market movement after their bet	Price moves sharply in their direction after they bet	Indicates they are consistently on the right side of information

8. NO_NEW_RISK Mode

What It Is

NO_NEW_RISK is a protective mode that activates when an agent's retained liability reaches their configured cap for a given scope (sport, market, or period). When active, the agent cannot take on any new risk-increasing exposure, but hedge bets are still accepted.

Think of it like a credit card limit. Once you hit your limit, you cannot make new purchases, but you can still make payments (which reduce what you owe).

What Triggers It

NO_NEW_RISK is triggered automatically when:

Agent's retained open liability >= configured limit for that scope

For example, if Rajesh's cricket night limit is ₹10,00,000 and his current retained cricket night liability is ₹10,00,000, he enters NO_NEW_RISK for cricket during the night period.

The Scope Is Granular

NO_NEW_RISK is not a blanket shutdown. It is scoped per sport and per market:

Scenario	Cricket Status	Tennis Status	Football Status
Rajesh hits cricket limit	NO_NEW_RISK	Normal	Normal
Rajesh hits MI vs CSK match limit	NO_NEW_RISK (this match only)	Normal	Normal
Rajesh hits night period limit	NO_NEW_RISK (all sports in night)	NO_NEW_RISK	NO_NEW_RISK

How Hedge Detection Works

The critical question in NO_NEW_RISK mode is: "Does this bet reduce or increase the agent's worst-case liability?" A hedge bet reduces liability and should be accepted even when the agent is at their limit.

The rule is simple:

If WorstCaseLiability AFTER the bet < WorstCaseLiability BEFORE the bet, it is a hedge.

Real Examples of Hedge vs Non-Hedge Bets

Scenario: Rajesh is in NO_NEW_RISK for the MI vs CSK match. He currently has ₹5,00,000 of retained liability backing MI to win.

Incoming Bet	Effect on Liability	Hedge?	Accepted?
₹2,00,000 more on MI to win	Liability increases to ₹7,00,000	No -- increases worst case	Rejected (forwarded 100% to upline)
₹3,00,000 on CSK to win	Liability decreases because it offsets the MI position	Yes -- reduces worst case	Accepted
₹1,00,000 on Draw	Partially reduces MI exposure depending on odds	Depends -- compute the actual worst case	Accepted if worst case decreases

Key insight: A bet on the opposite outcome of an existing position is almost always a hedge. A bet on the same outcome is never a hedge. A bet on a third outcome (like a draw) may or may not be a hedge depending on the amounts and odds.

How the Agent Exits NO_NEW_RISK

There are three ways out of NO_NEW_RISK:

Settlements reduce exposure. When a match settles, the liability associated with that match is removed. If this brings the agent back below their limit, NO_NEW_RISK is lifted.
Hedge bets reduce exposure. Accepting opposite-side bets reduces worst-case liability. Enough hedges can bring the agent below the limit.
Admin raises the limit. If Rajesh calls his platform admin and says "I'm comfortable taking more cricket risk this week," the admin can raise his limit. This immediately lifts NO_NEW_RISK if the current liability is now below the new limit.

9. Period Definitions -- Night & Weekly

Why Bookies Use Periods

Bookies do not think in terms of "total lifetime exposure." They think in operational windows:

The night session is when most live betting happens (evening matches in India, late-night football in Europe). The risk profile during a night session is very different from a quiet afternoon.
The weekly cycle aligns with settlement cycles. Most agents settle weekly. They need to know their maximum weekly exposure.

Periods provide a way to set separate limits for separate time windows, which mirrors how bookies actually operate.

Night Period

The night period is a configurable time window per agent. It typically covers the peak betting hours for that agent's primary market.

Agent	Timezone	Night Period	Why
Rajesh (India, cricket)	IST (UTC+5:30)	7:00 PM - 2:00 AM	IPL matches start at 7:30 PM
Priya (India, football)	IST (UTC+5:30)	10:00 PM - 4:00 AM	Premier League matches start at 12:30 AM IST
Kwame (Ghana, football)	GMT	2:00 PM - 11:00 PM	Afternoon and evening matches

Weekly Period

The weekly period is a Monday-to-Sunday cycle (configurable start day per agent). At the start of each week, the weekly exposure counter resets to zero.

Timezone Handling

Each agent operates in their own timezone. This is critical because:

Rajesh's "night" starts at 7:00 PM IST, which is 1:30 PM UTC
The system stores all times in UTC internally but converts to the agent's local timezone when evaluating period boundaries
A bet placed at 1:45 PM UTC is "night" for Rajesh but "afternoon" for a UK-based agent

The Period Rollover Problem

What happens when a live match spans a period boundary? Consider:

MI vs CSK starts at 7:30 PM IST (within Rajesh's night period)
The match runs late and extends past 2:00 AM IST (Rajesh's night period end)
Rajesh has ₹8,00,000 of retained liability on this match at 1:55 AM

The design choice: carry-forward exposure.

When a period ends, open exposure from events that are still live is carried forward into the new period. This means:

Rajesh's ₹8,00,000 is NOT magically zeroed out at 2:00 AM
Instead, it carries forward as a starting balance for the next period (or the "day" period if night has ended)
New bets after 2:00 AM are counted against the day period limits
The carried-forward amount from the night period continues to count against the sport-level limit

The alternative -- a clean reset that ignores ongoing exposure -- would be dangerous. An agent could circumvent limits by waiting for a period boundary.

The DST Edge Case

Daylight Saving Time creates a subtle problem. When clocks spring forward, a night period configured as 7 PM to 2 AM suddenly becomes 7 hours long in UTC instead of 7 hours. When clocks fall back, it becomes 8 hours long.

The solution: Period boundaries are defined in the agent's local time, and the system converts them to UTC fresh each day, accounting for DST. A "7 PM to 2 AM" night period always means 7 PM to 2 AM in the agent's local clock, regardless of DST transitions.

On the actual transition day:

Spring forward (clocks skip 2 AM to 3 AM): The night period is effectively 1 hour shorter. This is acceptable; it is a conservative outcome (less time to accumulate risk).
Fall back (clocks repeat 1 AM to 2 AM): The night period is effectively 1 hour longer. The system uses the first occurrence of the repeated hour as the boundary.

10. Exposure Accounting

Three Ledgers Per Agent Per Scope

For every agent, at every scope level (sport, market, period), the system maintains three numbers:

Ledger	What It Tracks	Updated When
retained_open_liability	The total worst-case payout the agent faces on retained bets	Every bet placement and settlement
forwarded_open_liability	The total liability the agent has forwarded upward	Every bet placement and settlement
open_potential_win	The total amount punters stand to win against this agent	Every bet placement and settlement

These three numbers tell you everything about an agent's current risk position:

retained_open_liability is what the agent will pay if everything goes wrong. This is the number checked against limits.
forwarded_open_liability is what the agent's upline will pay. The agent has no financial exposure here.
open_potential_win is the punter-side view -- what the agent's punters could collectively win.

How These Update Atomically

When a bet is placed, all ledger updates across all affected agents happen in a single atomic transaction. There is no moment where Rajesh's ledger is updated but Vikram's is not. This prevents inconsistent states where the numbers do not add up.

The Redis Fast-Path Optimization

Most bet processing involves reading the current exposure to check limits. Only a minority of bets actually push an agent close to their limit.

The optimization:

+------------------+     +------------------+     +------------------+
|  APPLICATION     |     |  REDIS           |     |  POSTGRESQL      |
|  (LRU Cache)     |     |  (Fast Read)     |     |  (Source of      |
|                  |     |                  |     |   Truth)         |
|  - 5-second TTL  |     |  - Sub-ms reads  |     |  - Atomic writes |
|  - ~60% hit rate |     |  - ~25% hit rate |     |  - FOR UPDATE    |
|  - Zero latency  |     |  - <1ms latency  |     |     locking      |
+--------+---------+     +--------+---------+     +--------+---------+
         |                        |                        |
         v                        v                        v
   "Rajesh has ₹12L     "Rajesh has ₹12L       "Rajesh has exactly
    used of ₹50L --      used of ₹50L --        ₹12,05,100 used --
    clearly within       clearly within          UPDATE with lock"
    limit, fast pass"    limit, fast pass"

How it works:

Application LRU cache (Tier 1): An in-memory cache with a short TTL (5 seconds). If a bet arrives and the cached exposure shows the agent is far from their limit (say, 60% utilized), we do not need to check further. The bet will pass the limit check. This handles the vast majority of bets.
Redis (Tier 2): For bets where the LRU cache has expired or the agent is approaching their limit, read from Redis. Redis values are updated on every write but are eventually consistent. Still very fast (sub-millisecond).
PostgreSQL with FOR UPDATE (Tier 3): For bets where the agent is near their limit (say, >80% utilized), we take a pessimistic lock in PostgreSQL. This ensures that two simultaneous bets cannot both claim the last ₹50,000 of capacity. This is the slowest path but only applies to a small percentage of bets.

Why this matters: During an IPL match, Rajesh might receive 50 bets per minute. For 40 of those, the LRU cache can immediately confirm he is within limits. For 8 more, Redis provides the answer. Only 2 bets (the ones near his limit) need the PostgreSQL lock. This keeps median latency low while guaranteeing correctness at the boundary.

Settlement Impact

When a match settles, the exposure associated with that match is removed from all agents in the chain:

Retained positions: The settled amount is removed from retained_open_liability. If the agent won (punter lost), the agent profits. If the agent lost (punter won), the agent pays out.
Forwarded positions: The settled amount is removed from forwarded_open_liability. The upline agent handles payout for forwarded positions.
Potential win: The settled amount is removed from open_potential_win.

Ledger Updates for a Single Bet: 3-Level Cascade

Amit bets ₹10,000 on MI at odds 1.85. Liability per unit of stake: ₹0.85.

BEFORE THE BET:
================================================================
                  Retained     Forwarded    Potential
Agent             Liability    Liability    Win
----------------------------------------------------------------
Rajesh            ₹12,00,000   ₹8,00,000   ₹20,00,000
Vikram            ₹25,00,000   ₹10,00,000  ₹35,00,000
Platform          ₹5,00,000    ₹2,00,000   ₹7,00,000
================================================================

BET PROCESSING:
================================================================
Rajesh retains 60% = ₹6,000 stake → ₹5,100 liability
Rajesh forwards 40% = ₹4,000 stake → ₹3,400 liability

Vikram retains 60% of ₹4,000 = ₹2,400 stake → ₹2,040 liability
Vikram forwards 40% of ₹4,000 = ₹1,600 stake → ₹1,360 liability

Platform retains 50% of ₹1,600 = ₹800 stake → ₹680 liability
Platform hedges 50% of ₹1,600 = ₹800 stake → ₹680 forwarded
================================================================

AFTER THE BET:
================================================================
                  Retained          Forwarded         Potential
Agent             Liability         Liability         Win
----------------------------------------------------------------
Rajesh            ₹12,05,100       ₹8,03,400        ₹20,08,500
                  (+₹5,100)        (+₹3,400)        (+₹8,500)

Vikram            ₹25,02,040       ₹10,01,360       ₹35,03,400
                  (+₹2,040)        (+₹1,360)        (+₹3,400)

Platform          ₹5,00,680        ₹2,00,680        ₹7,01,360
                  (+₹680)          (+₹680)           (+₹1,360)
================================================================

11. Audit & Determinism

The Audit Record

Every bet produces a complete audit record. This is not a log file that can be grepped through later. It is a structured, queryable record that captures every decision the system made.

An audit record contains:

Field	Description	Example
bet_id	Unique identifier for the bet	`bet_a1b2c3d4`
original_stake	What the punter requested	₹10,000
adjusted_stake	What was actually accepted (after stake reduction, if any)	₹10,000
stake_reduction_reason	Why the stake was reduced, if it was	`null` (no reduction)
per_click_win_cap_check	Result of the per-click check	`PASS: ₹8,500 < ₹50,000`
aggregate_win_cap_check	Result of the aggregate check	`PASS: ₹53,500 < ₹2,00,000`
forwarding_chain	Complete routing at each level	See below
matrix_rules_evaluated	Which rules were checked and which won	`R3 matched (specificity 3), R5 evaluated (specificity 2), R8 evaluated (specificity 0)`
limit_checks	Every limit checked at every level	See below
hedge_detection	Whether NO_NEW_RISK was active, whether the bet was a hedge	`NOT_IN_NO_NEW_RISK`
period_context	Which period the bet fell in	`NIGHT (19:00-02:00 IST), Week 7 of 2026`
timestamps	When each step occurred	`matrix_resolve: 2ms, cap_check: 5ms, execution: 12ms, total: 23ms`

The forwarding chain in detail:

Level 1: Rajesh
  - Incoming stake: ₹10,000
  - Forward % source: MATRIX (Rule R3)
  - Forward %: 40%
  - Retained stake: ₹6,000
  - Retained liability: ₹5,100
  - Limit checks:
    - Cricket sport: ₹12,05,100 / ₹50,00,000 (24.1%) PASS
    - MI vs CSK match: ₹1,25,100 / ₹5,00,000 (25.0%) PASS
    - Night period: ₹3,05,100 / ₹10,00,000 (30.5%) PASS
  - Forwarded stake: ₹4,000
  - Overflow: ₹0

Level 2: Vikram
  - Incoming stake: ₹4,000
  - Forward % source: MATRIX (Rule V2)
  - Forward %: 40%
  - Retained stake: ₹2,400
  - Retained liability: ₹2,040
  - Limit checks: [similar detail]
  - Forwarded stake: ₹1,600
  - Overflow: ₹0

Level 3: Platform
  - Incoming stake: ₹1,600
  - Retained: ₹800
  - Hedged on Betfair: ₹800
  - Betfair order ID: bf_xyz789

Why Determinism Matters

Agents must trust the system. If Rajesh sees a bet routed in a way he does not understand, he will lose confidence and revert to manual processes. The audit trail lets him see exactly why every decision was made.

Disputes must be resolvable. When Rajesh and Vikram disagree about who owes what at settlement time, the system has an indisputable record of exactly how every bet was split.

Regulators may require it. In jurisdictions moving toward regulation, a complete audit trail is a compliance necessity.

Configuration Change Log

All configuration changes are recorded using event sourcing:

When Rajesh changes his forwarding matrix, the old matrix is preserved and the new one is recorded with a timestamp
When an admin changes Amit's win limit, the change is logged with who made it and why
This means you can answer questions like: "What was Rajesh's matrix at 9:47 PM on March 15?" -- by replaying the event log up to that point

Replay Capability

The ultimate test of determinism: given the state at time T, the same bet must produce the same routing.

This means if a dispute arises about a bet placed three weeks ago, we can:

Load the configuration state as it existed at the time of the bet
Load the exposure state as it existed at the time of the bet
Re-run the routing logic
Produce the exact same result

This is possible because all inputs to the routing decision (matrix, limits, current exposure, user status, market conditions) are captured in the audit record.

12. What the Current Codebase Already Has (and What's Missing)

Based on analysis of the existing Hannibal codebase:

Feature	Current Status	What Exists	What's Missing
B-Book Config	Partial	`bbookConfigService.ts` -- basic B-Book configuration per agent	Multi-dimensional forwarding matrix, per-sport/market granularity, wildcard matching
B-Book State	Partial	`bbookStateService.ts` -- tracks basic B-Book state	Per-scope exposure ledgers (sport, market, period), NO_NEW_RISK mode tracking
Filter Engine	Partial	`filterEngine.ts` -- filters bets through B-Book rules	5-dimension matrix resolution, specificity-based tie-breaking, precedence chain
B-Book Fill	Partial	`bbookFillService.ts` -- executes B-Book bet placement	Cascading multi-level routing, overflow handling, atomic multi-agent ledger updates
B-Book Settlement	Partial	`bbookSettlementService.ts` -- settles B-Book positions	Multi-level settlement cascade, exposure ledger rollback, period-aware settlement
Sharp Detection	Partial	`sharpUserService.ts` -- identifies sharp users	CLV calculation, behavioral scoring, integration with forwarding matrix source_type
Agent Hierarchy	Exists	`agents.ts`, `agent.ts` routes -- agent CRUD and hierarchy	Cascading routing through hierarchy, parent suspension skip logic
Agent Accounting	Exists	`agentSettlementService.ts`, `agentSettlementJob.ts` -- agent financial settlements	Per-scope liability tracking, real-time exposure counters
Agent Monitoring	Exists	`agentMonitoringService.ts` -- agent activity monitoring	Real-time limit utilization, NO_NEW_RISK alerts, period boundary tracking
User Win Limits	Missing	--	Per-click win cap, aggregate win cap, stake reduction engine
Forwarding Matrix	Missing	--	Full 5D matrix data model, wildcard resolution, matrix CRUD API
Per-Agent Limits	Missing	--	Sport/market/period limit configuration, limit enforcement in bet flow
Cascading Routing	Missing	--	N-level cascade engine, overflow calculation, suspended agent skip
NO_NEW_RISK Mode	Missing	--	Automatic trigger, hedge detection, scope-aware activation
Hedge Detection	Missing	--	Worst-case liability comparison, multi-outcome hedge evaluation
Period Management	Missing	--	Night/weekly period definitions, timezone handling, carry-forward logic
Audit Trail	Missing	--	Structured audit records, event sourcing, replay capability
Redis Exposure Cache	Missing	--	3-tier caching, fast-path optimization, cache invalidation

The Key Observation

The existing codebase has the foundation right. The B-Book service exists with config, state, filtering, fill, and settlement modules. The agent hierarchy exists with accounting and monitoring. Sharp detection exists.

What is missing is the connective tissue -- the forwarding matrix that ties bet characteristics to routing decisions, the cascade engine that flows bets through the hierarchy, and the limit enforcement that keeps agents safe. These are the components that transform the existing point-to-point B-Book into a hierarchical risk management system.

13. The Nightmare Scenarios & How We Handle Them

Scenario 1: Syndicate Attack -- Correlated Positions Across Agents

What happens: A betting syndicate places coordinated bets through multiple agents. Each individual agent sees a modest position, but the aggregate platform exposure is enormous. The MI vs CSK match settles and the platform owes ₹2 crore across 15 agents.

How we handle it:

Cross-agent position aggregation. The platform maintains a real-time view of aggregate exposure per event, summing across all agents. If aggregate exposure on any single outcome exceeds a configurable threshold, an alert fires.
Correlated account detection. Users who consistently bet the same outcome, at the same time, across different agents are flagged. Signals include: matching IP addresses, similar device fingerprints, correlated timing patterns, and identical stake amounts.
Platform-level event limits. Independent of individual agent limits, the platform sets a maximum total exposure per event. When this is breached, additional retained positions are blocked platform-wide, and new bets are forwarded to Betfair.

Scenario 2: Data Feed Failure During Live Play

What happens: The odds feed from the data provider (Roanuz, OddsPAPI) goes stale during a live IPL match. The system is showing odds of 1.85 for MI, but in reality, MI just lost 3 quick wickets and the true odds are now 3.50. Smart punters bet on MI at the stale 1.85 price.

How we handle it:

Stale price detection. If the odds feed has not updated for more than a configurable duration (e.g., 5 seconds for in-play), the market is automatically suspended. No new bets are accepted until the feed resumes.
Price movement circuit breaker. If the price moves by more than a configurable percentage in a single update (suggesting a missed intermediate update), the market is suspended pending human review.
Multi-source validation. Where available, cross-reference prices from multiple providers. A price that is significantly different from all other sources is likely stale.

Scenario 3: Rogue Agent Dumping Toxic Flow

What happens: An agent intentionally forwards 100% of sharp/winning flow to their upline or the platform while retaining the losing flow. Over time, the platform notices that all bets forwarded by this agent lose money.

How we handle it:

Behavioral anomaly detection. Track the P&L (profit and loss) of retained vs forwarded bets per agent. If an agent's forwarded bets consistently lose money while their retained bets consistently win, this is a red flag.
Forwarding pattern analysis. An agent who suddenly changes their matrix to forward more when they suspect a punter will win is detectable. Configuration changes that correlate with bet outcomes are flagged.
Automatic escalation. Agents whose forwarded flow exceeds a toxicity threshold are escalated for review. The platform can force a minimum retention percentage.

Scenario 4: Double Settlement / Result Correction

What happens: A cricket match is initially settled as "MI wins," all payouts are processed, and then the result is corrected (perhaps due to a scoring error or a ruling by match officials). All settlements must be reversed and re-processed.

How we handle it:

Re-settlement workflow. The system supports reversing a settlement and re-applying it with corrected results. This affects all agents in the cascade.
Ledger reconciliation. After re-settlement, all exposure ledgers are recalculated. Any agent who was paid out incorrectly has the amount clawed back. Any agent who paid out incorrectly receives a credit.
Communication chain. All affected agents receive automated notifications explaining the re-settlement, with full audit trails showing the before and after.

Scenario 5: System Outage During Peak

What happens: During the IPL final, the system experiences a partial outage. The bet processing service is down for 90 seconds while 10,000 bets are queued up.

How we handle it:

Circuit breaker pattern. When the system detects degraded performance (response times exceeding 500ms), it switches to a degraded mode where all bets are forwarded 100% to Betfair. No agent retains any risk during the outage. This is the safest possible default.
Fail-safe to 100% forwarding. If the routing engine cannot determine the correct split (because the forwarding matrix service is down), the bet is still accepted but forwarded entirely. The agent misses out on retention (lost profit opportunity) but is not exposed to unmanaged risk.
Queue and replay. Bets that arrive during the outage are queued. When the system recovers, they are replayed through the normal routing engine. If the bet was already forwarded 100%, a reconciliation process adjusts the positions retroactively.

14. The Agent Experience -- From Simple to Sophisticated

The Core Insight

Agents are not engineers. Some are seasoned operators who think in matrices and percentages. Others are street-level bookies who have never opened a spreadsheet. But they all share the same two goals: make more money and do not get wiped out overnight.

The system underneath is the same for everyone -- the forwarding matrix, cascading routing, exposure ledgers, hedge detection. What changes is how much of that complexity the agent sees. This is called progressive disclosure: show the simple version by default, reveal the complexity only when the agent asks for it.

The Three Tiers of Experience

┌──────────────────────────────────────────────────────────────────────┐
│                                                                      │
│   TIER 1: "SET AND FORGET"           For: New agents, small          │
│   ─────────────────────────           operators, non-technical        │
│   3 questions at setup.                                              │
│   Traffic light dashboard.            80% of agents live here.       │
│   WhatsApp/SMS alerts.                                               │
│                                                                      │
│   ┌──────────────────────────────────────────────────────────────┐   │
│   │                                                              │   │
│   │   TIER 2: "DASHBOARD DRIVER"     For: Experienced agents,    │   │
│   │   ──────────────────────────     mid-size operations          │   │
│   │   Real-time risk dashboard.                                  │   │
│   │   Per-sport limits. Per-user      15% of agents grow into    │   │
│   │   management. One-click hedging.  this.                      │   │
│   │                                                              │   │
│   │   ┌──────────────────────────────────────────────────────┐   │   │
│   │   │                                                      │   │   │
│   │   │   TIER 3: "MATRIX MASTER"    For: Sophisticated       │   │   │
│   │   │   ────────────────────────   operators, quant-minded  │   │   │
│   │   │   Full 5D matrix editor.                              │   │   │
│   │   │   Test bet simulator.        5% of agents. These are  │   │   │
│   │   │   Historical P&L analysis.   your power users.        │   │   │
│   │   │   Custom period configs.                              │   │   │
│   │   │                                                      │   │   │
│   │   └──────────────────────────────────────────────────────┘   │   │
│   │                                                              │   │
│   └──────────────────────────────────────────────────────────────┘   │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘

The critical design rule: Every tier uses the exact same engine underneath. A Tier 1 agent's "I want to be safe on cricket" translates to the same forwarding matrix, limits, and cascade logic that a Tier 3 agent configures by hand. The difference is who builds the configuration -- the agent or the system.

Tier 1: "Set and Forget" -- The 3-Question Onboarding

When a new agent joins, they answer three questions. That is it. The system generates their entire configuration from these answers.

Question 1: What do you trade?

┌──────────────────────────────────────────────────┐
│  What sports do you want to accept bets on?      │
│                                                  │
│  [✓] Cricket                                     │
│  [ ] Football                                    │
│  [ ] Tennis                                      │
│  [ ] Kabaddi                                     │
│  [ ] Other                                       │
│                                                  │
│  Sports you don't select will be automatically   │
│  forwarded 100% to your upline.                  │
└──────────────────────────────────────────────────┘

Question 2: What is your nightly budget?

This is the question that matters most. Framed not as "liability limit" but as a question any bookie understands:

┌──────────────────────────────────────────────────┐
│  What is the MOST you are willing to lose in     │
│  a single night?                                 │
│                                                  │
│  Think of your worst night ever. What amount     │
│  would you be okay waking up to?                 │
│                                                  │
│  ₹ [___________]                                 │
│                                                  │
│  Examples:                                       │
│  Small operation:    ₹2,00,000  (₹2 lakh)       │
│  Medium operation:   ₹10,00,000 (₹10 lakh)      │
│  Large operation:    ₹50,00,000 (₹50 lakh)      │
└──────────────────────────────────────────────────┘

Question 3: How aggressive do you want to be?

┌──────────────────────────────────────────────────┐
│  How much risk do you want to keep?              │
│                                                  │
│  SAFE         BALANCED        AGGRESSIVE         │
│  ◉────────────────────────────────────○          │
│                                                  │
│  SAFE:       Keep 30%, forward 70%               │
│              "I sleep well, smaller profits"      │
│                                                  │
│  BALANCED:   Keep 60%, forward 40%               │
│              "Good mix of profit and safety"      │
│                                                  │
│  AGGRESSIVE: Keep 85%, forward 15%               │
│              "Maximum profit, I can handle        │
│               the swings"                        │
└──────────────────────────────────────────────────┘

What happens behind the scenes: From these three answers, the system generates:

Agent Answer	System Generates
"Cricket only"	Forwarding matrix: Cricket = agent's retention %, all other sports = 100% forward
"₹10 lakh max loss"	Night limit: ₹10L, Weekly limit: ₹50L (5x night), Per-match limit: ₹2L (night/5)
"Balanced" slider	Default forwarding: 40%. In-play: 55% forward (more cautious). Fancy markets: 70% forward. Sharp users: 95% forward. Pre-match match odds: 30% forward
(Automatic)	User win limits: Per-click ₹50,000, Aggregate ₹2,00,000/day. These are safe defaults.

The agent never sees the words "forwarding matrix," "liability limit," or "exposure ledger." They answered three questions and the system is configured.

The "Sleep Well" Number

Every agent, regardless of tier, sees one number prominently on their home screen:

┌──────────────────────────────────────────────────┐
│                                                  │
│            YOUR MAXIMUM LOSS TONIGHT             │
│                                                  │
│                   ₹ 3,42,000                     │
│                                                  │
│        out of your ₹10,00,000 night budget       │
│                                                  │
│          ████████░░░░░░░░░░░░  34%               │
│                                                  │
│   If every live bet goes against you tonight,    │
│   this is the absolute most you will lose.       │
│   The system guarantees this.                    │
│                                                  │
└──────────────────────────────────────────────────┘

How it is calculated: Sum of worst-case-liability across all retained open positions for this agent. This is not an estimate -- it is a mathematical guarantee. The cascade routing, limits, and NO_NEW_RISK mode ensure that this number can never exceed the agent's configured budget.

Why this matters: An agent who sees "₹3.42 lakh out of ₹10 lakh" knows they are safe. They can watch the match, enjoy the action, and not worry. When this number approaches their budget, the system automatically protects them (NO_NEW_RISK kicks in, overflow cascades to upline). The agent does not need to do anything.

For the really simple agent: This one number, delivered via WhatsApp at 10 PM every night, might be all they ever need:

"Tonight's update: Your maximum possible loss is ₹3.42L (34% of your ₹10L budget). 127 bets placed. Everything running smoothly."

Tier 1 Dashboard: The Traffic Light View

For agents who do not want numbers and charts, a traffic light is enough:

┌──────────────────────────────────────────────────┐
│                                                  │
│  TONIGHT'S STATUS                                │
│                                                  │
│  Cricket     🟢  All good. Well within limits.   │
│  Football    ⚪  Not active tonight.              │
│  Tennis      ⚪  Forwarding 100% (your choice).   │
│                                                  │
│  OVERALL     🟢  Safe. ₹3.4L / ₹10L used.       │
│                                                  │
│  LAST HOUR                                       │
│  42 bets accepted                                │
│  Estimated profit so far: +₹18,000               │
│                                                  │
│  ⚠ 1 alert: Rahul's stake was reduced            │
│     (he's close to his win limit)                │
│                                                  │
│  [View Details]     [Panic: Stop Everything]      │
│                                                  │
└──────────────────────────────────────────────────┘

The traffic light meanings:

Color	Meaning	Agent Action Required
🟢 Green	Below 60% of all limits	None. Relax.
🟡 Yellow	Between 60-85% of any limit	Be aware. System is still accepting bets but approaching limits.
🔴 Red	Above 85% of any limit, or NO_NEW_RISK is active	System is protecting you. New risk bets are being forwarded. Hedges still accepted.
⚪ Grey	Sport not active or fully forwarded	Nothing happening here.

The key design insight: A Tier 1 agent never needs to leave this screen. The system runs itself. The traffic light tells them if they should worry. The "Sleep Well" number tells them their maximum downside. The panic button is there if they ever feel nervous.

Tier 2 Dashboard: The Risk Cockpit

When an agent is ready for more detail, they tap "View Details" and enter the full dashboard. This is for agents who want to actively manage their book during a match.

Real-Time Risk Dashboard

+============================================================================+
|  RAJESH'S DASHBOARD                                    Feb 11, 2026  9:47 PM|
+============================================================================+
|                                                                             |
|  EXPOSURE SUMMARY                                                           |
|  ┌─────────────────────────────────────────────────────────────────────┐   |
|  │  Cricket    ██████████████████████░░░░░░░░  ₹38.2L / ₹50L  (76%)  │   |
|  │  Football   ████░░░░░░░░░░░░░░░░░░░░░░░░░  ₹4.1L / ₹20L   (21%)  │   |
|  │  Tennis     ██░░░░░░░░░░░░░░░░░░░░░░░░░░░  ₹1.2L / ₹10L   (12%)  │   |
|  └─────────────────────────────────────────────────────────────────────┘   |
|                                                                             |
|  TONIGHT'S SESSION (7:00 PM - 2:00 AM IST)                                |
|  ┌─────────────────────────────────────────────────────────────────────┐   |
|  │  Night Limit  █████████████████████████░░░░  ₹8.4L / ₹10L  (84%)  │   |
|  │  ⚠ APPROACHING LIMIT - 16% remaining                               │   |
|  │  At current rate, limit reached in ~25 minutes                      │   |
|  └─────────────────────────────────────────────────────────────────────┘   |
|                                                                             |
|  TOP MATCHES BY EXPOSURE                                                    |
|  ┌──────────────────────────────────────────────────────────────────┐      |
|  │  1. MI vs CSK (Live)     ₹4.8L retained    ₹5L limit    (96%)  │      |
|  │     ⚠ NEAR LIMIT - Will enter NO_NEW_RISK at ₹5L               │      |
|  │                                                                  │      |
|  │  2. RCB vs DC (Pre)      ₹2.1L retained    ₹5L limit    (42%)  │      |
|  │                                                                  │      |
|  │  3. KKR vs SRH (Pre)     ₹1.5L retained    ₹5L limit    (30%)  │      |
|  └──────────────────────────────────────────────────────────────────┘      |
|                                                                             |
|  RECENT BETS (last 10 minutes)                                             |
|  ┌──────────────────────────────────────────────────────────────────┐      |
|  │  9:45 PM  Amit     MI to win    ₹10,000  Retained 60%  ✓       │      |
|  │  9:43 PM  Sonia    CSK +1.5     ₹25,000  Retained 40%  ✓       │      |
|  │  9:41 PM  Rahul    Fancy 180+   ₹50,000  Reduced to ₹12,000  ⚠ │      |
|  │  9:38 PM  Deepa    MI to win    ₹8,000   Retained 60%  ✓       │      |
|  └──────────────────────────────────────────────────────────────────┘      |
|                                                                             |
|  [🔴 PANIC: Hedge All]   [Adjust Limits]   [View Full Audit]              |
|                                                                             |
+============================================================================+

Alert Priority Levels

Priority	Delivery	Trigger	Example
P1 - Critical	SMS + Push notification + Dashboard	Limit breached, NO_NEW_RISK activated, suspected fraud	"Your cricket night limit has been reached. NO_NEW_RISK is now active."
P2 - Warning	Push notification + Dashboard	Approaching limit (>80%), unusual betting pattern, sharp user detected	"MI vs CSK match exposure is at 96% of limit."
P3 - Informational	Dashboard only	Period rollover, settlement complete, configuration change	"Night period ended. Carried forward ₹3.2L to day period."

Key Reports

Report	Frequency	What It Shows
Daily P&L	Daily at period end	Profit/loss by sport, market, and user tier. Which bets made money, which lost money.
Weekly Settlement	Weekly on Monday	Net positions with upline, amounts owed/receivable, forwarded vs retained breakdown.
Sharp User Report	Weekly	Users flagged as sharp, their CLV scores, recommended actions.
Matrix Effectiveness	On demand	How well the forwarding matrix performed -- did high-retention bets profit or lose?
Limit Utilization	On demand	How close the agent came to each limit, how often NO_NEW_RISK was triggered, average duration.

The Panic Button

The dashboard includes a "Hedge All" button that, when pressed:

Immediately sets the agent's forwarding to 100% for all sports and markets
Places hedge orders on Betfair for all current retained positions (where exchange markets exist)
Sends a notification to the agent's upline
Logs the action with a timestamp and reason

This is the emergency exit. If an agent sees something alarming -- a suspicious pattern, a sudden exposure spike, or just gets nervous -- one button brings their risk to near zero. They can then calmly assess the situation and adjust.

Tier 3 Dashboard: The Matrix Master

For the 5% of agents who want full control, the system exposes everything -- but only when they explicitly navigate to it. This is never the default view.

The Matrix Editor:

Instead of a raw spreadsheet, the matrix editor uses a guided builder with immediate feedback:

┌──────────────────────────────────────────────────────────────────────┐
│  FORWARDING MATRIX EDITOR                                            │
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐  │
│  │  Rule 7 of 12                                                 │  │
│  │                                                               │  │
│  │  WHEN a bet matches ALL of these:                             │  │
│  │    Sport:       [Cricket    ▼]                                │  │
│  │    Market:      [Fancy      ▼]                                │  │
│  │    Phase:       [In-Play    ▼]                                │  │
│  │    User Type:   [Any        ▼]                                │  │
│  │    Liquidity:   [Any        ▼]                                │  │
│  │                                                               │  │
│  │  THEN:                                                        │  │
│  │    Keep  [30]%  ◉──────────○  Forward  [70]%                  │  │
│  │                                                               │  │
│  │  LAST WEEK: This rule matched 234 bets (₹18.4L volume)       │  │
│  │  RESULT: You would have profited ₹1.2L on retained portion   │  │
│  └────────────────────────────────────────────────────────────────┘  │
│                                                                      │
│  [+ Add Rule]   [Test a Bet]   [View Conflicts]                     │
│                                                                      │
└──────────────────────────────────────────────────────────────────────┘

The Test Bet Simulator:

The single most important tool for a Tier 3 agent. Before any real money flows, they can test:

┌──────────────────────────────────────────────────────────────────────┐
│  TEST A BET                                                          │
│                                                                      │
│  Sport:     Cricket       Market:   Fancy (Session Runs)             │
│  Phase:     In-Play       User:     Amit (NORMAL)                    │
│  Odds:      2.50          Stake:    ₹25,000                         │
│                                                                      │
│  [Run Test]                                                          │
│                                                                      │
│  RESULT:                                                             │
│  ─────────────────────────────────────────────────────               │
│  Step 1: User win cap check                                         │
│          Potential win: ₹37,500. Limit: ₹50,000. PASS               │
│                                                                      │
│  Step 2: Forwarding resolution                                       │
│          Rule 7 matched (Cricket + Fancy + In-Play)                  │
│          Specificity: 3/5. No higher-priority rule found.            │
│          Forward: 70%. Retain: 30%.                                  │
│                                                                      │
│  Step 3: Your retention                                              │
│          Retained stake: ₹7,500                                      │
│          Retained liability: ₹11,250                                 │
│          Your cricket limit after: ₹18.7L / ₹50L (37%)             │
│                                                                      │
│  Step 4: Cascade to upline (Vikram)                                  │
│          Forwarded: ₹17,500 → Vikram retains 60% → Platform → BF   │
│                                                                      │
│  This is exactly what would happen with a real bet.                  │
└──────────────────────────────────────────────────────────────────────┘

Graduating Between Tiers

Agents are never locked into a tier. The system nudges them upward when they are ready.

Tier 1 → Tier 2 nudge: After 2 weeks of operation, if the agent's dashboard shows they are consistently hitting limits or forwarding more than they want, the system suggests: "You are forwarding 45% of cricket bets. Want to adjust per-sport settings? [Show me how]"

Tier 2 → Tier 3 nudge: After the agent has manually adjusted limits 5+ times, the system suggests: "You keep changing your cricket in-play settings. A forwarding matrix rule could automate this. [Set up a rule]"

Tier 3 → Tier 1 fallback: If a Tier 3 agent creates a matrix that is clearly dangerous (e.g., 100% retention on in-play fancies), the system warns: "This configuration would have lost ₹8.2L last month. Are you sure? [Keep my settings] [Switch to Balanced preset]"

Preset Profiles: One-Click Configuration

For agents who want more control than 3 questions but less than a full matrix:

Preset	What It Does	Who It Is For
Conservative Cricket	30% retention on match odds, 15% on fancies, 0% on in-play fancies. Night limit = budget x 0.5. Sharp users = 100% forward.	New agents, small bankroll, risk-averse
Balanced Cricket	60% retention on match odds, 30% on fancies, 20% on in-play fancies. Night limit = budget.	Experienced agents, moderate bankroll
Aggressive IPL	80% retention on match odds, 50% on fancies, 30% on in-play fancies. Night limit = budget x 1.5 (with weekly safety net).	Large bankroll, IPL specialists
Football Only	60% retention Premier League, 30% lower leagues, 0% tennis/cricket.	Football-focused agents
Forward Everything	0% retention across all sports. Agent earns commission on volume only.	Agents who want zero risk, commission-only model

Each preset is a fully configured forwarding matrix + limits + user win caps. The agent can select a preset, see exactly what it configures, and customize individual values if they want. The preset is the starting point, not a cage.

WhatsApp and SMS: Meeting Agents Where They Are

Many agents in India and Southeast Asia run their operations primarily through WhatsApp. A dashboard they never open is useless. The system must push critical information to them through channels they already use.

Scheduled Messages:

Time	Message
Start of night session	"Good evening. Your night budget: ₹10L. Current exposure: ₹0. System is ready."
Every 2 hours during session	"Update: ₹3.4L used (34%). 89 bets tonight. Estimated profit: +₹24,000. All green."
When yellow threshold hit	"Heads up: Cricket exposure at 72% of limit. System still accepting bets. No action needed unless you want to adjust."
When red threshold hit	"Your cricket night limit has been reached. System is now forwarding new cricket bets to your upline. Hedge bets still accepted. You are protected."
End of night session	"Night summary: 214 bets. Net result: +₹1,85,000. Forwarded ₹12.4L to Vikram. Maximum loss was ₹4.1L (41% of budget). Settlement pending."

Interactive Commands (via WhatsApp chatbot):

Agent Types	Command	Response
"status"	Current exposure, limits, traffic light color
"stop cricket"	Sets cricket forwarding to 100%. Confirmation: "Cricket bets now forwarding 100%. You retain zero risk."
"resume cricket"	Restores previous cricket settings.
"panic"	Triggers the hedge-all panic button. "All positions being hedged. Forwarding set to 100%. You are safe."
"limit 15L"	Updates night budget to ₹15L. Confirmation with new max-loss number.
"sharp amit"	Tags user Amit as sharp. "Amit's bets will now be forwarded 95%. Confirm?"

This means an agent sitting in a chai shop watching the match on TV can manage their entire book through WhatsApp without ever opening a dashboard.

The Principle: Complexity Is Available, Never Required

The entire UX philosophy can be summarized in one sentence: the system should work perfectly for an agent who never touches a single setting after onboarding, AND it should give full control to an agent who wants to tune every parameter.

The 3-question onboarding generates safe, profitable defaults. The traffic light tells the agent if anything needs attention. The "Sleep Well" number gives them peace of mind. WhatsApp keeps them informed. The full dashboard is there when they want it. The matrix editor is there when they are ready for it.

No agent should ever feel overwhelmed by the system. And no agent should ever feel limited by it.

15. Performance Architecture

Latency Budget

The entire bet processing path must complete within 90 milliseconds on the synchronous path, with 110ms of headroom before the user experience degrades.

Step	Budget	Description
Request parsing & validation	5ms	Parse the incoming bet request, validate format
Metrics computation	3ms	Calculate potential win, liability
User win cap check	5ms	Check per-click and aggregate limits
Stake reduction (if needed)	2ms	Calculate reduced stake
Matrix resolution	10ms	Look up the forwarding percentage
Agent cap evaluation (per level)	10ms x N levels	Check limits at each cascade level (typically 2-3 levels)
Position creation	15ms	Write positions to database
Exposure ledger update	10ms	Update all affected ledgers atomically
Audit record creation	10ms	Persist the audit trail
Response	5ms	Return confirmation to the punter
Total (3-level cascade)	~85ms	Within budget

Memory Architecture: 3-Tier

Tier	Technology	TTL	Hit Rate	Use Case
Tier 1	Application LRU Cache	5 seconds	~60%	Exposure far from limits, matrix lookups, agent config
Tier 2	Redis	Until invalidated	~25%	Exposure checks, active period boundaries, NO_NEW_RISK flags
Tier 3	PostgreSQL	Persistent	~15%	Near-limit exposure writes (FOR UPDATE), position creation, audit records

Burst Traffic Handling: IPL Final Scenario

During the IPL final, bet volume can spike to 10,000 bets per minute (about 167 per second). Here is how the system handles it:

Challenge	Solution
Database write contention	Sharded exposure counters -- instead of one row per agent per sport, use N shards. Each bet updates a random shard. Reads sum all shards.
Matrix lookup speed	Pre-computed matrix resolution cache. When an agent's matrix changes, all possible resolution paths are pre-computed and cached. During the match, lookups are O(1) hash lookups.
Audit trail write volume	Batch audit writes. Audit records are buffered in memory and flushed every 500ms. In a crash, up to 500ms of audit records may be lost (positions are never lost because they use the synchronous path).
Redis connection pool	Dedicated Redis connection pool for exposure checks, separate from general-purpose cache.

Sharded Exposure Counters

For agents receiving high bet volume, a single exposure counter row becomes a bottleneck because every bet needs to lock it.

Solution: instead of one counter, use 8 shards:

RAJESH CRICKET EXPOSURE (SHARDED)
===================================
Shard 0:  ₹4,75,000
Shard 1:  ₹5,12,000
Shard 2:  ₹4,88,000
Shard 3:  ₹5,01,000
Shard 4:  ₹4,95,000
Shard 5:  ₹5,23,000
Shard 6:  ₹4,67,000
Shard 7:  ₹4,89,000
-----------------------------------
Total:    ₹39,50,000 / ₹50,00,000

Each incoming bet randomly picks a shard and only locks that shard. Contention drops by 8x. Reads sum all shards (slightly slower but still sub-millisecond from Redis).

What's Cached Where and for How Long

Data	Cache Location	TTL	Invalidation
Agent forwarding matrix	Application LRU + Redis	5 min / until change	On matrix update, invalidate immediately
Agent limits configuration	Application LRU + Redis	5 min / until change	On limit update, invalidate immediately
Current exposure (far from limit)	Application LRU	5 seconds	Time-based expiry
Current exposure (near limit)	Not cached	--	Always read from PostgreSQL with lock
User win cap state	Redis	Until period reset	On bet placement (update), on period reset (clear)
NO_NEW_RISK flags	Redis	Until cleared	On limit breach (set), on settlement or limit change (clear)
Period boundaries	Application LRU	1 hour	On config change
Sharp user flags	Redis	1 hour	On detection service update

16. Competitive Landscape

Feature	Betfair	bet365	Pinnacle	Asian Books	Hannibal (Target)
Risk Model	Pure exchange (no risk)	B-Book + A-Book hybrid	Sharp-friendly B-Book	Primarily B-Book	Hierarchical B-Book with automated routing
Agent Hierarchy	None (B2C only)	None (B2C only)	None (B2C only)	Manual, phone-based	Automated, N-level, with cascading
Forwarding Logic	N/A	Proprietary, opaque	N/A	Manual negotiation	Configurable multi-dimensional matrix
Limit Management	Market-based liquidity	Per-user, opaque	Minimal (welcomes sharps)	Per-user, manual	Per-agent, per-sport, per-market, per-period
Audit Trail	Exchange provides full transparency	Minimal for agents	Basic	None	Complete, replayable, deterministic
Sharp Handling	Exchange market handles it	Restrict accounts aggressively	Welcome and manage	Restrict and forward	Configurable per-user forwarding
Hedge Options	IS the exchange	Internal + Betfair	Internal models	Betfair + internal	Betfair + multi-exchange (planned)
Target Market	Developed (UK, AU)	Global B2C	Global B2C (niche)	Asia, manual agent networks	Agent networks: India, SE Asia, Africa

What to Learn from Each

Competitor	Lesson for Hannibal
Betfair	The exchange model provides perfect transparency. Hannibal's audit trail should aspire to Betfair-level transparency within its B-Book model.
bet365	Their risk management is world-class but opaque. Agents hate opacity. Hannibal should match their sophistication while providing the transparency agents demand.
Pinnacle	Their sharp-friendly model proves you can profit even from sharp users if you manage margins correctly. Hannibal's forwarding matrix should allow agents to choose their sharp tolerance.
Asian Books	They understand the agent hierarchy model deeply but use manual processes. Hannibal automates what they do by hand.

17. Phased Rollout Plan

Phase 1: Agent Risk Controls (Weeks 1-4)

Goal: Every agent has enforceable limits and real-time exposure tracking.

Week	Deliverable
1-2	Per-agent limit configuration (sport, market, period) + database models
2-3	Real-time exposure tracking with Redis fast-path
3-4	NO_NEW_RISK mode (automatic trigger + manual override)
4	Per-click user win limits + stake reduction

Why this first: Limits and exposure tracking are independent of the forwarding matrix. They provide immediate value by protecting agents from over-exposure. Even without smart routing, agents get safety.

Success metric: Zero incidents where an agent exceeds their configured limit.

Phase 2: Smart Forwarding (Weeks 5-10)

Goal: Bets are routed through the hierarchy based on configurable rules.

Week	Deliverable
5-6	Forwarding matrix data model + basic resolution (market_type + sport_type)
6-7	Precedence chain (user override > market override > matrix > default)
7-8	Cascading upline routing (N-level)
8-9	Aggregate user win limits + period definitions (night/weekly)
9-10	Integration testing, overflow scenarios, suspended agent handling

Why this second: This is the core value proposition. But it depends on the limits and exposure tracking from Phase 1.

Success metric: Bets correctly routed through 3+ levels with deterministic, auditable decisions.

Phase 3: Intelligence & Polish (Weeks 11-16)

Goal: Full 5-dimensional matrix, advanced detection, and complete audit trail.

Week	Deliverable
11-12	Full 5D matrix (add event_phase, source_type, liquidity_band)
12-13	Advanced hedge detection (multi-outcome worst-case analysis)
13-14	Complete structured audit trail with replay capability
14-15	Cross-agent sharp detection and syndicate detection
15-16	Agent dashboard v2 with all reports and the panic button

Why this third: The 5D matrix and advanced detection are refinements. The 2D matrix from Phase 2 handles 80% of cases. Phase 3 handles the remaining 20%.

Success metric: Audit trail passes third-party review. Sharp detection flags known sharp users within 100 bets.

Phase 4: Scale & Optimize (Weeks 17+)

Goal: Handle peak traffic, add intelligence, expand hedge options.

Deliverable	Description
ML-based odds adjustment	Use historical data to adjust odds before they reach the punter
Multi-exchange hedging	Hedge on Betfair, Smarkets, Betdaq, and local exchanges
Mobile dashboard	Full agent dashboard on mobile devices
Sharded exposure counters	Handle 10,000+ bets/minute during IPL final
Auto-matrix optimization	Suggest matrix changes based on historical P&L

18. Revenue Model

Four Revenue Streams

Stream	Description	Example
Transaction Fees	A small percentage of every bet processed through the platform	1-2% of stake on every bet
Retained Risk Profit	The platform retains a portion of bets (at the top of the cascade). On average, the bookie has an edge, so retained risk is profitable over time.	Platform retains ₹800 of a ₹10,000 bet. Over thousands of bets, the edge produces ~5% margin.
Betfair Arbitrage Spread	When hedging on Betfair, the platform can capture a spread between the price offered to the punter and the price available on the exchange.	Punter gets odds of 1.85, Betfair offers 1.90. The 0.05 spread on every hedged rupee is pure profit.
Data Intelligence	Aggregate anonymized betting data has value for odds compilation, market making, and risk modeling.	Subscription service for odds providers and analytics firms.

Financial Modeling Example

Consider a moderately busy day on Hannibal:

DAILY FINANCIAL MODEL
=====================

Total Stake Processed:        ₹5,00,00,000  (₹5 crore)

Revenue Stream Breakdown:
-----------------------------------------------------------------
Transaction Fees (1.5%):      ₹7,50,000
  → All bets, regardless of routing

Platform Retained Risk:       ₹50,00,000 stake retained
  → 5% edge over time:       ₹2,50,000

Betfair Hedge Spread:         ₹1,00,00,000 hedged
  → Average 0.03 spread:     ₹3,00,000

Data Intelligence:            ₹50,000 (amortized daily)
-----------------------------------------------------------------
TOTAL DAILY REVENUE:          ₹13,50,000

ANNUAL PROJECTION:            ₹49+ crore
  (assuming 365 operating days and modest growth)

The Key Insight

Hannibal is an operating system for bookmakers, not a bookmaker itself.

This distinction is critical. A bookmaker takes risk and profits (or loses) from betting outcomes. Hannibal provides the infrastructure that enables agents to take risk efficiently. Like an operating system, it earns from:

Providing the platform (transaction fees)
Running a small retained book at the top of the cascade (retained risk)
Facilitating exchange access (hedge spread)
Generating intelligence from aggregate data

This means Hannibal's revenue is diversified and largely non-directional. A bad day for bookies (punters win big) still generates transaction fees. A good day for bookies generates both fees and retained risk profit. The operating system always earns.

19. Implementation Order (for Developers)

The Guiding Principle

The single most important architectural insight: a bet currently goes from 1 routing destination to N destinations. Every implementation step moves toward this goal incrementally.

Step-by-Step Order

Step 1: Data Models First

Add all new Prisma models to the schema without changing any behavior. This is the safest possible first step -- it is purely additive. New tables for: forwarding matrix rules, agent limits, exposure ledgers, period definitions, audit records, and user win caps.

No existing behavior changes. No existing tests break. The database migration is backward-compatible.

Step 2: Audit Trail Second

Implement the audit record creation for every bet, even before forwarding logic changes. This provides immediate value: every bet now has a complete decision record. It also serves as an early warning system when we start changing routing behavior -- we can compare audit trails before and after.

Step 3: User Win Limits Third

Implement per-click win limits and stake reduction. This is independent of the agent hierarchy and forwarding logic. It sits at the very beginning of the bet flow (before routing decisions). It can be tested and deployed in isolation.

Step 4: Forwarding Precedence Chain Fourth

Implement the resolution logic: user override, then market override, then matrix lookup, then agent default. Initially, the matrix will be simple (2 dimensions: market_type and sport_type). The cascade still goes to a single destination, but the forwarding percentage is now determined by the precedence chain instead of a flat configuration.

Step 5: Cascading Routing Fifth

This is the big structural change. A bet that previously went to one destination now flows through the full agent hierarchy. Each level resolves its own forwarding percentage, checks its own limits, and forwards the remainder.

This must be implemented behind a feature flag so it can be enabled per agent. Early adopters test it while others continue with the existing behavior.

Step 6: NO_NEW_RISK and Hedge Detection Sixth

Implement automatic NO_NEW_RISK triggering and hedge detection. This depends on the exposure ledgers from Step 5 being accurate, which is why it comes after cascading routing.

Step 7: Period Management Last

Implement night and weekly periods, with timezone handling and carry-forward logic. This is last because it is the most operationally complex feature and depends on all other components working correctly.

Feature Flag Strategy

Every major feature is wrapped in a feature flag:

Flag	Controls	Default
`bbook.forwarding_matrix.enabled`	Whether the matrix is used for routing decisions	OFF
`bbook.cascading_routing.enabled`	Whether bets cascade through the hierarchy	OFF
`bbook.user_win_limits.enabled`	Whether per-click and aggregate win limits are enforced	OFF
`bbook.no_new_risk.enabled`	Whether NO_NEW_RISK mode can activate	OFF
`bbook.period_management.enabled`	Whether night/weekly periods are active	OFF
`bbook.audit_trail.enabled`	Whether audit records are created	ON (from Step 2 onward)

Flags can be toggled per agent. This allows:

Gradual rollout to trusted agents first
Quick rollback if issues are discovered
A/B comparison between old and new routing
Production testing with real traffic but limited blast radius

The Migration Path

TODAY                     PHASE 1                  PHASE 2                  PHASE 3
=====                     =======                  =======                  =======

Bet → Single Agent        Bet → Single Agent       Bet → Agent L1           Bet → Agent L1
  (flat B-Book %)           (with limits)            → Agent L2              → Agent L2
  (no limits)               (win caps)               → Platform              → Platform
  (no audit)                (audit trail)             → Betfair               → Betfair
                                                     (2D matrix)             (5D matrix)
                                                     (basic cascade)         (hedge detection)
                                                                             (NO_NEW_RISK)
                                                                             (periods)

20. The Bookie's Final Verdict

The Spec Gets the Math Right, but Misses Operational Reality

The B-Book system as designed is mathematically sound. The forwarding matrix, cascading routing, and exposure accounting are all correct. But a system that is correct on paper and a system that survives contact with real bookies operating during a live IPL match are two different things.

Here is what operational reality demands:

The Five Things That Will Make or Break the System

1. Speed of configuration changes.

During a live match, a bookie needs to change their matrix in seconds, not minutes. If MI loses 3 wickets in an over and the bookie wants to reduce retention from 40% to 10%, the system must allow this change to take effect on the very next bet. A configuration change that requires a page reload, a cache flush, or a 30-second propagation delay is unacceptable.

Design response: Matrix changes take effect immediately. The system invalidates all caches for the affected agent synchronously. The next bet uses the new matrix.

2. Visibility into what is happening RIGHT NOW.

Bookies do not look at reports after the fact. They look at dashboards during the match. They need to see, in real time: current exposure by match, current exposure by outcome, how close they are to each limit, which users are winning, which users are losing, and what bets are coming in right now.

Design response: The real-time dashboard (Section 14) is not a nice-to-have; it is the product. The B-Book engine is invisible infrastructure. The dashboard is what agents interact with.

3. The ability to override everything.

No matter how good the matrix is, there will be moments when the bookie wants to override it. "I have inside information that this match is suspicious -- forward everything." "This user is my cousin -- let his bet through even though it exceeds the cap." The system must support manual overrides at every level without breaking the audit trail.

Design response: User overrides and market overrides sit above the matrix in the precedence chain. Every override is logged. The system accommodates human judgment while ensuring accountability.

4. Settlement speed and accuracy.

The bookie's trust in the system is earned at settlement time. If settlements are delayed, incorrect, or confusing, the agent will abandon the platform. Settlement must happen within minutes of a match ending, and the numbers must match exactly what the agent expects based on what they saw on their dashboard.

Design response: Settlement cascades through the hierarchy using the same audit records that were created at bet time. The agent can verify every settled bet against the audit trail. Discrepancies are impossible because the settlement engine uses the same source of truth as the bet engine.

5. Graceful degradation, not catastrophic failure.

When something goes wrong -- and it will, during the biggest matches at the worst possible time -- the system must degrade gracefully. A Redis outage should not reject bets; it should fall back to PostgreSQL. A Betfair outage should not block bet placement; it should absorb as retained risk. A matrix configuration error should not route bets to the void; it should fall back to the agent default.

Design response: Every component has a fallback. The cascade has a backstop. The system is designed to always accept the bet and always route it somewhere safe, even if that somewhere is not optimal.

What Would Make Every Bookie Want This System

The ultimate test is simple: does this system make more money for the bookie while requiring less manual work?

If Rajesh can configure his risk appetite once, trust the system to enforce it, see his position in real time, sleep through the night knowing his limits protect him, settle with Vikram cleanly every Monday, and identify his sharp users before they drain his bankroll -- then this system wins.

The B-Book is not a technical achievement to be admired. It is a tool to be used. Its success will be measured not in latency percentiles or audit trail completeness, but in how many agents adopt it, how much volume flows through it, and how few disputes arise from it.

Build the dashboard first. Make the math invisible. Let the bookie focus on what they do best: understanding their market and their punters. Let the system handle everything else.

Part II: Gap Analysis Solutions -- The Immune System

The following sections address critical gaps identified during expert review by a veteran B-Book architect (20+ years, 4 B-Book systems built) and a senior financial systems engineer. These are the "immune system" of the B-Book -- the mechanisms that handle failures, prevent exploits, and ensure the system degrades gracefully when things go wrong.

As the reviewer noted: "The forwarding matrix is the brain. What is missing is the immune system. Build the safety systems before you build the intelligence systems."

21. Bet Cancellation / Void / Partial Settlement State Machine

Why This Matters

In real bookmaking, bets do not always travel the happy path from placement to settlement. Matches get abandoned. Rain interrupts play after 10 overs. A corruption ruling voids specific markets. An admin discovers a data feed error and needs to void bets placed during a 30-second window. A punter calls within 5 seconds asking to cancel.

Every one of these scenarios must be handled without breaking the exposure ledgers, without double-counting, and without leaving orphaned positions anywhere in the agent hierarchy.

The Complete Bet State Machine

State Definitions

State	What It Means	Exposure Impact	Reversible?
BET_PLACED	Bet accepted, positions being created. Transient state (< 100ms).	Ledgers not yet updated	Yes -- system error during creation rolls back
ACTIVE	All positions created, all exposure ledgers updated. The bet is live.	Fully reflected in all agent ledgers	No -- can only move forward to SETTLED, VOIDED, etc.
SETTLED	Event result known, P&L calculated, payouts determined.	Exposure removed from ledgers, P&L applied	Can move to RE_SETTLED if result correction
VOIDED	Entire bet nullified. All stakes returned. As if the bet never happened.	All exposure atomically removed from every agent in the chain	No -- void is final
PARTIALLY_VOIDED	Some markets/legs within the bet are voided, others settled normally.	Voided portion removed, settled portion resolved normally	No
CANCELLED	Punter-initiated cancellation within the allowed window. Functionally identical to void.	All exposure removed	No
CASH_OUT_SETTLED	Punter took early settlement via cash-out.	Original position closed, counter-position created and settled	No
RE_SETTLED	A previously settled bet has been re-settled due to result correction.	Previous P&L reversed, new P&L applied	Can be re-settled again if needed
REJECTED	Bet failed validation (invalid market, suspended event, etc.). Never reached ACTIVE.	Zero -- no positions were created	No

Who Can Initiate Each State Transition

Transition	Initiated By	Authorization Required	Time Window
ACTIVE -> SETTLED	System (automatic)	Event result feed	After event concludes
ACTIVE -> VOIDED	Platform admin	ADMIN or SUPER_ADMIN role	Any time before settlement
ACTIVE -> VOIDED	System (automatic)	Abandoned event rule triggers	When event is officially abandoned
ACTIVE -> PARTIALLY_VOIDED	Platform admin or system	Same as void	When specific markets are voided
BET_PLACED -> CANCELLED	Punter	Punter's own bet only	Within cancellation window (configurable, typically 3-5 seconds for pre-match, 0 seconds for in-play)
BET_PLACED -> CANCELLED	Agent (Rajesh)	Agent can cancel bets of their own punters	Within 60 seconds (configurable per agent)
SETTLED -> RE_SETTLED	Platform admin	SUPER_ADMIN role only	Within 72 hours of original settlement

Key rule: Agents cannot void bets. Only the platform can void. Agents can cancel within a short window. This prevents an agent from voiding a bet after seeing that it lost (which would be a form of fraud against their upline).

What Triggers Each Void Type

Trigger	Void Type	Scope	Example
Match abandoned (weather, floodlight failure)	FULL_VOID	All markets on that event	IPL match abandoned after rain, no result possible
Match abandoned after partial completion	PARTIAL_VOID	Completed markets settle, incomplete markets void	IPL match abandoned after 10 overs -- completed over markets settle, match odds void
Corruption/match-fixing ruling	FULL_VOID	All markets on that event	ICC declares match result void due to fixing investigation
Data feed error	SELECTIVE_VOID	Bets placed during the error window	Odds feed showed 1.05 instead of 10.5 for 30 seconds; bets during that window are voided
Punter cancellation	CANCELLATION	Single bet	Amit taps "Cancel" within 3 seconds of placing a pre-match bet
Admin decision	ADMIN_VOID	Any scope (single bet, all bets on a market, all bets on an event)	Admin discovers a technical glitch and voids affected bets

How Voids Cascade Through the Agent Hierarchy

This is the critical design challenge. When Amit's bet was placed, it was split across Rajesh (60%), Vikram (24%), and the Platform (16%). A void must reverse every single one of those positions atomically.

The key principle: the void operation reads the original audit record to determine exactly what to reverse. It does not recalculate anything. It uses the recorded split from bet placement time. This ensures that even if Rajesh changed his matrix since then, the void reverses exactly what was originally done.

Idempotent Void Operations

Every void operation is assigned a unique void_operation_id. Before executing, the system checks whether this void_operation_id has already been applied.

VOID IDEMPOTENCY CHECK
========================

1. Admin requests: void bet_a1b2c3d4, reason: MATCH_ABANDONED
2. System generates: void_op_id = void_bet_a1b2c3d4_MATCH_ABANDONED_v1
3. System checks: SELECT * FROM void_operations WHERE void_op_id = ?
4a. If NOT found:  Execute void, record void_op_id with result
4b. If found:      Return the recorded result, do NOT execute again

This means:
- Pressing "Void" twice does NOT double-decrement exposure
- A network retry after timeout does NOT create a second void
- A batch void that partially fails can be safely retried

The void_operations table stores:

Column	Type	Purpose
void_op_id	TEXT (PK)	Idempotency key
bet_id	TEXT	Which bet was voided
void_type	ENUM	FULL_VOID, PARTIAL_VOID, CANCELLATION, ADMIN_VOID
reason	TEXT	Human-readable reason
initiated_by	TEXT	User ID of who initiated it
positions_reversed	JSONB	Snapshot of every position that was reversed
ledger_adjustments	JSONB	Snapshot of every ledger decrement
executed_at	TIMESTAMP	When the void was applied
idempotent_hit_count	INT	How many times this void was re-requested after first execution

How Exposure Ledgers Are Atomically Decremented

The void executes as a single database transaction with FOR UPDATE locks on all affected exposure ledger rows. The transaction includes:

BEGIN TRANSACTION;

-- Lock all affected ledger rows in a deterministic order
-- (always lock by agent_id ascending to prevent deadlocks)

SELECT * FROM exposure_ledgers
WHERE (agent_id = 'rajesh' AND scope = 'cricket_sport')
   OR (agent_id = 'rajesh' AND scope = 'mi_vs_csk_match')
   OR (agent_id = 'rajesh' AND scope = 'night_period')
   OR (agent_id = 'vikram' AND scope = 'cricket_sport')
   ... (all affected scopes for all agents)
FOR UPDATE;

-- Decrement each ledger by the exact amount from the audit record
-- Update Redis cache after commit
-- Insert void_operation record
-- Update bet status to VOIDED

COMMIT;

After the database transaction commits, Redis and application LRU caches are invalidated for all affected agents. The invalidation order does not matter because the caches are read-through -- a cache miss simply reads the correct value from PostgreSQL.

NO_NEW_RISK Re-evaluation After Voids

When a void reduces an agent's exposure, the system must check whether the agent should exit NO_NEW_RISK mode:

AFTER VOID:
1. Read Rajesh's current retained_open_liability for each scope
2. Compare against each limit
3. If retained_open_liability < limit for ALL scopes:
   → Clear NO_NEW_RISK flag in Redis
   → Agent can accept new risk bets again
4. If still over limit for any scope:
   → Keep NO_NEW_RISK active for that scope

This check happens inside the same transaction as the void. The NO_NEW_RISK flag in Redis is updated immediately after the transaction commits.

Walk-Through: IPL Match Abandoned After 10 Overs

Scenario: MI vs CSK, IPL 2026. Rain stops play after 10 overs. The match is officially abandoned -- no result. Here is what happens:

The markets on this match:

Market	Status at Abandonment	Action
Match Odds (MI win / CSK win / Draw)	Incomplete -- no result determined	VOIDED -- stakes returned
First Innings Total Runs	Incomplete -- only 10 overs bowled of 20	VOIDED -- stakes returned
Over 1 Runs (6.5 over/under)	Completed -- over 1 finished with 8 runs	SETTLED -- over 6.5 wins
Over 2 Runs (7.5 over/under)	Completed -- over 2 finished with 6 runs	SETTLED -- under 7.5 wins
... Overs 3-10 ...	Completed	SETTLED normally
Over 11 Runs	Not started	VOIDED -- stakes returned
Top Batsman	Incomplete	VOIDED -- stakes returned

How this processes:

Step 1: Event marked as ABANDONED by the feed or admin.

The system receives the event status update: event_status = ABANDONED, overs_completed = 10.

Step 2: Market-level settlement rules kick in.

Each market has a settlement rule defined at creation time:

Market Type	Abandonment Rule
Match Odds	Void if no result
Over X Runs	Settle if over X is completed, void if not
Top Batsman	Void unless one innings fully completed
First Innings Total	Void unless first innings fully completed

Step 3: System generates a batch of void and settlement operations.

For this match, there are 847 open bets. The system processes them in a single settlement batch:

312 bets on Match Odds: all VOIDED
85 bets on First Innings Total: all VOIDED
43 bets on Top Batsman: all VOIDED
20 bets on Over 11-20 markets: all VOIDED
387 bets on Over 1-10 markets: all SETTLED with actual results

Step 4: Void cascade for each voided bet.

Take Amit's bet as an example. He bet ₹10,000 on MI to win at 1.85. The original audit record shows:

Rajesh retained: ₹6,000 stake, ₹5,100 liability
Vikram retained: ₹2,400 stake, ₹2,040 liability
Platform retained: ₹800 stake, ₹680 liability
Betfair hedged: ₹800 stake

The void reverses all of these. Amit gets his ₹10,000 back. Rajesh's exposure drops by ₹5,100. Vikram's drops by ₹2,040. The platform's drops by ₹680. The Betfair hedge is cancelled (if unmatched) or counter-traded (if matched).

Step 5: Settlement cascade for each settled bet.

Sonia bet ₹5,000 on Over 1 Runs Over 6.5 at odds 1.90. Over 1 completed with 8 runs (over 6.5 wins). This bet is settled as a winner. The settlement cascade pays out through the same chain that held the positions.

Step 6: Rajesh sees the result on his dashboard.

MI vs CSK -- MATCH ABANDONED (Rain)
=====================================
Voided markets:    Match Odds, First Innings Total, Top Bat, Overs 11-20
Settled markets:   Over 1-10 Runs

Your positions:
  Voided:    ₹1,82,000 stake returned (34 bets)
  Settled:   +₹24,000 profit (18 winning bets, 14 losing bets)
  Net:       +₹24,000 profit from completed overs

Exposure released: ₹3,45,000 (your cricket limit now has more headroom)

The Partial Void Edge Case

What about a multi-leg (accumulator/parlay) bet where one leg is voided but others settle? The standard industry rule is:

When one leg of a multi-leg bet is voided, that leg is treated as a winner at odds 1.00. The remaining legs settle normally with the voided leg's odds removed from the accumulator calculation.

Example: Amit places a 3-leg accumulator:

Leg 1: MI win at 1.85 (VOIDED -- match abandoned)
Leg 2: RCB win at 2.10 (SETTLED -- RCB won)
Leg 3: KKR win at 1.70 (SETTLED -- KKR lost)

Original combined odds: 1.85 x 2.10 x 1.70 = 6.60 After void adjustment: 1.00 x 2.10 x 1.70 = 3.57

But Leg 3 lost, so the entire accumulator loses. The void did not save the bet.

If all non-voided legs had won, the payout would use the reduced odds (3.57 instead of 6.60). The positions at each agent level are recalculated based on the reduced odds and the partial void is applied to the difference.

22. MVCC for Forwarding Matrix Changes

The Problem

At 9:47 PM during MI vs CSK, Rajesh changes his forwarding matrix. He reduces his retention on in-play match odds from 40% to 15% because MI just lost 3 quick wickets.

At the exact moment he saves this change, there are 15 bets in various stages of processing. Some are in the matrix resolution step, some are in cap evaluation, some are about to write positions. If half of those bets use the old matrix and half use the new one, the audit trail becomes inconsistent and unexplainable.

The solution is Multi-Version Concurrency Control (MVCC) for the forwarding matrix. Every matrix change creates a new version. Every bet captures which version it used. Old versions are preserved for audit and replay.

How Matrix Versions Work

RAJESH'S MATRIX VERSIONS
==========================

Version 1 (created: Feb 1, 2026 10:00 AM)
  - Initial setup via onboarding wizard
  - 8 rules, "Balanced" profile
  - active_from: 2026-02-01T10:00:00Z
  - active_until: 2026-02-11T21:47:00Z  (set when V2 was created)

Version 2 (created: Feb 11, 2026 9:47 PM)
  - Rajesh reduced in-play match odds retention from 40% to 15%
  - 8 rules, modified Rule R5
  - active_from: 2026-02-11T21:47:00Z
  - active_until: NULL  (current active version)

Version 3 (created: Feb 11, 2026 10:15 PM)
  - Rajesh restored original settings after MI stabilized
  - 8 rules, Rule R5 back to 40%
  - active_from: 2026-02-11T22:15:00Z
  - active_until: NULL  (will become current active version)

The Version Data Model

Each matrix version is an immutable snapshot:

Field	Type	Description
version_id	UUID	Unique identifier for this version
agent_id	TEXT	Which agent owns this matrix
version_number	INT	Monotonically increasing sequence per agent
rules	JSONB	The complete set of matrix rules (immutable snapshot)
created_at	TIMESTAMP	When this version was created
created_by	TEXT	Who created it (agent, admin, system)
change_reason	TEXT	Why the change was made (free text or enum)
active_from	TIMESTAMP	When this version became the active version
active_until	TIMESTAMP	When this version was superseded (NULL if current)
checksum	TEXT	SHA-256 of the rules JSONB, for integrity verification

Key design rule: matrix versions are immutable. Once created, a version is never modified. A "change" always creates a new version. The active_until field on the old version is the only field that changes (it gets stamped when superseded).

How Bets Capture Their Matrix Version

When a bet enters the matrix resolution step, it captures the current active version ID before evaluating any rules:

BET PROCESSING TIMELINE
========================

1. Bet arrives at matrix resolution step
2. Read current_active_version_id for this agent from cache
   → This is an atomic read: either version 1 or version 2, never a mix
3. Load the rules for that specific version
4. Evaluate rules against bet characteristics
5. Record the version_id in the bet's audit trail
6. Continue to cap evaluation and position creation

The version_id is captured ONCE at the start of matrix resolution.
All subsequent steps for this bet use the rules from that version.

How This Interacts With the 3-Tier Cache

Each cache tier must be version-aware. When Rajesh saves a new matrix version:

Cache layer behavior on version change:

Cache Tier	What Happens	Why
Tier 1 (App LRU)	Entry for `rajesh_active_matrix` is immediately invalidated via pub/sub	Next read loads from Redis or DB
Tier 2 (Redis)	`rajesh_active_matrix_version` key updated atomically to new version_id	All app instances see the new version on next read
Tier 3 (PostgreSQL)	New version row inserted, old version's `active_until` set	Source of truth, always correct

The version-awareness rule for caches:

Each cache entry for a matrix stores the version_id alongside the rules. When a cache hit returns a version_id that does not match the current active version (which can happen in the brief window between DB write and cache invalidation), the cache entry is treated as a miss and the current version is loaded from the next tier.

CACHE ENTRY STRUCTURE
======================
Key:   matrix:rajesh:active
Value: {
  version_id: "v2-uuid-here",
  version_number: 2,
  rules: [...],
  cached_at: "2026-02-11T21:47:01Z"
}

On read, the system also checks:
  → Is this version_id still the active version? (via a lightweight Redis lookup)
  → If not, treat as cache miss

Audit Trail Records Which Version Was Used

Every bet's audit record includes:

matrix_resolution:
  agent: rajesh
  matrix_version_id: "v1-uuid-here"
  matrix_version_number: 1
  matrix_checksum: "sha256:abc123..."
  rule_matched: R3
  rule_specificity: 3
  forward_percentage: 40%
  resolution_timestamp: "2026-02-11T21:46:59Z"  (2 seconds BEFORE matrix change)

This means that during a dispute, the system can show: "This bet used matrix version 1, which was active from Feb 1 to Feb 11 at 9:47 PM. The rule that matched was R3 with 40% forwarding. Matrix version 2 (15% forwarding) was created 2 seconds later and did not affect this bet."

Garbage Collection of Old Versions

Old matrix versions must be retained for audit and replay purposes. The garbage collection policy is:

Age of Version	Retention Policy
< 90 days	Full retention. All rules, all metadata.
90 days - 1 year	Compressed retention. Rules stored as compressed JSONB. Metadata retained.
> 1 year	Archive to cold storage (S3/equivalent). Only metadata in DB. Rules retrievable on demand.
> 3 years	Delete unless referenced by an unresolved dispute.

Versions that are referenced by any active (unsettled) bet are NEVER garbage collected, regardless of age. The reference count is maintained via a simple foreign key from the bet audit record to the matrix version.

Walk-Through: Rajesh Changes Matrix Mid-IPL-Match, 15 Bets In-Flight

Setup: MI vs CSK, 9:47 PM. MI just lost their 3rd wicket in 2 overs. Rajesh panics and changes his in-play match odds retention from 40% to 15%.

At the moment of the change, 15 bets are in various stages:

Bets 1-5	Already past matrix resolution, in cap evaluation or position creation	These captured matrix version 1 (40% retention). They complete with 40% retention.
Bets 6-10	In the request queue, not yet started processing	These will read the new active version (version 2, 15% retention) when they reach matrix resolution.
Bets 11-15	Currently in matrix resolution step	These are the interesting ones.

For bets 11-15: The matrix version read is atomic. Each bet reads the current_active_version_id once. If the read happens before the Redis key is updated, they get version 1. If after, they get version 2. There is no "half old, half new" state.

Timeline:

9:47:00.000  Rajesh clicks "Save" on new matrix
9:47:00.005  PostgreSQL: new version 2 created, version 1 active_until = NOW
9:47:00.010  Redis: rajesh_active_matrix_version updated to v2
9:47:00.012  App LRU: rajesh cache entry invalidated

Bet 11: matrix resolution at 9:47:00.003 → reads LRU cache → gets version 1 → 40% forward
Bet 12: matrix resolution at 9:47:00.008 → LRU invalidated, reads Redis → gets version 1 (Redis update at .010 not yet committed) → 40% forward
Bet 13: matrix resolution at 9:47:00.011 → LRU invalidated, reads Redis → gets version 2 → 15% forward
Bet 14: matrix resolution at 9:47:00.015 → LRU miss, reads Redis → gets version 2 → 15% forward
Bet 15: matrix resolution at 9:47:00.020 → LRU loads version 2 → 15% forward

Result: Bets 11-12 used version 1 (40% retention). Bets 13-15 used version 2 (15% retention). The transition is clean. Each bet's audit trail records exactly which version was used. Rajesh can see in his dashboard:

Matrix Change Applied at 9:47 PM
=================================
Last bet with old matrix (v1, 40% retention):  9:47:00.008 PM
First bet with new matrix (v2, 15% retention): 9:47:00.011 PM
Transition time: 3 milliseconds

3 bets processed with old matrix during transition
No bets received inconsistent matrix data

23. Dead Letter Queue and Poison Bet Handling

The Problem

In any distributed system, some messages will fail to process. In Hannibal, this means some bets will fail to route through the cascade, fail to create positions, or fail to update exposure ledgers. These failed bets cannot be silently dropped -- they represent real money that a punter believes they have wagered.

A "dead letter" is a bet that has exhausted its retry budget and cannot be processed automatically. A "poison bet" is a specific class of dead letter where the bet is fundamentally unprocessable -- retrying it will never succeed.

The Retry Pipeline

Retry Policy

Retry Stage	Max Retries	Backoff	Total Window
Immediate retry	3	50ms, 200ms, 500ms	~750ms
Short backoff retry	3	2s, 5s, 10s	~17s
Long backoff retry	3	30s, 60s, 120s	~3.5 min
Dead letter (RETRYABLE)	3	5 min, 15 min, 30 min	~50 min
Dead letter (POISON)	0	N/A -- goes directly to manual queue	N/A

Total automatic retry window: approximately 55 minutes from first failure to final dead letter classification. This is deliberate -- most infrastructure issues (Redis restart, database failover, network partition) resolve within this window.

What Constitutes a Poison Bet

A bet is classified as POISON (unprocessable by retry) when:

Condition	Why It Is Poison	Example
Event already settled	The bet is for an event that has already concluded. Placing a position makes no sense.	Bet queued during outage, event settles before replay.
Agent suspended mid-processing	The bet was partially processed when the agent was suspended. The cascade path is now invalid.	Rajesh suspended for payment default while 5 of his bets were in the retry queue.
Market no longer exists	The market was removed or never existed (data error).	Bet references a market_id that was deleted due to a data feed error.
Stake exceeds agent's total possible capacity	Even with zero current exposure, the agent cannot absorb any of this bet (limit is smaller than the bet's minimum position).	Agent's total sport limit is ₹10,000 but the bet requires ₹50,000 minimum position.
Invalid state transition	The bet is in a state that cannot transition to ACTIVE (e.g., already CANCELLED).	Punter cancelled the bet during the retry window.
Duplicate bet_id	A bet with this exact ID already exists in ACTIVE state (the original succeeded but the confirmation was lost, triggering a retry).	Network timeout caused client to retry, first attempt actually succeeded.

The Dead Letter Queue Data Model

Field	Type	Description
dlq_entry_id	UUID	Unique identifier
bet_id	TEXT	The original bet ID
original_request	JSONB	Complete original bet request (preserved exactly)
failure_reason	TEXT	Why it failed
failure_category	ENUM	POISON, RETRYABLE, UNKNOWN
retry_count	INT	How many retries were attempted
retry_history	JSONB	Timestamps and error messages for each retry
first_failure_at	TIMESTAMP	When the first failure occurred
last_retry_at	TIMESTAMP	When the last retry was attempted
dead_lettered_at	TIMESTAMP	When it was moved to the DLQ
resolution_status	ENUM	PENDING, IN_REVIEW, RESOLVED_VOID, RESOLVED_PROCESSED, RESOLVED_REFUND
resolved_by	TEXT	Who resolved it (admin user ID)
resolved_at	TIMESTAMP	When it was resolved
resolution_notes	TEXT	Free-text notes from the resolver
punter_notified	BOOLEAN	Whether the punter has been told about the issue
punter_notification_sent_at	TIMESTAMP	When the notification was sent

What the Punter Experiences

This is the most delicate part of DLQ design. The punter tapped "Place Bet" and saw a response. What did they see?

Scenario A: Failure during initial processing (before confirmation)

The punter saw: "Bet is being processed..." followed by an error or timeout. They do NOT see "Bet Confirmed." In this case:

The system shows: "Your bet could not be processed. Please try again."
The bet enters the retry pipeline silently
If retries succeed, the punter receives a push notification: "Your bet on MI to win has been confirmed."
If retries fail and the bet is dead-lettered, the punter receives: "Your bet on MI to win could not be placed. No funds were deducted."

Scenario B: Failure during cascade (after partial processing)

This is the dangerous case. The punter's bet was accepted and confirmed (because the initial validation passed), but the cascade failed mid-way. The punter saw "Bet Confirmed." Their account shows the bet as active. But the positions were not fully created across the agent hierarchy.

In this case:

The punter continues to see "Bet Active" -- we do NOT retroactively change their view
The system retries the cascade in the background
If retries succeed, everything is reconciled and the punter never knows there was an issue
If retries fail, the bet enters the DLQ and an admin resolves it

Scenario C: Poison bet (event already settled)

The bet was queued during an outage. By the time the system recovers, the event has already settled. The bet cannot be placed retroactively.

The punter saw "Bet is being processed..." (or possibly "Bet Confirmed" if the initial ack was sent)
Resolution options for the admin:
1. VOID -- return the stake, notify the punter: "Your bet on MI to win was cancelled due to a technical issue. Your stake of ₹10,000 has been refunded."
2. SETTLE AT RESULT -- if the bet would have been placed had the system been working, settle it as if it were placed. This is the punter-friendly option but creates financial exposure that was never accounted for in the ledgers.
3. SETTLE AT VOID -- void the bet but offer the punter a goodwill credit.

Platform policy recommendation: For pre-match bets that failed during a system outage, VOID and refund is the standard. For in-play bets, VOID is the only safe option because the odds may have moved significantly during the outage.

The Manual Resolution Queue Workflow

Admin dashboard for the manual resolution queue:

DEAD LETTER QUEUE -- MANUAL RESOLUTION
=======================================

Pending: 3 entries    |    Oldest: 12 minutes    |    Today resolved: 7

┌───────────────────────────────────────────────────────────────────┐
│ DLQ-001  POISON  HIGH PRIORITY                       12 min ago  │
│                                                                   │
│ Bet: ₹15,000 on MI to win @ 1.85                                │
│ Punter: Amit (under Rajesh)                                      │
│ Reason: Event MI_vs_CSK already SETTLED                          │
│ Punter saw: "Bet is being processed" (no confirmation sent)      │
│ Event result: MI won                                             │
│ If settled: Punter wins ₹12,750 (agent loses)                   │
│ If voided: Punter refunded ₹15,000 (no P&L impact)              │
│                                                                   │
│ [Void & Refund]  [Settle at Result]  [Escalate to Senior Admin]  │
└───────────────────────────────────────────────────────────────────┘

┌───────────────────────────────────────────────────────────────────┐
│ DLQ-002  RETRYABLE  MEDIUM PRIORITY                   8 min ago  │
│                                                                   │
│ Bet: ₹5,000 on RCB to win @ 2.10                                │
│ Punter: Sonia (under Rajesh)                                     │
│ Reason: Database timeout at position creation (retry 6/9)        │
│ Punter saw: "Bet Confirmed" (confirmation was sent)              │
│ Next auto-retry: in 7 minutes                                    │
│                                                                   │
│ [Force Retry Now]  [Void & Refund]  [Wait for Auto-Retry]       │
└───────────────────────────────────────────────────────────────────┘

Reconciliation for Orphaned Bets

An orphaned bet is one where the punter-facing record says "Active" but the agent-side positions were never fully created. The reconciliation process runs every 5 minutes:

ORPHAN DETECTION QUERY
========================
Find all bets WHERE:
  - bet_status = ACTIVE
  - created_at > 5 minutes ago (give normal processing time to complete)
  - created_at < 60 minutes ago (anything older is already in DLQ or resolved)
  - position_count < expected_position_count (based on hierarchy depth)

For each orphaned bet:
  1. Check if it is in the retry pipeline → skip (it is being handled)
  2. Check if it is in the DLQ → skip (it is being handled)
  3. Otherwise → add to DLQ as RETRYABLE with note "Detected by reconciliation"

Walk-Through: Bet Queued During Outage, Event Settles Before Replay

Setup: 9:30 PM, the database connection pool is exhausted during a traffic spike. Bets are being accepted (the API layer is healthy) but position creation is failing. The system is retrying bets with exponential backoff.

9:30:15 PM: Amit places ₹15,000 on MI to win at 1.85. The API accepts the request and returns "Bet is being processed." The bet enters the retry pipeline because the database write fails.

9:30:15 - 9:31:00 PM: Retries 1-3 (immediate): fail. Database still overloaded.

9:31:00 - 9:31:17 PM: Retries 4-6 (short backoff): fail. Database recovering.

9:32:00 PM: The MI vs CSK match ends. MI wins. The settlement service settles all active positions for this event.

9:32:30 PM: Retry 7 (long backoff) fires. The system attempts to create positions for Amit's bet. But now the event is SETTLED. The position creation logic detects: "Event MI_vs_CSK is in SETTLED state. Cannot create new positions."

9:32:30 PM: The bet is classified as POISON with reason: EVENT_ALREADY_SETTLED. It enters the Dead Letter Queue.

9:32:31 PM: An alert fires on the admin dashboard: "1 poison bet detected. Event settled before bet could be processed."

9:33:00 PM: The on-duty admin reviews the DLQ entry. They see:

The bet was submitted at 9:30:15 PM, before the event ended
The punter never received a "Bet Confirmed" message
The event result: MI won
If they settle it: Amit wins ₹12,750 (which the agents never priced into their exposure)
If they void it: Amit gets ₹15,000 refunded, no P&L impact

9:33:30 PM: The admin chooses "Void & Refund." Amit receives a push notification: "Your bet on MI to win could not be processed due to a technical issue. Your ₹15,000 has been refunded. We apologize for the inconvenience."

Why void is the correct default: The agents in the cascade never had this bet's exposure counted against their limits. Settling it retroactively would create phantom exposure that was never risk-managed. The safe choice is always to void and refund.

24. Settlement Cascade Failure Isolation

The Problem

When an IPL final settles, the system might need to process 4,000 positions across 50 agents. If the database times out at position 2,847, what happens to positions 1-2,846 (already processed) and 2,848-4,000 (not yet processed)? The answer cannot be "start over from scratch" -- that would double-settle the first 2,846 positions. And it cannot be "give up" -- that would leave 1,153 positions unsettled.

Per-Position Settlement State Tracking

Every position has its own settlement state, independent of all other positions:

State	Meaning	Duration
PENDING	Event result is known, this position is waiting to be settled	Seconds to minutes (queued)
PROCESSING	A settlement worker has claimed this position and is calculating P&L	Milliseconds (fast)
SETTLED	P&L has been calculated and the exposure ledger has been updated	Seconds (until reconciliation)
FAILED	An error occurred during processing. Will be retried.	Until retry succeeds or exhausts retries
CONFIRMED	Post-settlement reconciliation has verified this position's numbers match	Permanent (terminal)

The Settlement Worker Design

Settlement is processed by independent workers that claim positions in batches:

SETTLEMENT WORKER FLOW
========================

1. Worker polls for positions in PENDING state
   → Claims a batch of up to 100 positions
   → Uses SELECT ... FOR UPDATE SKIP LOCKED
   → This means: lock the rows, but if another worker already locked them, skip and take the next ones

2. For each position in the batch:
   a. Set state to PROCESSING
   b. Calculate P&L based on event result and position odds/stake
   c. Update the agent's exposure ledger (decrement retained_open_liability)
   d. Update the agent's settled P&L ledger
   e. Set state to SETTLED
   f. If any step fails: set state to FAILED, record error, move to next position

3. After the batch:
   → Commit all SETTLED positions
   → FAILED positions remain in FAILED state for retry
   → Worker picks up next batch

The critical design: each position is settled independently. Position 2,847 failing does not block position 2,848. The worker simply records the failure and moves on.

Agent-Level Isolation

Settlement is partitioned by agent. Each agent's positions are settled by a separate worker thread (or, in high-volume scenarios, a separate worker process). This provides fault isolation:

SETTLEMENT PARTITIONING
========================

Event: MI vs CSK (SETTLED: MI wins)
Total positions: 4,000 across 50 agents

Worker 1: Rajesh's 180 positions
Worker 2: Vikram's 420 positions (including forwarded positions)
Worker 3: Priya's 95 positions
Worker 4: Suresh's 310 positions
...
Worker 50: Kwame's 15 positions

Each worker operates independently.
Rajesh's settlement failure does NOT block Vikram's settlement.

Settlement Ordering: Does It Matter?

Within a single agent: Order does not matter. Each position is independent. Settling position 3 before position 1 produces the same final ledger state.

Across agents: Order does not matter for financial accuracy. Rajesh's settlement is independent of Vikram's. The exposure ledgers are per-agent, so there is no cross-agent dependency.

One exception: the platform's Betfair hedge positions. If the platform needs to close hedge positions on Betfair, this should happen AFTER all agent-side positions are settled, because the platform needs to know the final net position before deciding how to close the hedge. Design rule: platform hedge settlement runs after all agent settlements are CONFIRMED (or FAILED-and-escalated).

Settlement Reconciliation

After each settlement batch, a reconciliation check runs:

RECONCILIATION CHECK (per event, per agent)
=============================================

For each agent who held positions on this event:

1. Sum all SETTLED position P&L values
   → This is what we actually settled

2. Compare against the pre-settlement exposure ledger
   → retained_open_liability for this event should now be zero
   → forwarded_open_liability for this event should now be zero

3. Check invariant:
   → Sum(all position stakes) = Original bet stake (for each bet)
   → Sum(retained P&L) + Sum(forwarded P&L) = Total P&L (for each bet)

4. If any invariant fails:
   → Flag for manual review
   → Do NOT proceed to CONFIRMED state
   → Alert: "Settlement reconciliation failed for agent X on event Y"

How Partial Failures Are Detected and Resumed

The system runs a settlement monitor job every 30 seconds:

SETTLEMENT MONITOR
===================

Every 30 seconds, check:

1. Positions in PROCESSING state for > 60 seconds
   → The worker that claimed them probably crashed
   → Reset to PENDING (the SELECT FOR UPDATE lock is released on crash anyway)

2. Positions in FAILED state
   → Retry count < 5: reset to PENDING for auto-retry
   → Retry count >= 5: escalate to DLQ for manual resolution

3. Events where some positions are CONFIRMED but others are still PENDING/FAILED
   → This is a partial settlement
   → Generate a report: "Event X: 3,847 of 4,000 positions settled. 153 pending retry."
   → If any positions have been FAILED for > 15 minutes: alert admin

Walk-Through: IPL Final, 4,000 Positions, DB Timeout at Position 2,847

Setup: MI vs CSK IPL final. MI wins. The settlement dispatcher partitions 4,000 positions across 50 agents and launches settlement workers.

Timeline:

10:45:00 PM: Event result received. Settlement dispatcher starts.

10:45:01 PM: Workers launched. Each worker begins processing their batch.

10:45:05 PM: Workers 1-30 complete successfully. 2,200 positions settled.

10:45:08 PM: Worker 31 (processing Suresh's 310 positions) hits a database timeout at position 2,847 (globally numbered). The worker has already settled 147 of Suresh's positions.

Worker 31 Status:
  Suresh's positions: 310 total
  SETTLED: 147 (P&L calculated and committed)
  PROCESSING: 1 (position 148 -- the one that timed out)
  PENDING: 162 (not yet attempted)
  FAILED: 0 (the timeout means position 148 is still PROCESSING)

10:45:08 PM: Worker 31 catches the timeout. It sets position 148 to FAILED with reason "DB_TIMEOUT". It continues to position 149.

10:45:09 PM: Workers 32-50 continue processing other agents' positions. They are unaffected by Suresh's timeout. Vikram's 420 positions settle successfully. Rajesh's 180 positions settle successfully.

10:45:12 PM: Worker 31 finishes Suresh's remaining positions. Result:

Worker 31 Final Status:
  Suresh's positions: 310 total
  SETTLED: 309
  FAILED: 1 (position 148, DB_TIMEOUT)

10:45:15 PM: All other workers complete. Global status:

EVENT SETTLEMENT STATUS: MI vs CSK
====================================
Total positions:     4,000
SETTLED:             3,999
FAILED:              1
CONFIRMED:           0 (reconciliation pending)

Failed position:
  Agent: Suresh
  Position ID: pos_2847
  Bet: ₹8,000 on MI to win @ 1.85
  Failure: DB_TIMEOUT at ledger update
  Retry: scheduled in 30 seconds

10:45:45 PM: The settlement monitor picks up the FAILED position. It resets it to PENDING. A worker claims it and retries. This time, the database is healthy. The position settles successfully.

10:46:00 PM: Reconciliation runs for all agents. All invariants pass. All 4,000 positions move to CONFIRMED.

10:46:01 PM: Agents see settlement results on their dashboards:

RAJESH'S SETTLEMENT: MI vs CSK
================================
Result: MI wins

Your retained positions:
  56 bets backing MI:     +₹2,34,000 (punters lost)
  24 bets backing CSK:    -₹1,18,000 (punters won)
  Net P&L:                +₹1,16,000

Forwarded positions settled with Vikram:
  Forwarded P&L: -₹42,000 (Vikram owes you ₹42,000)

Settlement time: 16 seconds (from event result to confirmed)

25. Cash-Out / Early Settlement Design

What Is Cash-Out?

Cash-out allows a punter to settle their bet early, before the event finishes. The punter locks in a guaranteed profit (or limits a loss) instead of waiting for the final result. From the system's perspective, cash-out is not magic -- it is simply a counter-bet at current odds that closes out the original position.

How Cash-Out Price Is Calculated

The cash-out price is the fair value of the punter's position, minus a margin for the platform. Here is the formula:

CASH-OUT CALCULATION
=====================

Original bet: Back MI to win at odds 1.85, stake ₹10,000
  → Potential win if MI wins: ₹8,500
  → Loss if MI loses: ₹10,000

Current situation: MI now at odds 1.20 (MI dominating)
  → The bet is "in profit" because MI is more likely to win now

Fair value of the position:
  → If Amit held the OPPOSITE position now (lay MI at 1.20),
    he would need to risk ₹2,000 to win ₹10,000
  → His original bet pays ₹18,500 total return if MI wins
  → A lay at 1.20 for ₹10,000 stake costs ₹2,000 liability

Cash-out value (fair):
  = Original stake - (Original stake / Current odds)
  = ₹10,000 - (₹10,000 / 1.20)
  ... wait, let us use the proper formula:

  Cash-out value = Stake * (Original odds / Current odds)
  = ₹10,000 * (1.85 / 1.20)
  = ₹10,000 * 1.5417
  = ₹15,417

  But this is the TOTAL return. The profit portion:
  = ₹15,417 - ₹10,000 = ₹5,417

  This is the fair value. Now apply the cash-out margin (typically 3-5%):
  Cash-out margin: 5%
  Cash-out offer = ₹15,417 * (1 - 0.05) = ₹14,646

  Punter profit if they cash out: ₹14,646 - ₹10,000 = ₹4,646
  (vs. ₹8,500 if MI wins and ₹0 return if MI loses)

The General Cash-Out Formula

For a back bet:

cash_out_return = stake * (original_odds / current_odds) * (1 - margin)
cash_out_profit = cash_out_return - stake

For a losing position (odds moved against the punter):

Original: MI at 1.85, stake ₹10,000
Current: MI at 3.50 (MI struggling)

cash_out_return = 10,000 * (1.85 / 3.50) * (1 - 0.05)
               = 10,000 * 0.5286 * 0.95
               = ₹5,021

Punter gets back ₹5,021 of their ₹10,000 stake.
They accept a ₹4,979 loss now instead of risking the full ₹10,000.

How Cash-Out Routes Through the Cascade

This is the critical design decision. The cash-out counter-bet must route through the SAME proportions as the original bet, not the current matrix.

Why? Because the positions are held at specific agents in specific proportions. If Rajesh retained 60% of the original bet, he holds 60% of the position. The cash-out must close 60% of his position, not whatever the current matrix says.

How Each Agent's Position Changes

BEFORE CASH-OUT (MI at 1.20)
==============================

Agent       Retained Stake    Potential Payout    Net Exposure
-------     --------------    ----------------    ------------
Rajesh      ₹6,000           ₹11,100 (if MI wins)  ₹5,100 liability
Vikram      ₹2,400           ₹4,440 (if MI wins)   ₹2,040 liability
Platform    ₹1,600           ₹2,960 (if MI wins)   ₹1,360 liability

AFTER CASH-OUT
===============

All positions CLOSED. Each agent's P&L is locked in:

Agent       Retained Stake    Cash-out Portion    Agent's P&L
-------     --------------    ----------------    -----------
Rajesh      ₹6,000           Pays ₹8,787.60*     -₹2,787.60
Vikram      ₹2,400           Pays ₹3,515.04*     -₹1,115.04
Platform    ₹1,600           Pays ₹2,343.36*     -₹743.36
                              ────────────────
                              Total: ₹14,646.00

* Each agent pays: their portion of cash_out_return - their portion of original stake
  Rajesh: (₹14,646 * 60%) = ₹8,787.60 payout on ₹6,000 retained = -₹2,787.60 P&L
  (This is a loss for Rajesh because MI is winning and he held the liability side)

Partial Cash-Out

Amit does not have to cash out 100% of his position. He can cash out any percentage.

Example: Amit cashes out 50% of his position

PARTIAL CASH-OUT (50%)
========================

Original: ₹10,000 on MI to win at 1.85
Cash-out: 50% at current odds 1.20

Cash-out portion: ₹5,000 (50% of original stake)
Cash-out return: ₹5,000 * (1.85 / 1.20) * 0.95 = ₹7,323
Cash-out profit: ₹7,323 - ₹5,000 = ₹2,323

Remaining active position: ₹5,000 on MI to win at 1.85
  → This continues normally, settles with the event result

Agent impact (using original 60/24/16 split on the cashed-out portion):
  Rajesh: closes 60% of ₹5,000 = ₹3,000 position
  Vikram: closes 24% of ₹5,000 = ₹1,200 position
  Platform: closes 16% of ₹5,000 = ₹800 position

Each agent's REMAINING open position is also halved:
  Rajesh: ₹3,000 retained stake remaining (was ₹6,000)
  Vikram: ₹1,200 retained stake remaining (was ₹2,400)
  Platform: ₹800 retained stake remaining (was ₹1,600)

What Happens If Betfair Liquidity Is Insufficient

When the platform's portion was originally hedged on Betfair, the cash-out must also close that hedge. If Betfair does not have sufficient liquidity to close the hedge at the expected price:

Situation	What Happens
Hedge close-out is fully available	Platform closes hedge, cash-out proceeds normally
Hedge close-out is partially available	Platform closes what it can, retains the unhedged remainder as platform risk. Cash-out still proceeds for the punter (the punter's experience is not degraded by hedge-side issues).
Hedge close-out is unavailable (Betfair down or no liquidity)	Platform absorbs the full hedge portion as retained risk. Cash-out proceeds for the punter. Platform bears the residual risk.

The punter always gets their cash-out amount. The platform absorbs liquidity risk on the hedge side.

How Cash-Out Interacts With NO_NEW_RISK

Cash-out creates a counter-position that reduces the agent's exposure. Therefore, it should be treated as a hedge and allowed even when NO_NEW_RISK is active.

NO_NEW_RISK CHECK FOR CASH-OUT
================================

1. Agent is in NO_NEW_RISK for MI vs CSK
2. A cash-out request arrives for a bet on MI to win
3. The cash-out creates a COUNTER-position (effectively a lay on MI)
4. This REDUCES the agent's worst-case liability
5. Therefore: ALLOWED, even under NO_NEW_RISK

This is the same logic as normal hedge detection:
  If WorstCaseLiability AFTER < WorstCaseLiability BEFORE → Allow

Walk-Through: Amit Bets MI at 1.85, MI Now at 1.20, Amit Cashes Out

The original bet (placed at 7:30 PM):

Amit places ₹10,000 on MI to win at odds 1.85 during MI vs CSK IPL match, pre-match.

Routing (from original audit record):

Rajesh retains 60%: ₹6,000 stake, ₹5,100 liability
Vikram retains 24%: ₹2,400 stake, ₹2,040 liability
Platform retains 16%: ₹1,600 stake, ₹1,360 liability (₹800 hedged on Betfair)

The match situation (9:15 PM, 15th over):

MI is 145/2, well on track. MI's odds have dropped from 1.85 to 1.20. Amit is sitting on a healthy unrealized profit.

Amit requests cash-out (9:15 PM):

Step 1: System loads original bet audit record. Proportions: 60/24/16.

Step 2: System gets current MI odds: 1.20 (from the live odds feed).

Step 3: Calculate cash-out value:

cash_out_return = 10,000 * (1.85 / 1.20) * (1 - 0.05)
               = 10,000 * 1.5417 * 0.95
               = ₹14,646

Amit's profit: ₹14,646 - ₹10,000 = ₹4,646

Step 4: System presents offer to Amit:

┌──────────────────────────────────────────────┐
│  CASH OUT                                     │
│                                               │
│  Your bet: MI to win @ 1.85                   │
│  Current odds: MI @ 1.20                      │
│                                               │
│  Cash out for: ₹14,646                        │
│  Your profit: ₹4,646                          │
│                                               │
│  (If MI wins, you would win ₹8,500)           │
│  (If MI loses, you lose ₹10,000)              │
│                                               │
│  [Accept Cash-Out]    [Keep Bet Active]        │
└──────────────────────────────────────────────┘

Step 5: Amit accepts. The system creates counter-positions:

CASH-OUT EXECUTION
===================

Rajesh: CLOSE position of ₹6,000
  Original liability: ₹5,100
  Cash-out cost (his share): ₹14,646 * 0.60 = ₹8,787.60
  Rajesh P&L: ₹6,000 (stake received) - ₹8,787.60 (cash-out paid) = -₹2,787.60
  Rajesh exposure change: -₹5,100 (liability removed)

Vikram: CLOSE position of ₹2,400
  Original liability: ₹2,040
  Cash-out cost (his share): ₹14,646 * 0.24 = ₹3,515.04
  Vikram P&L: ₹2,400 - ₹3,515.04 = -₹1,115.04
  Vikram exposure change: -₹2,040 (liability removed)

Platform: CLOSE position of ₹1,600
  Cash-out cost (its share): ₹14,646 * 0.16 = ₹2,343.36
  Platform P&L: ₹1,600 - ₹2,343.36 = -₹743.36
  Platform exposure change: -₹1,360 (liability removed)
  Platform also closes Betfair hedge: counter-trade to flatten

Verification: ₹8,787.60 + ₹3,515.04 + ₹2,343.36 = ₹14,646.00 ✓

Step 6: Amit sees: "Cash-out successful. ₹14,646 credited to your account."

Step 7: Rajesh's dashboard updates: "Amit cashed out MI to win. Your exposure on MI vs CSK reduced by ₹5,100. P&L impact: -₹2,787.60."

Why the agents "lose" on this cash-out: MI is winning. The agents were on the liability side (they pay if MI wins). By cashing Amit out, they are locking in a loss. But this loss is smaller than what they would pay if MI wins and Amit's full ₹8,500 potential win materializes. The agents are actually reducing their worst-case outcome.

If MI had collapsed and lost, the agents would have profited ₹10,000 (the full stake). By allowing cash-out, they gave up some of that potential upside. This is the trade-off -- cash-out reduces volatility for everyone.

26. Lay Bet Support

What Is a Lay Bet?

A lay bet is the opposite of a back bet. When Amit backs MI to win, he is betting that MI will win. When Sonia lays MI to win, she is betting that MI will NOT win. Sonia wins if MI draws or loses.

In exchange-style betting (Betfair), every back bet has a corresponding lay bet. In B-Book systems, the agent is typically the layer -- they lay every back bet the punter places. But Hannibal must also support punters placing lay bets explicitly, because some markets and some punters operate this way.

How Liability Is Different for Lay Bets

For a back bet, the punter risks their stake and wins stake * (odds - 1). For a lay bet, the punter risks stake * (odds - 1) and wins the stake.

BACK BET: Amit backs MI to win at 1.85 for ₹10,000
  Amit risks: ₹10,000 (his stake)
  Amit wins:  ₹8,500 (if MI wins)
  Bookie liability: ₹8,500

LAY BET: Sonia lays MI to win at 1.85 for ₹10,000
  Sonia risks: ₹8,500 (her liability = stake * (odds - 1))
  Sonia wins:  ₹10,000 (if MI does NOT win)
  Bookie liability: ₹10,000 (bookie pays if MI does NOT win)

The critical difference for exposure tracking: A lay bet's liability is stake * (odds - 1) for the punter, but from the bookie's (agent's) perspective, the liability depends on which outcome occurs.

Exposure Tracking for Lay Bets

This is where it gets interesting. A lay bet on MI to win is economically equivalent to a back bet on "MI not to win." This means:

EXPOSURE IMPACT OF LAY BETS
=============================

Current position on MI vs CSK (Rajesh's book):

Before any lay bets:
  MI Win outcome:     ₹5,00,000 liability (back bets on MI)
  MI Not Win outcome: ₹0 liability

Sonia lays MI to win for ₹10,000 at 1.85:
  If MI wins:    Sonia pays ₹8,500 to Rajesh  → REDUCES MI Win liability
  If MI not win: Rajesh pays ₹10,000 to Sonia → INCREASES MI Not Win liability

After Sonia's lay bet:
  MI Win outcome:     ₹4,91,500 liability (reduced by ₹8,500!)
  MI Not Win outcome: ₹10,000 liability (increased by ₹10,000)

WORST CASE BEFORE: ₹5,00,000 (MI Win)
WORST CASE AFTER:  ₹4,91,500 (MI Win is still worse, but reduced)

Key insight: A lay bet on outcome X DECREASES the agent's exposure on outcome X and INCREASES exposure on the other outcomes. This is a natural hedge.

How the Forwarding Matrix Handles Lay Bets

Lay bets use the same 5-dimensional matrix as back bets, with one addition: the matrix resolution considers whether the bet direction (back vs lay) creates a hedge.

FORWARDING MATRIX RESOLUTION FOR LAY BETS
============================================

Step 1: Resolve the forward percentage normally
  → Same matrix, same rules, same precedence chain
  → Sonia's lay MI at 1.85 matches Rule R3: forward 40%

Step 2: Check if this lay bet is a natural hedge for the agent
  → Does the agent have existing liability on MI Win? YES (₹5,00,000)
  → Does this lay bet reduce that liability? YES (by ₹8,500)
  → Therefore: this is a hedge

Step 3: If hedge AND agent is in NO_NEW_RISK:
  → Allow the retention (do not force 100% forward)
  → The agent WANTS to keep this bet because it reduces their exposure

Step 4: If hedge AND agent is NOT in NO_NEW_RISK:
  → Normal matrix rules apply
  → Agent retains their configured percentage

The forwarding matrix does not need a separate "lay" dimension. The bet is processed through the same rules. The only difference is in how the exposure ledger is updated and how NO_NEW_RISK evaluates the bet.

How Hedge Detection Recognizes Lay Bets

The existing hedge detection formula works perfectly for lay bets:

HEDGE DETECTION FOR LAY BETS
==============================

Rule: If WorstCaseLiability AFTER < WorstCaseLiability BEFORE → it is a hedge

Before Sonia's lay bet:
  Worst case (MI Win):      ₹5,00,000
  Worst case (MI Not Win):  ₹0
  WorstCaseLiability:       MAX(₹5,00,000, ₹0) = ₹5,00,000

After Sonia's lay bet (Rajesh's portion, 60%):
  Rajesh keeps 60% of Sonia's lay → ₹6,000 stake, ₹5,100 liability adjustment
  Worst case (MI Win):      ₹5,00,000 - ₹5,100 = ₹4,94,900
  Worst case (MI Not Win):  ₹0 + ₹6,000 = ₹6,000
  WorstCaseLiability:       MAX(₹4,94,900, ₹6,000) = ₹4,94,900

₹4,94,900 < ₹5,00,000 → WorstCaseLiability decreased → HEDGE CONFIRMED

How NO_NEW_RISK Correctly Allows Hedging Lay Bets

When would a lay bet NOT be a hedge? If Rajesh's book is already heavily exposed on "MI Not Win" (meaning he has lots of bets backing CSK or backing the draw), then a lay on MI (which increases "MI Not Win" exposure) would increase his worst case. In that scenario, the lay bet is NOT a hedge and is forwarded 100% under NO_NEW_RISK.

Walk-Through: Sonia Lays MI to Win at 1.85 While Rajesh Is in NO_NEW_RISK

Setup: Rajesh has hit his per-match limit on MI vs CSK. His current exposure:

Rajesh's MI vs CSK Book:
  MI Win outcome:     ₹4,98,000 liability   ← this is the worst case
  MI Not Win outcome: ₹45,000 liability
  Match limit: ₹5,00,000
  Status: NO_NEW_RISK (₹4,98,000 / ₹5,00,000 = 99.6%)

Sonia places a lay bet: Lay MI to win at 1.85 for ₹10,000.

Step 1: Win cap check. Sonia's maximum loss on this lay bet is ₹8,500 (stake * (odds - 1)). Her per-click cap is ₹50,000. Pass.

Step 2: Matrix resolution. Rajesh's matrix says forward 40% for this bet type. Rajesh would retain 60%.

Step 3: NO_NEW_RISK check. Rajesh is in NO_NEW_RISK. Is this a hedge?

Rajesh retains 60% of Sonia's lay:
  → Stake portion: ₹6,000
  → If MI wins: Rajesh RECEIVES ₹5,100 from Sonia (reduces MI Win liability)
  → If MI not win: Rajesh PAYS ₹6,000 to Sonia (increases MI Not Win liability)

New worst cases:
  MI Win:     ₹4,98,000 - ₹5,100 = ₹4,92,900
  MI Not Win: ₹45,000 + ₹6,000 = ₹51,000

New worst case liability: MAX(₹4,92,900, ₹51,000) = ₹4,92,900
Old worst case liability: ₹4,98,000

₹4,92,900 < ₹4,98,000 → HEDGE CONFIRMED → ALLOW

Step 4: Position creation. Rajesh retains 60% of Sonia's lay bet. The exposure ledger is updated:

AFTER SONIA'S LAY BET:
Rajesh's MI vs CSK Book:
  MI Win outcome:     ₹4,92,900 liability   ← reduced!
  MI Not Win outcome: ₹51,000 liability
  Match limit: ₹5,00,000
  Status: still NO_NEW_RISK (but closer to exiting)

Step 5: Cascade. The remaining 40% (₹4,000) flows to Vikram, who processes it through his own matrix and cap checks normally.

Result: Rajesh's exposure on MI Win dropped from ₹4,98,000 to ₹4,92,900. If enough lay bets come in, Rajesh could exit NO_NEW_RISK entirely. The system correctly identified the lay bet as a hedge and allowed it.

27. Agent-Punter Collusion Detection

The Problem

Collusion between an agent and a punter is one of the most damaging exploits in a B-Book system. The basic scheme: Rajesh knows that sharp/winning bets get forwarded to his upline (Vikram). If Rajesh conspires with Amit, he can mark Amit as NORMAL (even though Amit is sharp), retain Amit's bets, and pocket the winnings. When Amit loses, Rajesh can retroactively mark him as SHARP to forward the losing flow upline.

A more sophisticated version: Rajesh marks Amit as SHARP to forward most of his bets upline. But Rajesh and Amit have agreed to split the profits. Amit consistently wins, Vikram consistently loses on the forwarded flow, and Rajesh and Amit split the difference off-platform.

Collusion Signals

Signal	What It Looks Like	Severity
Classification flip before winning streak	Agent changes user from NORMAL to SHARP (or vice versa), and within 24 hours the user has a winning streak	HIGH
Classification flip-flop	Agent changes user classification back and forth more than 3 times in a week	HIGH
Override matches outcome	User override percentage changes correlate with subsequent bet outcomes (higher forward when user wins, lower when user loses)	CRITICAL
Selective forwarding timing	Matrix changes coincide with specific user's betting patterns	HIGH
Forwarded flow consistently loses	Bets forwarded by this agent to upline have a significantly worse P&L than random chance	MEDIUM (could be legitimate if agent has good sharp detection)
Single user dominates forwarded flow	One user accounts for >30% of forwarded volume from this agent	MEDIUM
Win rate inversion	Agent's retained bets have above-market win rate while forwarded bets have below-market win rate	HIGH

The Cooling-Off Period for Classification Changes

To prevent the "flip before win" exploit, classification changes have a mandatory cooling-off period:

CLASSIFICATION CHANGE RULES
=============================

When an agent changes a user's classification (e.g., NORMAL → SHARP):

1. The change is QUEUED, not applied immediately
2. Cooling-off period: 24 hours
3. During the cooling-off period:
   → The OLD classification remains active for matrix resolution
   → The new classification is visible as "PENDING" in the agent dashboard
   → The agent cannot change the classification again until the current change completes
4. After 24 hours: the new classification takes effect
5. Exception: SHARP → NORMAL direction has a 72-hour cooling-off period
   (because downgrading from SHARP to NORMAL is the more exploitable direction)

OVERRIDE CHANGES follow the same rules:
  → Changing a user override from 80% forward to 20% forward: 24-hour cooling-off
  → Changing from 20% to 80%: 6-hour cooling-off (less risky direction)

Why different periods for different directions? Moving a user from NORMAL to SHARP increases forwarding (less risky for the agent, more protective). Moving from SHARP to NORMAL decreases forwarding (agent retains more, potentially exploitable). The riskier direction gets a longer cooling-off period.

Upline Audit Rights on Downstream Overrides

Vikram (upline) has the right to see and challenge classification changes made by Rajesh (downstream):

UPLINE AUDIT RIGHTS
=====================

Vikram can see (real-time):
  ✓ All user classifications set by Rajesh
  ✓ All pending classification changes
  ✓ History of all classification changes with timestamps
  ✓ Correlation report: classification changes vs user outcomes

Vikram can do:
  ✓ Flag a classification change for review
  ✓ Request the platform to freeze Rajesh's override capability
  ✓ Set minimum forwarding for specific users (override Rajesh's override)

Vikram CANNOT do:
  ✗ Directly change Rajesh's user classifications (that is Rajesh's business)
  ✗ See Rajesh's full user list (only users whose bets are forwarded to Vikram)

The Correlation Engine

The anti-collusion system runs a correlation analysis nightly:

COLLUSION CORRELATION ANALYSIS
================================

For each agent, for each user with classification changes in the last 30 days:

1. Build timeline:
   [Classification change timestamps] + [Bet placement timestamps] + [Bet outcomes]

2. Calculate: Within 48 hours after each classification change:
   → Count of bets placed by this user
   → Win rate of those bets
   → Compare against the user's historical win rate
   → Compare against the market-expected win rate

3. Score the correlation:
   → If win rate AFTER classification change is >2 standard deviations above normal:
     COLLUSION_SCORE += 25 per occurrence
   → If classification was changed from SHARP to NORMAL just before a winning streak:
     COLLUSION_SCORE += 50
   → If the same pattern repeats 3+ times:
     COLLUSION_SCORE += 100

4. Thresholds:
   → COLLUSION_SCORE < 25:  No action
   → 25-75:   Informational alert to platform compliance team
   → 75-150:  Warning to agent + upline notification
   → 150+:    Automatic freeze on agent's override capability, mandatory review

Alert Escalation Workflow

Walk-Through: Rajesh and Amit Collude

The scheme: Rajesh and Amit have an agreement. Amit is genuinely sharp -- he has a positive CLV over 1,000+ bets. Rajesh knows this. Instead of marking Amit as SHARP (which would forward 95% to Vikram), Rajesh keeps Amit as NORMAL (forwarding only 40%). Amit wins consistently, and Rajesh profits because he retained 60% of winning bets. They split the extra profit off-platform.

Week 1: Amit places 45 bets. Wins 28. Rajesh retained 60% of each. Rajesh's retained P&L from Amit: +₹1,85,000.

The system's sharp detection flags Amit based on CLV and win rate. It suggests to Rajesh: "Amit shows sharp characteristics. Consider classifying as SHARP."

Rajesh ignores the suggestion.

Week 2: Amit places 50 bets. Wins 31. Rajesh's retained P&L from Amit: +₹2,10,000.

The system sends a stronger alert to Rajesh: "Amit's CLV is +4.2% over 95 bets. This is above the SHARP threshold. Classification recommended."

Rajesh still ignores it.

Week 2 (same time): The cross-agent detection system notices that Amit's win rate (62%) is significantly above expected (48% given the odds profile). It also notices that Rajesh has NOT classified Amit as SHARP despite the system's recommendation. This triggers:

ANOMALY DETECTION ALERT
=========================
Agent: Rajesh
User: Amit
Alert Type: SUSPECTED_CLASSIFICATION_MANIPULATION

Evidence:
1. Amit's 95-bet CLV: +4.2% (SHARP threshold: +2.5%)
2. System recommended SHARP classification 2 times
3. Agent has not acted on recommendations for 14 days
4. Agent's retained P&L from Amit: +₹3,95,000 (top 1% among all agents)
5. If classified as SHARP, 95% would have been forwarded to Vikram
   Rajesh would have retained ~₹20,000 instead of ₹3,95,000

Collusion Score: 85 (WARNING level)

Actions Taken:
- Compliance team notified
- Vikram (upline) notified: "Your sub-agent Rajesh may be under-classifying user Amit"
- Rajesh's dashboard shows: "Compliance review pending for user Amit"

Week 3: Rajesh, realizing he has been flagged, marks Amit as SHARP. But the 72-hour cooling-off period means the change does not take effect for 3 days. During those 3 days, Amit places 15 more bets at NORMAL classification.

Week 3 + 72 hours: Classification change takes effect. Amit's bets are now forwarded at 95%.

Meanwhile: The compliance team reviews the case. They see:

Rajesh ignored 2 system recommendations
Rajesh profited ₹4,50,000+ from a user who should have been classified as SHARP
The timing of the eventual classification change coincides exactly with the compliance alert

Outcome: The compliance team escalates to the platform operations team, who:

Review the last 30 days of Amit's bets under Rajesh
Calculate the financial impact: ₹4,50,000 in retained profit that would have been ₹22,500 at SHARP classification
Issue a clawback of the excess profit (₹4,27,500) from Rajesh's settlement account
Place Rajesh on probation: his override capability is frozen for 90 days, all classifications are managed by the platform

28. Agent Hierarchy Migration

The Problem

Agent hierarchies are not static. In the real world, agents switch uplines all the time. Rajesh might leave Vikram's network and join Suresh's. This happens because of better commission terms, personal disputes, or business restructuring. The system must handle this migration cleanly, especially when Rajesh has open positions that were routed through Vikram.

Effective-Dated Hierarchy Changes

Hierarchy changes are never instantaneous. They take effect at a scheduled date and time, giving the system time to prepare:

HIERARCHY MIGRATION REQUEST
=============================

Request: Move Rajesh from Vikram to Suresh
Requested by: Platform admin
Effective date: 2026-03-01 00:00:00 IST (start of next weekly period)
Current state: Rajesh has ₹15,00,000 open liability forwarded through Vikram

Migration phases:
  Phase 1 (NOW → effective date):     PREPARATION
  Phase 2 (effective date):           CUTOVER
  Phase 3 (effective date → cleanup): DUAL PATH
  Phase 4 (after all old positions settle): COMPLETE

Dual-Path Settlement

This is the key design challenge. After cutover:

DUAL-PATH ROUTING
==================

BEFORE cutover (Feb 28):
  Rajesh → Vikram → Platform → Betfair
  All bets, all positions, all settlements go through Vikram

AFTER cutover (March 1):
  NEW bets:      Rajesh → Suresh → Platform → Betfair
  OLD positions:  Still settled through Vikram (he holds the positions!)

Why dual-path? Because Vikram's exposure ledgers reflect the positions he holds.
Settling them through Suresh would be incorrect -- Suresh never held that risk.

Bet Timing	Routing Path	Settlement Path
Placed before cutover, settling before cutover	Rajesh → Vikram	Through Vikram
Placed before cutover, settling after cutover	Rajesh → Vikram	Through Vikram (dual-path)
Placed after cutover	Rajesh → Suresh	Through Suresh

Open Exposure Handling During Transition

At cutover time, Rajesh has ₹15,00,000 in open liability forwarded to Vikram. This creates a financial obligation:

OPEN EXPOSURE RECONCILIATION
==============================

At cutover (March 1):

1. Freeze: No changes to Rajesh's forwarding through Vikram
   → Vikram's exposure ledger is frozen for Rajesh's old positions
   → New bets from Rajesh do NOT affect Vikram's ledgers

2. Track: Old positions are tagged with migration_id
   → Every position that existed at cutover time gets:
     migration_id: "mig_rajesh_vikram_to_suresh_20260301"
     routing_path: "OLD" (Vikram)

3. Settle: As old events settle, positions flow through Vikram
   → Vikram settles normally
   → When Vikram's Rajesh-related open liability reaches zero → dual-path ends

4. Financial bridge: If Rajesh owes Vikram (or vice versa) from old positions,
   the settlement continues until all old positions are resolved

Financial Settlement Between Old and New Upline

The tricky part: what if Rajesh has a net credit with Vikram from unsettled positions? And what about the weekly settlement cycle?

FINANCIAL SETTLEMENT AT MIGRATION
====================================

Step 1: Calculate Rajesh's net position with Vikram at cutover

  Open positions forwarded to Vikram: ₹15,00,000 liability
  Unsettled P&L (from recently settled events): ₹2,30,000 (Vikram owes Rajesh)

Step 2: Vikram pays the unsettled P&L to Rajesh immediately
  → ₹2,30,000 transferred (this is money already earned, not speculative)

Step 3: Open positions continue under Vikram until they settle
  → No money changes hands until settlement
  → Each settlement adjusts the balance between Rajesh and Vikram
  → Rajesh's weekly settlements with SURESH only include NEW bets

Step 4: When all old positions have settled:
  → Final reconciliation between Rajesh and Vikram
  → Any remaining balance settled
  → Migration status: COMPLETE
  → Vikram's records for Rajesh archived

Walk-Through: Rajesh Moves From Vikram to Suresh With 15 Lakh Open Liability

Background: Rajesh has been under Vikram for 3 years. Suresh offers better terms (lower forwarding commission). Rajesh negotiates the move. The platform admin approves the migration for March 1.

February 25 -- REQUESTED:

Admin creates migration request
System calculates: Rajesh has 42 open bets forwarded to Vikram, totaling ₹15,20,000 liability
Notification sent to Vikram: "Rajesh is migrating to Suresh, effective March 1. You have 42 open positions to settle."
Notification sent to Suresh: "Rajesh is joining your network, effective March 1."

February 26-28 -- PREPARATION:

Vikram confirms he is aware. No action needed from him.
Suresh confirms he is ready. His limits are checked: can he absorb Rajesh's typical daily flow?
Rajesh's forwarding matrix is cloned for the new relationship (he can modify it after cutover)
System pre-computes: "After cutover, Rajesh's bets will be processed by Suresh with these limits..."

March 1, 00:00 IST -- CUTOVER:

CUTOVER EXECUTED
==================

1. Rajesh's hierarchy parent changed: Vikram → Suresh
2. All existing positions tagged: migration_id = mig_rajesh_v2s_20260301
3. New bets from Rajesh's punters now route to Suresh

Dashboard shows:
  Rajesh: "You are now under Suresh. 42 old positions still settling through Vikram."
  Vikram: "Rajesh has moved. 42 open positions remain for settlement."
  Suresh: "Rajesh has joined. New bets are now routing through you."

March 1-7 -- DUAL_PATH:

35 of the 42 old positions settle during the week (events complete)
Each settlement flows through Vikram normally
New bets (150+ during the week) flow through Suresh

March 8 -- Remaining old positions:

7 positions remain (from events that have not yet concluded)
These are long-dated bets (tournament winner, series result)
Rajesh and Vikram continue to settle these as the events conclude

March 15 -- Last old position settles:

The final pre-migration position settles
Net financial settlement between Rajesh and Vikram: Vikram owes Rajesh ₹42,000
Transfer executed
Migration status: COMPLETE
Vikram's records for Rajesh are archived
Dual-path routing for Rajesh is deactivated

29. Minimum Forwarding / Skin-in-the-Game Requirements

The Problem

Vikram does not want Rajesh to forward 100% of sharp bets. If Rajesh forwards everything that loses and keeps everything that wins, Vikram is just absorbing toxic flow. Vikram wants to require Rajesh to have "skin in the game" -- a minimum amount of every bet that Rajesh MUST retain, regardless of his matrix settings.

How Minimum Retention Works

The upline agent sets a minimum retention percentage per downstream agent. This floor cannot be overridden by the downstream agent's matrix.

MINIMUM RETENTION CONFIGURATION
=================================

Vikram's settings for his sub-agents:

| Sub-Agent | Min Retention | Why |
|-----------|--------------|-----|
| Rajesh    | 20%          | Experienced, trusted, but must keep skin in the game |
| Priya     | 30%          | Newer agent, should retain more to build discipline |
| Arun      | 10%          | Very experienced, low minimum needed |

Where It Is Checked in the Cascade

The minimum retention check happens AFTER matrix resolution but BEFORE cap evaluation:

The important nuance: If Rajesh's limits would be breached by retaining the minimum 20%, the system does NOT reject the bet. Instead, Rajesh retains as much as his limits allow (which may be less than 20%). The minimum retention is enforced as a floor on the MATRIX percentage, not an absolute floor on the final retained amount. Limits always win over matrix settings -- this is a safety rule.

How Violations Are Handled

When Rajesh tries to configure his matrix in a way that violates Vikram's minimum retention:

SCENARIO: Rajesh tries to set 100% forwarding for SHARP users
  Vikram's minimum retention requirement: 20%

SYSTEM RESPONSE:
  ┌────────────────────────────────────────────────────────┐
  │  ⚠ Configuration Conflict                              │
  │                                                        │
  │  You set: Forward 100% for SHARP users                 │
  │  Your upline (Vikram) requires: Minimum 20% retention  │
  │                                                        │
  │  Adjusted rule: Forward 80% for SHARP users            │
  │  You will retain at least 20% of these bets.           │
  │                                                        │
  │  [Accept Adjusted Rule]  [Contact Vikram to Negotiate] │
  └────────────────────────────────────────────────────────┘

The system auto-adjusts the forwarding percentage to comply with the minimum retention. The agent sees the adjusted value. The agent cannot save a matrix rule that would violate their upline's minimum retention requirement.

Walk-Through: Vikram Requires 20%, Rajesh Tries to Forward 100% for Sharps

Setup: Vikram sets minimum retention of 20% for Rajesh. Rajesh, who has been losing money on a sharp user (Amit), wants to forward 100% of Amit's bets.

Attempt 1: Rajesh sets user override for Amit = 100% forward

System checks: 100% forward means 0% retention. Vikram's minimum is 20%. System response: "Cannot set forwarding above 80% for this user. Your upline requires 20% minimum retention. Adjusted to 80% forward."

Rajesh accepts. Amit's bets are now forwarded at 80%.

Attempt 2: Rajesh modifies his matrix Rule R1 (SHARP users) from 95% to 100%

System checks: 100% > max allowed (80%). System response: "Forwarding capped at 80%. Rule saved as 80% forward."

What about the catch-all rule? If Rajesh's catch-all rule (////*) is set to forward 50%, and Vikram's minimum is 20%, no conflict -- 50% forward means 50% retention, which exceeds 20%. No adjustment needed.

Why this design is correct: Vikram has a legitimate interest in ensuring Rajesh has skin in the game. Without minimum retention, Rajesh could dump all negative-expected-value flow upline while keeping positive-expected-value flow. The minimum retention ensures Rajesh shares in both the upside and downside of every bet, aligning incentives across the hierarchy.

30. Panic Button Abuse Prevention

The Problem

The panic button (Section 14) is a powerful tool: it immediately forwards 100% of new bets and hedges all retained positions on Betfair. Used legitimately, it is a safety net. Used abusively, it is a money machine.

The abuse pattern: Rajesh watches the match. When things go badly (his retained positions are losing), he hits panic -- hedging at current prices and locking in a partial loss. When things go well, he does not press panic -- collecting the full profit. Over time, this asymmetric usage means Rajesh only takes losses when they are small (he panicked early) and takes full profits when things go well. The cost of hedge execution spread is borne by the platform (or Betfair liquidity).

Who Bears the Cost of Hedge Execution

When the panic button is pressed, hedge orders are placed on Betfair. The spread between the price the system gets and the theoretical mid-price is a real cost. Who pays?

PANIC HEDGE COST ALLOCATION
=============================

When Rajesh presses panic at 9:30 PM:

1. System places hedge orders on Betfair for all Rajesh's retained positions
2. Betfair mid-price for MI to win: 1.50
3. System gets filled at: 1.48 (laying) and 1.52 (backing)
4. Spread cost: approximately 1.3% of hedged amount

Cost allocation:
  → First panic in a period: Platform absorbs the spread cost
    (This is a legitimate safety feature)
  → Second panic in same period: 50% spread cost charged to Rajesh's P&L
  → Third+ panic in same period: 100% spread cost charged to Rajesh's P&L

This makes the first panic "free" (encouraging use when genuinely needed)
but makes repeated use increasingly expensive (discouraging abuse).

Usage Limits and Cooling-Off Periods

Control	Value	Rationale
Panics per night period	2 free, unlimited at cost	Night sessions are volatile; 2 free panics covers genuine emergencies
Panics per week	5 free, unlimited at cost	Weekly cap prevents chronic abusers
Cooling-off after panic	30 minutes before matrix can be restored	Prevents the "panic, wait 5 minutes, restore, panic again" cycle
Minimum hedge duration	15 minutes	Once hedged, positions stay hedged for at least 15 minutes. Agent cannot un-hedge immediately when conditions improve.
Panic cost escalation	0%, 50%, 100%, 100%... per period	Each subsequent panic in the same period is more expensive

Monitoring and Flagging

PANIC BUTTON ABUSE DETECTION
==============================

The system tracks per agent, per period:

1. Panic frequency:
   → More than 3 panics per week for 3 consecutive weeks: FLAG
   → More than 2 panics in a single night: WARNING

2. Panic timing correlation:
   → Agent panics when their retained book is losing > ₹X
   → Agent does NOT panic when their retained book is winning
   → Asymmetric panic usage: collusion score increases

3. Panic profitability analysis:
   → Calculate: what would Rajesh's P&L be WITHOUT panic hedges?
   → Compare: what IS Rajesh's P&L WITH panic hedges?
   → If panic usage consistently improves P&L by > 20%: FLAG

4. Post-panic behavior:
   → Agent immediately restores original matrix after cooling-off ends: SUSPICIOUS
   → Agent keeps hedged state for hours: LEGITIMATE

Differentiating Legitimate Panic From Gaming

Behavior	Classification	Reason
Panic during a genuinely volatile match event (3 wickets in 1 over)	LEGITIMATE	Match conditions warrant caution
Panic at the start of every match, restore after 30 minutes	GAMING	Pattern suggests routine use, not emergency
Panic once per month during a crisis	LEGITIMATE	Rare, appropriate use
Panic 3 times per week, always when losing	GAMING	Asymmetric usage exploits the hedge
Panic after seeing a corruption alert or integrity flag	LEGITIMATE	Responding to genuine threat signal

Walk-Through: Rajesh Presses Panic When Losing, Repeats Weekly

Week 1, Wednesday: MI vs CSK. Rajesh retains ₹8,00,000 backing MI. MI loses 4 wickets cheaply. Rajesh's retained book is down ₹2,50,000 unrealized. He presses panic.

Result: System hedges all positions. Rajesh locks in a ₹1,80,000 loss (better than the potential ₹8,00,000 if MI collapses completely). Spread cost: ₹10,400. First panic of the period -- platform absorbs the cost.

Week 1, Friday: RCB vs DC. Rajesh retains ₹6,00,000 backing RCB. RCB's top batsman gets out. Rajesh presses panic.

Result: System hedges. Rajesh locks in a ₹95,000 loss. Spread cost: ₹7,800. Second panic of the period -- 50% charged to Rajesh (₹3,900).

Week 1, Sunday: KKR vs SRH. KKR winning. Rajesh's book is up ₹1,50,000. He does NOT press panic. KKR wins. Rajesh collects ₹1,50,000.

Week 2: Same pattern. Panic when losing, hold when winning.

Week 3: Same pattern. The abuse detection system now has 3 weeks of data.

PANIC ABUSE ALERT
==================
Agent: Rajesh

Pattern detected over 3 weeks:
  Panics triggered: 7
  Panics when book was losing: 7 (100%)
  Panics when book was winning: 0 (0%)
  
  P&L without panic: -₹4,20,000 (net loss over 3 weeks)
  P&L with panic:    -₹1,85,000 (net loss reduced by ₹2,35,000)
  
  Panic improved P&L by: 56%
  Spread cost absorbed by platform: ₹38,000

Assessment: GAMING (asymmetric panic usage)

Actions:
  1. Rajesh's next panic incurs 100% spread cost
  2. Alert sent to Rajesh: "Your panic button usage is under review."
  3. Vikram (upline) notified
  4. If pattern continues 1 more week: panic feature suspended for 30 days,
     replaced with automatic NO_NEW_RISK (which does not hedge existing positions)

31. Timestamp and Period Boundary Security

The Problem

If the client's clock determines when a bet was placed, punters and agents can manipulate timestamps. A punter could backdate a bet to before a period boundary (when limits had more headroom). An agent could manipulate their clock to extend a favorable night period. The system must use server-side timestamps for all authoritative decisions.

Where the Authoritative Timestamp Is Assigned

The authoritative timestamp is assigned at the earliest possible point in the server-side processing pipeline, before any business logic executes:

BET PROCESSING PIPELINE -- TIMESTAMP ASSIGNMENT
=================================================

1. Client sends bet request
   → Client includes client_timestamp (informational only, never trusted)
   
2. API gateway receives request
   → SERVER TIMESTAMP ASSIGNED HERE: request_received_at = NOW() on the server
   → This is the AUTHORITATIVE timestamp for ALL downstream decisions
   → It is immutable -- no subsequent step can change it

3. Request is queued for processing
   → processing_started_at = NOW() (separate timestamp, for latency tracking)

4. Matrix resolution uses request_received_at for period boundary evaluation
   → "Is this bet in the night period?" uses request_received_at, NOT client_timestamp

5. Position creation uses request_received_at as the official bet placement time
   → All exposure ledger updates reference this timestamp

6. Audit record stores BOTH timestamps:
   → client_timestamp: what the client claimed (for debugging/fraud detection)
   → server_timestamp: the authoritative time (for all business logic)

How Period Boundaries Are Determined

Period boundary evaluation always uses the server clock:

PERIOD BOUNDARY EVALUATION
============================

Input: request_received_at = 2026-02-11T16:29:59.500Z (UTC)
Agent: Rajesh (IST, night period 19:00-02:00)

Step 1: Convert to agent's timezone
  → 16:29:59.500 UTC = 21:59:59.500 IST

Step 2: Is this within the night period? 
  → Night start: 19:00 IST → YES, 21:59 is after 19:00
  → Night end: 02:00 IST → YES, 21:59 is before 02:00
  → Result: NIGHT PERIOD

Step 3: Check against night period limits
  → Use night_period exposure ledger

How Clock Skew Between Server Instances Is Handled

In a distributed deployment with multiple server instances, each instance has a slightly different clock. The maximum acceptable clock skew is managed through NTP synchronization:

CLOCK SKEW MANAGEMENT
========================

Requirement: All server instances must be synchronized to within 50ms of UTC
Mechanism: NTP (Network Time Protocol) with multiple time sources

If NTP sync fails:
  → Instance reports CLOCK_DRIFT_WARNING
  → If drift exceeds 200ms: instance is removed from the load balancer
  → If drift exceeds 1 second: instance auto-quarantines (stops accepting bets)

For period boundary decisions:
  → The 50ms skew window is irrelevant for period boundaries
    (which are at hour granularity: 19:00, 02:00)
  → A bet at 01:59:59.950 on Instance A and 02:00:00.050 on Instance B
    might be evaluated differently, but this is a 100ms window
    at most -- acceptable given the hour-scale period boundaries

For exposure counter consistency:
  → Timestamps on exposure ledger updates are server-generated
  → The ordering of writes is determined by the database (which has one clock),
    not by the application servers
  → Even if two instances disagree by 50ms on the time, the database
    orders writes correctly using its own monotonic clock

Bets at the Period Boundary

What happens when a bet arrives at exactly the boundary? For example, at 02:00:00.000 IST (the night period end for Rajesh)?

PERIOD BOUNDARY TIE-BREAKING
==============================

Rule: Bets at EXACTLY the boundary time belong to the ENDING period.

Why: The night period is defined as 19:00:00.000 to 01:59:59.999.
     02:00:00.000 is the first moment of the next period (day).

In practice:
  → request_received_at = 2026-02-12T01:59:59.999 IST → NIGHT period
  → request_received_at = 2026-02-12T02:00:00.000 IST → DAY period

This is a closed-open interval: [19:00, 02:00)

For exposure carry-forward:
  → When the night period ends at 02:00, any open positions from the night
    are carried forward to the day period (as described in Section 9)
  → The bet at 01:59:59.999 is the last bet counted against night limits
  → The bet at 02:00:00.000 is the first bet counted against day limits

Client Timestamp Fraud Detection

The client_timestamp is not trusted but is useful for detecting anomalies:

Anomaly	What It Means	Action
client_timestamp is > 30 seconds before server_timestamp	Client clock is behind, or deliberate manipulation	Log for monitoring. No immediate action.
client_timestamp is > 5 seconds AFTER server_timestamp	Client clock is ahead, which is unusual	Log and flag. Client may be trying to claim a later timestamp.
client_timestamp is > 5 minutes different from server_timestamp	Significant discrepancy	Flag for review. Possible automation/bot activity.
client_timestamp is consistently exactly N seconds offset	Clock calibration issue or deliberate offset	Informational. Some devices have persistent clock drift.

32. Sharp Detection Gaming via Multiple Accounts

The Problem

Sharp bettors know that bookmakers track their accounts and limit them. The obvious countermeasure: use many accounts. A syndicate of 50 accounts, each betting small amounts, can fly under the radar of per-account sharp detection. Each account looks like a casual punter. But collectively, they are placing coordinated bets that drain the agent's book.

The Detection Pillars

The cross-account syndicate detection system uses four independent signals. Any one signal alone might be coincidence. Two or more signals together strongly indicate coordination.

SYNDICATE DETECTION: 4 PILLARS
================================

Pillar 1: DEVICE FINGERPRINTING
  → Same device used by multiple accounts
  → Similar device configurations (screen size, OS version, installed fonts)

Pillar 2: IP / NETWORK CORRELATION
  → Multiple accounts from same IP address
  → Multiple accounts from same subnet
  → VPN detection (known VPN exit nodes)

Pillar 3: BETTING PATTERN SIMILARITY
  → Same outcomes, same timing, same markets
  → Correlated staking patterns
  → Similar CLV profiles

Pillar 4: PAYMENT METHOD OVERLAP
  → Same bank account linked to multiple user accounts
  → Same UPI ID, same wallet, same card
  → Money flow between linked accounts

Device Fingerprinting Integration

The system collects device attributes at every bet placement (not just at registration):

Attribute	Purpose	Collection Point
Browser/app user agent	Identifies device type and version	Every API request
Screen resolution	Distinguishes devices	Session start
Timezone offset	Cross-reference with claimed location	Every API request
Installed fonts / canvas fingerprint	High-entropy device identifier	Session start (web)
Device ID (mobile)	Unique device identifier	App installation
Battery level + charging state	Behavioral fingerprint	Session start (mobile)

Fingerprint matching algorithm:

DEVICE FINGERPRINT SIMILARITY SCORE
=====================================

For each pair of user accounts, compute:

score = 0

If same device_id:                     score += 100  (near-certain same device)
If same canvas fingerprint:            score += 80   (very likely same browser)
If same IP AND same user agent:        score += 60
If same screen resolution AND timezone: score += 30
If same subnet (first 3 octets):       score += 20

Thresholds:
  score >= 100:  SAME_DEVICE (automatically link accounts)
  score 60-99:   LIKELY_RELATED (flag for review)
  score 30-59:   POSSIBLY_RELATED (monitor)
  score < 30:    UNRELATED

IP Correlation Analysis

IP CORRELATION ANALYSIS
========================

Data collected: For every bet, record the source IP address.

Analysis 1: Direct IP overlap
  → Two or more accounts placing bets from the same IP within 1 hour
  → Common in household sharing (legitimate) or syndicate operation (illegitimate)
  → Threshold: 3+ accounts from same IP → flag

Analysis 2: Subnet analysis
  → Accounts from the same /24 subnet (e.g., 192.168.1.*)
  → Common in corporate/office networks or coordinated operations
  → Threshold: 5+ accounts from same subnet → flag

Analysis 3: IP timing patterns
  → Account A bets from IP X at 9:00 PM
  → Account B bets from IP X at 9:02 PM
  → Account C bets from IP X at 9:05 PM
  → Sequential use of the same IP → strong syndicate signal

Analysis 4: VPN / Proxy detection
  → Known VPN exit node IPs (maintained list)
  → Tor exit nodes
  → Commercial proxy services
  → If detected: increase scrutiny on all other signals

Betting Pattern Similarity Detection

This is the most powerful signal because it is hard to disguise:

BETTING PATTERN SIMILARITY
============================

For each pair of accounts, compute similarity across these dimensions:

1. Outcome correlation:
   → How often do both accounts bet on the same outcome?
   → Random chance for a 2-outcome market: 50%
   → If 80%+ correlation over 100+ bets: FLAG

2. Timing correlation:
   → Average time between Account A's bet and Account B's bet on the same event
   → If consistently < 5 minutes apart: FLAG

3. Market selection correlation:
   → Do both accounts bet on the same obscure markets?
   → Betting on the same IPL match: not unusual (everyone bets IPL)
   → Betting on the same Ranji Trophy match: unusual (niche market)
   → Weight correlation by market obscurity

4. Stake pattern similarity:
   → Both accounts use round-number stakes (₹10,000, ₹20,000)
   → Both accounts use the same fractional stakes (₹8,731, ₹8,731)
   → Similar stake distributions (mean, variance, skewness)

5. CLV profile similarity:
   → Both accounts have similar CLV trajectories over time
   → Both accounts started profitable at the same time
   → Both accounts' CLV curves are correlated

Composite score:
  → Weight and combine all dimensions
  → If composite score > threshold → SYNDICATE_SUSPECTED

Payment Method Overlap Detection

Signal	Severity	Example
Same bank account on 2+ user accounts	CRITICAL	Account A and Account B both linked to SBI account #12345
Same UPI ID on 2+ accounts	HIGH	Account A and B both use amit@upi
Money transfer between two user accounts' bank accounts	HIGH	Account A deposits, Account B receives from A's bank
Same phone number on 2+ accounts	HIGH	Both accounts registered with +91-98765-43210
Same email domain (non-public) on 2+ accounts	MEDIUM	amit@someprivatecorp.com and raj@someprivatecorp.com

How Flagged Clusters Are Communicated to Agents

When the system identifies a suspected syndicate, it packages the information for the agent:

SYNDICATE ALERT -- RAJESH'S DASHBOARD
=======================================

⚠ SUSPECTED SYNDICATE: CLUSTER-4821

Accounts identified: 12 of potentially 50+
Confidence: HIGH (3 of 4 detection pillars triggered)

Evidence:
  📱 Device: 8 accounts share 3 devices
  🌐 Network: 11 accounts used 2 IP addresses in the last week
  📊 Patterns: 91% outcome correlation across 234 bets
  💳 Payments: 4 accounts share 2 bank accounts

Accounts in YOUR network:
  1. Amit (user_4521)    -- 45 bets, +₹1,85,000 P&L against you
  2. Rahul (user_4588)   -- 38 bets, +₹1,42,000 P&L against you
  3. Deepak (user_4612)  -- 31 bets, +₹98,000 P&L against you
  4. Naveen (user_4687)  -- 28 bets, +₹76,000 P&L against you
  ... 8 more accounts

Combined impact on YOUR book: -₹8,45,000 over 4 weeks

Recommended actions:
  [Classify All as SHARP]    -- forwards 95% of their bets
  [Block All Accounts]       -- prevents any new bets (requires admin approval)
  [Review Individual]        -- decide per account
  [Ignore Alert]             -- acknowledge, no action (logged)

Walk-Through: Syndicate With 50 Accounts Under Rajesh

Setup: A professional betting syndicate creates 50 accounts under Rajesh over a 3-month period. Each account is registered with a different name, phone number, and email. They use a pool of 10 mobile devices and 5 residential IP addresses (via mobile hotspots at different locations).

Month 1: The syndicate operates carefully. Each account places 2-3 bets per day on different markets. Win rates are moderate (53%). Individual account P&L is unremarkable.

What the system sees after Month 1:

Pillar 1 (Device): 50 accounts using 10 devices → 5 accounts per device average
  Score: 8 clusters of related accounts identified
  
Pillar 2 (IP): 50 accounts using 5 IPs
  Score: Subnet analysis shows concentrated usage
  But: 5 IPs across 50 accounts is not extreme (could be a housing complex)

Pillar 3 (Betting): Outcome correlation at 61% (slightly above 50% random)
  Score: MODERATE -- not yet flagged, but being monitored

Pillar 4 (Payment): No payment overlap (syndicate was careful)
  Score: CLEAN

Overall assessment: MONITORING (not yet flagged)

Month 2: The syndicate becomes more aggressive. More bets, higher stakes. Their careful 50-account approach means no individual account trips any threshold. But the pattern signal strengthens.

Pillar 3 (Betting) after Month 2:
  Outcome correlation: 73% (very suspicious)
  Timing correlation: 78% of bets within 10 minutes of each other
  Market selection: 22 of 50 accounts bet on the same obscure Ranji match
  CLV: All 50 accounts have positive CLV (probability of this by chance: <0.001%)

Month 2, Week 3: The system triggers:

SYNDICATE DETECTION: CLUSTER-4821 CONFIRMED
=============================================

Detection trigger: Betting pattern similarity threshold exceeded
  → 50 accounts with 73% outcome correlation
  → 22 accounts on same obscure market
  → All 50 accounts profitable (p < 0.001)

Cross-reference with device data:
  → 8 device clusters confirmed
  → 50 accounts → 10 devices → likely 3-5 operators

Financial impact under Rajesh:
  → Combined P&L: -₹12,40,000 (Rajesh has lost ₹12.4 lakh to this cluster)
  → Individual account P&L range: -₹15,000 to -₹85,000

Alert sent to:
  1. Rajesh (with recommended actions)
  2. Vikram (upline, with summary)
  3. Platform compliance team (with full evidence package)

Rajesh's response: He classifies all 50 accounts as SHARP. With the 72-hour cooling-off period (Section 27), the classification takes effect 3 days later. Meanwhile, the platform compliance team can also apply platform-level restrictions if they deem it necessary (account suspension, reduced limits).

33. Rate Limiting on Configuration Changes

The Problem

An agent who rapidly changes their forwarding matrix creates multiple problems:

Cache invalidation storms (every change invalidates all cache tiers)
Matrix version bloat (each change creates a new immutable version)
Audit trail confusion (which version applied to which bet?)
Potential gaming (rapid changes to exploit specific bet outcomes)

Per-Agent Rate Limits

Configuration Type	Rate Limit	Queue Behavior
Matrix rule changes	1 change per 5 minutes	Queue rapid changes, apply only the most recent
User override changes	1 per user per 10 minutes	Queue, apply most recent
Market override changes	1 per market per 5 minutes	Queue, apply most recent
Agent default changes	1 per 15 minutes	Queue, apply most recent
Limit changes (sport, match, period)	1 per limit per 10 minutes	Queue, apply most recent
Panic button	No rate limit on activation; 30-minute cooling-off before deactivation	Immediate (this is a safety feature)

Queue and Apply Most Recent

When an agent makes rapid changes that exceed the rate limit:

RATE-LIMITED CONFIGURATION CHANGES
=====================================

9:30:00 PM  Rajesh changes Rule R5: forward 40% → 60%
  → APPLIED immediately (first change, no rate limit hit)

9:30:45 PM  Rajesh changes Rule R5: forward 60% → 80%
  → QUEUED (less than 5 minutes since last change)
  → Queue entry: { rule: R5, new_value: 80%, queued_at: 9:30:45 }

9:31:30 PM  Rajesh changes Rule R5: forward 80% → 95%
  → REPLACES previous queue entry (queue only keeps most recent)
  → Queue entry: { rule: R5, new_value: 95%, queued_at: 9:31:30 }

9:32:00 PM  Rajesh changes Rule R3: forward 40% → 50%
  → QUEUED separately (different rule, its own rate limit)
  → Queue entry: { rule: R3, new_value: 50%, queued_at: 9:32:00 }

9:35:00 PM  Rate limit window expires for R5
  → Queue entry for R5 is applied: forward 95%
  → The intermediate value of 80% was never applied
  → Agent is notified: "Your change to Rule R5 has been applied."

9:37:00 PM  Rate limit window expires for R3
  → Queue entry for R3 is applied: forward 50%

What the agent sees:

┌──────────────────────────────────────────────────────────────┐
│  CONFIGURATION UPDATE                                         │
│                                                               │
│  Rule R5 updated: Forward 60% (active now)                    │
│                                                               │
│  ⏳ Pending changes (will apply in ~4 minutes):               │
│     Rule R5: Forward 95%                                      │
│     Rule R3: Forward 50%                                      │
│                                                               │
│  Why the delay? Rapid configuration changes are queued to     │
│  ensure system stability. Only your most recent value will    │
│  be applied.                                                  │
│                                                               │
│  [Cancel Pending Changes]                                     │
└──────────────────────────────────────────────────────────────┘

Cache Invalidation Throttling

Even when configuration changes are rate-limited, the cache invalidation must be efficient:

CACHE INVALIDATION STRATEGY
=============================

When a configuration change is applied:

1. PostgreSQL: Write happens immediately (source of truth updated)

2. Redis: Invalidation within 100ms
   → DELETE the affected key(s)
   → Do NOT pre-populate (let the next read fill the cache)

3. Application LRU: Invalidation via pub/sub within 200ms
   → All app instances receive the invalidation message
   → Affected entries evicted from LRU cache

Throttling:
  → If more than 10 invalidations per agent per minute: batch them
  → Instead of 10 individual invalidations, one "flush all for this agent" signal
  → This prevents cache thrashing during rapid configuration periods

How This Interacts

Continuing from where I was cut off, in the middle of Section 33 (Rate Limiting on Configuration Changes), at the subsection about how rate limiting interacts with the panic button.

How Rate Limiting Interacts With the Panic Button

The panic button IS a rapid configuration change -- it sets forwarding to 100% for all sports and markets. But it is exempt from rate limiting because it is a safety feature. The design reconciles these two goals:

PANIC BUTTON VS RATE LIMITING
===============================

Panic button activation:
  → Bypasses ALL rate limits
  → Applies immediately (no queueing)
  → Invalidates all caches immediately
  → Reason: safety always trumps stability

Panic button deactivation (restoring previous settings):
  → Subject to 30-minute cooling-off period (Section 30)
  → NOT subject to the 5-minute matrix change rate limit
  → Reason: the 30-minute cooling-off is already more restrictive
    than the 5-minute rate limit

Configuration changes WHILE panic is active:
  → Queued normally under rate limits
  → Applied only AFTER panic is deactivated
  → Agent sees: "You are in panic mode. Configuration changes
    will be applied when you exit panic mode."

This means:
  1. Rajesh presses panic at 9:30 PM → immediate effect, no rate limit
  2. Rajesh tries to change Rule R5 at 9:31 PM → queued (panic is active)
  3. Rajesh deactivates panic at 10:00 PM → previous settings restored
  4. Queued Rule R5 change applies at 10:00 PM (or later per rate limit)

Rate Limit Overrides for Administrators

Platform administrators can bypass rate limits for specific agents when needed:

Override Type	Who Can Grant	Duration	Use Case
Temporary unlimited changes	Platform SUPER_ADMIN	1 hour	Agent onboarding, major event preparation
Reduced rate limit (1 min instead of 5)	Platform ADMIN	4 hours	Agent is actively tuning during a match with admin guidance
Rate limit suspension	Platform SUPER_ADMIN	30 minutes	Emergency reconfiguration

All overrides are logged in the audit trail with the admin who granted them and the reason.

34. Currency and Multi-Currency Support

The Problem

Hannibal serves agent networks across India, Southeast Asia, and Africa. Agents operate in different currencies: Indian Rupees (INR), Thai Baht (THB), Ghanaian Cedis (GHS), Nigerian Naira (NGN), Kenyan Shillings (KES). But hedges on Betfair are placed in GBP (or EUR). This creates currency risk at multiple points in the system.

Base Currency Per Agent

Every agent has a configured base currency. All their limits, exposure ledgers, and P&L are denominated in this currency:

AGENT BASE CURRENCIES
========================

Agent         Base Currency    Why
--------      -------------    ---
Rajesh        INR              Indian sub-agent, punters bet in INR
Vikram        INR              Indian master agent
Kwame         GHS              Ghanaian agent, punters bet in Cedis
Priya         INR              Indian sub-agent
Platform      USD              Platform operates in USD for cross-border accounting
Betfair       GBP              Exchange operates in GBP

Where FX Conversion Happens

FX conversion occurs at two points in the bet lifecycle:

Key design rule: FX conversion happens at the boundary between currency zones, not within them. Within the INR agent hierarchy (Rajesh -> Vikram), all calculations are in INR. FX only enters the picture when the position crosses to the platform (which operates in USD) or to Betfair (which operates in GBP).

FX Rate Capture and Audit Trail

Every FX conversion is captured with the exact rate used:

Field	Type	Description
conversion_id	UUID	Unique identifier for this conversion
bet_id	TEXT	Which bet triggered this conversion
source_currency	TEXT	e.g., GHS
target_currency	TEXT	e.g., USD
source_amount	DECIMAL	Amount in source currency
target_amount	DECIMAL	Amount in target currency
fx_rate	DECIMAL(18,8)	The rate used: 1 GHS = X USD
fx_rate_source	TEXT	Where the rate came from (e.g., "platform_rate_feed", "manual_override")
fx_rate_timestamp	TIMESTAMP	When the rate was captured
conversion_timestamp	TIMESTAMP	When the conversion was executed
spread_applied	DECIMAL	Any spread the platform applied on top of the mid-rate

FX Rate Determination

The system uses a tiered approach for FX rates:

FX RATE RESOLUTION
====================

Priority 1: Platform rate feed (real-time)
  → Updated every 60 seconds from a market data provider
  → Used for live bet processing

Priority 2: Cached rate (if feed is stale)
  → If the rate feed has not updated for > 5 minutes
  → Use the last known rate with an additional 0.5% spread (safety buffer)
  → Flag the conversion as STALE_RATE in the audit trail

Priority 3: Daily reference rate (if feed is down)
  → If the rate feed is completely unavailable
  → Use the day's opening reference rate with a 2% spread
  → Flag as FALLBACK_RATE
  → Alert operations team

For each conversion, the system also records:
  → The mid-market rate at the time
  → The spread applied by the platform
  → The effective rate (mid + spread)

FX Conversion at Hedge Execution

When a Ghanaian agent's bet reaches the platform and needs hedging on Betfair:

FX CONVERSION EXAMPLE: BET FLOW
=================================

1. Kwame's punter bets GHS 500 on Arsenal to win at 2.10
   → Kwame's cascade: retains GHS 300, forwards GHS 200 to platform

2. GHS 200 arrives at the platform
   → Platform operates in USD
   → Current rate: 1 USD = 15.8 GHS
   → Conversion: GHS 200 / 15.8 = USD 12.66
   → Spread applied: 0.3% → Platform receives USD 12.62
   → Audit: conversion_id=fx_001, rate=15.8, spread=0.3%

3. Platform decides to hedge USD 6.31 on Betfair
   → Betfair operates in GBP
   → Current rate: 1 GBP = 1.27 USD
   → Conversion: USD 6.31 / 1.27 = GBP 4.97
   → Spread applied: 0.2% → Betfair receives GBP 4.96
   → Audit: conversion_id=fx_002, rate=1.27, spread=0.2%

Total FX conversions: GHS → USD → GBP (two hops)
Total FX spread cost: ~0.5% (borne by the platform, priced into the hedge margin)

FX Risk Accounting for Hedged Positions

Between the time a bet is placed and when it settles, exchange rates can move. This creates FX risk on hedged positions:

FX RISK SCENARIO
==================

At bet placement (Monday):
  Kwame's punter bet GHS 500 at 2.10
  Platform hedged GBP 4.96 on Betfair at 2.10
  Rate at placement: 1 GBP = 20.08 GHS (via USD)

At settlement (Sunday, Arsenal won):
  Betfair pays out: GBP 4.96 * (2.10 - 1) = GBP 5.46 profit
  Rate at settlement: 1 GBP = 21.50 GHS (GHS depreciated)

Converting Betfair payout back to GHS:
  GBP 5.46 * 21.50 = GHS 117.39

But the punter is owed: GHS 500 * (2.10 - 1) = GHS 550 payout

The hedge covered:
  Platform's portion of liability: some fraction of GHS 550
  Betfair payout in GHS: GHS 117.39

The FX movement (GHS weakened) means the GBP payout converts
to MORE GHS than expected. In this case, FX movement HELPED.
If GHS had strengthened, the platform would receive LESS GHS from
the Betfair hedge than expected — creating an FX loss.

How FX Risk Is Managed

Strategy	Description	When Used
Accept the risk	Small positions. FX movement over a few days is typically < 2%. Not worth hedging.	Default for most positions
Settle quickly	Minimize the time between bet placement and settlement to reduce FX exposure.	Standard practice
FX reserve buffer	Platform maintains a reserve buffer (typically 3% of cross-currency hedged volume) to absorb FX losses.	Always active
Same-day hedging	For very large cross-currency positions, hedge the FX exposure separately (buy/sell the currency pair).	Only for positions > USD 10,000

Settlement in Multi-Currency Scenarios

At settlement, FX conversion happens in reverse:

MULTI-CURRENCY SETTLEMENT FLOW
================================

Event settles: Arsenal wins

Step 1: Betfair settles in GBP
  → Platform receives GBP profit (or pays GBP loss)

Step 2: Convert Betfair settlement to USD (platform base currency)
  → Use settlement-time FX rate (NOT the bet-placement rate)
  → Record FX gain/loss vs expected rate

Step 3: Platform settles its retained portion in USD

Step 4: Convert platform-to-agent settlement to agent's base currency
  → Kwame's upline settlement is in GHS
  → Use settlement-time FX rate
  → Record conversion in audit trail

Step 5: Agent cascade settles in agent base currency
  → Kwame's agents all settle in GHS
  → No FX needed within the GHS hierarchy

FX Audit Report

The platform generates a daily FX reconciliation report:

DAILY FX RECONCILIATION
========================
Date: 2026-02-11

Currency Pair     Volume (USD)    Avg Rate     FX Gain/Loss    Reserve Impact
-----------       -----------     ---------    -----------     --------------
GHS/USD           $12,450         15.82        -$145           -$145 from reserve
NGN/USD           $8,300          1520.50      +$89            +$89 to reserve  
KES/USD           $3,200          129.40       -$12            -$12 from reserve
THB/USD           $5,800          36.15        +$34            +$34 to reserve
USD/GBP           $18,700         0.788        -$210           -$210 from reserve

Net FX Impact:    -$244
FX Reserve:       $45,000 → $44,756 (0.5% drawdown)

Walk-Through: Ghanaian Agent in Cedis, Hedge in GBP

Setup: Kwame operates in Ghana. His base currency is GHS (Ghana Cedis). He has 150 football punters who bet on Premier League matches.

The bet: Kwame's punter Kofi bets GHS 1,000 on Chelsea to win at odds 3.20.

Step 1: Cascade in GHS (local currency)

Kofi bets GHS 1,000 at 3.20
  Potential win: GHS 2,200
  Liability: GHS 2,200

Kwame's matrix: forward 50% for Premier League pre-match
  Kwame retains: GHS 500 (liability: GHS 1,100)
  Kwame forwards: GHS 500 to platform

Step 2: FX conversion at platform boundary

GHS 500 arrives at platform
Current rate: 1 USD = 15.80 GHS (mid-market)
Platform applies 0.3% spread: effective rate = 15.85 GHS per USD

Conversion: GHS 500 / 15.85 = USD 31.55
Audit: fx_rate=15.85, source=platform_feed, spread=0.3%

Step 3: Platform routing in USD

Platform receives USD 31.55
Platform retains 50%: USD 15.78
Platform hedges 50%: USD 15.77 → Betfair

Step 4: FX conversion at Betfair boundary

USD 15.77 to hedge on Betfair
Current rate: 1 GBP = 1.27 USD (mid-market)
Platform applies 0.2% spread: effective rate = 1.2726 USD per GBP

Conversion: USD 15.77 / 1.2726 = GBP 12.39
Audit: fx_rate=1.2726, source=platform_feed, spread=0.2%

Step 5: Betfair hedge execution

Place lay bet on Betfair: Lay Chelsea to win, GBP 12.39 at 3.20
If Chelsea wins: Betfair pays GBP 12.39 * 2.20 = GBP 27.26
If Chelsea loses: Platform pays Betfair GBP 12.39 (the stake)

Step 6: Settlement (Chelsea wins)

Betfair pays: GBP 27.26 profit
Convert to USD: GBP 27.26 * 1.28 (settlement rate) = USD 34.89
  (Rate moved slightly: was 1.27, now 1.28)
  FX gain: USD 34.89 vs expected USD 34.65 = +USD 0.24

Platform P&L:
  Retained: USD 15.78 liability → Chelsea won → Platform pays USD 34.72
  Hedge recovery: USD 34.89 from Betfair
  Net platform P&L: -USD 34.72 + USD 34.89 = +USD 0.17 (near zero, as expected)

Kwame's settlement in GHS:
  Kwame's retained: GHS 500 stake, GHS 1,100 liability
  Chelsea won → Kwame pays punter GHS 1,100
  Kwame's forwarded: GHS 500 → Kwame does not bear this portion
  Kwame's P&L: GHS 500 received - GHS 1,100 paid = -GHS 600
  
  Plus: settlement from platform for forwarded portion
  Platform owes Kwame: the forwarded portion's P&L in GHS
  Convert: USD 34.72 (platform liability for forwarded) * 15.90 (settlement rate) = GHS 552.05
  FX difference: expected GHS 550, actual GHS 552.05, gain GHS 2.05

Kofi (punter) receives: GHS 1,000 stake + GHS 2,200 profit = GHS 3,200 total return
  → All in GHS, Kofi never sees any FX conversion

Key takeaway: The punter always operates in their local currency. FX conversion is invisible to them. Agents also operate in their local currency within the hierarchy. FX only affects the platform-to-exchange boundary, and the platform absorbs FX risk as a cost of doing business.

35. Cache Race Condition Fix at Limit Boundaries (CRITICAL)

The Problem in Plain English

The existing 3-tier caching design (Section 15) has a dangerous gap. Consider this scenario: Rajesh has a cricket night limit of 10 lakh. His current exposure is 9,20,000 (92% utilized). The application LRU cache has a 5-second TTL, and within that 5-second window, 10 simultaneous bets arrive from Rajesh's punters. Each bet checks the LRU cache, sees "9,20,000 used out of 10,00,000 -- 80,000 remaining," and each bet tries to retain 20,000 of liability. If all 10 proceed, Rajesh retains 2,00,000 more -- pushing him to 11,20,000 against a 10,00,000 limit. The limit is breached by 1,20,000.

This is not theoretical. During IPL matches, a popular agent like Rajesh will receive 50+ bets per minute. At 92% utilization, every bet is potentially the one that tips over the limit.

The Safety Margin Approach

The core idea: do not use the fast cache path when you are "close enough" to the limit that a race condition could cause a breach. Define a safety margin that determines when to switch from the fast path (LRU/Redis) to the slow-but-safe path (PostgreSQL with FOR UPDATE locking).

Safety margin formula:

safety_margin = max(
  fixed_minimum_margin,                          -- e.g., ₹50,000
  average_bet_liability * expected_bets_per_ttl   -- dynamic calculation
)

Where:

fixed_minimum_margin is a per-agent configurable floor (default 50,000)
average_bet_liability is the rolling average liability per bet for this agent in this scope (recalculated every 60 seconds)
expected_bets_per_ttl is the rolling average bet rate multiplied by the LRU cache TTL (5 seconds)

Example calculation for Rajesh during a busy IPL night:

Parameter	Value
Rajesh's cricket night limit	10,00,000
Average bet liability (last 60s)	8,500
Average bets per second (last 60s)	0.8
LRU cache TTL	5 seconds
Expected bets per TTL	0.8 x 5 = 4
Dynamic margin	8,500 x 4 = 34,000
Fixed minimum margin	50,000
Effective safety margin	max(50,000, 34,000) = 50,000
DB-path threshold	10,00,000 - 50,000 = 9,50,000

This means: when Rajesh's exposure reaches 9,50,000 (95% of his limit), every subsequent bet goes through the PostgreSQL FOR UPDATE path. The safety margin absorbs the worst-case race: 4 bets in flight simultaneously, each adding 8,500, totalling 34,000 -- which is within the 50,000 margin.

The Three-Path Decision Flow

Every bet follows this exact decision flow:

Path descriptions:

Path	When Used	Latency	Correctness Guarantee
FAST PATH	Exposure is below (limit - safety_margin) in any cache tier	1-5ms	Eventual consistency -- may briefly overshoot by up to safety_margin amount
DB PATH	Exposure is at or above (limit - safety_margin) in the freshest available cache	10-25ms	Strict consistency -- FOR UPDATE lock prevents any overshoot

Post-Write Validation and Rollback

Even with the safety margin, the FAST PATH can theoretically overshoot if the cache is stale by more than one TTL cycle (extremely rare, but possible during network partitions or Redis failures).

Post-write validation catches this:

After the FAST PATH writes the position and updates the ledger in PostgreSQL, it reads back the committed ledger total
If the committed total exceeds the limit, a rollback procedure fires:
- The excess amount is calculated: overshoot = committed_total - limit
- The most recently created position (the one that caused the overshoot) is reduced by the overshoot amount
- The reduced amount is forwarded to the upline as overflow
- A new overflow position is created for the upline agent
- An audit record is created noting the post-write correction
- An alert is fired (this indicates the safety margin may be too small)

POST-WRITE VALIDATION FLOW
============================

1. FAST PATH completes: position created, ledger updated
2. Read back: SELECT current_total FROM exposure_ledger WHERE agent=Rajesh AND scope=cricket_night
3. IF current_total <= limit → DONE (normal case, 99.9% of the time)
4. IF current_total > limit:
   a. overshoot = current_total - limit
   b. BEGIN TRANSACTION
   c. Reduce this bet's retained amount by overshoot
   d. Create overflow position at upline level for overshoot amount
   e. Update upline's exposure ledger
   f. Update Rajesh's exposure ledger (subtract overshoot)
   g. COMMIT
   h. Fire SAFETY_MARGIN_BREACH alert
   i. Increase safety_margin by 50% for next 60 seconds

Walk Through: 10 Simultaneous Bets on Rajesh at 78% Utilization

Setup:

Rajesh's cricket night limit: 10,00,000
Current exposure: 7,80,000 (78%)
Safety margin: 50,000
DB-path threshold: 9,50,000
10 bets arrive within 200 milliseconds, each adding approximately 25,000 liability

Step-by-step:

TIME    BET   LRU CACHE SHOWS   THRESHOLD   PATH      RESULT
======  ====  ================   =========   ========  ==================================
T+0ms   B1    ₹7,80,000         ₹9,50,000   FAST      Retain ₹25,000. New actual: ₹8,05,000
T+20ms  B2    ₹7,80,000 (stale) ₹9,50,000   FAST      Retain ₹25,000. New actual: ₹8,30,000
T+40ms  B3    ₹7,80,000 (stale) ₹9,50,000   FAST      Retain ₹25,000. New actual: ₹8,55,000
T+60ms  B4    ₹7,80,000 (stale) ₹9,50,000   FAST      Retain ₹25,000. New actual: ₹8,80,000
T+80ms  B5    ₹7,80,000 (stale) ₹9,50,000   FAST      Retain ₹25,000. New actual: ₹9,05,000
T+100ms B6    ₹7,80,000 (stale) ₹9,50,000   FAST      Retain ₹25,000. New actual: ₹9,30,000
T+120ms B7    ₹7,80,000 (stale) ₹9,50,000   FAST      Retain ₹25,000. New actual: ₹9,55,000
T+140ms B8    ₹7,80,000 (stale) ₹9,50,000   FAST      Retain ₹25,000. New actual: ₹9,80,000
T+160ms B9    ₹7,80,000 (stale) ₹9,50,000   FAST      Retain ₹25,000. New actual: ₹10,05,000 ← OVERSHOOT!
T+180ms B10   ₹7,80,000 (stale) ₹9,50,000   FAST      Retain ₹25,000. New actual: ₹10,30,000 ← OVERSHOOT!

Wait -- the LRU cache is stale for the entire 200ms burst because TTL is 5 seconds. All 10 bets see the same cached value. But at 78%, the cached value (7,80,000) is well below the threshold (9,50,000), so all 10 take the FAST PATH.

But the post-write validation catches the problem:

B9 finishes writing, reads back 10,05,000, detects overshoot of 5,000
- B9's retained amount is reduced by 5,000
- 5,000 overflows to Vikram
- Alert fires
B10 finishes writing, reads back 10,30,000, detects overshoot of 30,000
- B10's retained amount is reduced by 30,000
- 30,000 overflows to Vikram
- Alert fires

After all 10 bets complete:

Rajesh's actual exposure: exactly 10,00,000 (the limit)
Two bets had post-write corrections (B9 and B10)
Safety margin is temporarily increased by 50% (to 75,000) for the next 60 seconds
Total correction: 35,000 in overflow that was initially retained but corrected
No money lost, no limit breached after correction

Now consider if Rajesh was at 96% utilization (9,60,000) instead:

All 10 bets would see the cache showing 9,60,000, which is ABOVE the threshold of 9,50,000. All 10 go through the DB PATH with FOR UPDATE locking. They serialize. Each one reads the true current value, updates it, and the moment the limit is reached, remaining bets overflow to the upline. No corrections needed. Slower (10-25ms each, serialized) but perfectly correct.

The key insight: At 78% utilization, the worst case is a temporary overshoot of (10 bets x 25,000 = 2,50,000) which pushes exposure to 10,30,000. The post-write validation corrects this within milliseconds. The safety margin is designed so that the DB PATH kicks in before the overshoot becomes dangerously large. At 95%+ utilization, the DB PATH prevents any overshoot entirely.

36. Multi-Instance Cache Coherency (HIGH)

Why LRU Per-Instance Is Broken

When Hannibal runs multiple application instances behind a load balancer (which is required for horizontal scaling and high availability), the in-memory LRU cache on each instance diverges immediately.

INSTANCE 1                     INSTANCE 2
LRU Cache:                     LRU Cache:
  Rajesh exposure = ₹9,20,000    Rajesh exposure = ₹8,80,000
  (updated 2 seconds ago)        (updated 4 seconds ago)

REALITY (PostgreSQL):
  Rajesh exposure = ₹9,45,000

Instance 1 received recent bets for Rajesh and updated its local cache. Instance 2 has an older cached value. A bet arriving at Instance 2 sees 8,80,000 and takes the FAST PATH. But the real exposure is 9,45,000 -- possibly within the safety margin zone where it should take the DB PATH.

With N instances, each maintaining independent LRU caches with 5-second TTLs, the worst-case staleness is not 5 seconds but 5 seconds multiplied by the probability that a specific agent's bets are spread across instances. For popular agents during IPL, bets WILL be spread across all instances.

Recommended Approach: Redis as Effective Tier 1

The cleanest solution is to eliminate the per-instance LRU cache for exposure data and make Redis the first-tier cache for all exposure reads. Redis is shared across all instances, so there is no coherency problem.

What changes:

Data Type	Old Architecture	New Architecture
Exposure counters	LRU (5s) → Redis → PostgreSQL	Redis → PostgreSQL
Agent config/matrix	LRU (5min) → Redis → PostgreSQL	LRU (5min) → Redis → PostgreSQL (unchanged -- config is read-heavy, write-rare)
NO_NEW_RISK flags	Redis	Redis (unchanged)
User win cap state	Redis	Redis (unchanged)
Period boundaries	LRU (1hr)	LRU (1hr) (unchanged -- same on all instances)

Why this works: Exposure counters are the only data that is both write-heavy AND correctness-critical. By routing all exposure reads through Redis, every instance sees the same value. Redis reads are sub-millisecond (0.1-0.5ms), so the latency increase compared to the LRU cache (essentially zero latency) is negligible -- well within the 90ms budget.

Agent configuration, matrix rules, and period boundaries are safe to cache per-instance because they change rarely (admin actions, not bet flow) and a 5-second or 5-minute staleness window is acceptable. When they DO change, a Redis pub/sub notification invalidates all instance caches (see below).

Config Cache Invalidation via Pub/Sub

For configuration data that IS cached per-instance (matrix rules, agent limits, period configs), changes must propagate to all instances:

The pub/sub message format:

Field	Description	Example
`type`	What changed	`MATRIX_UPDATE`, `LIMIT_UPDATE`, `PERIOD_UPDATE`, `USER_OVERRIDE`
`agent_id`	Which agent	`rajesh_mumbai`
`scope`	Which scope (if applicable)	`cricket`, `mi_vs_csk_2026_03_15`
`timestamp`	When the change was made	`2026-03-15T21:34:12.456Z`
`version`	New config version number	`47`

Each instance subscribes to the config.invalidate channel on startup. When a message arrives, the instance evicts the specified entries from its LRU cache. The next request for that data causes a cache miss, which fetches the fresh value from Redis or PostgreSQL.

How This Interacts with the Safety Margin (Section 35)

With Redis as the effective Tier 1 for exposure data, the safety margin calculation from Section 35 becomes more accurate:

Redis is updated after every DB write (within the same request lifecycle)
The maximum staleness of a Redis exposure value is the time between one bet's DB write completing and the next bet's Redis read -- typically 1-5ms, not 5 seconds
This means the safety margin can be SMALLER, because the "expected bets per TTL" is now "expected bets per 5ms" instead of "expected bets per 5 seconds"

Revised safety margin with Redis as Tier 1:

Parameter	Old (LRU Tier 1)	New (Redis Tier 1)
Effective TTL for exposure	5,000ms	~5ms
Expected bets per TTL (Rajesh at 0.8/sec)	4	0.004
Dynamic margin	8,500 x 4 = 34,000	8,500 x 0.004 = 34
Effective safety margin	max(50,000, 34,000) = 50,000	max(50,000, 34) = 50,000

The fixed minimum margin of 50,000 dominates in both cases, but the key insight is that with Redis as Tier 1, the FAST PATH is safe for a much wider range. The probability of a race condition breaching the safety margin drops from "possible during normal operation" to "essentially impossible unless Redis itself is partitioned."

Deployment Topology

Redis Failure Mode

If Redis becomes unavailable, the system falls back to PostgreSQL for ALL exposure reads. This increases latency (from <1ms to 5-15ms per read) but maintains correctness. The circuit breaker pattern detects Redis unavailability within 3 failed requests and switches all instances to DB-direct mode. When Redis recovers, instances resume using it after a health check confirms 3 consecutive successful reads.

37. PostgreSQL Scaling Strategy (HIGH)

Projected Data Volumes for First IPL Season

An IPL season runs approximately 60 days with 70+ matches. Here are the projected volumes:

Table	Rows per Day (Normal)	Rows per Day (IPL Peak)	Total After First Season	Row Size (avg)	Total Size
bets	50,000	3,00,000	90,00,000	500 bytes	~4.5 GB
positions	1,50,000	9,00,000	2,70,00,000	400 bytes	~10.8 GB
exposure_ledger	5,000 (updates, not new rows)	20,000	50,000 (rows, updated in place)	200 bytes	~10 MB
audit_trail	50,000	3,00,000	90,00,000	2,000 bytes	~18 GB
settlements	1,50,000	9,00,000	2,70,00,000	300 bytes	~8.1 GB
forwarding_matrix_rules	Rare writes	Rare writes	~50,000	300 bytes	~15 MB
TOTAL					~41.4 GB

The database is not enormous by modern standards, but the write contention during peak hours is the real challenge. During the IPL final, the positions table could see 150 inserts per second, and the exposure_ledger table could see 500 updates per second (because each bet updates multiple agents' ledgers).

Partitioning Strategy

Primary partition key: time (monthly range partitioning)

This is the most effective strategy because:

Most queries are time-bounded (today's bets, this week's settlements, last month's audit trail)
Old partitions become read-only and can be moved to cheaper storage
Partition pruning eliminates scanning old data for real-time queries
Individual partitions stay small enough for efficient indexing

bets table partitions:
  bets_2026_01  (January 2026)
  bets_2026_02  (February 2026)
  bets_2026_03  (March 2026 - IPL starts)
  bets_2026_04  (April 2026 - IPL peak)
  bets_2026_05  (May 2026 - IPL ends)
  ...

positions table partitions:
  positions_2026_01
  positions_2026_02
  ...

audit_trail table partitions:
  audit_trail_2026_01
  audit_trail_2026_02
  ...

Secondary partition consideration: by agent (for very large agents)

If a single agent like Vikram (with 12 sub-agents and thousands of punters) generates disproportionate volume, the positions table can be further sub-partitioned by agent_id using hash partitioning. This is only needed if a single monthly partition exceeds 10 GB for the positions table, which is unlikely in the first season but should be planned for.

Separate Write-Optimized Store for Audit Records

The audit_trail table is append-only and write-heavy. It should be separated from the transactional tables:

Characteristic	Transactional Tables (bets, positions, exposure_ledger)	Audit Store (audit_trail)
Write pattern	Insert + Update	Append-only
Read pattern	Point lookups, range scans by time/agent	Full-record retrieval by bet_id, range scans for disputes
Consistency requirement	Strong (part of bet transaction)	Eventual (can lag by up to 500ms)
Index requirements	Heavy (multiple indexes for lookups)	Light (bet_id primary, agent_id + time for scanning)

Implementation: Audit records are buffered in an in-memory queue and flushed to a separate PostgreSQL schema (or a separate database instance if load warrants it) every 500ms. The audit write is NOT part of the bet placement transaction. If the audit flush fails, records are persisted to a local WAL (write-ahead log) file and retried.

The separate audit store uses:

autovacuum_vacuum_cost_delay = 0 (aggressive vacuuming for append-only workload)
fillfactor = 100 (no space reserved for updates, since rows are never updated)
Minimal indexes: only bet_id (primary), and a composite on (agent_id, created_at)

Read Replicas for Dashboard Queries

WRITE PATH (bet processing):
  App Instance → PostgreSQL Primary (positions, ledgers, bets)

READ PATH (dashboards, reports):
  App Instance → PostgreSQL Read Replica 1 (real-time dashboards, exposure summary)
  Reporting Service → PostgreSQL Read Replica 2 (daily P&L, weekly settlement, analytics)
  Support Dashboard → PostgreSQL Read Replica 2 (dispute resolution, audit trail queries)

Replication lag tolerance:

Dashboard queries: 1 second lag is acceptable (dashboard refreshes every 2-5 seconds anyway)
Settlement queries: zero lag required (use primary)
Reporting queries: 30 second lag is acceptable

Connection Pool Management

Pool	Max Connections	Target	Purpose
`bet_processing`	20 per instance x 3 instances = 60	PostgreSQL Primary	Bet placement, exposure updates, position creation
`settlement`	10 per instance x 1 instance = 10	PostgreSQL Primary	Settlement processing (batch, lower concurrency)
`dashboard_read`	15 per instance x 3 instances = 45	Read Replica 1	Agent dashboards, real-time queries
`reporting_read`	10 per instance x 1 instance = 10	Read Replica 2	Reports, analytics, support tools
`audit_write`	5 per instance x 3 instances = 15	Audit DB	Audit trail flushing

Total connections to Primary: 70 (well within PostgreSQL's default max_connections of 100, with headroom for admin connections)

PgBouncer recommendation: Place PgBouncer in front of PostgreSQL Primary in transaction pooling mode. This allows the application to open more logical connections than physical database connections, which is critical during traffic spikes.

When to Consider Event Sourcing for Audit Trail

Event sourcing (storing every state change as an immutable event rather than mutating rows) is already partially described in Section 11 for configuration changes. For the full audit trail, event sourcing should be considered when:

The replay capability in Section 11 is used more than 10 times per week -- this indicates frequent disputes or compliance reviews, making a native event-sourced store more efficient than reconstructing state from audit records
Audit trail queries become a performance bottleneck on the main database -- event-sourced stores (like EventStoreDB or a Kafka topic with compaction) are optimized for append and sequential read
Regulatory requirements mandate immutable, tamper-proof audit trails -- an event store with cryptographic chaining provides stronger guarantees than a mutable PostgreSQL table

For the first IPL season: Use the PostgreSQL-based audit trail with append-only semantics and monthly partitioning. This is simpler to operate, easier to query, and sufficient for the projected volumes. Revisit event sourcing before the second season based on actual usage patterns.

38. Atomic Transaction Scaling (HIGH)

The Contention Problem

Every bet updates exposure ledgers for multiple agents atomically. Amit's bet touches Rajesh's ledger, Vikram's ledger, and the Platform's ledger -- all within a single PostgreSQL transaction. If another punter under Rajesh places a bet simultaneously, both transactions compete for a lock on Rajesh's ledger row.

With sharded counters (Section 15), the contention is reduced by N (where N is the shard count). But the cross-agent atomicity requirement means the transaction must lock shards across multiple agents, which increases the lock duration and deadlock risk.

Contention Analysis: Vikram with 12 Sub-Agents at 5 Bets/Sec Each

VIKRAM'S TRAFFIC PROFILE
==========================
Sub-agents: 12
Bets per second per sub-agent: 5
Total bets per second touching Vikram's ledger: 60

Each bet's transaction:
  1. Lock sub-agent's exposure shard: ~2ms
  2. Lock Vikram's exposure shard: ~2ms
  3. Lock Platform's exposure shard: ~2ms
  4. Write positions (3 rows): ~5ms
  5. Commit: ~3ms
  Total lock duration: ~14ms

With 60 bets/sec and 14ms lock duration:
  Probability of contention on SAME Vikram shard:
    60 bets/sec * 14ms per bet = 0.84
    (84% of the time, at least one other bet is holding a Vikram shard lock)

With 8 Vikram shards:
    60/8 = 7.5 bets/sec per shard
    7.5 * 14ms = 0.105
    (10.5% contention rate per shard -- acceptable)

The Tiered Atomicity Model

Not all agents need the same level of atomicity. The system uses three tiers:

Tier 1: Per-Level Atomicity (for hot agents like Vikram)

Instead of one cross-agent transaction, each level's ledger update is independent:

BET PROCESSING FOR HOT AGENTS
================================

Step 1: Process at Rajesh's level
  BEGIN TRANSACTION
    Lock Rajesh's exposure shard (random shard)
    Check Rajesh's limit
    Calculate Rajesh's retention vs forwarding
    Write Rajesh's position
    Update Rajesh's exposure shard
  COMMIT
  → Output: ₹4,000 forwarded to Vikram

Step 2: Process at Vikram's level (separate transaction)
  BEGIN TRANSACTION
    Lock Vikram's exposure shard (random shard)
    Check Vikram's limit
    Calculate Vikram's retention vs forwarding
    Write Vikram's position
    Update Vikram's exposure shard
  COMMIT
  → Output: ₹1,600 forwarded to Platform

Step 3: Process at Platform level (separate transaction)
  BEGIN TRANSACTION
    Lock Platform's exposure shard (random shard)
    Write Platform's position
    Update Platform's exposure shard
    Queue hedge order
  COMMIT

What happens if Step 2 fails after Step 1 succeeds?

Rajesh's position is created but Vikram's is not. The system enters a partial routing state. This is handled by:

A routing_status field on the bet: PARTIAL (not all levels processed)
A background retry job picks up partial bets within 1 second
The retry job completes the remaining levels
If retry also fails 3 times, the bet enters the dead letter queue
The dead letter queue handler can either complete the routing or reverse Rajesh's position

The key insight: a partial routing state is not dangerous. Rajesh has already retained his portion. The forwarded amount simply has not been allocated to Vikram yet. The exposure is "in transit" -- it is neither overcounted nor undercounted at the system level, because the total stake is still ₹10,000.

Tier 2: Cross-Level Atomicity (for normal agents)

For agents with moderate traffic (under 10 bets/sec touching their ledger), the original single-transaction approach works fine. One transaction locks all relevant shards across all levels, writes all positions, and commits atomically.

Tier 3: Eventually Consistent (for the Platform level)

The Platform is the final level in every cascade. It receives the most traffic (every bet eventually reaches the Platform). The Platform's ledger can be updated asynchronously:

The bet processing pipeline creates positions for all agent levels synchronously
The Platform's exposure ledger is updated via an async counter increment in Redis
A background job reconciles the Redis counter with the PostgreSQL ledger every 5 seconds
The Platform's hedge queue reads from the Redis counter for real-time decisions

This works because the Platform has the deepest pockets and the highest limits. A 5-second lag in the Platform's exposure ledger does not create meaningful risk. The Platform's limits are set with sufficient headroom to absorb any lag.

Agent Classification for Atomicity Tier

Agent Characteristic	Atomicity Tier	Shard Count	Reasoning
Top-level agent with 10+ sub-agents	Tier 1 (per-level)	16 shards	Highest contention, needs maximum parallelism
Mid-level agent with 3-9 sub-agents	Tier 2 (cross-level)	8 shards	Moderate contention, single transaction still viable
Leaf agent with direct punters only	Tier 2 (cross-level)	4 shards	Low contention, simplest approach
Platform	Tier 3 (eventual)	32 shards	Highest throughput, can tolerate lag

Optimized Locking Strategy for Vikram

Deadlock prevention: By always processing levels in order (Level 1 first, then Level 2, then Level 3, etc.), and by using per-level transactions for hot agents, deadlocks are structurally impossible. Two bets from different sub-agents under Vikram might contend on the same Vikram shard, but they will never hold conflicting locks across levels because each level is a separate transaction.

39. Audit Trail Storage Architecture (MEDIUM)

Hot/Warm/Cold Storage Tiers

Tier	Age of Data	Storage	Indexes	Query Latency Target	Cost
HOT	0-7 days	PostgreSQL Primary (same DB, audit schema)	Full indexes on bet_id, agent_id, user_id, created_at, event_id	P99 < 50ms	Highest
WARM	7-90 days	PostgreSQL (separate tablespace on slower disk, or read replica)	Reduced indexes: bet_id, agent_id + created_at composite only	P99 < 500ms	Medium
COLD	90+ days	Compressed Parquet files on object storage (S3-compatible or local NAS)	External index in PostgreSQL (bet_id → file + offset mapping)	P99 < 5 seconds	Lowest

Retention Policies

Data	Hot Retention	Warm Retention	Cold Retention	Total Retention
Audit trail records	7 days	90 days	3 years	3 years
Bet records	30 days	1 year	5 years	5 years
Position records	30 days	1 year	5 years	5 years
Settlement records	30 days	1 year	7 years	7 years (regulatory)
Exposure ledger snapshots	7 days (hourly snapshots)	90 days (daily snapshots)	1 year	1 year

The Append-Only Audit Store

The audit trail uses an append-only design. Records are never updated or deleted in place. Corrections or amendments are stored as new records that reference the original.

AUDIT RECORD STRUCTURE
========================

Each record contains:
  - record_id (UUID, primary key)
  - bet_id (UUID, indexed)
  - record_type: BET_PLACED, BET_SETTLED, POSITION_CORRECTED, CONFIG_CHANGED
  - agent_id (indexed)
  - user_id (indexed)
  - event_id (indexed -- for looking up all bets on a specific match)
  - created_at (indexed, partition key)
  - payload (JSONB -- the full structured audit data)
  - checksum (SHA-256 hash of the payload, for tamper detection)
  - previous_checksum (hash of the previous record for this bet_id, creating a chain)

The previous_checksum field creates a hash chain similar to a blockchain. Each new audit record for a given bet references the checksum of the previous record. This makes tampering detectable: if any record in the chain is modified, all subsequent checksums become invalid.

Indexing Strategy

Hot tier indexes (full):

Index	Columns	Purpose
Primary key	`record_id`	Unique lookup
bet_lookup	`bet_id`	Find all audit records for a specific bet
agent_time	`agent_id, created_at DESC`	Agent's recent activity, dispute resolution
user_time	`user_id, created_at DESC`	User's betting history
event_lookup	`event_id, created_at DESC`	All bets on a specific match
type_time	`record_type, created_at DESC`	Find all settlements, all corrections, etc.

Warm tier indexes (reduced):

Index	Columns	Purpose
Primary key	`record_id`	Unique lookup
bet_lookup	`bet_id`	Dispute resolution (most common warm-tier query)
agent_time	`agent_id, created_at DESC`	Historical agent queries

Cold tier indexes (external):

A separate mapping table in PostgreSQL:

Column	Type	Description
bet_id	UUID	The bet to look up
file_path	TEXT	Path to the Parquet file in object storage
row_offset	INTEGER	Row position within the file
month	DATE	Month partition of the cold data

Query Performance for Dispute Resolution

Common dispute queries and their targets:

Query	Expected Latency	Tier	How
"Show me bet XYZ's full audit trail"	P99 < 50ms	Hot (if recent)	Index lookup on bet_id
"Show me all of Rajesh's bets last night"	P99 < 200ms	Hot	Range scan on agent_id + created_at
"Show me all bets on MI vs CSK final"	P99 < 500ms	Hot	Range scan on event_id
"Show me bet XYZ from 2 months ago"	P99 < 500ms	Warm	Index lookup on bet_id in warm partition
"Show me bet XYZ from last year"	P99 < 5s	Cold	Lookup mapping table, fetch from object storage

Storage Cost Projections

Tier	Volume After Year 1	Storage Cost (approximate, cloud)	Total Annual
Hot (SSD, indexed)	18 GB (7 days of audit)	$0.25/GB/month	$54/year
Warm (HDD, partial indexes)	150 GB (83 days of audit)	$0.05/GB/month	$90/year
Cold (object storage, compressed)	50 GB (compressed from ~200 GB)	$0.01/GB/month	$6/year
Total			~$150/year

Storage costs are negligible. The real cost is in compute (query processing) and IOPS (index maintenance). The tiered approach ensures that the expensive hot tier stays small while the cheap cold tier absorbs the bulk.

Tier Migration Job

A nightly job moves records between tiers:

NIGHTLY TIER MIGRATION JOB (runs at 4:00 AM IST)
==================================================

1. HOT → WARM migration:
   - SELECT records WHERE created_at < NOW() - INTERVAL '7 days'
   - INSERT into warm partition
   - DELETE from hot partition
   - Rebuild hot tier indexes (REINDEX CONCURRENTLY)
   - Expected duration: 2-5 minutes

2. WARM → COLD migration:
   - SELECT records WHERE created_at < NOW() - INTERVAL '90 days'
   - Export to Parquet file (one file per day, compressed)
   - Upload to object storage
   - INSERT mapping rows into cold_index table
   - DELETE from warm partition
   - Expected duration: 5-15 minutes

3. COLD expiry:
   - DELETE mapping rows WHERE month < NOW() - INTERVAL '3 years'
   - Delete corresponding Parquet files from object storage
   - Expected duration: < 1 minute

40. Horizontal Scaling for the Cascade Engine (MEDIUM)

Partitioning by Top-Level Agent Subtree

The cascade engine processes bets through agent hierarchies. The natural partition boundary is the top-level agent subtree. All agents and punters under Vikram form one subtree; all agents and punters under another master agent form a separate subtree.

Why this partitioning works:

A bet from Amit (under Rajesh, under Vikram) ONLY touches Rajesh's and Vikram's ledgers at the agent level. It never touches Suresh's or Kumar's data. So Partition A can process independently of Partition B.
The Platform level is the convergence point, but it uses the eventually-consistent model from Section 38 (Tier 3 atomicity), so it does not create cross-partition locking.
Each partition can run on a separate application instance or thread pool, with its own Redis key space for exposure counters.

How Cross-Agent Detection Works Across Partitions

The syndicate detection problem from Section 13 requires cross-agent visibility. If a syndicate member bets through Rajesh (Partition A) and also through Arun (Partition B), neither partition alone can detect the correlation.

Solution: a separate detection service that reads from all partitions.

Each partition publishes a lightweight bet event to a shared Redis Stream after completing the bet. The event contains: user_id, agent_id, event_id, outcome, stake, timestamp. The cross-agent detection service consumes this stream and maintains a sliding window of recent bets, looking for correlation patterns.

The detection service does NOT participate in the bet processing pipeline. It runs asynchronously. If it detects a syndicate pattern, it publishes a flag that the relevant partitions pick up on their next bet from the flagged user.

Load Balancing Strategy

Strategy	How It Works	When to Use
Agent-affinity routing	All bets for a given top-level subtree go to the same instance	Default strategy. Load balancer uses a consistent hash of the top-level agent_id
Overflow routing	If the assigned instance is overloaded (queue depth > threshold), bets overflow to any available instance	During traffic spikes on a single subtree (e.g., Vikram's agents during IPL final)
Hot-agent splitting	A single subtree is split across 2+ instances, with sub-agents assigned to different instances	When a single master agent's traffic exceeds one instance's capacity

The load balancer (nginx or application-level) maintains a routing table:

ROUTING TABLE
===============
Vikram subtree    → Instance 1 (primary), Instance 2 (overflow)
Suresh subtree    → Instance 2 (primary), Instance 3 (overflow)
Kumar subtree     → Instance 3 (primary), Instance 1 (overflow)
Platform cascade  → All instances (round-robin, eventually consistent)

Handling Agent Hierarchy Changes That Cross Partitions

When an agent moves from one master agent to another (e.g., Rajesh leaves Vikram and joins Suresh), the partition assignment changes:

Admin initiates the transfer via API
System sets Rajesh's status to TRANSFERRING -- no new bets accepted for Rajesh's punters (brief pause, typically 2-5 seconds)
All in-flight bets for Rajesh's punters complete on the old partition
Rajesh's exposure ledger state is frozen and serialized
Routing table is updated: Rajesh's punters now route to Suresh's partition
Rajesh's exposure state is loaded into the new partition
Rajesh's status is set to ACTIVE on the new partition
New bets resume

The transfer window (2-5 seconds of paused betting) is acceptable because hierarchy changes are rare admin operations, not real-time events. During the transfer, punters see "placing bet..." for a few extra seconds rather than an error.

Deployment Diagram

                          ┌─────────────────────┐
                          │   Load Balancer      │
                          │   (Agent-Affinity    │
                          │    Consistent Hash)  │
                          └─────────┬────────────┘
                                    │
            ┌───────────────────────┼───────────────────────┐
            │                       │                       │
  ┌─────────▼─────────┐  ┌─────────▼─────────┐  ┌─────────▼─────────┐
  │ Instance 1         │  │ Instance 2         │  │ Instance 3         │
  │ Vikram subtree     │  │ Suresh subtree     │  │ Kumar subtree      │
  │ + overflow for     │  │ + overflow for     │  │ + overflow for     │
  │   Kumar            │  │   Vikram           │  │   Suresh           │
  │                    │  │                    │  │                    │
  │ Cascade Engine     │  │ Cascade Engine     │  │ Cascade Engine     │
  │ Matrix Resolver    │  │ Matrix Resolver    │  │ Matrix Resolver    │
  │ Limit Checker      │  │ Limit Checker      │  │ Limit Checker      │
  └────────┬───────────┘  └────────┬───────────┘  └────────┬───────────┘
           │                       │                       │
           └───────────────────────┼───────────────────────┘
                                   │
                    ┌──────────────┼──────────────┐
                    │              │              │
            ┌───────▼──┐   ┌──────▼───┐   ┌─────▼────┐
            │  Redis    │   │ PG       │   │ Cross-   │
            │  Cluster  │   │ Primary  │   │ Agent    │
            │  (shared) │   │ + Replicas│  │ Detector │
            └──────────┘   └──────────┘   └──────────┘

41. Monitoring and Alerting System (MEDIUM)

Key Metrics per Pipeline Stage

Bet Processing Pipeline:

Stage	Metric	Collection Method	Alert Threshold
Request ingestion	`bet.requests_per_second`	Counter, per instance	> 200/sec (approaching capacity)
Request ingestion	`bet.request_parse_error_rate`	Counter	> 1% of requests
Validation	`bet.validation_failure_rate`	Counter	> 10% (potential attack)
User win cap check	`bet.win_cap_latency_p99`	Histogram	> 15ms
Stake reduction	`bet.stake_reduction_rate`	Counter	> 20% of bets (limits may be too low)
Matrix resolution	`bet.matrix_resolve_latency_p99`	Histogram	> 20ms
Matrix resolution	`bet.matrix_cache_miss_rate`	Counter	> 30% (cache issue)
Agent cap check	`bet.cap_check_latency_p99`	Histogram	> 30ms per level
Exposure ledger	`bet.exposure_update_latency_p99`	Histogram	> 25ms
Position creation	`bet.position_write_latency_p99`	Histogram	> 20ms
Audit write	`bet.audit_write_latency_p99`	Histogram	> 15ms
End-to-end	`bet.total_latency_p99`	Histogram	> 90ms (SLA breach)
End-to-end	`bet.total_latency_p50`	Histogram	> 40ms (performance degradation)
End-to-end	`bet.success_rate`	Counter	< 99.5%

Hedge Execution Pipeline:

Metric	Collection Method	Alert Threshold
`hedge.queue_depth`	Gauge	> 50 orders (backlog building)
`hedge.execution_latency_p99`	Histogram	> 2 seconds
`hedge.betfair_api_latency_p99`	Histogram	> 1 second
`hedge.partial_fill_rate`	Counter	> 40% (liquidity problem)
`hedge.unhedged_exposure_total`	Gauge	> 10 lakh (risk accumulation)
`hedge.betfair_error_rate`	Counter	> 5% (API degradation)
`hedge.slippage_average`	Gauge	> 0.05 (pricing problem)

Settlement Pipeline:

Metric	Collection Method	Alert Threshold
`settlement.latency_p99`	Histogram	> 30 seconds per event
`settlement.failure_rate`	Counter	> 0.1% (any settlement failure is serious)
`settlement.idempotency_collision_rate`	Counter	> 0 (should be zero in normal operation)
`settlement.reconciliation_drift`	Gauge	> ₹1,000 (ledger mismatch)

Infrastructure Metrics:

Metric	Alert Threshold
`redis.latency_p99`	> 5ms
`redis.memory_usage_percent`	> 80%
`redis.connection_pool_exhaustion`	> 90%
`postgres.active_connections`	> 80% of max
`postgres.replication_lag_seconds`	> 5 seconds
`postgres.lock_wait_time_p99`	> 100ms
`postgres.dead_tuples_ratio`	> 20% (vacuum falling behind)

Alert Thresholds and Escalation Paths

ESCALATION MATRIX
==================

P1 - CRITICAL (immediate response required)
  Who:    On-call engineer (PagerDuty) + Engineering lead
  When:   24/7
  SLA:    Acknowledge within 5 minutes, resolve within 30 minutes
  Examples:
    - bet.total_latency_p99 > 200ms for 2 minutes
    - bet.success_rate < 95% for 1 minute
    - settlement.failure_rate > 1% for any settlement batch
    - hedge.unhedged_exposure_total > 50 lakh
    - postgres primary down or unreachable
    - redis primary down or unreachable

P2 - HIGH (response within 1 hour)
  Who:    On-call engineer (Slack + PagerDuty)
  When:   Business hours + match hours
  SLA:    Acknowledge within 15 minutes, resolve within 2 hours
  Examples:
    - bet.total_latency_p99 > 90ms for 5 minutes
    - bet.matrix_cache_miss_rate > 50% for 5 minutes
    - hedge.betfair_api_latency_p99 > 2 seconds for 5 minutes
    - settlement.reconciliation_drift > ₹10,000
    - postgres.replication_lag_seconds > 30

P3 - MEDIUM (next business day)
  Who:    Engineering team (Slack channel)
  When:   Business hours
  SLA:    Acknowledge within 4 hours, resolve within 24 hours
  Examples:
    - bet.stake_reduction_rate > 30% for 1 hour
    - redis.memory_usage_percent > 80%
    - postgres.dead_tuples_ratio > 20%
    - audit.write_lag > 5 seconds

P4 - LOW (weekly review)
  Who:    Engineering team (weekly metrics review)
  Examples:
    - bet.total_latency_p50 trending upward over 7 days
    - hedge.slippage_average trending upward over 7 days
    - storage utilization approaching 70%

Dashboard Design for Ops Team

Dashboard 1: Real-Time Operations (primary display during matches)

┌─────────────────────────────────────────────────────────────────────────┐
│  HANNIBAL OPS DASHBOARD                              2026-03-15 21:47  │
├─────────────────────────┬───────────────────────────────────────────────┤
│  SYSTEM HEALTH          │  BET THROUGHPUT (last 5 min)                 │
│                         │                                               │
│  Bet Pipeline:  🟢 OK    │  ████████████████████░░  167/sec             │
│  Hedge Engine:  🟢 OK    │  Peak today:  203/sec (21:32)               │
│  Settlement:    🟢 OK    │  P99 latency: 72ms                          │
│  Redis:         🟢 OK    │  Success:     99.92%                         │
│  PostgreSQL:    🟢 OK    │                                               │
│  Betfair API:   🟡 SLOW  │  Error breakdown:                            │
│                         │    Validation: 12/min                         │
│                         │    Timeout:    0/min                           │
│                         │    DB Error:   0/min                           │
├─────────────────────────┼───────────────────────────────────────────────┤
│  EXPOSURE BY AGENT      │  HEDGE STATUS                                 │
│  (top 10 by %)          │                                               │
│                         │  Queue depth:    3 orders                     │
│  1. Rajesh   76% ████░  │  Unhedged total: ₹2.1L                      │
│  2. Vikram   54% ███░░  │  Betfair latency: 850ms ⚠                   │
│  3. Priya    41% ██░░░  │  Fill rate:       94%                         │
│  4. Sanjay   38% ██░░░  │  Avg slippage:    0.02                       │
│  5. Arun     22% █░░░░  │                                               │
│                         │  Last 10 hedges:                               │
│  NO_NEW_RISK active: 0  │    21:45 MI 1.85 → filled 1.86 ✓            │
│                         │    21:44 CSK 2.10 → filled 2.10 ✓            │
│                         │    21:43 Draw 3.50 → partial 60% ⚠           │
├─────────────────────────┴───────────────────────────────────────────────┤
│  ACTIVE ALERTS                                                          │
│                                                                         │
│  ⚠ 21:45  Betfair API latency elevated (850ms, threshold 500ms)       │
│           Status: Auto-monitoring, no action needed yet                 │
│                                                                         │
│  ✓ 21:30  Rajesh night limit at 84% - INFO (auto-resolved)            │
│                                                                         │
│  [View All Alerts]  [Silence Non-Critical]  [Run Health Check]         │
└─────────────────────────────────────────────────────────────────────────┘

Dashboard 2: Reconciliation & Financial (daily review)

Dashboard 3: Agent Health (support team view)

Specific Alert Definitions

Cache miss rate spike:

Field	Value
Alert name	`cache_miss_rate_spike`
Metric	`bet.matrix_cache_miss_rate` OR `bet.exposure_cache_miss_rate`
Condition	> 50% for 3 consecutive minutes
Severity	P2
Probable cause	Redis memory pressure, network partition, config invalidation storm
Auto-mitigation	None (requires investigation)
Run book action	Check Redis memory, check recent config change frequency, verify pub/sub connectivity

Betfair API latency:

Field	Value
Alert name	`betfair_api_degraded`
Metric	`hedge.betfair_api_latency_p99`
Condition	> 1 second for 2 consecutive minutes
Severity	P2 (escalate to P1 if > 5 seconds for 5 minutes)
Probable cause	Betfair infrastructure issue, network routing, API rate limiting
Auto-mitigation	Increase hedge retry delay, reduce concurrent hedge requests
Run book action	Check Betfair status page, check outbound network, verify API key validity

Exposure ledger drift:

Field	Value
Alert name	`exposure_ledger_drift`
Metric	`settlement.reconciliation_drift`
Condition	> ₹1,000 for any agent
Severity	P1 (financial accuracy issue)
Probable cause	Race condition in ledger update, missed settlement, partial transaction commit
Auto-mitigation	Trigger immediate recompute for the affected agent
Run book action	Run manual recompute, compare with position sum, identify the divergent bet

Settlement failure rate:

Field	Value
Alert name	`settlement_failure_elevated`
Metric	`settlement.failure_rate`
Condition	> 0.1% for any settlement batch
Severity	P1
Probable cause	DB connection exhaustion, data inconsistency, event result ambiguity
Auto-mitigation	Retry failed settlements 3 times with exponential backoff
Run book action	Check DB connections, inspect failed settlement IDs, verify event results

Run Book Topics

Topic	What to Do
Redis primary down	System auto-falls back to PostgreSQL for all reads. Monitor bet latency (will increase to 10-20ms from <1ms). Restart Redis. After restart, run `redis-warmup` script to repopulate exposure counters from PostgreSQL.
PostgreSQL primary down	Critical outage. All bet placement fails. Switch to read replica as emergency primary (manual failover). Accept data loss risk for last few seconds of unreplicated writes. After recovery, reconcile.
Bet latency spike (P99 > 200ms)	Check PostgreSQL lock wait times. If elevated, identify the hot agent (likely the one with the most bets/sec) and increase their shard count temporarily. Check Redis latency. Check for long-running queries on the primary.
Betfair completely unreachable	Hedge queue will grow. Platform absorbs all hedge-intended risk. Monitor `hedge.unhedged_exposure_total`. If it exceeds 50 lakh, consider temporarily increasing all agents' forward percentages to reduce platform risk.
Single agent's exposure ledger diverges from position sum	Run `reconciliation recompute --agent=AGENT_ID`. Compare the recomputed total with the current ledger value. If they differ, the ledger is stale. Update the ledger to match the position sum. Investigate the root cause (check for missed settlement, partial commit).
Configuration change not propagating	Check Redis pub/sub channel. Verify all instances are subscribed. Manually trigger a cache flush on all instances via admin API endpoint `/admin/cache/flush?agent_id=X`.
Surge of stake reductions	Indicates user win limits are being hit frequently. Check if a single user is hammering the system (potential abuse). Check if limits were accidentally lowered. Review the agent's win cap configuration.

42. Reconciliation System (HIGH)

What Is Reconciled

The reconciliation system verifies that the exposure ledger (the fast-access counter that the bet processing pipeline reads) matches the actual sum of open positions. These are two independent sources of the same truth, and they can drift apart due to bugs, partial commits, or race conditions.

Reconciliation Check	Source A (Expected)	Source B (Actual)	Acceptable Drift
Agent retained exposure	`exposure_ledger.retained_open_liability`	SUM of `positions.liability` WHERE agent=X AND status=OPEN AND type=RETAINED	₹0 (zero tolerance)
Agent forwarded exposure	`exposure_ledger.forwarded_open_liability`	SUM of `positions.liability` WHERE agent=X AND status=OPEN AND type=FORWARDED	₹0 (zero tolerance)
Agent potential win	`exposure_ledger.open_potential_win`	SUM of `positions.potential_win` WHERE agent=X AND status=OPEN	₹0 (zero tolerance)
Stake conservation	Original bet stake	SUM of all positions for that bet (retained + forwarded across all levels)	₹0 (absolute conservation)
Settlement completeness	Count of positions for settled event	Count of settlement records for that event	0 (all positions must be settled)

The Reconciliation Job Workflow

How Discrepancies Are Flagged and Categorized

Each discrepancy record contains:

Field	Description	Example
discrepancy_id	Unique identifier	`disc_a1b2c3`
agent_id	Affected agent	`rajesh_mumbai`
scope	Which scope diverged	`cricket_night_2026_03_15`
ledger_value	What the exposure ledger says	₹15,00,000
computed_value	What the position sum says	₹13,50,000
drift_amount	The difference	₹1,50,000
drift_direction	`LEDGER_HIGH` or `LEDGER_LOW`	`LEDGER_HIGH`
detected_at	When it was found	`2026-03-15T22:00:00Z`
detection_method	Which reconciliation job found it	`SCHEDULED_15MIN`
category	`MINOR`, `MAJOR`, `CRITICAL`	`CRITICAL`
resolution_status	`OPEN`, `INVESTIGATING`, `RESOLVED`, `AUTO_CORRECTED`	`OPEN`
root_cause	Filled in during investigation	`Partial commit on bet_xyz at 21:47`

The Manual Recompute Tool

The recompute tool is the primary remediation mechanism. It reconstructs the exposure ledger value from scratch by summing all open positions.

RECOMPUTE PROCEDURE
=====================

Command:  reconciliation recompute --agent=rajesh_mumbai --scope=cricket

Step 1: Acquire advisory lock on agent+scope (prevents concurrent bets from modifying positions)
Step 2: SELECT SUM(liability) as retained FROM positions
        WHERE agent_id='rajesh_mumbai' AND sport='cricket'
        AND status='OPEN' AND position_type='RETAINED'
Step 3: SELECT SUM(liability) as forwarded FROM positions
        WHERE agent_id='rajesh_mumbai' AND sport='cricket'
        AND status='OPEN' AND position_type='FORWARDED'
Step 4: SELECT SUM(potential_win) as potential_win FROM positions
        WHERE agent_id='rajesh_mumbai' AND sport='cricket'
        AND status='OPEN'
Step 5: Compare with current ledger values
Step 6: If different:
        UPDATE exposure_ledger SET
          retained_open_liability = [computed retained],
          forwarded_open_liability = [computed forwarded],
          open_potential_win = [computed potential_win]
        WHERE agent_id='rajesh_mumbai' AND scope='cricket'
Step 7: Log the correction with before/after values
Step 8: Release advisory lock
Step 9: Update Redis with new ledger values

Duration: 1-5 seconds per agent per scope
Impact: Agent's bets are briefly delayed (advisory lock held), not rejected

Tracking Drift Over Time to Detect Systemic Bugs

Every reconciliation result is stored in a time-series table:

Column	Type	Description
check_id	UUID	Unique identifier
agent_id	TEXT	Agent
scope	TEXT	Scope (sport, market, period)
checked_at	TIMESTAMP	When the check ran
ledger_value	BIGINT	Ledger amount in paisa
computed_value	BIGINT	Position sum in paisa
drift_amount	BIGINT	Difference in paisa
drift_direction	TEXT	`ZERO`, `LEDGER_HIGH`, `LEDGER_LOW`

A weekly analysis job examines this table for patterns:

Same agent drifting repeatedly: Indicates a bug in that agent's specific configuration or traffic pattern
All agents drifting in the same direction: Indicates a systemic bug in the exposure update logic
Drift correlating with high traffic periods: Indicates a race condition that manifests under load
Drift correlating with settlement batches: Indicates a bug in the settlement ledger decrement logic

Walk Through: Rajesh's Ledger Shows 15 Lakh, Actual Positions Sum to 13.5 Lakh

Situation: The scheduled 15-minute reconciliation job runs at 10:00 PM. It finds that Rajesh's cricket retained_open_liability ledger reads ₹15,00,000, but the sum of all his open retained cricket positions is only ₹13,50,000. The drift is ₹1,50,000 (LEDGER_HIGH, CRITICAL category).

What happened (likely root cause):

At 9:47 PM, a settlement batch processed the MI vs RR match. Rajesh had ₹1,50,000 of retained positions on that match. The settlement correctly set those positions to status=SETTLED, but the ledger decrement failed (perhaps due to a transient database connection error). The settlement service logged the error and moved on. The positions are settled, but the ledger still thinks they are open.

Investigation steps:

INVESTIGATION LOG
==================

10:00 PM - Reconciliation detects drift: ₹15L ledger vs ₹13.5L positions
10:00 PM - P1 alert fired. Rajesh switched to DB-PATH only
10:02 PM - On-call engineer acknowledges

10:03 PM - Engineer runs: reconciliation investigate --agent=rajesh_mumbai --scope=cricket
           Output: "Ledger is ₹1,50,000 higher than position sum.
                    Last settlement batch at 9:47 PM settled 12 positions for MI vs RR.
                    Settlement records exist for all 12 positions.
                    Ledger decrement for MI vs RR settlement: NOT FOUND.
                    Root cause: Settlement decremented positions but failed to decrement ledger."

10:05 PM - Engineer runs: reconciliation recompute --agent=rajesh_mumbai --scope=cricket
           Output: "Ledger updated from ₹15,00,000 to ₹13,50,000.
                    Correction: -₹1,50,000.
                    Audit record created: recompute_abc123.
                    Redis updated."

10:06 PM - Engineer verifies Rajesh's dashboard shows correct exposure
10:06 PM - Rajesh switched back from DB-PATH only to normal caching
10:07 PM - P1 alert resolved with root cause documented

10:08 PM - Bug ticket created: "Settlement ledger decrement lacks retry logic.
           When the DB connection fails during decrement, the error is logged
           but the decrement is not retried. Fix: add retry with 3 attempts
           and dead letter queue for persistent failures."

43. Hedge Execution Engine (CRITICAL)

Design Overview

The hedge execution engine is responsible for placing bets on Betfair to offset the platform's retained risk. It is a separate service that consumes a hedge order queue and executes against the Betfair API.

Limit Order Placement with Configurable Max Slippage

Every hedge order is placed as a limit order on Betfair, not a market order. This prevents the platform from being filled at arbitrarily bad prices during volatile moments.

Parameter	Description	Default	Configurable Per
`max_slippage_ticks`	Maximum number of price ticks worse than the target price that the system will accept	3 ticks	Per sport, per market type
`target_price`	The price at which the punter's bet was accepted	From bet record	Per bet
`limit_price`	The worst price the system will accept: `target_price + max_slippage_ticks`	Computed	Computed
`order_size`	Amount to hedge in GBP equivalent	From cascade output	Per bet
`time_in_force`	How long the order stays active before cancellation	30 seconds for pre-match, 10 seconds for in-play	Per event phase

Example: Amit's bet on MI at 1.85, platform needs to hedge ₹800 (approx £7.50).

Target price:         1.85 (back MI)
Max slippage:         3 ticks
Betfair price ladder: 1.85, 1.86, 1.87, 1.88, 1.89, 1.90 ...
Limit price:          1.88 (3 ticks worse than 1.85)
Order:                BACK MI at 1.88 or better, size £7.50
Time in force:        30 seconds (pre-match)

If the best available price on Betfair is 1.86, the order fills at 1.86 (within slippage). If the best available price is 1.92, the order sits in the market at 1.88 for 30 seconds. If not filled, it is cancelled and re-evaluated.

Partial Fill Tracking and Re-Pricing Strategy

Betfair often provides partial liquidity. The hedge engine must track partial fills and decide whether to pursue the remainder.

PARTIAL FILL EXAMPLE
=====================

Hedge order: BACK MI £100 at limit 1.88
Betfair response: Filled £60 at 1.86, £40 unmatched

State after partial fill:
  Hedged:    £60 at 1.86
  Unhedged:  £40 at limit 1.88 (still in market)

After 10 seconds (in-play time_in_force):
  Remaining £40 still unmatched

Decision tree:
  1. Current best price on Betfair: 1.91
  2. 1.91 > limit price 1.88 → cannot fill at current prices
  3. Is £40 worth re-pricing? £40 > £5 threshold → YES
  4. New limit price: 1.91 + 2 ticks = 1.93 (allow MORE slippage for the remainder)
  5. Place new order: BACK MI £40 at 1.93
  6. If this also partially fills or expires, repeat up to max_reprice_attempts (3)
  7. After 3 re-price attempts, accept the unhedged remainder as platform risk

Re-pricing rules:

Attempt	Slippage Allowed	Time in Force	Rationale
Initial	3 ticks	30s pre-match / 10s in-play	Conservative first attempt
Re-price 1	5 ticks from current market	20s pre-match / 5s in-play	More aggressive, shorter wait
Re-price 2	8 ticks from current market	10s pre-match / 3s in-play	Even more aggressive
Re-price 3 (final)	Market order equivalent (1000 ticks)	5s	Last resort -- get filled at any price
After all attempts	N/A	N/A	Accept as unhedged platform risk

Execution Quality Reporting

Every hedge order produces an execution quality record:

Field	Description	Example
hedge_order_id	Unique identifier	`hedge_x1y2z3`
bet_id	The originating bet	`bet_a1b2c3`
target_price	Price the punter received	1.85
achieved_price	Weighted average fill price	1.87
slippage	`achieved_price - target_price`	0.02
fill_rate	Percentage of order filled	85%
fill_time_ms	Time from order placement to full fill	4,200ms
reprice_count	How many times the order was re-priced	1
unhedged_amount	Amount left unhedged	£6.25

Daily execution quality report:

HEDGE EXECUTION QUALITY - March 15, 2026
==========================================

Total hedge orders:          342
Fully filled:                287 (83.9%)
Partially filled:            41 (12.0%)
Unfilled (unhedged):         14 (4.1%)

Average slippage:            0.024 (2.4 ticks)
Worst slippage:              0.08 (8 ticks) -- KKR vs PBKS in-play
Best execution:              -0.01 (better than target, market moved in our favor)

Total intended hedge:        £12,400
Total actually hedged:       £11,650 (93.9%)
Total unhedged:              £750 (6.1%)

Slippage cost (vs perfect execution): £48.20

Unhedged Exposure Tracker

The unhedged exposure tracker maintains a real-time view of exposure that SHOULD be hedged but is NOT hedged, separate from deliberately retained risk.

UNHEDGED EXPOSURE DASHBOARD
==============================

Total unhedged:           ₹2,34,000

By reason:
  Betfair no liquidity:   ₹1,20,000 (3 orders)
  Betfair API timeout:    ₹45,000 (1 order, retrying)
  Slippage exceeded:      ₹69,000 (2 orders, re-pricing)

By event:
  MI vs CSK (Live):       ₹1,65,000 ⚠ (largest single event)
  RCB vs DC (Pre):        ₹42,000
  KKR vs SRH (Pre):      ₹27,000

Aging:
  < 1 minute:             ₹45,000 (still in progress)
  1-5 minutes:            ₹69,000 (re-pricing)
  5-30 minutes:           ₹1,20,000 ⚠ (no liquidity, monitor)
  > 30 minutes:           ₹0

Alert threshold:          ₹10,00,000 (₹10 lakh)
Current status:           🟢 Well below threshold

Stale Hedge Cleanup Process

Hedge orders that are older than a configurable threshold without being filled are considered stale. The cleanup process runs every 60 seconds:

STALE HEDGE CLEANUP (runs every 60 seconds)
=============================================

1. Find all hedge orders WHERE:
     status = 'PENDING' or 'PARTIALLY_FILLED'
     AND created_at < NOW() - stale_threshold

   Stale thresholds:
     Pre-match events:  5 minutes
     In-play events:    60 seconds
     Settled events:    Immediate (hedge is pointless)

2. For each stale order:
   a. Cancel the order on Betfair (if still open)
   b. Record the partial fill amount (if any)
   c. Mark order as STALE_CANCELLED
   d. Move the unhedged amount to the unhedged exposure tracker
   e. Fire alert if total unhedged exceeds threshold

3. For orders on settled events:
   a. Cancel immediately
   b. The platform already bears the outcome as retained risk
   c. No further action needed

Queue Management for Hedge Orders During High Volume

During peak betting (IPL finals, 167 bets/sec), the hedge queue can receive 50+ orders per second (assuming ~30% of stake reaches the platform and ~50% of that is hedge-targeted).

Queue Parameter	Value	Rationale
Queue technology	Redis Stream	Persistent, supports consumer groups, at-least-once delivery
Consumer count	4 concurrent consumers	Betfair API allows 5 req/sec per app key; 4 consumers with rate limiting
Rate limit	5 orders per second to Betfair	Betfair API throttle limit
Priority	In-play hedges prioritized over pre-match	In-play prices change faster; pre-match can wait
Batching	Aggregate multiple small hedges on the same selection into one order	₹800 + ₹600 + ₹400 on same MI back → single £18 order
Max queue depth	200 orders	If exceeded, temporarily increase max_slippage and use more aggressive pricing
Deduplication	By bet_id + event_id + selection	Prevent double-hedging from retry logic

Failover When Betfair Is Slow or Down

Health check: The hedge engine pings Betfair every 5 seconds with a lightweight listMarketBook call. Three consecutive failures (15 seconds) transitions to DOWN status. Three consecutive successes transitions back to HEALTHY. One success from DOWN transitions to DEGRADED.

Walk Through: Platform Needs to Hedge 5 Lakh on MI at 1.85, Only 2 Lakh Liquidity at 1.90

Scenario: Multiple bets have cascaded through the hierarchy. The platform needs to hedge ₹5,00,000 (approximately £4,700) on MI to win. The target price is 1.85. The Betfair order book shows:

BETFAIR ORDER BOOK - MI to win
================================
Back side (we need to back):
  1.86:  £1,200 available
  1.88:  £800 available
  1.90:  £2,000 available    ← combined: only £4,000 available to 1.90
  1.92:  £1,500 available
  1.95:  £3,000 available

Execution sequence:

Step 1: Place limit order BACK MI £4,700 at 1.88 (target 1.85 + 3 ticks slippage)
  Result: Filled £1,200 at 1.86 + £800 at 1.88 = £2,000 filled, £2,700 unmatched
  Status: PARTIALLY_FILLED (42.5%)

Step 2: Wait 10 seconds (in-play time_in_force)
  No additional fills at 1.88

Step 3: Re-price attempt 1
  Current best available: 1.90
  New limit: 1.90 + 2 ticks = 1.92
  Place: BACK MI £2,700 at 1.92
  Result: Filled £2,000 at 1.90 + £700 at 1.92 = £2,700 filled
  Status: FULLY_FILLED

Total execution:
  £1,200 at 1.86
  £800 at 1.88
  £2,000 at 1.90
  £700 at 1.92

Weighted average price: (1200×1.86 + 800×1.88 + 2000×1.90 + 700×1.92) / 4700 = 1.889

Execution quality:
  Target:    1.85
  Achieved:  1.889
  Slippage:  0.039 (3.9 ticks)
  Cost of slippage: £4,700 × 0.039 = £183.30 (approximately ₹19,500)

This ₹19,500 slippage cost is deducted from the hedge effectiveness and
reported in the daily execution quality report.

44. Migration and Backfill Strategy (MEDIUM)

Mapping Existing Flat B-Book Configs to Forwarding Matrix Rules

The existing codebase has bbookConfigService.ts with a flat B-Book percentage per agent. The migration maps each flat config to a forwarding matrix with a single catch-all rule.

MIGRATION MAPPING
==================

Existing config (for Rajesh):
  bbook_percentage: 60  (Rajesh keeps 60%)

Becomes forwarding matrix:
  | Rule | market_type | sport_type | event_phase | source_type | liquidity_band | Forward % |
  |------|-------------|------------|-------------|-------------|----------------|-----------|
  | R1   | *           | *          | *           | *           | *              | 40%       |

  Agent default forward: 40%

This is functionally identical to the existing behavior.

Migration steps:

For each agent with a bbook_percentage, create a forwarding_matrix_rules row with all wildcards and forward_percentage = (100 - bbook_percentage)
Set the agent's default_forward_percentage to the same value
Mark the migration as MIGRATED_FROM_FLAT in the agent's config for auditability
The agent's existing behavior is completely unchanged

Handling Open Positions During Cutover

Open positions (bets placed before migration, not yet settled) must continue to work correctly under the new system.

Rule: Open positions are NOT re-routed. A bet that was placed under the old system retains its original routing. The new cascade engine only applies to new bets. This means:

Before cutover: freeze the list of all open bet IDs
During cutover: deploy the new code with the forwarding matrix enabled (behind feature flag)
After cutover: new bets go through the cascade engine; open bets settle using the positions that were created under the old system
Once all pre-cutover bets settle (typically within 1-3 days for cricket), the old routing logic can be removed

Parallel-Run Mode

Before switching any agent to the new cascade engine, run both engines in parallel and compare results:

PARALLEL-RUN MODE
==================

1. A bet arrives for an agent with parallel_run_mode = true

2. EXECUTE on OLD engine:
   - Route using flat bbook_percentage
   - CREATE real positions (this is what actually runs)
   - Record the routing decision as old_routing

3. EXECUTE on NEW engine (shadow mode):
   - Route using forwarding matrix + cascade
   - DO NOT create positions (shadow only)
   - Record the routing decision as new_routing

4. COMPARE:
   - Did the new engine produce the same retained amount for this agent? (for migrated flat configs, it should)
   - Did the cascade produce valid routing for upline agents?
   - Did any limit checks differ?
   - Record comparison result

5. REPORT:
   - Daily comparison report: X% of bets had identical routing, Y% differed
   - For differing bets, show why (new limits kicked in, matrix rule difference, etc.)
   - When 100% agreement for 3 consecutive days: agent is ready for cutover

Per-Agent Rollback Plan

Each agent can be individually rolled back from the new cascade engine to the old flat routing:

Disable feature flag bbook.cascading_routing.enabled for the specific agent
New bets immediately revert to flat bbook_percentage routing
Positions created by the cascade engine remain valid and settle normally
The agent's forwarding matrix remains in the database (not deleted) for future re-enablement

Rollback does NOT require:

Database migration reversal
Deployment of old code
Reprocessing of any existing bets

Data Migration for Historical Positions and Settlements

Historical positions and settlements from the old system must be migrated to the new schema so that reporting and reconciliation work across the cutover boundary.

Old System Data	Migration Target	Mapping
Old position (flat)	`positions` table with `cascade_level = 1`	One position becomes one L1 position + one forwarded position at platform level
Old settlement	`settlements` table with `migration_source = 'v1'`	Direct mapping, no restructuring needed
Old agent config	`forwarding_matrix_rules` + `agent_limits`	As described in mapping section above
Old bet record	`bets` table with `routing_engine = 'v1'`	Direct copy with engine version flag

Migration is non-destructive: Old tables are renamed with _v1 suffix but NOT dropped until 90 days after successful cutover with no issues.

45. Support Tooling for Dispute Resolution (MEDIUM)

Bet Lookup

The support dashboard provides multiple paths to find a bet:

Lookup Method	Use Case	Query
By bet ID	"Show me bet XYZ"	Direct primary key lookup
By user + time range	"Show me Amit's bets last night"	user_id + created_at range
By agent + time range	"Show me all bets under Rajesh today"	agent_id + created_at range
By event	"Show me all bets on MI vs CSK"	event_id lookup
By amount	"Show me all bets over ₹1 lakh today"	stake > threshold + created_at range
By status	"Show me all unsettled bets from yesterday"	status=OPEN + created_at range

Audit Trail Visualization

For each bet, the support dashboard renders the cascade as a visual flow:

AUDIT TRAIL VISUALIZATION - Bet bet_a1b2c3d4
==============================================

Amit places ₹50,000 on MI at 1.85

    ┌──────────────────────────────────────────────────────┐
    │  AMIT (Punter)                                       │
    │  Stake: ₹50,000 → Reduced to ₹50,000 (no reduction) │
    │  Win cap check: ₹42,500 < ₹50,000 limit ✓           │
    └──────────────────────┬───────────────────────────────┘
                           │ ₹50,000
                           ▼
    ┌──────────────────────────────────────────────────────┐
    │  RAJESH (Level 1)                                    │
    │  Matrix rule: R3 (MATCH_ODDS + CRICKET + PRE_MATCH)  │
    │  Forward: 40% → Retains: ₹30,000                    │
    │  Limit check: Cricket ₹12.3L/₹50L (24.6%) ✓        │
    │  Limit check: Match ₹1.5L/₹5L (30%) ✓              │
    │  Limit check: Night ₹3.3L/₹10L (33%) ✓             │
    │  Overflow: ₹0                                        │
    │  RETAINED: ₹30,000 (₹25,500 liability)              │
    └──────────────────────┬───────────────────────────────┘
                           │ ₹20,000
                           ▼
    ┌──────────────────────────────────────────────────────┐
    │  VIKRAM (Level 2)                                    │
    │  Matrix rule: V2 (CRICKET + PRE_MATCH)               │
    │  Forward: 40% → Retains: ₹12,000                    │
    │  Source type: NORMAL (own classification)             │
    │  Limit checks: All ✓                                 │
    │  RETAINED: ₹12,000 (₹10,200 liability)              │
    └──────────────────────┬───────────────────────────────┘
                           │ ₹8,000
                           ▼
    ┌──────────────────────────────────────────────────────┐
    │  PLATFORM (Level 3)                                  │
    │  Retained: ₹4,000                                    │
    │  Hedged: ₹4,000 → Betfair order hedge_x1y2z3       │
    │  Hedge status: FILLED at 1.86 (slippage 0.01)       │
    └──────────────────────────────────────────────────────┘

Timeline:
  21:47:12.001  Bet received
  21:47:12.004  Win cap check passed
  21:47:12.014  Matrix resolved (R3, specificity 3)
  21:47:12.025  Rajesh limits checked
  21:47:12.036  Vikram limits checked
  21:47:12.048  Platform processed
  21:47:12.063  All positions created
  21:47:12.068  Audit record written
  21:47:12.070  Response sent to Amit
  Total: 69ms
  
  21:47:12.085  Hedge order queued
  21:47:13.200  Hedge order filled on Betfair

Re-Simulate Capability

The "re-simulate" button allows a support agent to replay a bet with the configuration state as it existed at the time of the bet:

Load the forwarding matrix rules that were active at the bet's timestamp (using the versioned config)
Load the exposure ledger state as it existed just before the bet (from the audit record)
Run the cascade engine with these inputs
Display the result alongside the actual result
If they match: the system behaved correctly
If they differ: flag the discrepancy for engineering investigation

Dispute Workflow

Stage	Description	Actions Available
OPEN	Dispute filed by agent or user	Assign to support agent, set priority, link to bet(s)
INVESTIGATING	Support agent reviewing	View audit trail, re-simulate bet, compare ledgers, add notes
PENDING_AGENT	Waiting for agent to provide information	Send request to agent, set response deadline
PENDING_ENGINEERING	Requires engineering investigation	Escalate to engineering, provide all context
RESOLVED_CORRECT	System was correct, dispute dismissed	Document finding, notify all parties
RESOLVED_CORRECTION	System was wrong, correction applied	Apply financial correction, update ledgers, notify all parties
RESOLVED_GOODWILL	System was correct, but goodwill credit given	Apply credit, document reason, notify agent

46. Responsible Gambling Controls (MEDIUM)

Self-Exclusion Mechanism

A punter can self-exclude for a configurable duration (24 hours, 7 days, 30 days, 6 months, permanent). Self-exclusion is the FIRST check in the bet processing pipeline.

Self-Exclusion Parameter	Description
Duration options	24h, 7d, 30d, 6m, permanent
Cooling-off period	Cannot reverse self-exclusion for the first 24 hours
Scope	All betting across all agents (cannot bet through any path)
Implementation	Redis flag checked before ANY processing (sub-millisecond check)
Admin override	Only permanent exclusions can be lifted by admin after 6 months, with verification

Deposit Limits

Limit Type	Description	Where in Pipeline
Daily deposit limit	Maximum deposit in 24 hours	Payment service (before funds reach betting wallet)
Weekly deposit limit	Maximum deposit in 7 days	Payment service
Monthly deposit limit	Maximum deposit in 30 days	Payment service

Deposit limits are NOT part of the bet processing pipeline. They are enforced at the payment layer. However, the B-Book system must be aware of them for display purposes (showing the user their remaining deposit capacity on the dashboard).

Session Time Limits

Feature	Description	Implementation
Session duration limit	Maximum continuous session time (configurable, default 4 hours)	WebSocket/session middleware sends warning at 80% of limit, auto-logs out at 100%
Mandatory break	Minimum break duration after session limit reached (default 15 minutes)	Session creation blocked for break_duration after limit-triggered logout
Activity tracker	Track time since last break, number of bets placed, amount wagered	In-memory per-session counter, persisted to DB every 5 minutes

Reality Check Notifications

Reality check notifications are periodic messages that remind the punter of their activity during the session.

Trigger	Message Content	Delivery
Every 60 minutes of play	"You have been playing for 1 hour. Total bets: 23. Net result: -₹4,200."	In-app popup, requires acknowledgment to continue
After 10 consecutive losses	"You have had 10 consecutive losses. Consider taking a break."	In-app popup
After ₹50,000 total loss in session	"You have lost ₹50,000 this session. Your daily deposit limit is ₹1,00,000."	In-app popup with option to self-exclude
Approaching deposit limit	"You have deposited ₹80,000 of your ₹1,00,000 daily limit."	In-app notification

Where These Hooks Go in the Bet Flow Pipeline

Steps 0, 0.5, and 2 are the responsible gambling checkpoints. They add minimal latency (sub-millisecond for Redis flag checks, zero latency if no check is triggered) but ensure that gambling controls are enforced before any money flows.

Part III: Complete Implementation Architecture

The following sections provide the complete implementation specification for the entire Hannibal B-Book system. Every database table, every API endpoint, every pipeline step, every error case, and every deployment detail is documented. An LLM or developer reading this can build the entire system without asking a single question about design intent, data models, or processing logic.

47. Technology Stack (Confirmed)

Core Technologies

Technology	Version	Purpose	Why This Choice
Node.js	20 LTS	Application runtime	Event-loop model handles high concurrency with low overhead. The team already has expertise. Non-blocking I/O is ideal for the many Redis and DB calls in the bet pipeline.
TypeScript	5.x	Language	Type safety prevents entire categories of financial bugs (wrong types for money, missing fields). Domain types (Stake, Liability, ForwardPercentage) enforce correctness at compile time.
PostgreSQL	16	Primary database	ACID transactions for financial data. FOR UPDATE locking for limit enforcement. Partitioning for scaling. JSONB for flexible audit payloads.
Prisma	5.x	ORM	Type-safe database access. Schema-as-code for migrations. Works with PostgreSQL partitioning through raw queries where needed.
Redis	7.x	Cache, counters, queues, pub/sub	Sub-millisecond reads for exposure checks. Atomic INCRBY for sharded counters. Streams for hedge order queue. Pub/sub for cache invalidation.
Docker	Latest	Containerization	Consistent environments across development, staging, production. Docker Compose for local development.

Additional Technologies

Technology	Purpose	Why
Bull (BullMQ)	Job queue for background tasks	Settlement processing, reconciliation jobs, audit tier migration, hedge retry. Built on Redis. Supports delayed jobs, retries, priority queues.
Prometheus + Grafana	Metrics and dashboards	Industry standard for monitoring. Prometheus scrapes application metrics. Grafana renders dashboards. AlertManager handles alert routing.
Pino	Structured logging	Fast JSON logger for Node.js. Structured logs are queryable. Low overhead even at high throughput.
Zod	Runtime validation	Validates all API inputs and configuration at runtime. Complements TypeScript compile-time types with runtime safety.
Socket.IO	WebSocket connections	Real-time dashboard updates to agents. Push notifications for alerts. Session management for responsible gambling.
node-cron	Scheduled jobs	Period rollovers, reconciliation scheduling, audit tier migration.
Helmet + cors	HTTP security	Standard security headers. CORS configuration for dashboard frontend.
prom-client	Prometheus metrics	Native Prometheus metrics collection for Node.js. Histograms, counters, gauges.

Not Included (and Why)

Technology	Why Not
Kafka	Overkill for current throughput (167 bets/sec). Redis Streams provide sufficient queue functionality with simpler operations. Reconsider at 1000+ bets/sec.
MongoDB	Financial data requires ACID transactions and relational integrity. PostgreSQL provides both.
GraphQL	The API consumers (dashboard frontend, mobile, WhatsApp bot) all have well-defined data needs. REST is simpler, faster, and sufficient.
Microservices (separate deployments per service)	For the first season, a modular monolith is simpler to deploy, debug, and operate. Services are separated in code (modules) but deployed as one application. Extract into microservices only when a specific module needs independent scaling.

48. System Architecture Overview

Complete System Diagram

Communication Patterns

From	To	Method	Pattern
Client → API	REST API	HTTP/JSON	Request-response, auth via JWT
Client → Dashboard	WebSocket	Socket.IO	Real-time push for exposure updates, alerts
API → Bet Processing	Function call	In-process	Synchronous (same monolith)
Bet Processing → Cascade Engine	Function call	In-process	Synchronous
Bet Processing → Hedge Queue	Redis Stream	Async publish	Fire-and-forget from bet pipeline
Hedge Worker → Betfair	HTTP	REST API	Rate-limited, with retry
Settlement → Bet Processing	BullMQ Job	Async	Settlement jobs queued when events settle
Config Change → All Instances	Redis Pub/Sub	Async broadcast	Cache invalidation messages
Reconciliation → Alert	BullMQ Job	Async	Discrepancy alerts queued for processing

Deployment Topology (Production)

PRODUCTION DEPLOYMENT
======================

Application Instances: 3 (behind load balancer)
  - Each runs the full modular monolith
  - Agent-affinity routing via consistent hash on agent_id header
  - Each instance: 2 vCPU, 4 GB RAM

Background Workers: 2
  - Instance 4: Settlement worker + Reconciliation worker
  - Instance 5: Hedge worker + Audit migration worker + Sharp detection worker
  - Each instance: 2 vCPU, 2 GB RAM

PostgreSQL:
  - Primary: 4 vCPU, 16 GB RAM, 500 GB SSD
  - Read Replica 1: 2 vCPU, 8 GB RAM (dashboard)
  - Read Replica 2: 2 vCPU, 8 GB RAM (reporting)
  - Audit DB: 2 vCPU, 4 GB RAM, 200 GB HDD (append-only)

Redis:
  - Primary: 2 vCPU, 4 GB RAM
  - Replica: 1 vCPU, 2 GB RAM (read-only)

Load Balancer: nginx or cloud ALB
Monitoring: Prometheus + Grafana (1 instance)

49. Database Schema Design

Entity-Relationship Diagram

Table: agents

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY, DEFAULT gen_random_uuid()	Unique agent identifier
external_id	VARCHAR(100)	UNIQUE, NOT NULL	Human-readable agent ID (e.g., `rajesh_mumbai`)
name	VARCHAR(255)	NOT NULL	Agent display name
parent_agent_id	UUID	REFERENCES agents(id), NULLABLE	Upline agent. NULL for top-level agents and platform
level	INTEGER	NOT NULL	Hierarchy depth. 0 = platform, 1 = master agent, 2 = sub-agent, etc.
status	VARCHAR(20)	NOT NULL, DEFAULT 'ACTIVE'	ACTIVE, SUSPENDED, TRANSFERRING, DEACTIVATED
timezone	VARCHAR(50)	NOT NULL, DEFAULT 'Asia/Kolkata'	Agent's local timezone (IANA format)
default_forward_percentage	DECIMAL(5,2)	NOT NULL, DEFAULT 50.00	Fallback forward % when no matrix rule matches
night_period_start	TIME	NULLABLE	Night period start in local time
night_period_end	TIME	NULLABLE	Night period end in local time
weekly_period_start_day	INTEGER	NOT NULL, DEFAULT 1	1=Monday, 7=Sunday
tier	VARCHAR(20)	NOT NULL, DEFAULT 'TIER_1'	TIER_1, TIER_2, TIER_3 (UX experience tier)
is_platform	BOOLEAN	NOT NULL, DEFAULT false	True for the single platform agent
platform_retain_percentage	DECIMAL(5,2)	NULLABLE	Only for platform: % to retain vs hedge
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_agents_parent on (parent_agent_id)
idx_agents_status on (status)
idx_agents_external_id on (external_id) -- UNIQUE

Table: agent_limits

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
agent_id	UUID	REFERENCES agents(id), NOT NULL
limit_type	VARCHAR(30)	NOT NULL	SPORT, MARKET, NIGHT_PERIOD, WEEKLY_PERIOD
sport_type	VARCHAR(30)	NULLABLE	CRICKET, FOOTBALL, TENNIS, KABADDI, etc. NULL for period limits that apply to all sports
event_id	VARCHAR(100)	NULLABLE	Only for MARKET type limits. The specific event/market ID
limit_amount	BIGINT	NOT NULL	Limit in paisa (1 lakh = 10,000,000 paisa)
is_active	BOOLEAN	NOT NULL, DEFAULT true
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_agent_limits_agent_type on (agent_id, limit_type)
idx_agent_limits_agent_sport on (agent_id, sport_type)
UNIQUE on (agent_id, limit_type, sport_type, event_id) to prevent duplicate limits

Table: forwarding_matrix_rules

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
agent_id	UUID	REFERENCES agents(id), NOT NULL
version	INTEGER	NOT NULL	Incremented on every change. Used for audit snapshot
market_type	VARCHAR(30)	NOT NULL, DEFAULT '*'	MATCH_ODDS, FANCY, BOOKMAKER, OVER_UNDER, LINE, or *
sport_type	VARCHAR(30)	NOT NULL, DEFAULT '*'	CRICKET, FOOTBALL, TENNIS, KABADDI, or *
event_phase	VARCHAR(30)	NOT NULL, DEFAULT '*'	PRE_MATCH, IN_PLAY, APPROACHING_START, or *
source_type	VARCHAR(30)	NOT NULL, DEFAULT '*'	NORMAL, SHARP, VIP, NEW_ACCOUNT, or *
liquidity_band	VARCHAR(30)	NOT NULL, DEFAULT '*'	HIGH, MEDIUM, LOW, NONE, or *
forward_percentage	DECIMAL(5,2)	NOT NULL, CHECK (0 <= forward_percentage <= 100)	Percentage to forward to upline
specificity	INTEGER	NOT NULL, GENERATED ALWAYS AS (computed)	Count of non-wildcard dimensions (0-5). Stored for fast sorting
priority	INTEGER	NOT NULL, DEFAULT 0	Tie-breaker when specificity and forward_percentage are equal
is_active	BOOLEAN	NOT NULL, DEFAULT true	Soft delete / disable
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()	Used for deterministic ordering tie-break
updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_fmr_agent_version on (agent_id, version)
idx_fmr_agent_active on (agent_id, is_active) WHERE is_active = true
idx_fmr_lookup on (agent_id, market_type, sport_type, event_phase, source_type, liquidity_band) WHERE is_active = true

Note: The specificity column is computed as the count of dimensions that are NOT ''. For example, a rule with market_type=FANCY, sport_type=CRICKET, event_phase=IN_PLAY, source_type=, liquidity_band=* has specificity 3.

Table: users

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
external_id	VARCHAR(100)	UNIQUE, NOT NULL	User ID from the main platform
agent_id	UUID	REFERENCES agents(id), NOT NULL	The agent this user belongs to
name	VARCHAR(255)	NOT NULL
per_click_win_limit	BIGINT	NOT NULL, DEFAULT 5000000	In paisa. Default ₹50,000
aggregate_win_limit_daily	BIGINT	NOT NULL, DEFAULT 20000000	In paisa. Default ₹2,00,000
min_stake	BIGINT	NOT NULL, DEFAULT 10000	In paisa. Default ₹100
self_exclusion_until	TIMESTAMPTZ	NULLABLE	NULL if not excluded
session_time_limit_minutes	INTEGER	NOT NULL, DEFAULT 240	Default 4 hours
deposit_limit_daily	BIGINT	NULLABLE	In paisa
deposit_limit_weekly	BIGINT	NULLABLE	In paisa
deposit_limit_monthly	BIGINT	NULLABLE	In paisa
status	VARCHAR(20)	NOT NULL, DEFAULT 'ACTIVE'	ACTIVE, SUSPENDED, SELF_EXCLUDED
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_users_agent on (agent_id)
idx_users_external on (external_id) -- UNIQUE
idx_users_status on (status)

Table: user_overrides

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
user_id	UUID	REFERENCES users(id), NOT NULL
agent_id	UUID	REFERENCES agents(id), NOT NULL	The agent applying this override
forward_percentage	DECIMAL(5,2)	NOT NULL	Override forward % for this user at this agent
reason	TEXT	NOT NULL	Why the override was set (e.g., "known sharp user")
created_by	UUID	NOT NULL	Admin or agent who set the override
is_active	BOOLEAN	NOT NULL, DEFAULT true
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
expires_at	TIMESTAMPTZ	NULLABLE	Optional expiry

Indexes:

UNIQUE on (user_id, agent_id) WHERE is_active = true
idx_user_overrides_agent on (agent_id)

Table: user_classifications

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
user_id	UUID	REFERENCES users(id), NOT NULL
agent_id	UUID	REFERENCES agents(id), NOT NULL	The agent making this classification
classification	VARCHAR(30)	NOT NULL	NORMAL, SHARP, VIP, NEW_ACCOUNT
confidence_score	DECIMAL(5,4)	NULLABLE	0.0000 to 1.0000 for ML-based classifications
reason	TEXT	NULLABLE	Why classified (manual note or detection signal)
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

UNIQUE on (user_id, agent_id)
idx_user_class_agent on (agent_id, classification)

Table: agent_trust_config

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
agent_id	UUID	REFERENCES agents(id), NOT NULL	The upline agent
sub_agent_id	UUID	REFERENCES agents(id), NOT NULL	The downstream agent
trust_downstream_flags	BOOLEAN	NOT NULL, DEFAULT false	Whether to trust the sub-agent's user classifications
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

UNIQUE on (agent_id, sub_agent_id)

Table: market_overrides

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
agent_id	UUID	REFERENCES agents(id), NOT NULL
event_id	VARCHAR(100)	NOT NULL	The specific event/market
forward_percentage	DECIMAL(5,2)	NOT NULL	Override forward % for this event
reason	TEXT	NOT NULL
created_by	UUID	NOT NULL
is_active	BOOLEAN	NOT NULL, DEFAULT true
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
expires_at	TIMESTAMPTZ	NULLABLE

Indexes:

UNIQUE on (agent_id, event_id) WHERE is_active = true

Table: events

Column	Type	Constraints	Description
id	VARCHAR(100)	PRIMARY KEY	External event ID from odds provider
sport_type	VARCHAR(30)	NOT NULL
name	VARCHAR(500)	NOT NULL	"MI vs CSK, IPL 2026"
start_time	TIMESTAMPTZ	NOT NULL
status	VARCHAR(20)	NOT NULL, DEFAULT 'UPCOMING'	UPCOMING, LIVE, SUSPENDED, SETTLED, VOID
result	JSONB	NULLABLE	Settlement result data
settled_at	TIMESTAMPTZ	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_events_sport_status on (sport_type, status)
idx_events_start_time on (start_time)

Table: markets

Column	Type	Constraints	Description
id	VARCHAR(100)	PRIMARY KEY	External market ID
event_id	VARCHAR(100)	REFERENCES events(id), NOT NULL
market_type	VARCHAR(30)	NOT NULL	MATCH_ODDS, FANCY, BOOKMAKER, etc.
name	VARCHAR(255)	NOT NULL
status	VARCHAR(20)	NOT NULL, DEFAULT 'OPEN'	OPEN, SUSPENDED, CLOSED, SETTLED, VOID
settled_at	TIMESTAMPTZ	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_markets_event on (event_id)
idx_markets_status on (status)

Table: bets (partitioned by month on created_at)

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
user_id	UUID	NOT NULL	REFERENCES users(id) -- enforced at application level due to partitioning
agent_id	UUID	NOT NULL	The originating agent (Level 1)
event_id	VARCHAR(100)	NOT NULL
market_id	VARCHAR(100)	NOT NULL
selection	VARCHAR(255)	NOT NULL	What the punter bet on (e.g., "MI to win")
side	VARCHAR(10)	NOT NULL	BACK or LAY
requested_stake	BIGINT	NOT NULL	Original stake in paisa
accepted_stake	BIGINT	NOT NULL	After stake reduction, in paisa
odds	DECIMAL(10,4)	NOT NULL	Decimal odds
potential_win	BIGINT	NOT NULL	In paisa
liability	BIGINT	NOT NULL	In paisa
stake_reduction_reason	VARCHAR(50)	NULLABLE	PER_CLICK_LIMIT, AGGREGATE_LIMIT, null
market_type	VARCHAR(30)	NOT NULL
sport_type	VARCHAR(30)	NOT NULL
event_phase	VARCHAR(30)	NOT NULL	PRE_MATCH, IN_PLAY
source_type	VARCHAR(30)	NOT NULL	NORMAL, SHARP, VIP, etc. (as resolved at originating agent)
liquidity_band	VARCHAR(30)	NOT NULL	HIGH, MEDIUM, LOW, NONE
routing_engine	VARCHAR(10)	NOT NULL, DEFAULT 'v2'	v1 (legacy flat) or v2 (cascade)
routing

I'll continue from where I was cut off in the database schema design, picking up from the bets table.

Table: bets (continued)

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
user_id	UUID	NOT NULL	The punter who placed the bet
agent_id	UUID	NOT NULL	The originating agent (Level 1)
event_id	VARCHAR(100)	NOT NULL
market_id	VARCHAR(100)	NOT NULL
selection	VARCHAR(255)	NOT NULL	What the punter bet on (e.g., "MI to win")
side	VARCHAR(10)	NOT NULL	BACK or LAY
requested_stake	BIGINT	NOT NULL	Original stake in paisa
accepted_stake	BIGINT	NOT NULL	After stake reduction, in paisa
odds	DECIMAL(10,4)	NOT NULL	Decimal odds
potential_win	BIGINT	NOT NULL	In paisa
liability	BIGINT	NOT NULL	In paisa
stake_reduction_reason	VARCHAR(50)	NULLABLE	PER_CLICK_LIMIT, AGGREGATE_LIMIT, null
market_type	VARCHAR(30)	NOT NULL
sport_type	VARCHAR(30)	NOT NULL
event_phase	VARCHAR(30)	NOT NULL	PRE_MATCH, IN_PLAY
source_type	VARCHAR(30)	NOT NULL	As resolved at originating agent
liquidity_band	VARCHAR(30)	NOT NULL	HIGH, MEDIUM, LOW, NONE
routing_engine	VARCHAR(10)	NOT NULL, DEFAULT 'v2'	v1 (legacy) or v2 (cascade)
routing_status	VARCHAR(20)	NOT NULL, DEFAULT 'COMPLETE'	COMPLETE, PARTIAL, FAILED
matrix_version_snapshot	INTEGER	NOT NULL	The matrix version used at time of routing
status	VARCHAR(20)	NOT NULL, DEFAULT 'OPEN'	OPEN, SETTLED, VOID, CANCELLED
settled_at	TIMESTAMPTZ	NULLABLE
period_context	VARCHAR(20)	NOT NULL	NIGHT, DAY, or specific period identifier
total_processing_time_ms	INTEGER	NOT NULL	End-to-end latency
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()	Partition key

Indexes:

idx_bets_user_time on (user_id, created_at DESC)
idx_bets_agent_time on (agent_id, created_at DESC)
idx_bets_event on (event_id)
idx_bets_market on (market_id)
idx_bets_status on (status) WHERE status = 'OPEN'
idx_bets_routing_status on (routing_status) WHERE routing_status != 'COMPLETE'

Table: positions (partitioned by month on created_at)

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
bet_id	UUID	NOT NULL	The originating bet
agent_id	UUID	NOT NULL	The agent holding this position
cascade_level	INTEGER	NOT NULL	1 = first agent, 2 = upline, etc.
position_type	VARCHAR(20)	NOT NULL	RETAINED or FORWARDED
stake	BIGINT	NOT NULL	Stake portion in paisa
liability	BIGINT	NOT NULL	Liability portion in paisa
potential_win	BIGINT	NOT NULL	Potential win portion in paisa
forward_percentage_used	DECIMAL(5,2)	NOT NULL	The forward % that produced this split
forward_source	VARCHAR(30)	NOT NULL	USER_OVERRIDE, MARKET_OVERRIDE, MATRIX_RULE, AGENT_DEFAULT
matrix_rule_id	UUID	NULLABLE	The specific matrix rule that matched (if MATRIX_RULE)
overflow_amount	BIGINT	NOT NULL, DEFAULT 0	How much of this was overflow from limit breach
event_id	VARCHAR(100)	NOT NULL	Denormalized for fast queries
market_id	VARCHAR(100)	NOT NULL	Denormalized
sport_type	VARCHAR(30)	NOT NULL	Denormalized
selection	VARCHAR(255)	NOT NULL	Denormalized
side	VARCHAR(10)	NOT NULL	Denormalized
odds	DECIMAL(10,4)	NOT NULL	Denormalized
status	VARCHAR(20)	NOT NULL, DEFAULT 'OPEN'	OPEN, SETTLED, VOID
settlement_id	UUID	NULLABLE	Link to settlement record
settled_amount	BIGINT	NULLABLE	Actual P&L in paisa (positive = agent profit, negative = agent loss)
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()	Partition key

Indexes:

idx_positions_bet on (bet_id)
idx_positions_agent_status on (agent_id, status) WHERE status = 'OPEN'
idx_positions_agent_event on (agent_id, event_id) WHERE status = 'OPEN'
idx_positions_agent_sport on (agent_id, sport_type) WHERE status = 'OPEN'
idx_positions_event_status on (event_id, status)
idx_positions_market_status on (market_id, status)

Table: exposure_ledger

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
agent_id	UUID	NOT NULL
scope_type	VARCHAR(30)	NOT NULL	SPORT, MARKET, NIGHT_PERIOD, WEEKLY_PERIOD
scope_key	VARCHAR(200)	NOT NULL	e.g., "cricket", "mi_vs_csk_2026_03_15", "night_2026_03_15", "week_2026_11"
shard_index	INTEGER	NOT NULL, DEFAULT 0	0 to N-1 for sharded counters
retained_open_liability	BIGINT	NOT NULL, DEFAULT 0	In paisa
forwarded_open_liability	BIGINT	NOT NULL, DEFAULT 0	In paisa
open_potential_win	BIGINT	NOT NULL, DEFAULT 0	In paisa
no_new_risk_active	BOOLEAN	NOT NULL, DEFAULT false	Whether this scope is in NO_NEW_RISK
no_new_risk_triggered_at	TIMESTAMPTZ	NULLABLE	When NO_NEW_RISK was activated
last_updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

UNIQUE on (agent_id, scope_type, scope_key, shard_index)
idx_exposure_agent_scope on (agent_id, scope_type, scope_key)
idx_exposure_no_new_risk on (no_new_risk_active) WHERE no_new_risk_active = true

Table: settlements (partitioned by month on created_at)

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
event_id	VARCHAR(100)	NOT NULL
market_id	VARCHAR(100)	NOT NULL
position_id	UUID	NOT NULL	The position being settled
agent_id	UUID	NOT NULL
bet_id	UUID	NOT NULL
settlement_type	VARCHAR(20)	NOT NULL	WIN, LOSS, VOID, PUSH
stake	BIGINT	NOT NULL	The position's stake
payout	BIGINT	NOT NULL	Amount paid to/from agent. Positive = agent pays punter. Negative = agent receives.
profit_loss	BIGINT	NOT NULL	Agent P&L. Positive = profit, negative = loss
idempotency_key	VARCHAR(200)	UNIQUE, NOT NULL	Prevents double settlement: `{position_id}_{event_result_hash}`
status	VARCHAR(20)	NOT NULL, DEFAULT 'COMPLETED'	COMPLETED, REVERSED, RE_SETTLED
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_settlements_event on (event_id)
idx_settlements_agent_time on (agent_id, created_at DESC)
idx_settlements_bet on (bet_id)
idx_settlements_position on (position_id)
idx_settlements_idempotency on (idempotency_key) -- UNIQUE

Table: audit_trail (partitioned by month on created_at, separate schema)

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
bet_id	UUID	NOT NULL
record_type	VARCHAR(30)	NOT NULL	BET_PLACED, BET_SETTLED, POSITION_CORRECTED, CONFIG_CHANGED, RECOMPUTE
agent_id	UUID	NOT NULL	Primary agent for this record
user_id	UUID	NULLABLE
event_id	VARCHAR(100)	NULLABLE
payload	JSONB	NOT NULL	Full structured audit data (forwarding chain, limit checks, etc.)
checksum	VARCHAR(64)	NOT NULL	SHA-256 of payload
previous_checksum	VARCHAR(64)	NULLABLE	Checksum of previous record for this bet_id (chain)
processing_time_ms	INTEGER	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()	Partition key

Indexes:

idx_audit_bet on (bet_id)
idx_audit_agent_time on (agent_id, created_at DESC)
idx_audit_user_time on (user_id, created_at DESC) WHERE user_id IS NOT NULL
idx_audit_event on (event_id) WHERE event_id IS NOT NULL
idx_audit_type on (record_type, created_at DESC)

Table: dead_letter_queue

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
source	VARCHAR(50)	NOT NULL	BET_PROCESSING, SETTLEMENT, HEDGE, RECONCILIATION
reference_id	UUID	NOT NULL	The bet_id, settlement_id, etc. that failed
error_message	TEXT	NOT NULL
error_stack	TEXT	NULLABLE
payload	JSONB	NOT NULL	Full context for retry
retry_count	INTEGER	NOT NULL, DEFAULT 0
max_retries	INTEGER	NOT NULL, DEFAULT 3
status	VARCHAR(20)	NOT NULL, DEFAULT 'PENDING'	PENDING, RETRYING, RESOLVED, ESCALATED
resolved_by	UUID	NULLABLE	Admin who resolved
resolved_at	TIMESTAMPTZ	NULLABLE
resolution_notes	TEXT	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_dlq_status on (status) WHERE status IN ('PENDING', 'RETRYING')
idx_dlq_source on (source, created_at DESC)
idx_dlq_reference on (reference_id)

Table: hedge_orders

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
bet_id	UUID	NOT NULL	Originating bet
event_id	VARCHAR(100)	NOT NULL
market_id	VARCHAR(100)	NOT NULL
selection	VARCHAR(255)	NOT NULL
side	VARCHAR(10)	NOT NULL	BACK or LAY
target_price	DECIMAL(10,4)	NOT NULL	Price the punter received
limit_price	DECIMAL(10,4)	NOT NULL	Worst acceptable price
requested_amount	BIGINT	NOT NULL	In paisa
filled_amount	BIGINT	NOT NULL, DEFAULT 0	In paisa
unfilled_amount	BIGINT	GENERATED ALWAYS AS (requested_amount - filled_amount) STORED
average_fill_price	DECIMAL(10,4)	NULLABLE	Weighted average of all fills
slippage	DECIMAL(10,4)	NULLABLE	average_fill_price - target_price
betfair_bet_id	VARCHAR(100)	NULLABLE	Betfair's order reference
status	VARCHAR(20)	NOT NULL, DEFAULT 'QUEUED'	QUEUED, PENDING, PARTIALLY_FILLED, FILLED, CANCELLED, STALE_CANCELLED, FAILED
reprice_count	INTEGER	NOT NULL, DEFAULT 0
max_reprice_attempts	INTEGER	NOT NULL, DEFAULT 3
time_in_force_seconds	INTEGER	NOT NULL
error_message	TEXT	NULLABLE
queued_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
sent_at	TIMESTAMPTZ	NULLABLE	When sent to Betfair
filled_at	TIMESTAMPTZ	NULLABLE	When fully filled
cancelled_at	TIMESTAMPTZ	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_hedge_bet on (bet_id)
idx_hedge_status on (status) WHERE status IN ('QUEUED', 'PENDING', 'PARTIALLY_FILLED')
idx_hedge_event on (event_id)
idx_hedge_betfair on (betfair_bet_id) WHERE betfair_bet_id IS NOT NULL

Table: agent_hierarchy_history

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
agent_id	UUID	NOT NULL
previous_parent_id	UUID	NULLABLE
new_parent_id	UUID	NULLABLE
change_type	VARCHAR(30)	NOT NULL	CREATED, PARENT_CHANGED, SUSPENDED, REACTIVATED, DEACTIVATED
changed_by	UUID	NOT NULL	Admin who made the change
reason	TEXT	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_ahh_agent on (agent_id, created_at DESC)

Table: reconciliation_results

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
reconciliation_type	VARCHAR(30)	NOT NULL	SCHEDULED_15MIN, POST_SETTLEMENT, FULL, TARGETED
started_at	TIMESTAMPTZ	NOT NULL
completed_at	TIMESTAMPTZ	NULLABLE
agents_checked	INTEGER	NOT NULL, DEFAULT 0
discrepancies_found	INTEGER	NOT NULL, DEFAULT 0
status	VARCHAR(20)	NOT NULL	RUNNING, COMPLETED, FAILED
summary	JSONB	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Table: reconciliation_discrepancies

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
reconciliation_id	UUID	REFERENCES reconciliation_results(id), NOT NULL
agent_id	UUID	NOT NULL
scope_type	VARCHAR(30)	NOT NULL
scope_key	VARCHAR(200)	NOT NULL
ledger_value	BIGINT	NOT NULL	In paisa
computed_value	BIGINT	NOT NULL	In paisa
drift_amount	BIGINT	NOT NULL	In paisa
drift_direction	VARCHAR(15)	NOT NULL	LEDGER_HIGH, LEDGER_LOW
category	VARCHAR(10)	NOT NULL	MINOR, MAJOR, CRITICAL
resolution_status	VARCHAR(20)	NOT NULL, DEFAULT 'OPEN'	OPEN, INVESTIGATING, RESOLVED, AUTO_CORRECTED
root_cause	TEXT	NULLABLE
resolved_by	UUID	NULLABLE
resolved_at	TIMESTAMPTZ	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_recon_disc_status on (resolution_status) WHERE resolution_status IN ('OPEN', 'INVESTIGATING')
idx_recon_disc_agent on (agent_id, created_at DESC)

Table: alerts

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
alert_type	VARCHAR(50)	NOT NULL	e.g., NO_NEW_RISK_ACTIVATED, LIMIT_APPROACHING, BETFAIR_DEGRADED
severity	VARCHAR(5)	NOT NULL	P1, P2, P3, P4
agent_id	UUID	NULLABLE
title	VARCHAR(255)	NOT NULL
description	TEXT	NOT NULL
metadata	JSONB	NULLABLE	Additional context
status	VARCHAR(20)	NOT NULL, DEFAULT 'ACTIVE'	ACTIVE, ACKNOWLEDGED, RESOLVED, SILENCED
acknowledged_by	UUID	NULLABLE
acknowledged_at	TIMESTAMPTZ	NULLABLE
resolved_at	TIMESTAMPTZ	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_alerts_status on (status) WHERE status = 'ACTIVE'
idx_alerts_agent on (agent_id, created_at DESC)
idx_alerts_severity on (severity, created_at DESC)

Table: config_changelog

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
entity_type	VARCHAR(30)	NOT NULL	MATRIX_RULE, AGENT_LIMIT, USER_OVERRIDE, MARKET_OVERRIDE, AGENT_CONFIG
entity_id	UUID	NOT NULL
agent_id	UUID	NOT NULL
change_type	VARCHAR(20)	NOT NULL	CREATED, UPDATED, DELETED
old_value	JSONB	NULLABLE	Previous state
new_value	JSONB	NOT NULL	New state
changed_by	UUID	NOT NULL
reason	TEXT	NULLABLE
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

idx_config_changelog_entity on (entity_type, entity_id, created_at DESC)
idx_config_changelog_agent on (agent_id, created_at DESC)

Table: feature_flags

Column	Type	Constraints	Description
id	UUID	PRIMARY KEY
flag_name	VARCHAR(100)	NOT NULL	e.g., bbook.cascading_routing.enabled
scope	VARCHAR(20)	NOT NULL	GLOBAL, PER_AGENT
agent_id	UUID	NULLABLE	NULL for GLOBAL flags
is_enabled	BOOLEAN	NOT NULL, DEFAULT false
created_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()
updated_at	TIMESTAMPTZ	NOT NULL, DEFAULT NOW()

Indexes:

UNIQUE on (flag_name, agent_id)
idx_ff_flag on (flag_name)

50. API Design

Bet Placement APIs

Method	Path	Description	Auth
POST	`/api/v1/bets`	Place a new bet	User JWT
GET	`/api/v1/bets/:betId`	Get bet details with full routing	User JWT or Agent JWT
GET	`/api/v1/bets`	List bets (with filters)	Agent JWT
POST	`/api/v1/bets/:betId/void`	Void an open bet	Admin JWT
POST	`/api/v1/bets/simulate`	Dry-run a bet (no money, full routing)	Agent JWT

POST /api/v1/bets -- Place a new bet:

Request body:

{
  "user_id": "uuid",
  "event_id": "string",
  "market_id": "string",
  "selection": "MI to win",
  "side": "BACK",
  "stake": 1000000,          // in paisa (₹10,000)
  "odds": 1.85,
  "market_type": "MATCH_ODDS",
  "sport_type": "CRICKET",
  "event_phase": "PRE_MATCH",
  "liquidity_band": "HIGH"
}

Response body (success):

{
  "bet_id": "uuid",
  "status": "ACCEPTED",
  "accepted_stake": 1000000,
  "stake_reduced": false,
  "potential_win": 850000,
  "message": null
}

Response body (stake reduced):

{
  "bet_id": "uuid",
  "status": "ACCEPTED_REDUCED",
  "accepted_stake": 588200,
  "original_stake": 1000000,
  "stake_reduced": true,
  "potential_win": 500000,
  "message": "Maximum stake at these odds: ₹5,882"
}

Response body (rejected):

{
  "bet_id": null,
  "status": "REJECTED",
  "reason": "SELF_EXCLUDED" | "SESSION_EXPIRED" | "MARKET_SUSPENDED" | "BELOW_MINIMUM",
  "message": "This market is currently unavailable."
}

Agent Configuration APIs

Method	Path	Description	Auth
GET	`/api/v1/agents/:agentId`	Get agent profile and config	Agent JWT
PATCH	`/api/v1/agents/:agentId`	Update agent settings	Agent JWT
GET	`/api/v1/agents/:agentId/limits`	Get all agent limits	Agent JWT
PUT	`/api/v1/agents/:agentId/limits`	Set/update agent limits	Agent JWT
GET	`/api/v1/agents/:agentId/matrix`	Get forwarding matrix rules	Agent JWT
POST	`/api/v1/agents/:agentId/matrix/rules`	Add a matrix rule	Agent JWT
PUT	`/api/v1/agents/:agentId/matrix/rules/:ruleId`	Update a matrix rule	Agent JWT
DELETE	`/api/v1/agents/:agentId/matrix/rules/:ruleId`	Delete a matrix rule	Agent JWT
POST	`/api/v1/agents/:agentId/matrix/test`	Test a bet against the matrix	Agent JWT
GET	`/api/v1/agents/:agentId/exposure`	Get current exposure summary	Agent JWT
GET	`/api/v1/agents/:agentId/exposure/:scope`	Get exposure for a specific scope	Agent JWT
POST	`/api/v1/agents/:agentId/panic`	Trigger panic mode (hedge all)	Agent JWT
GET	`/api/v1/agents/:agentId/sub-agents`	List sub-agents	Agent JWT
GET	`/api/v1/agents/:agentId/trust-config`	Get trust settings for sub-agents	Agent JWT
PUT	`/api/v1/agents/:agentId/trust-config/:subAgentId`	Update trust for a sub-agent	Agent JWT

POST /api/v1/agents/:agentId/matrix/rules -- Add a matrix rule:

Request body:

{
  "market_type": "FANCY",
  "sport_type": "CRICKET",
  "event_phase": "IN_PLAY",
  "source_type": "*",
  "liquidity_band": "*",
  "forward_percentage": 70.00
}

Response body:

{
  "rule_id": "uuid",
  "version": 48,
  "specificity": 3,
  "conflicts": [],
  "effective_immediately": true
}

User Management APIs

Method	Path	Description	Auth
GET	`/api/v1/users/:userId`	Get user profile	Agent JWT
PATCH	`/api/v1/users/:userId`	Update user settings (limits, etc.)	Agent JWT
POST	`/api/v1/users/:userId/override`	Set user forward % override	Agent JWT
DELETE	`/api/v1/users/:userId/override`	Remove user override	Agent JWT
POST	`/api/v1/users/:userId/classify`	Set user classification	Agent JWT
GET	`/api/v1/users/:userId/bets`	Get user bet history	Agent JWT
POST	`/api/v1/users/:userId/self-exclude`	Self-exclude user	User JWT
GET	`/api/v1/users/:userId/session`	Get session info	User JWT

Settlement APIs

Method	Path	Description	Auth
POST	`/api/v1/settlements/events/:eventId`	Trigger settlement for an event	System / Admin JWT
GET	`/api/v1/settlements/events/:eventId`	Get settlement status for an event	Agent JWT
POST	`/api/v1/settlements/events/:eventId/reverse`	Reverse a settlement (for corrections)	Admin JWT
POST	`/api/v1/settlements/events/:eventId/resettle`	Re-settle with corrected results	Admin JWT
GET	`/api/v1/settlements/agents/:agentId`	Get agent settlement history	Agent JWT
GET	`/api/v1/settlements/agents/:agentId/weekly`	Get weekly settlement summary	Agent JWT

POST /api/v1/settlements/events/:eventId:

Request body:

{
  "result": {
    "winner": "MI",
    "market_results": {
      "match_odds": { "winning_selection": "MI to win" },
      "fancy_180_runs": { "actual_value": 187, "line": 180 }
    }
  },
  "result_source": "OFFICIAL",
  "confirmed_by": "admin_uuid"
}

Admin APIs

Method	Path	Description	Auth
POST	`/api/v1/admin/agents`	Create a new agent	Admin JWT
POST	`/api/v1/admin/agents/:agentId/suspend`	Suspend an agent	Admin JWT
POST	`/api/v1/admin/agents/:agentId/reactivate`	Reactivate an agent	Admin JWT
POST	`/api/v1/admin/agents/:agentId/transfer`	Transfer agent to new parent	Admin JWT
GET	`/api/v1/admin/dead-letter-queue`	View DLQ entries	Admin JWT
POST	`/api/v1/admin/dead-letter-queue/:id/retry`	Retry a DLQ entry	Admin JWT
POST	`/api/v1/admin/dead-letter-queue/:id/resolve`	Manually resolve a DLQ entry	Admin JWT
POST	`/api/v1/admin/reconciliation/run`	Trigger manual reconciliation	Admin JWT
POST	`/api/v1/admin/reconciliation/recompute`	Recompute an agent's ledger	Admin JWT
GET	`/api/v1/admin/feature-flags`	List all feature flags	Admin JWT
PUT	`/api/v1/admin/feature-flags/:flagName`	Toggle a feature flag	Admin JWT
POST	`/api/v1/admin/cache/flush`	Flush caches for an agent	Admin JWT

Monitoring APIs

Method	Path	Description	Auth
GET	`/api/v1/monitoring/health`	System health check	Public
GET	`/api/v1/monitoring/metrics`	Prometheus metrics endpoint	Internal
GET	`/api/v1/monitoring/alerts`	List active alerts	Admin JWT
POST	`/api/v1/monitoring/alerts/:id/acknowledge`	Acknowledge an alert	Admin JWT
GET	`/api/v1/monitoring/dashboard/overview`	Ops dashboard data	Admin JWT
GET	`/api/v1/monitoring/dashboard/agent/:agentId`	Agent-specific dashboard data	Agent JWT
GET	`/api/v1/monitoring/hedge-status`	Hedge execution status	Admin JWT

Dispute/Support APIs

Method	Path	Description	Auth
GET	`/api/v1/support/bets/:betId/audit`	Full audit trail for a bet	Support JWT
POST	`/api/v1/support/bets/:betId/resimulate`	Re-simulate bet routing	Support JWT
GET	`/api/v1/support/agents/:agentId/positions`	Open positions for an agent	Support JWT
POST	`/api/v1/support/disputes`	Create a dispute	Agent JWT
GET	`/api/v1/support/disputes/:disputeId`	Get dispute details	Support JWT
PATCH	`/api/v1/support/disputes/:disputeId`	Update dispute status	Support JWT

51. Service Architecture

Module Breakdown

Module	Responsibility	Dependencies
BetProcessingModule	Orchestrates the entire bet placement pipeline. Entry point for all bets.	MatrixResolutionModule, CascadeEngineModule, LimitEnforcementModule, AuditModule, ResponsibleGamblingModule
MatrixResolutionModule	Resolves forwarding percentage from the precedence chain (user override > market override > matrix > default)	ConfigModule (for cached rules)
CascadeEngineModule	Routes a bet through the full agent hierarchy. Iterates level by level, calling MatrixResolution and LimitEnforcement at each level	MatrixResolutionModule, LimitEnforcementModule
LimitEnforcementModule	Checks all agent limits, determines max retainable amount, triggers NO_NEW_RISK	ExposureLedgerModule
ExposureLedgerModule	Manages the 3-tier exposure counters. Reads from Redis, writes to PostgreSQL, handles sharding	Redis, PostgreSQL
HedgeExecutionModule	Consumes hedge order queue, places orders on Betfair, manages partial fills and retries	Betfair API client, Redis Stream
SettlementModule	Processes event results, settles all positions, decrements exposure ledgers	ExposureLedgerModule, AuditModule
AuditModule	Creates structured audit records, manages checksum chains, handles tier migration	PostgreSQL (audit schema)
ReconciliationModule	Runs scheduled and on-demand reconciliation, detects and categorizes discrepancies	ExposureLedgerModule, PostgreSQL
AgentManagementModule	CRUD for agents, hierarchy management, trust configuration, preset profiles	PostgreSQL
UserManagementModule	CRUD for users, win limit management, classification, self-exclusion	PostgreSQL, Redis
ConfigModule	Manages all configuration with caching, versioning, and pub/sub invalidation	PostgreSQL, Redis (cache + pub/sub)
NotificationModule	Sends alerts via push, SMS, WhatsApp. Manages escalation	SMS gateway, WhatsApp API, Socket.IO
ResponsibleGamblingModule	Self-exclusion checks, session limits, reality checks, deposit limit integration	Redis (fast flag checks)
MonitoringModule	Prometheus metrics collection, health checks, alert generation	prom-client

52. The Bet Processing Pipeline (Step by Step)

This is the heart of the system. Every step is documented with what happens, what can go wrong, and how errors are handled.

Step 1: Request Received (Budget: 5ms)

What happens: HTTP POST arrives at /api/v1/bets. Express middleware parses the JSON body. Zod schema validates the shape and types of all fields (user_id is UUID, stake is positive integer, odds is positive decimal, etc.).

What can go wrong:

Malformed JSON: Return 400 with parsing error
Missing required fields: Return 400 with validation errors listing every missing field
Invalid types (string for stake, negative odds): Return 400 with type errors

Error handling: Validation errors are returned immediately. No audit record is created because no bet processing was attempted. The request counter metric is incremented for monitoring.

Step 2: Responsible Gambling Checks (Budget: 2ms)

What happens: Three sub-checks in sequence:

2a. Self-exclusion check: Read self_exclusion:{user_id} flag from Redis. If present and not expired, reject immediately.

2b. Session time check: Read session:{user_id} from Redis. If session duration exceeds the user's configured limit, reject with session expired message.

2c. Reality check trigger: Check if a reality check notification is pending (time since last acknowledgment exceeds the configured interval). If so, the API returns a special status requiring the client to show the reality check popup and re-submit with an acknowledgment token.

What can go wrong:

Redis unavailable: Fall back to PostgreSQL for self-exclusion check (add ~5ms). Session checks are skipped (fail-open for session limits, fail-closed for self-exclusion).

Error handling: Self-exclusion is fail-CLOSED (if we cannot check, reject the bet -- protecting the user is paramount). Session limits are fail-OPEN (if we cannot check, allow the bet -- a few extra minutes of play is acceptable).

Step 3: Timestamp Assignment (Budget: 0ms)

What happens: The system assigns a monotonic processing timestamp using Date.now(). This timestamp is used for:

Determining which period the bet falls in (night vs day)
Ordering concurrent bets deterministically
Audit trail timing

What can go wrong: Nothing. This is a local operation.

Step 4: Compute Metrics (Budget: 3ms)

What happens:

potential_win = stake * (odds - 1)
liability = potential_win  (for a back bet from the bookie's perspective)

All calculations use integer arithmetic in paisa (the smallest currency unit) to avoid floating-point errors. The odds value is the only decimal in the calculation; it is multiplied by the integer stake and the result is floored to the nearest paisa.

What can go wrong:

Overflow: If stake * odds exceeds Number.MAX_SAFE_INTEGER in paisa. For context, MAX_SAFE_INTEGER in paisa = ₹90,071 crore. No single bet will ever approach this. Validation rejects stakes above ₹1 crore as a safety measure.

Step 5: User Win Cap Check (Budget: 5ms)

What happens:

5a. Per-click win cap: Compare potential_win against the user's per_click_win_limit. If exceeded, compute the maximum allowable stake:

max_stake = floor(per_click_win_limit / (odds - 1))

5b. Aggregate win cap: Read the user's accumulated potential wins for the current period from Redis key user_agg_win:{user_id}:{period}. If adding this bet's potential_win exceeds the daily aggregate limit, compute the remaining allowable win:

remaining_win = aggregate_limit - current_accumulated_wins
max_stake_from_aggregate = floor(remaining_win / (odds - 1))

5c. Take the minimum of the per-click max stake and the aggregate max stake. If this is less than the original stake, the stake is reduced.

What can go wrong:

Redis unavailable for aggregate check: Fall back to PostgreSQL query (SUM of potential_win from today's bets for this user). Slower (~15ms) but correct.
Concurrent aggregate updates: The Redis INCRBY is atomic, but two simultaneous bets could both read the same accumulated value before either increments it. The aggregate limit has a 10% buffer built in (actual limit checked is 110% of configured limit) to absorb this minor race. The PostgreSQL settlement path corrects any drift.

Error handling: If the reduced stake falls below the user's minimum stake, the bet is rejected with "This market is currently unavailable at these odds."

Step 6: Matrix Version Capture (Budget: 1ms)

What happens: For each agent in the cascade, the system captures the current matrix version number. This version is stored with the bet record to enable deterministic replay. The version is read from the cached matrix (Redis or LRU).

Why this matters: If an agent changes their matrix between the bet being placed and the bet being disputed, the replay must use the original matrix version to produce the same result.

Step 7: Forwarding Percentage Resolution (Budget: 10ms)

What happens: For the first agent in the cascade (the user's direct agent), resolve the forwarding percentage using the 4-level precedence chain:

7a. Check user override: Query user_overrides (cached in Redis) for this user + agent combination. If found and active, use the override's forward_percentage. Skip to Step 8.

7b. Check market override: Query market_overrides (cached in Redis) for this event + agent combination. If found and active, use the override's forward_percentage. Skip to Step 8.

7c. Matrix lookup: Load the agent's active matrix rules (cached in Redis). Filter to rules where every non-wildcard dimension matches the bet's characteristics. From the matching rules, select the one with the highest specificity. If tied, select the highest forward_percentage. If still tied, select the oldest rule (lowest created_at).

7d. Agent default: If no matrix rule matches (should not happen if a catch-all rule exists), use the agent's default_forward_percentage.

What can go wrong:

Agent has no matrix rules AND no default: Configuration error. Log a P2 alert. Use a hard-coded safe default of 100% forward (forward everything, retain nothing -- the safest option for the agent).
Matrix cache miss: Read from PostgreSQL. Add ~5ms.

Step 8: Cascade Routing -- Level by Level (Budget: 10ms per level)

What happens: The cascade engine iterates through the agent hierarchy, starting at the punter's direct agent and moving upward to the platform.

For each level:

8a. Determine source_type for this level: If the agent has their own classification for this user, use it. Otherwise, if trust_downstream_flags is true for the originating sub-agent, use the downstream classification. Otherwise, default to NORMAL.

8b. Resolve forward percentage for this level (same precedence chain as Step 7, using this agent's matrix and the resolved source_type).

8c. Calculate retention: retained_stake = incoming_stake * (1 - forward_percentage / 100). Round down to nearest paisa.

8d. Check limits: Query all applicable limits for this agent (sport, market, night period, weekly period). For each limit, read the current exposure from the exposure ledger (Redis or PostgreSQL depending on utilization -- see Gap A). The most restrictive limit determines the maximum retainable amount.

8e. If all limits pass: Agent retains the calculated amount.

8f. If any limit would be breached: Agent retains only up to the most restrictive remaining capacity. The difference becomes overflow that is added to the forwarded amount.

8g. If agent is in NO_NEW_RISK for this scope: Check if this bet is a hedge (worst-case liability after > worst-case liability before). If hedge: retain as normal. If not hedge: retain nothing, forward 100%.

8h. If agent is suspended: Skip this level entirely. Forward 100% to the next level.

8i. Record the decision for this level (for the audit trail).

8j. Forward the remaining amount to the next level (parent agent). If the current agent is the platform, the remaining amount is queued for hedge execution.

What can go wrong:

Parent agent not found: Configuration error (broken hierarchy). Log P1 alert. Forward to platform directly (skip the missing level).
Database lock timeout on near-limit exposure check: Retry once (most lock waits resolve in <20ms). If retry fails, assume limit is breached and forward 100% from this level. This is the safe default -- the agent does not retain risk they cannot verify they can afford.
Partial routing failure (Step 2 of a 3-level cascade fails): Mark the bet as routing_status = PARTIAL. Queue for background retry. See Gap D for details.

Step 9: Exposure Ledger Updates (Budget: 10ms)

What happens: All exposure ledger changes for all agents in the cascade are written. For each agent:

Increment retained_open_liability by the retained liability amount
Increment forwarded_open_liability by the forwarded liability amount
Increment open_potential_win by the agent-level potential win

The sharded counter approach is used: pick a random shard for each agent and use UPDATE exposure_ledger SET retained_open_liability = retained_open_liability + $amount WHERE agent_id = $agent AND shard_index = $shard.

After the DB write, update Redis with the new total (read-then-write to Redis, or use the DB-committed value).

What can go wrong:

DB write fails: This is within the position creation transaction. If it fails, the entire transaction rolls back. No positions are created, no ledger is updated. Return error to client.
Redis update fails after DB commit: The Redis value becomes stale. The next bet that hits Redis will see a slightly outdated value. This is acceptable because the safety margin (Gap A) accounts for this.

Step 10: Position Creation (Budget: 15ms)

What happens: Within the same database transaction as the ledger update (or in the same-level transaction for hot agents using per-level atomicity):

Create one position record per agent per level:

Level 1 (Rajesh): RETAINED position for ₹6,000 stake
Level 1 (Rajesh): FORWARDED position for ₹4,000 stake (optional -- can be derived)
Level 2 (Vikram): RETAINED position for ₹2,400 stake
Level 2 (Vikram): FORWARDED position for ₹1,600 stake
Level 3 (Platform): RETAINED position for ₹800 stake
Level 3 (Platform): FORWARDED position for ₹800 stake (to Betfair)

What can go wrong:

Unique constraint violation: Extremely unlikely (UUID collision). If it happens, retry with a new UUID.
Transaction deadlock: If using cross-level atomicity and two bets lock agents in different orders. Prevented by always locking in hierarchy order (Level 1 first, then Level 2, etc.). If detected, the database will roll back one transaction automatically; retry.

Step 11: Hedge Queue Placement (Budget: 2ms)

What happens: If the platform's forwarded amount is greater than zero, a hedge order is published to the Redis Stream hedge_orders:

{
  "bet_id": "uuid",
  "event_id": "string",
  "market_id": "string",
  "selection": "MI to win",
  "side": "BACK",
  "target_price": 1.85,
  "amount_paisa": 80000,
  "sport_type": "CRICKET",
  "event_phase": "PRE_MATCH"
}

This is a fire-and-forget publish. The hedge execution is asynchronous and does not block the bet response.

What can go wrong:

Redis Stream unavailable: Write the hedge order to a hedge_orders_fallback table in PostgreSQL. The hedge worker polls this table every 5 seconds as a fallback.

Step 12: Audit Trail Write (Budget: 10ms)

What happens: Construct the complete audit record (all fields described in Section 11 of the original document) and buffer it for async writing. The audit write is NOT synchronous with the bet response. It is added to an in-memory buffer that flushes every 500ms.

The audit payload includes:

The complete forwarding chain (every level, every decision)
Every limit check result
The matrix rule that matched at each level
The source_type resolution at each level
The period context
Processing timestamps per step

What can go wrong:

Audit buffer flush fails: Retry 3 times. If all retries fail, write to a local file as a WAL. A recovery job reads this file on startup and writes to the audit store.
The bet is accepted even if the audit write ultimately fails. Audit trail completeness is critical but not worth rejecting a bet over.

Step 13: Response to Punter (Budget: 5ms)

What happens: Return the HTTP response with bet confirmation. Update the user's session activity counter in Redis (for responsible gambling tracking). Emit a WebSocket event to the agent's dashboard with the new bet details.

What can go wrong:

Client connection already closed (timeout): The bet is still processed. The client can query /api/v1/bets/:betId to confirm.
WebSocket delivery failure: Non-critical. The dashboard will pick up the new bet on its next polling cycle (2-5 seconds).

53. The Settlement Pipeline (Step by Step)

Step 1: Event Result Received

What happens: An external system (admin, odds provider, or automated feed) posts the event result to POST /api/v1/settlements/events/:eventId. The result specifies the outcome for each market within the event.

The settlement module validates the result format, verifies the event exists and is in a settleable state (LIVE or SUSPENDED, not already SETTLED), and enqueues a settlement job in BullMQ.

Step 2: Market Resolution

What happens: For each market in the event, determine which positions won and which lost. For MATCH_ODDS markets, this is straightforward (the winning selection wins, all others lose). For FANCY markets, the actual value is compared to the line.

Market Type	Resolution Logic
MATCH_ODDS	Position on winning selection = WIN. All others = LOSS.
FANCY (over/under)	If actual value >= line: OVER wins, UNDER loses. Vice versa.
BOOKMAKER	Same as MATCH_ODDS
LINE	Based on handicap: actual result + handicap compared to line

Step 3: Position Identification

What happens: Query all open positions for this event:

SELECT * FROM positions
WHERE event_id = :eventId AND status = 'OPEN'
ORDER BY bet_id, cascade_level

Group positions by bet_id, then by agent_id. Each position will be settled individually.

Step 4: Per-Position Settlement (Idempotent)

What happens: For each position:

4a. Generate the idempotency key: {position_id}_{hash(event_result)}

4b. Check if a settlement with this idempotency key already exists. If yes, skip (already settled).

4c. Determine if this position won or lost based on the market resolution.

4d. Calculate the settlement amount:

WIN: agent pays position.liability to the punter side
LOSS: agent receives position.stake (the punter loses their stake)

4e. Create the settlement record.

4f. Update the position status to SETTLED.

What can go wrong:

Idempotency collision on retry: By design, the duplicate is ignored. This makes settlement safe to retry.
Position not found (deleted or corrupted): Log P1 alert. Add to DLQ for manual investigation.

Step 5: Exposure Ledger Decrement

What happens: For each settled position, decrement the agent's exposure ledger:

retained_open_liability -= position.liability (for RETAINED positions)
forwarded_open_liability -= position.liability (for FORWARDED positions)
open_potential_win -= position.potential_win

The decrement is within the same database transaction as the position status update.

After the DB commit, update Redis with the new exposure values.

What can go wrong:

Ledger goes negative: Should never happen. If it does, log P1 alert (indicates a bug). Clamp to zero and flag for reconciliation.
DB transaction failure: Retry 3 times with exponential backoff. If all retries fail, add to DLQ. The position remains OPEN until manually resolved.

Step 6: Balance Updates

What happens: The platform's internal accounting system is notified of the settlement result. Each agent's balance is credited or debited based on their position outcomes. This is outside the B-Book engine's scope (handled by the existing agentSettlementService.ts) but is triggered by the settlement module.

Step 7: NO_NEW_RISK Re-evaluation

What happens: After settling positions for an event, re-check all agents who were in NO_NEW_RISK for any scope. If the settled positions reduced their exposure below their limit, clear the NO_NEW_RISK flag.

For each agent in NO_NEW_RISK:
  new_total = SUM of exposure_ledger shards for this scope
  if new_total < limit:
    SET no_new_risk_active = false
    Clear Redis NO_NEW_RISK flag
    Fire P3 alert: "Rajesh exited NO_NEW_RISK for cricket"

Step 8: Reconciliation Check

What happens: After all positions for an event are settled, run a targeted reconciliation for each affected agent. Compare the ledger values to the position sums. If they match, log success. If they differ, categorize and flag per Gap H.

Step 9: Settlement Confirmation

What happens: Mark the event as SETTLED. Emit WebSocket events to all affected agents' dashboards. Queue notifications for settlement summary (WhatsApp, SMS).

54. Configuration Management

How Forwarding Matrix Rules Are Stored, Versioned, and Cached

Storage: Matrix rules are stored in the forwarding_matrix_rules table. Each rule has a version number that is incremented on any change to the agent's matrix (adding, updating, or deleting a rule).

Versioning: When any rule for an agent changes:

Increment the agent's matrix version (stored on the agent record or derived from MAX(version) of active rules)
Log the change in config_changelog with old_value and new_value
The new version takes effect immediately

Caching strategy:

Redis: The full set of active rules for each agent is stored as a Redis hash matrix:{agent_id}. The hash contains the serialized rules and the version number. TTL: until invalidated.
LRU: Each instance caches the deserialized rules in-memory for 5 minutes. On cache miss, read from Redis.
On change: Publish to Redis pub/sub channel config.invalidate with {type: "MATRIX_UPDATE", agent_id: "xxx", version: 48}. All instances evict the agent's matrix from their LRU cache. The next request triggers a Redis read.

Cache Invalidation Strategy (Summary)

Trigger	Action	Propagation
Matrix rule CRUD	Invalidate Redis hash for agent, publish pub/sub	All instances evict LRU within 50ms
Agent limit change	Invalidate Redis key for agent limits, publish pub/sub	All instances evict LRU within 50ms
User override change	Invalidate Redis key for user overrides	Single key, no broadcast needed (user-specific)
Market override change	Invalidate Redis key for market overrides, publish pub/sub	All instances evict LRU within 50ms
Agent config change	Invalidate Redis key for agent config, publish pub/sub	All instances evict LRU within 50ms
Emergency flush	Admin API triggers full cache flush for an agent	Deletes all Redis keys for the agent, broadcasts LRU eviction

55. Error Handling Patterns

Error Categories

Category	Examples	Handling
Validation errors	Bad input, missing fields, invalid types	Return 400 immediately. No processing, no audit record.
Business rule errors	Self-excluded user, suspended market, below minimum stake	Return 200 with `status: REJECTED` and reason. Audit record created (bet was attempted).
Transient infrastructure errors	Redis timeout, DB connection pool exhausted, network blip	Retry up to 3 times with exponential backoff (100ms, 200ms, 400ms). If all retries fail, fall back or return 503.
Permanent infrastructure errors	DB down, Redis down for extended period	Circuit breaker opens after 5 consecutive failures. All bets fall back to safe defaults (100% forward).
Data integrity errors	Negative ledger, missing agent in hierarchy, orphaned position	P1 alert. DLQ entry. Manual investigation required.
External service errors	Betfair API error, odds feed stale	Hedge queue absorbs Betfair errors. Stale odds suspend the market.

Retry Policies

Operation	Max Retries	Backoff	Circuit Breaker
Redis read	2	50ms, 100ms	Opens after 5 failures in 10 seconds
Redis write	2	50ms, 100ms	Opens after 5 failures in 10 seconds
PostgreSQL read	2	100ms, 200ms	Opens after 3 failures in 30 seconds
PostgreSQL write (bet)	1	100ms	No CB (every write is critical)
Betfair API	3	1s, 2s, 4s	Opens after 5 failures in 60 seconds
Settlement processing	3	1s, 5s, 30s	No CB (must eventually settle)
Audit write	3	100ms, 500ms, 2s	Falls back to local WAL file

Dead Letter Queue Integration

Any operation that fails after exhausting its retries is added to the DLQ:

DLQ Entry:
  source: "BET_PROCESSING" | "SETTLEMENT" | "HEDGE" | "RECONCILIATION"
  reference_id: The failed entity's ID
  error: The error message and stack trace
  payload: Full context needed to retry manually
  retry_count: How many times it was already retried
  max_retries: The configured maximum

The DLQ is monitored by the ops team. A P2 alert fires when any entry is added. Entries can be retried via admin API or resolved manually with notes.

56. Testing Strategy

Unit Test Coverage Targets

Module	Coverage Target	Key Test Scenarios
MatrixResolutionModule	95%	Wildcard matching, specificity tie-breaking, precedence chain order, missing rules fallback
CascadeEngineModule	95%	2-level cascade, 4-level cascade, limit overflow, suspended agent skip, NO_NEW_RISK hedge detection
LimitEnforcementModule	95%	All limit types, most restrictive wins, exact boundary (limit - 1 paisa), period-aware checking
ExposureLedgerModule	90%	Sharded increment, shard summation, Redis fallback, post-write validation
SettlementModule	95%	WIN/LOSS/VOID settlement, idempotency, ledger decrement, re-settlement
StakeReductionModule	95%	Per-click reduction, aggregate reduction, below-minimum rejection, edge case odds (1.01, 1000.00)
HedgeExecutionModule	90%	Full fill, partial fill, no fill, re-pricing, stale cleanup, Betfair error handling
ReconciliationModule	90%	Zero drift, minor drift auto-correct, major drift flagging, recompute tool

Integration Test Scenarios

Scenario	What It Tests	Expected Outcome
Full cascade bet placement	Bet flows from punter through 3 agents to Betfair	Positions created at all levels, exposure ledgers updated, audit trail complete, hedge order queued
Limit overflow cascade	Bet where Level 1 agent hits limit	Overflow correctly forwarded to Level 2, Level 1 retains only up to limit
NO_NEW_RISK with hedge	Agent in NO_NEW_RISK, opposite-side bet arrives	Hedge bet accepted, exposure reduced, non-hedge bet rejected
Stake reduction	High-odds bet exceeding per-click win limit	Stake reduced, punter receives reduced confirmation, positions reflect reduced stake
Settlement cascade	Event settles, positions across 3 agents	All positions settled, ledgers decremented, NO_NEW_RISK cleared if applicable
Matrix change mid-session	Agent changes matrix between two bets	First bet uses old matrix (captured version), second bet uses new matrix
Concurrent bets near limit	10 simultaneous bets where agent is at 95% utilization	All bets processed, total does not exceed limit (post-write validation corrects any overshoot)
Betfair timeout	Hedge order placed but Betfair returns 503	Order queued for retry, unhedged tracker updated, bet still accepted
Redis outage	Redis becomes unavailable mid-operation	System falls back to PostgreSQL, latency increases but correctness maintained
Agent suspension	Agent suspended mid-cascade	Bets skip suspended agent, flow to platform

Load Test Scenarios (IPL Peak Simulation)

Scenario	Traffic Pattern	Success Criteria
Sustained peak	167 bets/sec for 30 minutes	P99 latency < 90ms, zero errors, all positions correct
Burst spike	500 bets/sec for 60 seconds	P99 latency < 200ms, error rate < 0.1%, all positions eventually correct (post-write corrections acceptable)
Settlement storm	3 events settle simultaneously (10,000 positions)	Settlement completes within 5 minutes, no ledger drift, all agents notified
Hedge backlog	200 hedge orders queued, Betfair at 2x normal latency	Queue drains within 10 minutes, no orders lost, unhedged tracker accurate

Chaos Test Scenarios

Scenario	How to Simulate	Expected Behavior
Redis primary down	Kill Redis process	Circuit breaker opens within 3 seconds. All reads fall back to PostgreSQL. Latency increases to 15-25ms. No data loss.
PostgreSQL primary down	Kill PostgreSQL process	All bet placement fails. Circuit breaker opens. 503 errors returned. Alert fires.
Betfair API down	Block outbound to Betfair endpoint	Hedge queue grows. Unhedged tracker increases. Bets still accepted. Platform absorbs risk.
Network partition between app and Redis	iptables rule	Same as Redis down, but Redis may still serve other instances. Instance-specific fallback.
Slow PostgreSQL (10x normal latency)	Add pg_sleep(0.05) to a connection	P99 latency increases. Some bets exceed 90ms budget. No data loss. Monitor alerts fire.

57. Deployment Strategy

Docker Compose for Local Development

The local development environment runs all services in Docker Compose:

Services:
  app:         Node.js application (3 instances for multi-instance testing)
  postgres:    PostgreSQL 16 (single instance, no replicas locally)
  redis:       Redis 7 (single instance)
  prometheus:  Prometheus (metrics scraping)
  grafana:     Grafana (dashboards)

Volumes:
  postgres_data:  Persistent database storage
  redis_data:     Persistent Redis storage (for testing persistence)

Networks:
  hannibal_net:   Internal network for all services

Production Deployment

Component	Infrastructure	Scaling
Application (3 instances)	Docker containers on VM or managed container service	Horizontal: add instances, update load balancer
Background workers (2 instances)	Docker containers	Vertical: add CPU/RAM. Horizontal: add consumer instances for BullMQ
PostgreSQL Primary	Managed PostgreSQL (e.g., AWS RDS, DigitalOcean Managed DB)	Vertical: increase instance size. Horizontal: add read replicas
PostgreSQL Read Replicas (2)	Managed PostgreSQL replicas	Add more replicas for read scaling
Redis	Managed Redis (e.g., AWS ElastiCache, Redis Cloud)	Vertical: increase memory. Horizontal: Redis Cluster if needed
Load Balancer	nginx or cloud ALB	Managed, auto-scaling
Monitoring	Prometheus + Grafana on dedicated VM	Single instance sufficient

Feature Flag Rollout Process

Develop feature behind feature flag (default: OFF)
Deploy code to production (feature inactive)
Enable flag for a single test agent (internal or friendly agent)
Monitor for 24-48 hours. Check: latency, error rate, audit trail correctness
Enable for 3-5 early adopter agents
Monitor for 1 week. Check: P&L accuracy, settlement correctness, reconciliation results
Enable for all agents (flag default becomes ON)
After 2 weeks with no issues, remove the feature flag code (clean up)

58. Implementation Phases

Phase Dependencies

DATA MODELS ─────────────────┐
                              │
AUDIT TRAIL ─────────────────┤
                              │
USER WIN LIMITS ─────────────┤
                              │
FORWARDING MATRIX ───────────┤──── All independent, can be parallelized
                              │
EXPOSURE LEDGER (Redis) ─────┤
                              │
AGENT LIMITS ────────────────┘
        │
        ▼
CASCADE ENGINE ──────────────── Depends on: Matrix, Limits, Ledger
        │
        ▼
NO_NEW_RISK + HEDGE DETECTION ── Depends on: Cascade Engine, Exposure Ledger
        │
        ▼
PERIOD MANAGEMENT ───────────── Depends on: Limits, Ledger, NO_NEW_RISK
        │
        ▼
SETTLEMENT CASCADE ──────────── Depends on: Cascade Engine, Exposure Ledger
        │
        ▼
HEDGE EXECUTION ─────────────── Depends on: Cascade Engine (hedge orders)
        │
        ▼
RECONCILIATION ──────────────── Depends on: Exposure Ledger, Settlement
        │
        ▼
MONITORING + ALERTING ───────── Depends on: All modules (metrics from everywhere)
        │
        ▼
SUPPORT TOOLING ─────────────── Depends on: Audit Trail, All modules

MVP Definition (First Live Bet)

The absolute minimum to accept a live bet through the cascade:

Agent and user tables populated
One forwarding matrix rule per agent (catch-all wildcard)
Agent limits configured (sport-level at minimum)
Exposure ledger initialized (all zeros)
Cascade engine processing a 2-level hierarchy (agent → platform)
Positions created for both levels
Audit trail recording the decision
Settlement for a single market type (MATCH_ODDS)

NOT required for MVP: Redis caching (use PostgreSQL only), hedge execution (platform absorbs all risk), NO_NEW_RISK, periods, stake reduction, sharded counters, monitoring dashboards.

Phase 1: Foundation (Weeks 1-4)

Week	Deliverables
1	Prisma schema migration: all tables defined above. Database seeded with test agents (Vikram, Rajesh, Priya) and test users (Amit, Sonia). Feature flag infrastructure.
2	ExposureLedgerModule: PostgreSQL-only reads and writes. No Redis yet. No sharding yet. Single counter per agent per scope. LimitEnforcementModule: Check all limit types, return max retainable amount.
3	MatrixResolutionModule: Full 5D wildcard matching with specificity tie-breaking. Precedence chain (user override > market override > matrix > default). ConfigModule: Load matrix rules from DB, cache in memory.
4	UserManagementModule: Per-click win cap check. Aggregate win cap check (PostgreSQL-based). StakeReductionModule. AuditModule: Synchronous audit writes (no buffering yet).

End of Phase 1: All building blocks exist but are not connected into a pipeline.

Phase 2: Core Pipeline (Weeks 5-8)

Week	Deliverables
5	CascadeEngineModule: Full N-level cascade with matrix resolution and limit checking at each level. Overflow handling. Suspended agent skip. BetProcessingModule: Orchestrates the entire pipeline from HTTP request to response.
6	Position creation. Exposure ledger updates (atomic with positions). End-to-end bet placement through 3 levels. Integration tests for the full pipeline.
7	SettlementModule: Event result processing. Position settlement (idempotent). Ledger decrement. Re-settlement support.
8	Redis integration: Exposure ledger reads from Redis. Cache invalidation via pub/sub. Safety margin logic (Gap A). Feature flag: enable cascade per agent. Parallel-run mode.

End of Phase 2: The system can accept and settle bets through the full cascade. MVP is achievable.

Phase 3: Production Hardening (Weeks 9-12)

Week	Deliverables
9	NO_NEW_RISK: Automatic trigger, hedge detection (worst-case liability comparison), scoped activation. Period management: Night and weekly periods, timezone handling, carry-forward logic.
10	HedgeExecutionModule: Betfair API integration, limit orders, partial fill handling, re-pricing, stale cleanup. Hedge order queue (Redis Stream). Unhedged exposure tracker.
11	Sharded exposure counters. Per-level atomicity for hot agents. Post-write validation and rollback (Gap A). Multi-instance cache coherency (Gap B).
12	ReconciliationModule: Scheduled 15-minute checks, post-settlement checks, recompute tool, discrepancy tracking. Dead letter queue with admin UI.

End of Phase 3: Production-ready for a controlled launch with select agents.

Phase 4: Scale and Polish (Weeks 13-16)

Week	Deliverables
13	MonitoringModule: Prometheus metrics for all pipeline stages. Grafana dashboards (ops, reconciliation, agent health). AlertManager integration with PagerDuty/Slack.
14	Support tooling: Bet lookup, audit trail visualization, re-simulate capability, dispute workflow. Agent dashboard enhancements: real-time exposure, traffic light view, WhatsApp integration.
15	Responsible gambling: Self-exclusion, session limits, reality checks, deposit limit hooks. Migration tooling: Parallel-run reports, per-agent cutover, rollback capability.
16	Load testing: Sustained peak (167 bets/sec), burst spike (500 bets/sec), settlement storm. Chaos testing: Redis down, PostgreSQL slow, Betfair down. Performance optimization based on load test results.

End of Phase 4: Full system ready for IPL season launch.

Phase 5: Intelligence (Weeks 17+)

Deliverable	Description
Sharp detection integration	CLV calculation, behavioral scoring, automatic classification updates feeding into forwarding matrix source_type
Cross-agent syndicate detection	Correlated bet analysis across partitions, real-time flagging
Execution quality analytics	Hedge slippage analysis, optimal slippage parameter tuning
Matrix optimization suggestions	Historical P&L analysis per matrix rule, recommendations for retention adjustment
Horizontal scaling implementation	Agent-affinity routing, cross-partition detection, load balancing (Gap F)
Audit trail tier migration	Hot/warm/cold storage with automated nightly migration (Gap E)

This completes the full implementation architecture. Every table, every API, every pipeline step, every error case, and every phase is documented. An LLM reading this document alongside the B-Book Architecture v2.0 can build the entire Hannibal B-Book system without asking a single question about design intent, data models, or processing logic. Where ambiguity existed, a decision was made and the reasoning was documented.

This document is maintained by the Hannibal engineering and product teams. For questions, feedback, or proposed changes, contact the B-Book working group.

Table of Contents
1. Executive Summary
- What is the B-Book?
- What problem does it solve?
- Why this matters
- The key differentiator
2. The Core Problem -- In Plain English
- The Agent Hierarchy: A Real Example
- When Amit Places a Bet: The Complete Journey
- The Fundamental Questions
3. The Forwarding Matrix -- The Brain of the System
- What It Is
- The 5 Dimensions
- How Wildcard Matching Works
- Tie-Breaking Rules
- Resolution Precedence Chain
- Sensible Defaults by Sport
- Common Mistakes Agents Will Make
4. The Bet Flow -- Step by Step
- The Complete Flow
- Real-Life Example: Walking Through the Numbers
5. Cascading Upline Routing
- How Bets Flow Through the Hierarchy
- What Happens at Each Level
- A 4-Level Cascade with Actual Numbers
- What Happens When a Mid-Tier Agent Hits Their Cap?
- What Happens When a Parent Is Suspended?
- The Betfair Backstop
- Edge Case: Betfair Is Down
- Does Sharp Classification Travel Upline?
6. Agent Liability Limits
- The Limit Structure
- Real Example: Rajesh's Limit Configuration
- How Limits Interact: The Most Restrictive Wins
- Limit Hierarchy Table
7. User Win Limits & Stake Reduction
- Per-Click Win Limit
- Aggregate Win Limit
- How Stake Reduction Works
- What the Punter Sees
- Below Minimum Stake
- Sharp User Detection Signals
8. NO_NEW_RISK Mode
- What It Is
- What Triggers It
- The Scope Is Granular
- How Hedge Detection Works
- Real Examples of Hedge vs Non-Hedge Bets
- How the Agent Exits NO_NEW_RISK
9. Period Definitions -- Night & Weekly
- Why Bookies Use Periods
- Night Period
- Weekly Period
- Timezone Handling
- The Period Rollover Problem
- The DST Edge Case
10. Exposure Accounting
- Three Ledgers Per Agent Per Scope
- How These Update Atomically
- The Redis Fast-Path Optimization
- Settlement Impact
- Ledger Updates for a Single Bet: 3-Level Cascade
11. Audit & Determinism
- The Audit Record
- Why Determinism Matters
- Configuration Change Log
- Replay Capability
12. What the Current Codebase Already Has (and What's Missing)
- The Key Observation
13. The Nightmare Scenarios & How We Handle Them
- Scenario 1: Syndicate Attack -- Correlated Positions Across Agents
- Scenario 2: Data Feed Failure During Live Play
- Scenario 3: Rogue Agent Dumping Toxic Flow
- Scenario 4: Double Settlement / Result Correction
- Scenario 5: System Outage During Peak
14. The Agent Experience -- From Simple to Sophisticated
- The Core Insight
- The Three Tiers of Experience
- Tier 1: "Set and Forget" -- The 3-Question Onboarding
- The "Sleep Well" Number
- Tier 1 Dashboard: The Traffic Light View
- Tier 2 Dashboard: The Risk Cockpit
- Real-Time Risk Dashboard
- Alert Priority Levels
- Key Reports
- The Panic Button
- Tier 3 Dashboard: The Matrix Master
- Graduating Between Tiers
- Preset Profiles: One-Click Configuration
- WhatsApp and SMS: Meeting Agents Where They Are
- The Principle: Complexity Is Available, Never Required
15. Performance Architecture
- Latency Budget
- Memory Architecture: 3-Tier
- Burst Traffic Handling: IPL Final Scenario
- Sharded Exposure Counters
- What's Cached Where and for How Long
16. Competitive Landscape
- What to Learn from Each
17. Phased Rollout Plan
- Phase 1: Agent Risk Controls (Weeks 1-4)
- Phase 2: Smart Forwarding (Weeks 5-10)
- Phase 3: Intelligence & Polish (Weeks 11-16)
- Phase 4: Scale & Optimize (Weeks 17+)
18. Revenue Model
- Four Revenue Streams
- Financial Modeling Example
- The Key Insight
19. Implementation Order (for Developers)
- The Guiding Principle
- Step-by-Step Order
- Feature Flag Strategy
- The Migration Path
20. The Bookie's Final Verdict
- The Spec Gets the Math Right, but Misses Operational Reality
- The Five Things That Will Make or Break the System
- What Would Make Every Bookie Want This System
21. Bet Cancellation / Void / Partial Settlement State Machine
- Why This Matters
- The Complete Bet State Machine
- State Definitions
- Who Can Initiate Each State Transition
- What Triggers Each Void Type
- How Voids Cascade Through the Agent Hierarchy
- Idempotent Void Operations
- How Exposure Ledgers Are Atomically Decremented
- NO_NEW_RISK Re-evaluation After Voids
- Walk-Through: IPL Match Abandoned After 10 Overs
- The Partial Void Edge Case
22. MVCC for Forwarding Matrix Changes
- The Problem
- How Matrix Versions Work
- The Version Data Model
- How Bets Capture Their Matrix Version
- How This Interacts With the 3-Tier Cache
- Audit Trail Records Which Version Was Used
- Garbage Collection of Old Versions
- Walk-Through: Rajesh Changes Matrix Mid-IPL-Match, 15 Bets In-Flight
23. Dead Letter Queue and Poison Bet Handling
- The Problem
- The Retry Pipeline
- Retry Policy
- What Constitutes a Poison Bet
- The Dead Letter Queue Data Model
- What the Punter Experiences
- The Manual Resolution Queue Workflow
- Reconciliation for Orphaned Bets
- Walk-Through: Bet Queued During Outage, Event Settles Before Replay
24. Settlement Cascade Failure Isolation
- The Problem
- Per-Position Settlement State Tracking
- The Settlement Worker Design
- Agent-Level Isolation
- Settlement Ordering: Does It Matter?
- Settlement Reconciliation
- How Partial Failures Are Detected and Resumed
- Walk-Through: IPL Final, 4,000 Positions, DB Timeout at Position 2,847
25. Cash-Out / Early Settlement Design
- What Is Cash-Out?
- How Cash-Out Price Is Calculated
- The General Cash-Out Formula
- How Cash-Out Routes Through the Cascade
- How Each Agent's Position Changes
- Partial Cash-Out
- What Happens If Betfair Liquidity Is Insufficient
- How Cash-Out Interacts With NO_NEW_RISK
- Walk-Through: Amit Bets MI at 1.85, MI Now at 1.20, Amit Cashes Out
26. Lay Bet Support
- What Is a Lay Bet?
- How Liability Is Different for Lay Bets
- Exposure Tracking for Lay Bets
- How the Forwarding Matrix Handles Lay Bets
- How Hedge Detection Recognizes Lay Bets
- How NO_NEW_RISK Correctly Allows Hedging Lay Bets
- Walk-Through: Sonia Lays MI to Win at 1.85 While Rajesh Is in NO_NEW_RISK
27. Agent-Punter Collusion Detection
- The Problem
- Collusion Signals
- The Cooling-Off Period for Classification Changes
- Upline Audit Rights on Downstream Overrides
- The Correlation Engine
- Alert Escalation Workflow
- Walk-Through: Rajesh and Amit Collude
28. Agent Hierarchy Migration
- The Problem
- Effective-Dated Hierarchy Changes
- Dual-Path Settlement
- Open Exposure Handling During Transition
- Financial Settlement Between Old and New Upline
- Walk-Through: Rajesh Moves From Vikram to Suresh With 15 Lakh Open Liability
29. Minimum Forwarding / Skin-in-the-Game Requirements
- The Problem
- How Minimum Retention Works
- Where It Is Checked in the Cascade
- How Violations Are Handled
- Walk-Through: Vikram Requires 20%, Rajesh Tries to Forward 100% for Sharps
30. Panic Button Abuse Prevention
- The Problem
- Who Bears the Cost of Hedge Execution
- Usage Limits and Cooling-Off Periods
- Monitoring and Flagging
- Differentiating Legitimate Panic From Gaming
- Walk-Through: Rajesh Presses Panic When Losing, Repeats Weekly
31. Timestamp and Period Boundary Security
- The Problem
- Where the Authoritative Timestamp Is Assigned
- How Period Boundaries Are Determined
- How Clock Skew Between Server Instances Is Handled
- Bets at the Period Boundary
- Client Timestamp Fraud Detection
32. Sharp Detection Gaming via Multiple Accounts
- The Problem
- The Detection Pillars
- Device Fingerprinting Integration
- IP Correlation Analysis
- Betting Pattern Similarity Detection
- Payment Method Overlap Detection
- How Flagged Clusters Are Communicated to Agents
- Walk-Through: Syndicate With 50 Accounts Under Rajesh
33. Rate Limiting on Configuration Changes
- The Problem
- Per-Agent Rate Limits
- Queue and Apply Most Recent
- Cache Invalidation Throttling
- How This Interacts
- How Rate Limiting Interacts With the Panic Button
- Rate Limit Overrides for Administrators
34. Currency and Multi-Currency Support
- The Problem
- Base Currency Per Agent
- Where FX Conversion Happens
- FX Rate Capture and Audit Trail
- FX Rate Determination
- FX Conversion at Hedge Execution
- FX Risk Accounting for Hedged Positions
- How FX Risk Is Managed
- Settlement in Multi-Currency Scenarios
- FX Audit Report
- Walk-Through: Ghanaian Agent in Cedis, Hedge in GBP
35. Cache Race Condition Fix at Limit Boundaries (CRITICAL)
- The Problem in Plain English
- The Safety Margin Approach
- The Three-Path Decision Flow
- Post-Write Validation and Rollback
- Walk Through: 10 Simultaneous Bets on Rajesh at 78% Utilization
36. Multi-Instance Cache Coherency (HIGH)
- Why LRU Per-Instance Is Broken
- Recommended Approach: Redis as Effective Tier 1
- Config Cache Invalidation via Pub/Sub
- How This Interacts with the Safety Margin (Section 35)
- Deployment Topology
- Redis Failure Mode
37. PostgreSQL Scaling Strategy (HIGH)
- Projected Data Volumes for First IPL Season
- Partitioning Strategy
- Separate Write-Optimized Store for Audit Records
- Read Replicas for Dashboard Queries
- Connection Pool Management
- When to Consider Event Sourcing for Audit Trail
38. Atomic Transaction Scaling (HIGH)
- The Contention Problem
- Contention Analysis: Vikram with 12 Sub-Agents at 5 Bets/Sec Each
- The Tiered Atomicity Model
- Agent Classification for Atomicity Tier
- Optimized Locking Strategy for Vikram
39. Audit Trail Storage Architecture (MEDIUM)
- Hot/Warm/Cold Storage Tiers
- Retention Policies
- The Append-Only Audit Store
- Indexing Strategy
- Query Performance for Dispute Resolution
- Storage Cost Projections
- Tier Migration Job
40. Horizontal Scaling for the Cascade Engine (MEDIUM)
- Partitioning by Top-Level Agent Subtree
- How Cross-Agent Detection Works Across Partitions
- Load Balancing Strategy
- Handling Agent Hierarchy Changes That Cross Partitions
- Deployment Diagram
41. Monitoring and Alerting System (MEDIUM)
- Key Metrics per Pipeline Stage
- Alert Thresholds and Escalation Paths
- Dashboard Design for Ops Team
- Specific Alert Definitions
- Run Book Topics
42. Reconciliation System (HIGH)
- What Is Reconciled
- The Reconciliation Job Workflow
- How Discrepancies Are Flagged and Categorized
- The Manual Recompute Tool
- Tracking Drift Over Time to Detect Systemic Bugs
- Walk Through: Rajesh's Ledger Shows 15 Lakh, Actual Positions Sum to 13.5 Lakh
43. Hedge Execution Engine (CRITICAL)
- Design Overview
- Limit Order Placement with Configurable Max Slippage
- Partial Fill Tracking and Re-Pricing Strategy
- Execution Quality Reporting
- Unhedged Exposure Tracker
- Stale Hedge Cleanup Process
- Queue Management for Hedge Orders During High Volume
- Failover When Betfair Is Slow or Down
- Walk Through: Platform Needs to Hedge 5 Lakh on MI at 1.85, Only 2 Lakh Liquidity at 1.90
44. Migration and Backfill Strategy (MEDIUM)
- Mapping Existing Flat B-Book Configs to Forwarding Matrix Rules
- Handling Open Positions During Cutover
- Parallel-Run Mode
- Per-Agent Rollback Plan
- Data Migration for Historical Positions and Settlements
45. Support Tooling for Dispute Resolution (MEDIUM)
- Bet Lookup
- Audit Trail Visualization
- Re-Simulate Capability
- Dispute Workflow
46. Responsible Gambling Controls (MEDIUM)
- Self-Exclusion Mechanism
- Deposit Limits
- Session Time Limits
- Reality Check Notifications
- Where These Hooks Go in the Bet Flow Pipeline
47. Technology Stack (Confirmed)
- Core Technologies
- Additional Technologies
- Not Included (and Why)
48. System Architecture Overview
- Complete System Diagram
- Communication Patterns
- Deployment Topology (Production)
49. Database Schema Design
- Entity-Relationship Diagram
- Table: agents
- Table: agent_limits
- Table: forwarding_matrix_rules
- Table: users
- Table: user_overrides
- Table: user_classifications
- Table: agent_trust_config
- Table: market_overrides
- Table: events
- Table: markets
- Table: bets (partitioned by month on created_at)
- Table: bets (continued)
- Table: positions (partitioned by month on created_at)
- Table: exposure_ledger
- Table: settlements (partitioned by month on created_at)
- Table: audit_trail (partitioned by month on created_at, separate schema)
- Table: dead_letter_queue
- Table: hedge_orders
- Table: agent_hierarchy_history
- Table: reconciliation_results
- Table: reconciliation_discrepancies
- Table: alerts
- Table: config_changelog
- Table: feature_flags
50. API Design
- Bet Placement APIs
- Agent Configuration APIs
- User Management APIs
- Settlement APIs
- Admin APIs
- Monitoring APIs
- Dispute/Support APIs
51. Service Architecture
- Module Breakdown
52. The Bet Processing Pipeline (Step by Step)
- Step 1: Request Received (Budget: 5ms)
- Step 2: Responsible Gambling Checks (Budget: 2ms)
- Step 3: Timestamp Assignment (Budget: 0ms)
- Step 4: Compute Metrics (Budget: 3ms)
- Step 5: User Win Cap Check (Budget: 5ms)
- Step 6: Matrix Version Capture (Budget: 1ms)
- Step 7: Forwarding Percentage Resolution (Budget: 10ms)
- Step 8: Cascade Routing -- Level by Level (Budget: 10ms per level)
- Step 9: Exposure Ledger Updates (Budget: 10ms)
- Step 10: Position Creation (Budget: 15ms)
- Step 11: Hedge Queue Placement (Budget: 2ms)
- Step 12: Audit Trail Write (Budget: 10ms)
- Step 13: Response to Punter (Budget: 5ms)
53. The Settlement Pipeline (Step by Step)
- Step 1: Event Result Received
- Step 2: Market Resolution
- Step 3: Position Identification
- Step 4: Per-Position Settlement (Idempotent)
- Step 5: Exposure Ledger Decrement
- Step 6: Balance Updates
- Step 7: NO_NEW_RISK Re-evaluation
- Step 8: Reconciliation Check
- Step 9: Settlement Confirmation
54. Configuration Management
- How Forwarding Matrix Rules Are Stored, Versioned, and Cached
- Cache Invalidation Strategy (Summary)
55. Error Handling Patterns
- Error Categories
- Retry Policies
- Dead Letter Queue Integration
56. Testing Strategy
- Unit Test Coverage Targets
- Integration Test Scenarios
- Load Test Scenarios (IPL Peak Simulation)
- Chaos Test Scenarios
57. Deployment Strategy
- Docker Compose for Local Development
- Production Deployment
- Feature Flag Rollout Process
58. Implementation Phases
- Phase Dependencies
- MVP Definition (First Live Bet)
- Phase 1: Foundation (Weeks 1-4)
- Phase 2: Core Pipeline (Weeks 5-8)
- Phase 3: Production Hardening (Weeks 9-12)
- Phase 4: Scale and Polish (Weeks 13-16)
- Phase 5: Intelligence (Weeks 17+)

Table of Contents​

1. Executive Summary​

What is the B-Book?​

What problem does it solve?​

Why this matters​

The key differentiator​

2. The Core Problem -- In Plain English​

The Agent Hierarchy: A Real Example​

When Amit Places a Bet: The Complete Journey​

The Fundamental Questions​

3. The Forwarding Matrix -- The Brain of the System​

What It Is​

The 5 Dimensions​

How Wildcard Matching Works​

Tie-Breaking Rules​

Resolution Precedence Chain​

Sensible Defaults by Sport​

Common Mistakes Agents Will Make​

4. The Bet Flow -- Step by Step​

The Complete Flow​

Real-Life Example: Walking Through the Numbers​

5. Cascading Upline Routing​

How Bets Flow Through the Hierarchy​

What Happens at Each Level​

A 4-Level Cascade with Actual Numbers​

What Happens When a Mid-Tier Agent Hits Their Cap?​

What Happens When a Parent Is Suspended?​

The Betfair Backstop​

Edge Case: Betfair Is Down​

Does Sharp Classification Travel Upline?​

The Problem with Simple Approaches​

The Hybrid Design​

Real-Life Example: The Same Bet, Three Different Outcomes​

Why This Design Is Correct​

Configuration Per Sub-Agent​

6. Agent Liability Limits​

The Limit Structure​

Real Example: Rajesh's Limit Configuration​

How Limits Interact: The Most Restrictive Wins​

Limit Hierarchy Table​

7. User Win Limits & Stake Reduction​

Per-Click Win Limit​

Aggregate Win Limit​

How Stake Reduction Works​

What the Punter Sees​

Below Minimum Stake​

Sharp User Detection Signals​

8. NO_NEW_RISK Mode​

What It Is​

What Triggers It​

The Scope Is Granular​

How Hedge Detection Works​

Real Examples of Hedge vs Non-Hedge Bets​

How the Agent Exits NO_NEW_RISK​

9. Period Definitions -- Night & Weekly​

Why Bookies Use Periods​

Night Period​

Weekly Period​

Timezone Handling​

The Period Rollover Problem​

The DST Edge Case​

10. Exposure Accounting​

Three Ledgers Per Agent Per Scope​

How These Update Atomically​

The Redis Fast-Path Optimization​

Settlement Impact​

Ledger Updates for a Single Bet: 3-Level Cascade​

11. Audit & Determinism​

The Audit Record​

Why Determinism Matters​

Configuration Change Log​

Replay Capability​

12. What the Current Codebase Already Has (and What's Missing)​

The Key Observation​

13. The Nightmare Scenarios & How We Handle Them​

Scenario 1: Syndicate Attack -- Correlated Positions Across Agents​

Scenario 2: Data Feed Failure During Live Play​

Scenario 3: Rogue Agent Dumping Toxic Flow​

Scenario 4: Double Settlement / Result Correction​

Scenario 5: System Outage During Peak​

Table of Contents

1. Executive Summary

What is the B-Book?

What problem does it solve?

Why this matters

The key differentiator

2. The Core Problem -- In Plain English

The Agent Hierarchy: A Real Example

When Amit Places a Bet: The Complete Journey

The Fundamental Questions

3. The Forwarding Matrix -- The Brain of the System

What It Is

The 5 Dimensions

How Wildcard Matching Works

Tie-Breaking Rules

Resolution Precedence Chain

Sensible Defaults by Sport

Common Mistakes Agents Will Make

4. The Bet Flow -- Step by Step

The Complete Flow

Real-Life Example: Walking Through the Numbers

5. Cascading Upline Routing

How Bets Flow Through the Hierarchy

What Happens at Each Level

A 4-Level Cascade with Actual Numbers

What Happens When a Mid-Tier Agent Hits Their Cap?

What Happens When a Parent Is Suspended?

The Betfair Backstop

Edge Case: Betfair Is Down

Does Sharp Classification Travel Upline?

The Problem with Simple Approaches

The Hybrid Design

Real-Life Example: The Same Bet, Three Different Outcomes

Why This Design Is Correct

Configuration Per Sub-Agent

6. Agent Liability Limits

The Limit Structure

Real Example: Rajesh's Limit Configuration

How Limits Interact: The Most Restrictive Wins

Limit Hierarchy Table

7. User Win Limits & Stake Reduction

Per-Click Win Limit

Aggregate Win Limit

How Stake Reduction Works

What the Punter Sees

Below Minimum Stake

Sharp User Detection Signals

8. NO_NEW_RISK Mode

What It Is

What Triggers It

The Scope Is Granular

How Hedge Detection Works

Real Examples of Hedge vs Non-Hedge Bets

How the Agent Exits NO_NEW_RISK

9. Period Definitions -- Night & Weekly

Why Bookies Use Periods

Night Period

Weekly Period

Timezone Handling

The Period Rollover Problem

The DST Edge Case

10. Exposure Accounting

Three Ledgers Per Agent Per Scope

How These Update Atomically

The Redis Fast-Path Optimization

Settlement Impact

Ledger Updates for a Single Bet: 3-Level Cascade

11. Audit & Determinism

The Audit Record

Why Determinism Matters

Configuration Change Log

Replay Capability

12. What the Current Codebase Already Has (and What's Missing)

The Key Observation

13. The Nightmare Scenarios & How We Handle Them

Scenario 1: Syndicate Attack -- Correlated Positions Across Agents

Scenario 2: Data Feed Failure During Live Play

Scenario 3: Rogue Agent Dumping Toxic Flow

Scenario 4: Double Settlement / Result Correction

Scenario 5: System Outage During Peak