Skip to main content

B-Book Architecture & Design Document v2.0

System: Hannibal Date: February 2026 Status: Living Document Audience: Product, Engineering, Operations, Stakeholders


Table of Contents

  1. Executive Summary
  2. The Core Problem -- In Plain English
  3. The Forwarding Matrix -- The Brain of the System
  4. The Bet Flow -- Step by Step
  5. Cascading Upline Routing
  6. Agent Liability Limits
  7. User Win Limits & Stake Reduction
  8. NO_NEW_RISK Mode
  9. Period Definitions -- Night & Weekly
  10. Exposure Accounting
  11. Audit & Determinism
  12. What the Current Codebase Already Has (and What's Missing)
  13. The Nightmare Scenarios & How We Handle Them
  14. Operational Dashboard -- What Agents Need to See
  15. Performance Architecture
  16. Competitive Landscape
  17. Phased Rollout Plan
  18. Revenue Model
  19. Implementation Order (for Developers)
  20. The Bookie's Final Verdict

1. Executive Summary

What is the B-Book?

The B-Book is Hannibal's hierarchical, deterministic risk-management and routing engine. Think of it as the brain that sits between a punter placing a bet and the final destination of that bet's risk.

In traditional bookmaking, a "B-Book" means the bookie keeps the bet on their own books -- they take the other side of the punter's wager. If the punter loses, the bookie profits. If the punter wins, the bookie pays. The opposite is an "A-Book" where the bookie immediately hedges the bet on an exchange like Betfair, earning a small commission but taking no risk.

Hannibal's B-Book is more sophisticated than either approach. It is a routing engine that decides, for every single bet, what percentage stays at each level of an agent hierarchy and what percentage gets forwarded up the chain. It enforces limits automatically, cascades overflow intelligently, and maintains a complete audit trail of every decision.

What problem does it solve?

In sports betting agent networks -- prevalent across India, Southeast Asia, and Africa -- agents operate at different levels of a hierarchy. A master agent in Delhi manages sub-agents across several cities. Each sub-agent manages hundreds or thousands of punters. Every agent bears a different amount of risk based on their appetite, bankroll, and expertise.

Today, this risk allocation happens through manual spreadsheet management, WhatsApp groups, and phone calls. An agent might tell their upline "I'll keep 30% of cricket bets and forward you 70%." But there is no enforcement, no automatic cap management, and no audit trail. When disputes arise -- and they always arise -- there is no source of truth.

The B-Book automates all of this. It replaces manual negotiation with a configurable, enforceable, auditable system.

Why this matters

  • For agents: Automatic enforcement of limits means they never accidentally take on more risk than they can afford. No more 3am phone calls during an IPL match because exposure got out of hand.
  • For the platform: Deterministic routing means every rupee of every bet is accounted for. Disputes become trivially resolvable by replaying the decision trail.
  • For punters: Faster bet acceptance, consistent limits, and transparent maximum stakes.

The key differentiator

What sets Hannibal apart from existing tools is the automated forwarding matrix with cascading upline routing and deterministic audit trails. No other platform in this market offers a multi-dimensional forwarding matrix that automatically resolves routing percentages, cascades overflow through an arbitrary-depth agent hierarchy, and produces a complete, replayable decision record for every bet.


2. The Core Problem -- In Plain English

The Agent Hierarchy: A Real Example

Let us meet the people in our system:

                    +-----------------------+
| Betfair Exchange |
| (External Hedge) |
+-----------+-----------+
|
+-----------+-----------+
| HANNIBAL |
| (The Platform) |
+-----------+-----------+
|
+-----------+-----------+
| VIKRAM |
| Master Agent, Delhi |
| Manages 12 sub-agents|
+-----------+-----------+
|
+-----------------+-----------------+
| |
+-----------+-----------+ +-----------+-----------+
| RAJESH | | PRIYA |
| Sub-Agent, Mumbai | | Sub-Agent, Bangalore |
| 200 cricket punters | | 150 football punters |
+-----------+-----------+ +-----------------------+
|
+---------+---------+
| AMIT | SONIA | ... 198 more punters
| Punter | Punter |
+---------+---------+

Vikram is a master agent based in Delhi. He has been in the betting business for 15 years. He has a strong bankroll and deep knowledge of cricket markets. He is comfortable retaining significant risk on IPL matches.

Rajesh is one of Vikram's sub-agents, operating out of Mumbai. He manages about 200 punters who mainly bet on cricket. Rajesh has a moderate bankroll. He wants to keep some risk (because that is where the profit is) but cannot afford to take on unlimited exposure.

Amit is one of Rajesh's punters. He is a regular cricket bettor who places bets of between 500 and 50,000 on IPL matches.

When Amit Places a Bet: The Complete Journey

Amit opens his phone and places a bet: 10,000 on Mumbai Indians to win at odds 1.85 during the IPL.

Here is what needs to happen in the next 90 milliseconds:

  1. Can Amit even place this bet? Check his per-click win limit. At odds 1.85, a 10,000 stake means a potential win of 8,500. Is that within his limit?

  2. How much does Rajesh keep? The forwarding matrix says Rajesh retains 40% of IPL match odds bets. So Rajesh keeps 4,000 of the 10,000 stake (and the corresponding 3,400 potential liability).

  3. Can Rajesh afford to keep that 4,000? Check Rajesh's cricket limits, his per-match limits, his night period limit. If any limit is breached, reduce what Rajesh keeps.

  4. What happens to the other 6,000? It goes up to Vikram. Vikram's own forwarding matrix says he retains 60% of what arrives at his level. So Vikram keeps 3,600 and forwards 2,400 to the platform.

  5. What does the platform do with the remaining 2,400? Hannibal may retain some and hedge the rest on Betfair, depending on its own risk appetite.

  6. Record everything. The complete decision chain -- every percentage, every limit check, every cap evaluation -- is persisted as an audit record.

The Fundamental Questions

Every single bet must answer these questions:

QuestionWho Answers It
How much risk does Rajesh keep?Rajesh's forwarding matrix + his limits
How much risk does Vikram keep?Vikram's forwarding matrix + his limits
How much risk does the platform keep?Platform's risk configuration
How much gets hedged on Betfair?Whatever remains after all agents have taken their share
What if someone's limits are breached?Overflow cascades up to the next level
What if Betfair is unavailable?Platform absorbs as retained risk, retries asynchronously

3. The Forwarding Matrix -- The Brain of the System

What It Is

The forwarding matrix is a multi-dimensional lookup table that determines what percentage of each bet an agent retains versus forwards to their upline. It is the single most important configuration in the entire B-Book system.

Think of it like a spreadsheet where the rows represent different combinations of conditions, and the output is a single number: the forward percentage. If the matrix says "forward 60%," the agent keeps 40% and sends 60% up the chain.

The 5 Dimensions

Every bet has characteristics that determine how it should be routed. The matrix uses five dimensions to make this decision:

DimensionWhat It MeansExample Values
market_typeThe type of bet being placedMATCH_ODDS, FANCY, BOOKMAKER, OVER_UNDER, LINE
sport_typeWhich sportCRICKET, FOOTBALL, TENNIS, KABADDI
event_phaseWhen in the event lifecyclePRE_MATCH, IN_PLAY, APPROACHING_START
source_typeWhat kind of punterNORMAL, SHARP, VIP, NEW_ACCOUNT
liquidity_bandHow much exchange liquidity exists to hedgeHIGH, MEDIUM, LOW, NONE

How Wildcard Matching Works

An agent does not need to define a rule for every possible combination. That would be thousands of rows. Instead, the matrix supports wildcards (shown as *), which mean "match anything."

Here is an example of Rajesh's forwarding matrix:

Rulemarket_typesport_typeevent_phasesource_typeliquidity_bandForward %
R1FANCYCRICKETIN_PLAYSHARP*95%
R2FANCYCRICKETIN_PLAY**70%
R3MATCH_ODDSCRICKETPRE_MATCH*HIGH40%
R4MATCH_ODDSCRICKETPRE_MATCH*LOW70%
R5MATCH_ODDSCRICKETIN_PLAY**60%
R6*CRICKET*SHARP*90%
R7*FOOTBALL***80%
R8*****50%

Reading this table in plain English:

  • R1: If a sharp user places an in-play cricket fancy bet, forward 95%. Rajesh keeps only 5% because sharp users on in-play fancies are the most dangerous bets in cricket.
  • R3: For pre-match cricket match odds where exchange liquidity is high, forward only 40%. Rajesh keeps 60% because these are the safest bets -- they are easy to price and easy to hedge if needed.
  • R7: For any football bet, forward 80%. Rajesh is not a football expert, so he keeps very little.
  • R8: The catch-all rule. For anything not covered above, forward 50%.

Tie-Breaking Rules

What happens when a bet matches multiple rules? The system uses strict, deterministic tie-breaking:

  1. Most specific rule wins. A rule with fewer wildcards is more specific. R1 (one wildcard) beats R2 (two wildcards). Specificity is counted as the number of non-wildcard dimensions.

  2. If specificity is equal, higher forward percentage wins. This is the "risk-safe" default. When in doubt, forward more rather than less. The agent is protected from accidental over-exposure.

  3. If forward percentage is also equal, deterministic ordering by rule creation timestamp. The oldest rule wins. This ensures the same bet always resolves the same way.

Resolution Precedence Chain

The forwarding matrix is not the only thing that determines routing. There is a four-level precedence chain:

LevelWhat It IsWhen to Use It
User OverrideA specific forward % for a specific punter"This user is a known sharp -- forward 100% of their bets"
Market OverrideA specific forward % for a specific event/market"The CSK vs MI final is too big -- forward 90% of everything on this match"
Matrix RuleThe multi-dimensional lookup described aboveNormal day-to-day operations
Agent DefaultA single fallback percentage"If nothing else matches, forward 50%"

Sensible Defaults by Sport

These are the recommended starting ranges based on industry practice. New agents should start at the higher end of the forward range (lower retention) until they build confidence:

ScenarioRecommended RetentionWhy
Cricket in-play fancy (session/over runs)10-20%Highest variance, hardest to price, stale odds risk
Cricket pre-match match odds40-60%Well-priced, ample exchange liquidity, predictable markets
Cricket in-play match odds20-40%More volatile than pre-match, but still hedgeable
Football Premier League pre-match50-70%Deep liquidity, well-understood markets, strong pricing models
Football lower leagues pre-match20-40%Less information, worse pricing, integrity risk
Tennis10-25%Extremely volatile, retirement risk, low liquidity on most matches
Kabaddi5-15%Thin markets, poor external pricing, limited hedge options

Common Mistakes Agents Will Make

MistakeWhat HappensHow We Prevent It
Setting retention too high on in-play fanciesOne bad session wipes out a week of profitWarn when retention exceeds recommended range; require confirmation
No catch-all ruleSome bets have no matching rule and the system cannot route themSystem requires a default rule; matrix always has a * / * / * / * / * fallback
Conflicting rules that they do not understandAgent thinks rule X applies but rule Y wins due to specificityDashboard shows which rule matched for every bet; "test my matrix" dry-run tool
Copying another agent's matrix without understanding itMatrix tuned for a big operator does not suit a small oneTemplate system with clear explanations; onboarding wizard

4. The Bet Flow -- Step by Step

The Complete Flow

Real-Life Example: Walking Through the Numbers

The Bet: Amit places ₹10,000 on Mumbai Indians to beat Chennai Super Kings at decimal odds of 1.85, during IPL 2026, pre-match.

Step 1: Compute Metrics

MetricCalculationValue
StakeAs submitted₹10,000
Potential WinStake x (Odds - 1) = 10,000 x 0.85₹8,500
Liability (for the bookie)Same as potential win for a back bet₹8,500

Step 2: User Win Cap Check

Amit has a per-click win limit of ₹50,000. His potential win of ₹8,500 is well within that limit. No action needed.

Amit also has an aggregate win limit of ₹2,00,000 per day. He has accumulated ₹45,000 in potential wins today. Adding ₹8,500 brings him to ₹53,500 which is still under the limit. Pass.

Step 3: Stake Reduction

Since Amit passed both win cap checks, no stake reduction is applied. His full ₹10,000 stake is accepted.

Step 4: Resolve Forwarding Percentage

The system evaluates Rajesh's forwarding matrix. The bet characteristics are:

  • market_type: MATCH_ODDS
  • sport_type: CRICKET
  • event_phase: PRE_MATCH
  • source_type: NORMAL (Amit is not flagged as sharp)
  • liquidity_band: HIGH (MI vs CSK has deep exchange liquidity)

This matches Rule R3 from Rajesh's matrix: forward 40%. Rajesh retains 60%.

Step 5: Agent Cap Evaluation (Rajesh)

Rajesh retains 60% of ₹10,000 = ₹6,000 stake (₹5,100 liability).

Check Rajesh's limits:

  • Cricket overall limit: ₹50,00,000. Currently used: ₹12,00,000. After this bet: ₹12,05,100. Still within limit.
  • Per-match limit (this specific MI vs CSK match): ₹5,00,000. Currently used: ₹1,20,000. After: ₹1,25,100. Still within limit.
  • Night period limit: ₹10,00,000. Currently used: ₹3,00,000. After: ₹3,05,100. Still within limit.

All limits pass. Rajesh retains the full ₹6,000 stake.

Step 6: Cascade to Vikram

The remaining ₹4,000 stake (40%) flows up to Vikram. Vikram's matrix says he retains 60% of cricket pre-match match odds. So Vikram retains ₹2,400 and forwards ₹1,600 to the platform.

Step 7: Platform Routing

The platform receives ₹1,600. Based on platform risk configuration, it retains ₹800 and hedges ₹800 on Betfair.

Step 8: Execute

All positions are created atomically:

EntityRetained StakeRetained LiabilityForwarded
Rajesh₹6,000₹5,100₹4,000 → Vikram
Vikram₹2,400₹2,040₹1,600 → Platform
Platform₹800₹680₹800 → Betfair
Betfair₹800 (hedged)----
Total₹10,000

The stake always sums to the original ₹10,000. Nothing is created or destroyed; risk is simply distributed.

Step 9: Audit

A complete audit record is persisted containing: the original bet details, the matrix rule that matched at each level, every limit that was checked and its result, the final routing breakdown, timestamps for each step, and the total elapsed time.


5. Cascading Upline Routing

How Bets Flow Through the Hierarchy

The fundamental routing model is a cascade. A bet enters at the bottom of the agent hierarchy and flows upward. At each level, the agent retains what they can (based on their matrix and limits) and forwards the rest to their parent.

User (Amit) places ₹10,000 bet
|
v
+---[RAJESH: Level 1 Agent]---+
| Matrix says: forward 40% |
| Retains: ₹6,000 |
| Forwards: ₹4,000 |
+---------|--------------------+
|
v
+---[VIKRAM: Level 2 Agent]---+
| Matrix says: forward 40% |
| Retains: ₹2,400 |
| Forwards: ₹1,600 |
+---------|--------------------+
|
v
+---[HANNIBAL PLATFORM]-------+
| Config: retain 50% |
| Retains: ₹800 |
| Hedges: ₹800 |
+---------|--------------------+
|
v
+---[BETFAIR EXCHANGE]--------+
| Final backstop |
| Receives: ₹800 |
+------------------------------+

What Happens at Each Level

At every level in the chain, the system performs the same sequence:

  1. Resolve the source_type for this agent -- does this agent have their own classification for the user? If not, do they trust the downstream agent's classification? (See "Does Sharp Classification Travel Upline?" below)
  2. Resolve the forwarding percentage for this agent (using their matrix with the resolved source_type, overrides, or default)
  3. Calculate the retained amount = incoming stake x (1 - forward %)
  4. Check the agent's limits -- can they actually absorb that retained amount?
  5. If limits allow it: retain the calculated amount, forward the rest
  6. If limits would be breached: retain only up to the limit, forward the overflow as well

A 4-Level Cascade with Actual Numbers

A more complex scenario: Amit bets ₹50,000 on CSK to win at odds 2.10.

Potential win: ₹55,000. Liability: ₹55,000.

AMIT bets ₹50,000
|
v
RAJESH (L1) - Forward 40%
Wants to retain: ₹30,000 (60%)
Cricket match limit remaining: ₹25,000 <-- LIMIT HIT
Actually retains: ₹25,000
Forwards: ₹25,000 (the intended ₹20,000 + ₹5,000 overflow)
|
v
VIKRAM (L2) - Forward 40%
Receives: ₹25,000
Wants to retain: ₹15,000 (60%)
All limits OK
Actually retains: ₹15,000
Forwards: ₹10,000
|
v
PLATFORM (L3)
Receives: ₹10,000
Retains: ₹5,000
Hedges: ₹5,000
|
v
BETFAIR (L4)
Receives: ₹5,000 hedge order

Notice how Rajesh's overflow (₹5,000 he could not absorb because of his per-match limit) cascaded up to Vikram. Vikram did not "know" this was overflow -- he simply received ₹25,000 instead of ₹20,000 and processed it through his own matrix and limits.

What Happens When a Mid-Tier Agent Hits Their Cap?

If Vikram had also hit his limit in the example above, the overflow would continue cascading to the platform. The system guarantees that every rupee of every bet ends up somewhere. The cascade never drops a bet. It simply moves unclaimed risk upward until it reaches the platform, which is the final agent in the chain.

What Happens When a Parent Is Suspended?

If Vikram is suspended (say, for a payment dispute), his children cannot forward bets to him. In this scenario:

  • Rajesh can only retain bets up to his own limits
  • Any amount that would normally be forwarded to Vikram is instead forwarded directly to the platform
  • The platform absorbs the extra flow or hedges it on Betfair

The system treats a suspended agent as a "skip" in the chain, not a blockage. Punters can still bet. The risk simply routes around the suspended agent.

The Betfair Backstop

After all agents in the hierarchy have taken their share, any remaining exposure reaches the platform. The platform can retain some of this risk, but it always has the option to hedge on Betfair.

Betfair acts as the final backstop. It is the entity of last resort that absorbs whatever risk no agent in the hierarchy was willing or able to keep.

Edge Case: Betfair Is Down

If Betfair's API is unavailable (which happens during peak traffic or maintenance), the platform cannot hedge. In this case:

  1. The platform absorbs the would-be-hedged amount as retained risk temporarily
  2. The bet is still accepted (we do not reject punter bets because of a hedge-side issue)
  3. The hedge order is placed in an async retry queue
  4. When Betfair comes back online, the hedge is executed
  5. If the event settles before the hedge is placed, the platform simply bears that risk as if it had been deliberately retained

This is a conscious design decision: punter experience is never degraded by infrastructure issues on the hedge side.

Does Sharp Classification Travel Upline?

This is one of the most important design questions in the entire cascade. When Rajesh tags Amit as a sharp user and forwards 95% of his bet to Vikram, does Vikram know that Amit is sharp?

The answer is: the information travels, but each agent decides independently.

The Problem with Simple Approaches

If sharp info always travels: Rajesh tags Amit as sharp. Vikram's matrix also sees source_type=SHARP and forwards 90%. The platform sees SHARP and forwards 95% to Betfair. Sharp bets rocket through the entire hierarchy in milliseconds. But here is the problem: Rajesh's "sharp" might be Vikram's "normal." Rajesh has loose prices and 200 punters -- anyone who consistently wins against him looks sharp. Vikram, with 2,000 punters and tighter pricing, might profitably retain that exact same flow. Blindly propagating sharp flags means Vikram loses profitable volume because of Rajesh's weaker pricing.

If sharp info never travels: Vikram receives forwarded flow from Rajesh and treats it all as NORMAL. But some of that flow is genuinely toxic -- sharp syndicate members that Rajesh correctly identified. Vikram unknowingly retains it, and his P&L suffers. This is exactly the bookie's nightmare scenario: "rogue agent dumping toxic flow upline."

The Hybrid Design

Each forwarded bet carries metadata about the originating agent's classification, but each upline agent makes their own independent decision about how to treat it.

What travels with the bet:

Metadata FieldDescriptionExample
originating_user_idThe punter who placed the betuser_amit_4521
originating_agent_idThe first agent in the chainrajesh_mumbai
downstream_classificationWhat the originating agent classified the user asSHARP
forwarding_reasonWhy it was forwarded at each levelSHARP_USER, MATRIX_RULE, CAPACITY_BREACH

How each upline agent uses this information:

The resolution order at each upline level is:

  1. Own classification wins first. If Vikram has independently tagged Amit as SHARP (or NORMAL, or VIP), that classification is used regardless of what Rajesh thinks. Vikram's own data is the most relevant to Vikram's book.

  2. Configurable trust in downstream flags. Each agent can configure per sub-agent whether to trust their sharp classifications. Vikram might set trust_downstream_flags: true for Rajesh (whose sharp detection he trusts) but false for a newer sub-agent whose judgment he has not validated.

  3. Default to NORMAL if no other signal. If the upline has no opinion and does not trust the downstream flag, the bet is treated as normal flow. This is the safe default for the upline's book -- they apply their standard matrix rules.

Real-Life Example: The Same Bet, Three Different Outcomes

Setup: Amit is tagged as SHARP by Rajesh. Amit bets ₹20,000 on MI to win at 1.85. Rajesh forwards 95% (₹19,000) to Vikram.

Scenario A -- Vikram trusts Rajesh's flags: Vikram's config: trust_downstream_flags: true for Rajesh. Vikram's matrix: SHARP source on cricket → forward 80%. So Vikram forwards 80% of ₹19,000 = ₹15,200 to the platform. Vikram retains ₹3,800.

Scenario B -- Vikram has his own classification: Vikram has independently analyzed Amit's betting across all sub-agents and classified him as NORMAL (Amit's edge disappears at Vikram's sharper prices). Vikram's matrix: NORMAL source on cricket pre-match → forward 40%. Vikram retains 60% of ₹19,000 = ₹11,400.

Scenario C -- Vikram ignores downstream flags: Vikram's config: trust_downstream_flags: false for Rajesh. No own classification for Amit. Amit is treated as NORMAL. Same outcome as Scenario B.

Why This Design Is Correct

Each level of the hierarchy has different information and different risk tolerance. A user who is sharp at the sub-agent level (beating loose prices) may not be sharp at the master agent level (where prices are tighter). A user who looks normal to a sub-agent might be part of a syndicate that only the master agent can see (because the master agent has cross-agent visibility).

The audit trail records everything. For every forwarded bet, the audit record shows: what the originating agent classified the user as, what the upline agent's resolution was, and why. When Vikram asks "why did I retain a sharp user's bet?", the answer is clear: "Your config ignores downstream flags from Rajesh, your own detection had not flagged this user, so the bet was treated as NORMAL."

Cross-agent sharp detection fills the gap. The platform has visibility across ALL agents. If Amit is betting through three different sub-agents under Vikram, the platform-level sharp detection can flag this pattern and push a classification down to Vikram -- independent of what any individual sub-agent thinks. This is why Section 17's Phase 3 includes "cross-agent sharp detection" as a key deliverable.

Configuration Per Sub-Agent

Each agent configures trust settings per sub-agent:

Sub-Agenttrust_downstream_flagsReason
Rajesh (Mumbai)trueExperienced, reliable sharp detection, 8 years track record
Arun (Bangalore)falseNew sub-agent, unproven detection, only 3 months on platform
Sanjay (Chennai)trueGood track record, conservative flagging

This means Vikram can gradually extend trust as sub-agents prove their detection quality -- much like how the real-world agent relationship works. You trust experienced partners more than new ones.


6. Agent Liability Limits

The Limit Structure

Every agent can configure limits at multiple levels of granularity. The purpose of limits is to ensure an agent never accidentally takes on more risk than their bankroll can support.

Limit TypeScopeExample
Sport LimitTotal liability across all events in a sport"I can handle ₹50 lakh total cricket exposure"
Market LimitTotal liability on a specific event or market"No more than ₹5 lakh on any single IPL match"
Night Period LimitTotal liability accumulated during the night window"Cap my night session at ₹10 lakh"
Weekly Period LimitTotal liability accumulated during the weekly cycle"Cap my weekly exposure at ₹1 crore"

Real Example: Rajesh's Limit Configuration

Rajesh sets up the following limits for cricket:

RAJESH'S CRICKET LIMITS
========================

Sport-Level Limit (Cricket): ₹50,00,000 (₹50 lakh)
|
+-- Per-Match Limit: ₹5,00,000 (₹5 lakh per individual match)
|
+-- Night Period Limit: ₹10,00,000 (₹10 lakh between 7pm-2am IST)
|
+-- Weekly Period Limit: ₹40,00,000 (₹40 lakh Monday-Sunday)

How Limits Interact: The Most Restrictive Wins

When a bet arrives, all applicable limits are checked simultaneously. The most restrictive limit determines how much the agent can retain.

Example: It is Thursday night during IPL week. Rajesh's current state:

LimitCapacityUsedRemaining
Cricket Sport Limit₹50,00,000₹38,00,000₹12,00,000
MI vs CSK Match Limit₹5,00,000₹4,50,000₹50,000
Night Period Limit₹10,00,000₹9,20,000₹80,000
Weekly Period Limit₹40,00,000₹35,00,000₹5,00,000

A new bet wants to add ₹1,00,000 of retained liability. Looking at the remaining capacity:

  • Sport: ₹12,00,000 available -- sufficient
  • Match: ₹50,000 available -- NOT sufficient
  • Night: ₹80,000 available -- NOT sufficient
  • Weekly: ₹5,00,000 available -- sufficient

The most restrictive limit is the match limit at ₹50,000. So Rajesh can only retain ₹50,000 of the ₹1,00,000. The remaining ₹50,000 overflows and cascades upward to Vikram.

Limit Hierarchy Table

PriorityLimitChecked WhenResets
1 (most restrictive wins)Per-Match LimitEvery bet on that specific matchWhen match settles
2Night Period LimitEvery bet during night windowAt night period end
3Weekly Period LimitEvery bet during the weekMonday start of day
4Sport LimitEvery bet in that sportRolling / manual reset

7. User Win Limits & Stake Reduction

Per-Click Win Limit

The per-click win limit caps the maximum amount a punter can win on a single bet. This protects agents from large individual payouts.

How it works: If a punter bets at high odds, the potential win could be enormous. The per-click win limit ensures that no single bet can produce a payout above the configured threshold.

Example: Amit has a per-click win limit of ₹50,000.

BetOddsStakePotential WinWithin Limit?
MI to win1.85₹10,000₹8,500Yes
Kohli top bat5.00₹15,000₹60,000No -- exceeds ₹50,000
Fancy: over 180 runs50.00₹5,000₹2,45,000No -- far exceeds

Aggregate Win Limit

The aggregate win limit caps the total cumulative potential wins a punter can accumulate over a configurable period (typically daily). This protects against a punter placing many winning bets that individually pass the per-click limit but collectively create enormous exposure.

Example: Amit has a daily aggregate win limit of ₹2,00,000.

He has already placed bets today with a total potential win of ₹1,85,000. His next bet has a potential win of ₹25,000. Since ₹1,85,000 + ₹25,000 = ₹2,10,000 which exceeds ₹2,00,000, the bet must be reduced or rejected.

How Stake Reduction Works

When a bet exceeds a win limit, the system does not reject it outright. Instead, it reduces the stake to the maximum amount that keeps the potential win within the limit. This is better for the punter (they still get to bet) and better for the agent (they still get action).

Real Example: High Odds Scenario

Sonia wants to bet ₹5,000 on a fancy market at odds of 50.00. Her per-click win limit is ₹50,000.

StepCalculation
Potential win at full stake₹5,000 x (50.00 - 1) = ₹2,45,000
Exceeds per-click limit?Yes: ₹2,45,000 > ₹50,000
Maximum allowable win₹50,000
Reduced stake₹50,000 / (50.00 - 1) = ₹50,000 / 49 = ₹1,020 (rounded down)
Verify₹1,020 x 49 = ₹49,980 which is under ₹50,000

What the Punter Sees

The punter sees a message like:

Maximum stake at these odds: ₹1,020

The message is transparent about the cap (the punter knows their stake was limited) but opaque about the reason (we do not say "your win limit is ₹50,000" because that reveals the agent's risk configuration).

Below Minimum Stake

If the reduced stake falls below the minimum allowed bet size (say, ₹100), the bet is rejected entirely with a clear message:

This market is currently unavailable at these odds.

This avoids the absurdity of accepting a ₹3 bet.

Sharp User Detection Signals

Agents need to identify "sharp" users -- punters who consistently beat the closing line and generate long-term losses for the bookie. Sharp detection informs the source_type dimension of the forwarding matrix.

The key signals that indicate a user may be sharp:

SignalWhat It MeansWhy It Matters
Closing Line Value (CLV)The user consistently bets at prices better than where the market closesThis is the single strongest predictor of long-term profitability. A user with positive CLV over 500+ bets is almost certainly sharp.
Consistent stakingSame stake size regardless of odds or confidenceRecreational punters vary stakes; professionals use flat staking to disguise their edge
Early bettingRegularly bets within the first hour of a market openingEarly markets are softest; sharp users exploit them before prices adjust
Unpopular marketsFrequently bets on obscure leagues, low-tier eventsThese markets have the weakest pricing and the most exploitable edges
No mean reversionProfits do not revert to average over timeLucky punters revert; skilled punters sustain their edge
Rapid market movement after their betPrice moves sharply in their direction after they betIndicates they are consistently on the right side of information

8. NO_NEW_RISK Mode

What It Is

NO_NEW_RISK is a protective mode that activates when an agent's retained liability reaches their configured cap for a given scope (sport, market, or period). When active, the agent cannot take on any new risk-increasing exposure, but hedge bets are still accepted.

Think of it like a credit card limit. Once you hit your limit, you cannot make new purchases, but you can still make payments (which reduce what you owe).

What Triggers It

NO_NEW_RISK is triggered automatically when:

Agent's retained open liability >= configured limit for that scope

For example, if Rajesh's cricket night limit is ₹10,00,000 and his current retained cricket night liability is ₹10,00,000, he enters NO_NEW_RISK for cricket during the night period.

The Scope Is Granular

NO_NEW_RISK is not a blanket shutdown. It is scoped per sport and per market:

ScenarioCricket StatusTennis StatusFootball Status
Rajesh hits cricket limitNO_NEW_RISKNormalNormal
Rajesh hits MI vs CSK match limitNO_NEW_RISK (this match only)NormalNormal
Rajesh hits night period limitNO_NEW_RISK (all sports in night)NO_NEW_RISKNO_NEW_RISK

How Hedge Detection Works

The critical question in NO_NEW_RISK mode is: "Does this bet reduce or increase the agent's worst-case liability?" A hedge bet reduces liability and should be accepted even when the agent is at their limit.

The rule is simple:

If WorstCaseLiability AFTER the bet < WorstCaseLiability BEFORE the bet, it is a hedge.

Real Examples of Hedge vs Non-Hedge Bets

Scenario: Rajesh is in NO_NEW_RISK for the MI vs CSK match. He currently has ₹5,00,000 of retained liability backing MI to win.

Incoming BetEffect on LiabilityHedge?Accepted?
₹2,00,000 more on MI to winLiability increases to ₹7,00,000No -- increases worst caseRejected (forwarded 100% to upline)
₹3,00,000 on CSK to winLiability decreases because it offsets the MI positionYes -- reduces worst caseAccepted
₹1,00,000 on DrawPartially reduces MI exposure depending on oddsDepends -- compute the actual worst caseAccepted if worst case decreases

Key insight: A bet on the opposite outcome of an existing position is almost always a hedge. A bet on the same outcome is never a hedge. A bet on a third outcome (like a draw) may or may not be a hedge depending on the amounts and odds.

How the Agent Exits NO_NEW_RISK

There are three ways out of NO_NEW_RISK:

  1. Settlements reduce exposure. When a match settles, the liability associated with that match is removed. If this brings the agent back below their limit, NO_NEW_RISK is lifted.

  2. Hedge bets reduce exposure. Accepting opposite-side bets reduces worst-case liability. Enough hedges can bring the agent below the limit.

  3. Admin raises the limit. If Rajesh calls his platform admin and says "I'm comfortable taking more cricket risk this week," the admin can raise his limit. This immediately lifts NO_NEW_RISK if the current liability is now below the new limit.


9. Period Definitions -- Night & Weekly

Why Bookies Use Periods

Bookies do not think in terms of "total lifetime exposure." They think in operational windows:

  • The night session is when most live betting happens (evening matches in India, late-night football in Europe). The risk profile during a night session is very different from a quiet afternoon.
  • The weekly cycle aligns with settlement cycles. Most agents settle weekly. They need to know their maximum weekly exposure.

Periods provide a way to set separate limits for separate time windows, which mirrors how bookies actually operate.

Night Period

The night period is a configurable time window per agent. It typically covers the peak betting hours for that agent's primary market.

AgentTimezoneNight PeriodWhy
Rajesh (India, cricket)IST (UTC+5:30)7:00 PM - 2:00 AMIPL matches start at 7:30 PM
Priya (India, football)IST (UTC+5:30)10:00 PM - 4:00 AMPremier League matches start at 12:30 AM IST
Kwame (Ghana, football)GMT2:00 PM - 11:00 PMAfternoon and evening matches

Weekly Period

The weekly period is a Monday-to-Sunday cycle (configurable start day per agent). At the start of each week, the weekly exposure counter resets to zero.

Timezone Handling

Each agent operates in their own timezone. This is critical because:

  • Rajesh's "night" starts at 7:00 PM IST, which is 1:30 PM UTC
  • The system stores all times in UTC internally but converts to the agent's local timezone when evaluating period boundaries
  • A bet placed at 1:45 PM UTC is "night" for Rajesh but "afternoon" for a UK-based agent

The Period Rollover Problem

What happens when a live match spans a period boundary? Consider:

  • MI vs CSK starts at 7:30 PM IST (within Rajesh's night period)
  • The match runs late and extends past 2:00 AM IST (Rajesh's night period end)
  • Rajesh has ₹8,00,000 of retained liability on this match at 1:55 AM

The design choice: carry-forward exposure.

When a period ends, open exposure from events that are still live is carried forward into the new period. This means:

  • Rajesh's ₹8,00,000 is NOT magically zeroed out at 2:00 AM
  • Instead, it carries forward as a starting balance for the next period (or the "day" period if night has ended)
  • New bets after 2:00 AM are counted against the day period limits
  • The carried-forward amount from the night period continues to count against the sport-level limit

The alternative -- a clean reset that ignores ongoing exposure -- would be dangerous. An agent could circumvent limits by waiting for a period boundary.

The DST Edge Case

Daylight Saving Time creates a subtle problem. When clocks spring forward, a night period configured as 7 PM to 2 AM suddenly becomes 7 hours long in UTC instead of 7 hours. When clocks fall back, it becomes 8 hours long.

The solution: Period boundaries are defined in the agent's local time, and the system converts them to UTC fresh each day, accounting for DST. A "7 PM to 2 AM" night period always means 7 PM to 2 AM in the agent's local clock, regardless of DST transitions.

On the actual transition day:

  • Spring forward (clocks skip 2 AM to 3 AM): The night period is effectively 1 hour shorter. This is acceptable; it is a conservative outcome (less time to accumulate risk).
  • Fall back (clocks repeat 1 AM to 2 AM): The night period is effectively 1 hour longer. The system uses the first occurrence of the repeated hour as the boundary.

10. Exposure Accounting

Three Ledgers Per Agent Per Scope

For every agent, at every scope level (sport, market, period), the system maintains three numbers:

LedgerWhat It TracksUpdated When
retained_open_liabilityThe total worst-case payout the agent faces on retained betsEvery bet placement and settlement
forwarded_open_liabilityThe total liability the agent has forwarded upwardEvery bet placement and settlement
open_potential_winThe total amount punters stand to win against this agentEvery bet placement and settlement

These three numbers tell you everything about an agent's current risk position:

  • retained_open_liability is what the agent will pay if everything goes wrong. This is the number checked against limits.
  • forwarded_open_liability is what the agent's upline will pay. The agent has no financial exposure here.
  • open_potential_win is the punter-side view -- what the agent's punters could collectively win.

How These Update Atomically

When a bet is placed, all ledger updates across all affected agents happen in a single atomic transaction. There is no moment where Rajesh's ledger is updated but Vikram's is not. This prevents inconsistent states where the numbers do not add up.

The Redis Fast-Path Optimization

Most bet processing involves reading the current exposure to check limits. Only a minority of bets actually push an agent close to their limit.

The optimization:

+------------------+     +------------------+     +------------------+
| APPLICATION | | REDIS | | POSTGRESQL |
| (LRU Cache) | | (Fast Read) | | (Source of |
| | | | | Truth) |
| - 5-second TTL | | - Sub-ms reads | | - Atomic writes |
| - ~60% hit rate | | - ~25% hit rate | | - FOR UPDATE |
| - Zero latency | | - <1ms latency | | locking |
+--------+---------+ +--------+---------+ +--------+---------+
| | |
v v v
"Rajesh has ₹12L "Rajesh has ₹12L "Rajesh has exactly
used of ₹50L -- used of ₹50L -- ₹12,05,100 used --
clearly within clearly within UPDATE with lock"
limit, fast pass" limit, fast pass"

How it works:

  1. Application LRU cache (Tier 1): An in-memory cache with a short TTL (5 seconds). If a bet arrives and the cached exposure shows the agent is far from their limit (say, 60% utilized), we do not need to check further. The bet will pass the limit check. This handles the vast majority of bets.

  2. Redis (Tier 2): For bets where the LRU cache has expired or the agent is approaching their limit, read from Redis. Redis values are updated on every write but are eventually consistent. Still very fast (sub-millisecond).

  3. PostgreSQL with FOR UPDATE (Tier 3): For bets where the agent is near their limit (say, >80% utilized), we take a pessimistic lock in PostgreSQL. This ensures that two simultaneous bets cannot both claim the last ₹50,000 of capacity. This is the slowest path but only applies to a small percentage of bets.

Why this matters: During an IPL match, Rajesh might receive 50 bets per minute. For 40 of those, the LRU cache can immediately confirm he is within limits. For 8 more, Redis provides the answer. Only 2 bets (the ones near his limit) need the PostgreSQL lock. This keeps median latency low while guaranteeing correctness at the boundary.

Settlement Impact

When a match settles, the exposure associated with that match is removed from all agents in the chain:

  • Retained positions: The settled amount is removed from retained_open_liability. If the agent won (punter lost), the agent profits. If the agent lost (punter won), the agent pays out.
  • Forwarded positions: The settled amount is removed from forwarded_open_liability. The upline agent handles payout for forwarded positions.
  • Potential win: The settled amount is removed from open_potential_win.

Ledger Updates for a Single Bet: 3-Level Cascade

Amit bets ₹10,000 on MI at odds 1.85. Liability per unit of stake: ₹0.85.

BEFORE THE BET:
================================================================
Retained Forwarded Potential
Agent Liability Liability Win
----------------------------------------------------------------
Rajesh ₹12,00,000 ₹8,00,000 ₹20,00,000
Vikram ₹25,00,000 ₹10,00,000 ₹35,00,000
Platform ₹5,00,000 ₹2,00,000 ₹7,00,000
================================================================

BET PROCESSING:
================================================================
Rajesh retains 60% = ₹6,000 stake → ₹5,100 liability
Rajesh forwards 40% = ₹4,000 stake → ₹3,400 liability

Vikram retains 60% of ₹4,000 = ₹2,400 stake → ₹2,040 liability
Vikram forwards 40% of ₹4,000 = ₹1,600 stake → ₹1,360 liability

Platform retains 50% of ₹1,600 = ₹800 stake → ₹680 liability
Platform hedges 50% of ₹1,600 = ₹800 stake → ₹680 forwarded
================================================================

AFTER THE BET:
================================================================
Retained Forwarded Potential
Agent Liability Liability Win
----------------------------------------------------------------
Rajesh ₹12,05,100 ₹8,03,400 ₹20,08,500
(+₹5,100) (+₹3,400) (+₹8,500)

Vikram ₹25,02,040 ₹10,01,360 ₹35,03,400
(+₹2,040) (+₹1,360) (+₹3,400)

Platform ₹5,00,680 ₹2,00,680 ₹7,01,360
(+₹680) (+₹680) (+₹1,360)
================================================================

11. Audit & Determinism

The Audit Record

Every bet produces a complete audit record. This is not a log file that can be grepped through later. It is a structured, queryable record that captures every decision the system made.

An audit record contains:

FieldDescriptionExample
bet_idUnique identifier for the betbet_a1b2c3d4
original_stakeWhat the punter requested₹10,000
adjusted_stakeWhat was actually accepted (after stake reduction, if any)₹10,000
stake_reduction_reasonWhy the stake was reduced, if it wasnull (no reduction)
per_click_win_cap_checkResult of the per-click checkPASS: ₹8,500 < ₹50,000
aggregate_win_cap_checkResult of the aggregate checkPASS: ₹53,500 < ₹2,00,000
forwarding_chainComplete routing at each levelSee below
matrix_rules_evaluatedWhich rules were checked and which wonR3 matched (specificity 3), R5 evaluated (specificity 2), R8 evaluated (specificity 0)
limit_checksEvery limit checked at every levelSee below
hedge_detectionWhether NO_NEW_RISK was active, whether the bet was a hedgeNOT_IN_NO_NEW_RISK
period_contextWhich period the bet fell inNIGHT (19:00-02:00 IST), Week 7 of 2026
timestampsWhen each step occurredmatrix_resolve: 2ms, cap_check: 5ms, execution: 12ms, total: 23ms

The forwarding chain in detail:

Level 1: Rajesh
- Incoming stake: ₹10,000
- Forward % source: MATRIX (Rule R3)
- Forward %: 40%
- Retained stake: ₹6,000
- Retained liability: ₹5,100
- Limit checks:
- Cricket sport: ₹12,05,100 / ₹50,00,000 (24.1%) PASS
- MI vs CSK match: ₹1,25,100 / ₹5,00,000 (25.0%) PASS
- Night period: ₹3,05,100 / ₹10,00,000 (30.5%) PASS
- Forwarded stake: ₹4,000
- Overflow: ₹0

Level 2: Vikram
- Incoming stake: ₹4,000
- Forward % source: MATRIX (Rule V2)
- Forward %: 40%
- Retained stake: ₹2,400
- Retained liability: ₹2,040
- Limit checks: [similar detail]
- Forwarded stake: ₹1,600
- Overflow: ₹0

Level 3: Platform
- Incoming stake: ₹1,600
- Retained: ₹800
- Hedged on Betfair: ₹800
- Betfair order ID: bf_xyz789

Why Determinism Matters

Agents must trust the system. If Rajesh sees a bet routed in a way he does not understand, he will lose confidence and revert to manual processes. The audit trail lets him see exactly why every decision was made.

Disputes must be resolvable. When Rajesh and Vikram disagree about who owes what at settlement time, the system has an indisputable record of exactly how every bet was split.

Regulators may require it. In jurisdictions moving toward regulation, a complete audit trail is a compliance necessity.

Configuration Change Log

All configuration changes are recorded using event sourcing:

  • When Rajesh changes his forwarding matrix, the old matrix is preserved and the new one is recorded with a timestamp
  • When an admin changes Amit's win limit, the change is logged with who made it and why
  • This means you can answer questions like: "What was Rajesh's matrix at 9:47 PM on March 15?" -- by replaying the event log up to that point

Replay Capability

The ultimate test of determinism: given the state at time T, the same bet must produce the same routing.

This means if a dispute arises about a bet placed three weeks ago, we can:

  1. Load the configuration state as it existed at the time of the bet
  2. Load the exposure state as it existed at the time of the bet
  3. Re-run the routing logic
  4. Produce the exact same result

This is possible because all inputs to the routing decision (matrix, limits, current exposure, user status, market conditions) are captured in the audit record.


12. What the Current Codebase Already Has (and What's Missing)

Based on analysis of the existing Hannibal codebase:

FeatureCurrent StatusWhat ExistsWhat's Missing
B-Book ConfigPartialbbookConfigService.ts -- basic B-Book configuration per agentMulti-dimensional forwarding matrix, per-sport/market granularity, wildcard matching
B-Book StatePartialbbookStateService.ts -- tracks basic B-Book statePer-scope exposure ledgers (sport, market, period), NO_NEW_RISK mode tracking
Filter EnginePartialfilterEngine.ts -- filters bets through B-Book rules5-dimension matrix resolution, specificity-based tie-breaking, precedence chain
B-Book FillPartialbbookFillService.ts -- executes B-Book bet placementCascading multi-level routing, overflow handling, atomic multi-agent ledger updates
B-Book SettlementPartialbbookSettlementService.ts -- settles B-Book positionsMulti-level settlement cascade, exposure ledger rollback, period-aware settlement
Sharp DetectionPartialsharpUserService.ts -- identifies sharp usersCLV calculation, behavioral scoring, integration with forwarding matrix source_type
Agent HierarchyExistsagents.ts, agent.ts routes -- agent CRUD and hierarchyCascading routing through hierarchy, parent suspension skip logic
Agent AccountingExistsagentSettlementService.ts, agentSettlementJob.ts -- agent financial settlementsPer-scope liability tracking, real-time exposure counters
Agent MonitoringExistsagentMonitoringService.ts -- agent activity monitoringReal-time limit utilization, NO_NEW_RISK alerts, period boundary tracking
User Win LimitsMissing--Per-click win cap, aggregate win cap, stake reduction engine
Forwarding MatrixMissing--Full 5D matrix data model, wildcard resolution, matrix CRUD API
Per-Agent LimitsMissing--Sport/market/period limit configuration, limit enforcement in bet flow
Cascading RoutingMissing--N-level cascade engine, overflow calculation, suspended agent skip
NO_NEW_RISK ModeMissing--Automatic trigger, hedge detection, scope-aware activation
Hedge DetectionMissing--Worst-case liability comparison, multi-outcome hedge evaluation
Period ManagementMissing--Night/weekly period definitions, timezone handling, carry-forward logic
Audit TrailMissing--Structured audit records, event sourcing, replay capability
Redis Exposure CacheMissing--3-tier caching, fast-path optimization, cache invalidation

The Key Observation

The existing codebase has the foundation right. The B-Book service exists with config, state, filtering, fill, and settlement modules. The agent hierarchy exists with accounting and monitoring. Sharp detection exists.

What is missing is the connective tissue -- the forwarding matrix that ties bet characteristics to routing decisions, the cascade engine that flows bets through the hierarchy, and the limit enforcement that keeps agents safe. These are the components that transform the existing point-to-point B-Book into a hierarchical risk management system.


13. The Nightmare Scenarios & How We Handle Them

Scenario 1: Syndicate Attack -- Correlated Positions Across Agents

What happens: A betting syndicate places coordinated bets through multiple agents. Each individual agent sees a modest position, but the aggregate platform exposure is enormous. The MI vs CSK match settles and the platform owes ₹2 crore across 15 agents.

How we handle it:

  • Cross-agent position aggregation. The platform maintains a real-time view of aggregate exposure per event, summing across all agents. If aggregate exposure on any single outcome exceeds a configurable threshold, an alert fires.
  • Correlated account detection. Users who consistently bet the same outcome, at the same time, across different agents are flagged. Signals include: matching IP addresses, similar device fingerprints, correlated timing patterns, and identical stake amounts.
  • Platform-level event limits. Independent of individual agent limits, the platform sets a maximum total exposure per event. When this is breached, additional retained positions are blocked platform-wide, and new bets are forwarded to Betfair.

Scenario 2: Data Feed Failure During Live Play

What happens: The odds feed from the data provider (Roanuz, OddsPAPI) goes stale during a live IPL match. The system is showing odds of 1.85 for MI, but in reality, MI just lost 3 quick wickets and the true odds are now 3.50. Smart punters bet on MI at the stale 1.85 price.

How we handle it:

  • Stale price detection. If the odds feed has not updated for more than a configurable duration (e.g., 5 seconds for in-play), the market is automatically suspended. No new bets are accepted until the feed resumes.
  • Price movement circuit breaker. If the price moves by more than a configurable percentage in a single update (suggesting a missed intermediate update), the market is suspended pending human review.
  • Multi-source validation. Where available, cross-reference prices from multiple providers. A price that is significantly different from all other sources is likely stale.

Scenario 3: Rogue Agent Dumping Toxic Flow

What happens: An agent intentionally forwards 100% of sharp/winning flow to their upline or the platform while retaining the losing flow. Over time, the platform notices that all bets forwarded by this agent lose money.

How we handle it:

  • Behavioral anomaly detection. Track the P&L (profit and loss) of retained vs forwarded bets per agent. If an agent's forwarded bets consistently lose money while their retained bets consistently win, this is a red flag.
  • Forwarding pattern analysis. An agent who suddenly changes their matrix to forward more when they suspect a punter will win is detectable. Configuration changes that correlate with bet outcomes are flagged.
  • Automatic escalation. Agents whose forwarded flow exceeds a toxicity threshold are escalated for review. The platform can force a minimum retention percentage.

Scenario 4: Double Settlement / Result Correction

What happens: A cricket match is initially settled as "MI wins," all payouts are processed, and then the result is corrected (perhaps due to a scoring error or a ruling by match officials). All settlements must be reversed and re-processed.

How we handle it:

  • Re-settlement workflow. The system supports reversing a settlement and re-applying it with corrected results. This affects all agents in the cascade.
  • Ledger reconciliation. After re-settlement, all exposure ledgers are recalculated. Any agent who was paid out incorrectly has the amount clawed back. Any agent who paid out incorrectly receives a credit.
  • Communication chain. All affected agents receive automated notifications explaining the re-settlement, with full audit trails showing the before and after.

Scenario 5: System Outage During Peak

What happens: During the IPL final, the system experiences a partial outage. The bet processing service is down for 90 seconds while 10,000 bets are queued up.

How we handle it:

  • Circuit breaker pattern. When the system detects degraded performance (response times exceeding 500ms), it switches to a degraded mode where all bets are forwarded 100% to Betfair. No agent retains any risk during the outage. This is the safest possible default.
  • Fail-safe to 100% forwarding. If the routing engine cannot determine the correct split (because the forwarding matrix service is down), the bet is still accepted but forwarded entirely. The agent misses out on retention (lost profit opportunity) but is not exposed to unmanaged risk.
  • Queue and replay. Bets that arrive during the outage are queued. When the system recovers, they are replayed through the normal routing engine. If the bet was already forwarded 100%, a reconciliation process adjusts the positions retroactively.

14. The Agent Experience -- From Simple to Sophisticated

The Core Insight

Agents are not engineers. Some are seasoned operators who think in matrices and percentages. Others are street-level bookies who have never opened a spreadsheet. But they all share the same two goals: make more money and do not get wiped out overnight.

The system underneath is the same for everyone -- the forwarding matrix, cascading routing, exposure ledgers, hedge detection. What changes is how much of that complexity the agent sees. This is called progressive disclosure: show the simple version by default, reveal the complexity only when the agent asks for it.

The Three Tiers of Experience

┌──────────────────────────────────────────────────────────────────────┐
│ │
│ TIER 1: "SET AND FORGET" For: New agents, small │
│ ───────────────────────── operators, non-technical │
│ 3 questions at setup. │
│ Traffic light dashboard. 80% of agents live here. │
│ WhatsApp/SMS alerts. │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ TIER 2: "DASHBOARD DRIVER" For: Experienced agents, │ │
│ │ ────────────────────────── mid-size operations │ │
│ │ Real-time risk dashboard. │ │
│ │ Per-sport limits. Per-user 15% of agents grow into │ │
│ │ management. One-click hedging. this. │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ │ │ │
│ │ │ TIER 3: "MATRIX MASTER" For: Sophisticated │ │ │
│ │ │ ──────────────────────── operators, quant-minded │ │ │
│ │ │ Full 5D matrix editor. │ │ │
│ │ │ Test bet simulator. 5% of agents. These are │ │ │
│ │ │ Historical P&L analysis. your power users. │ │ │
│ │ │ Custom period configs. │ │ │
│ │ │ │ │ │
│ │ └──────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘

The critical design rule: Every tier uses the exact same engine underneath. A Tier 1 agent's "I want to be safe on cricket" translates to the same forwarding matrix, limits, and cascade logic that a Tier 3 agent configures by hand. The difference is who builds the configuration -- the agent or the system.

Tier 1: "Set and Forget" -- The 3-Question Onboarding

When a new agent joins, they answer three questions. That is it. The system generates their entire configuration from these answers.

Question 1: What do you trade?

┌──────────────────────────────────────────────────┐
│ What sports do you want to accept bets on? │
│ │
│ [✓] Cricket │
│ [ ] Football │
│ [ ] Tennis │
│ [ ] Kabaddi │
│ [ ] Other │
│ │
│ Sports you don't select will be automatically │
│ forwarded 100% to your upline. │
└──────────────────────────────────────────────────┘

Question 2: What is your nightly budget?

This is the question that matters most. Framed not as "liability limit" but as a question any bookie understands:

┌──────────────────────────────────────────────────┐
│ What is the MOST you are willing to lose in │
│ a single night? │
│ │
│ Think of your worst night ever. What amount │
│ would you be okay waking up to? │
│ │
│ ₹ [___________] │
│ │
│ Examples: │
│ Small operation: ₹2,00,000 (₹2 lakh) │
│ Medium operation: ₹10,00,000 (₹10 lakh) │
│ Large operation: ₹50,00,000 (₹50 lakh) │
└──────────────────────────────────────────────────┘

Question 3: How aggressive do you want to be?

┌──────────────────────────────────────────────────┐
│ How much risk do you want to keep? │
│ │
│ SAFE BALANCED AGGRESSIVE │
│ ◉────────────────────────────────────○ │
│ │
│ SAFE: Keep 30%, forward 70% │
│ "I sleep well, smaller profits" │
│ │
│ BALANCED: Keep 60%, forward 40% │
│ "Good mix of profit and safety" │
│ │
│ AGGRESSIVE: Keep 85%, forward 15% │
│ "Maximum profit, I can handle │
│ the swings" │
└──────────────────────────────────────────────────┘

What happens behind the scenes: From these three answers, the system generates:

Agent AnswerSystem Generates
"Cricket only"Forwarding matrix: Cricket = agent's retention %, all other sports = 100% forward
"₹10 lakh max loss"Night limit: ₹10L, Weekly limit: ₹50L (5x night), Per-match limit: ₹2L (night/5)
"Balanced" sliderDefault forwarding: 40%. In-play: 55% forward (more cautious). Fancy markets: 70% forward. Sharp users: 95% forward. Pre-match match odds: 30% forward
(Automatic)User win limits: Per-click ₹50,000, Aggregate ₹2,00,000/day. These are safe defaults.

The agent never sees the words "forwarding matrix," "liability limit," or "exposure ledger." They answered three questions and the system is configured.

The "Sleep Well" Number

Every agent, regardless of tier, sees one number prominently on their home screen:

┌──────────────────────────────────────────────────┐
│ │
│ YOUR MAXIMUM LOSS TONIGHT │
│ │
│ ₹ 3,42,000 │
│ │
│ out of your ₹10,00,000 night budget │
│ │
│ ████████░░░░░░░░░░░░ 34% │
│ │
│ If every live bet goes against you tonight, │
│ this is the absolute most you will lose. │
│ The system guarantees this. │
│ │
└──────────────────────────────────────────────────┘

How it is calculated: Sum of worst-case-liability across all retained open positions for this agent. This is not an estimate -- it is a mathematical guarantee. The cascade routing, limits, and NO_NEW_RISK mode ensure that this number can never exceed the agent's configured budget.

Why this matters: An agent who sees "₹3.42 lakh out of ₹10 lakh" knows they are safe. They can watch the match, enjoy the action, and not worry. When this number approaches their budget, the system automatically protects them (NO_NEW_RISK kicks in, overflow cascades to upline). The agent does not need to do anything.

For the really simple agent: This one number, delivered via WhatsApp at 10 PM every night, might be all they ever need:

"Tonight's update: Your maximum possible loss is ₹3.42L (34% of your ₹10L budget). 127 bets placed. Everything running smoothly."

Tier 1 Dashboard: The Traffic Light View

For agents who do not want numbers and charts, a traffic light is enough:

┌──────────────────────────────────────────────────┐
│ │
│ TONIGHT'S STATUS │
│ │
│ Cricket 🟢 All good. Well within limits. │
│ Football ⚪ Not active tonight. │
│ Tennis ⚪ Forwarding 100% (your choice). │
│ │
│ OVERALL 🟢 Safe. ₹3.4L / ₹10L used. │
│ │
│ LAST HOUR │
│ 42 bets accepted │
│ Estimated profit so far: +₹18,000 │
│ │
│ ⚠ 1 alert: Rahul's stake was reduced │
│ (he's close to his win limit) │
│ │
│ [View Details] [Panic: Stop Everything] │
│ │
└──────────────────────────────────────────────────┘

The traffic light meanings:

ColorMeaningAgent Action Required
🟢 GreenBelow 60% of all limitsNone. Relax.
🟡 YellowBetween 60-85% of any limitBe aware. System is still accepting bets but approaching limits.
🔴 RedAbove 85% of any limit, or NO_NEW_RISK is activeSystem is protecting you. New risk bets are being forwarded. Hedges still accepted.
⚪ GreySport not active or fully forwardedNothing happening here.

The key design insight: A Tier 1 agent never needs to leave this screen. The system runs itself. The traffic light tells them if they should worry. The "Sleep Well" number tells them their maximum downside. The panic button is there if they ever feel nervous.

Tier 2 Dashboard: The Risk Cockpit

When an agent is ready for more detail, they tap "View Details" and enter the full dashboard. This is for agents who want to actively manage their book during a match.

Real-Time Risk Dashboard

+============================================================================+
| RAJESH'S DASHBOARD Feb 11, 2026 9:47 PM|
+============================================================================+
| |
| EXPOSURE SUMMARY |
| ┌─────────────────────────────────────────────────────────────────────┐ |
| │ Cricket ██████████████████████░░░░░░░░ ₹38.2L / ₹50L (76%) │ |
| │ Football ████░░░░░░░░░░░░░░░░░░░░░░░░░ ₹4.1L / ₹20L (21%) │ |
| │ Tennis ██░░░░░░░░░░░░░░░░░░░░░░░░░░░ ₹1.2L / ₹10L (12%) │ |
| └─────────────────────────────────────────────────────────────────────┘ |
| |
| TONIGHT'S SESSION (7:00 PM - 2:00 AM IST) |
| ┌─────────────────────────────────────────────────────────────────────┐ |
| │ Night Limit █████████████████████████░░░░ ₹8.4L / ₹10L (84%) │ |
| │ ⚠ APPROACHING LIMIT - 16% remaining │ |
| │ At current rate, limit reached in ~25 minutes │ |
| └─────────────────────────────────────────────────────────────────────┘ |
| |
| TOP MATCHES BY EXPOSURE |
| ┌──────────────────────────────────────────────────────────────────┐ |
| │ 1. MI vs CSK (Live) ₹4.8L retained ₹5L limit (96%) │ |
| │ ⚠ NEAR LIMIT - Will enter NO_NEW_RISK at ₹5L │ |
| │ │ |
| │ 2. RCB vs DC (Pre) ₹2.1L retained ₹5L limit (42%) │ |
| │ │ |
| │ 3. KKR vs SRH (Pre) ₹1.5L retained ₹5L limit (30%) │ |
| └──────────────────────────────────────────────────────────────────┘ |
| |
| RECENT BETS (last 10 minutes) |
| ┌──────────────────────────────────────────────────────────────────┐ |
| │ 9:45 PM Amit MI to win ₹10,000 Retained 60% ✓ │ |
| │ 9:43 PM Sonia CSK +1.5 ₹25,000 Retained 40% ✓ │ |
| │ 9:41 PM Rahul Fancy 180+ ₹50,000 Reduced to ₹12,000 ⚠ │ |
| │ 9:38 PM Deepa MI to win ₹8,000 Retained 60% ✓ │ |
| └──────────────────────────────────────────────────────────────────┘ |
| |
| [🔴 PANIC: Hedge All] [Adjust Limits] [View Full Audit] |
| |
+============================================================================+

Alert Priority Levels

PriorityDeliveryTriggerExample
P1 - CriticalSMS + Push notification + DashboardLimit breached, NO_NEW_RISK activated, suspected fraud"Your cricket night limit has been reached. NO_NEW_RISK is now active."
P2 - WarningPush notification + DashboardApproaching limit (>80%), unusual betting pattern, sharp user detected"MI vs CSK match exposure is at 96% of limit."
P3 - InformationalDashboard onlyPeriod rollover, settlement complete, configuration change"Night period ended. Carried forward ₹3.2L to day period."

Key Reports

ReportFrequencyWhat It Shows
Daily P&LDaily at period endProfit/loss by sport, market, and user tier. Which bets made money, which lost money.
Weekly SettlementWeekly on MondayNet positions with upline, amounts owed/receivable, forwarded vs retained breakdown.
Sharp User ReportWeeklyUsers flagged as sharp, their CLV scores, recommended actions.
Matrix EffectivenessOn demandHow well the forwarding matrix performed -- did high-retention bets profit or lose?
Limit UtilizationOn demandHow close the agent came to each limit, how often NO_NEW_RISK was triggered, average duration.

The Panic Button

The dashboard includes a "Hedge All" button that, when pressed:

  1. Immediately sets the agent's forwarding to 100% for all sports and markets
  2. Places hedge orders on Betfair for all current retained positions (where exchange markets exist)
  3. Sends a notification to the agent's upline
  4. Logs the action with a timestamp and reason

This is the emergency exit. If an agent sees something alarming -- a suspicious pattern, a sudden exposure spike, or just gets nervous -- one button brings their risk to near zero. They can then calmly assess the situation and adjust.

Tier 3 Dashboard: The Matrix Master

For the 5% of agents who want full control, the system exposes everything -- but only when they explicitly navigate to it. This is never the default view.

The Matrix Editor:

Instead of a raw spreadsheet, the matrix editor uses a guided builder with immediate feedback:

┌──────────────────────────────────────────────────────────────────────┐
│ FORWARDING MATRIX EDITOR │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Rule 7 of 12 │ │
│ │ │ │
│ │ WHEN a bet matches ALL of these: │ │
│ │ Sport: [Cricket ▼] │ │
│ │ Market: [Fancy ▼] │ │
│ │ Phase: [In-Play ▼] │ │
│ │ User Type: [Any ▼] │ │
│ │ Liquidity: [Any ▼] │ │
│ │ │ │
│ │ THEN: │ │
│ │ Keep [30]% ◉──────────○ Forward [70]% │ │
│ │ │ │
│ │ LAST WEEK: This rule matched 234 bets (₹18.4L volume) │ │
│ │ RESULT: You would have profited ₹1.2L on retained portion │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ [+ Add Rule] [Test a Bet] [View Conflicts] │
│ │
└──────────────────────────────────────────────────────────────────────┘

The Test Bet Simulator:

The single most important tool for a Tier 3 agent. Before any real money flows, they can test:

┌──────────────────────────────────────────────────────────────────────┐
│ TEST A BET │
│ │
│ Sport: Cricket Market: Fancy (Session Runs) │
│ Phase: In-Play User: Amit (NORMAL) │
│ Odds: 2.50 Stake: ₹25,000 │
│ │
│ [Run Test] │
│ │
│ RESULT: │
│ ───────────────────────────────────────────────────── │
│ Step 1: User win cap check │
│ Potential win: ₹37,500. Limit: ₹50,000. PASS │
│ │
│ Step 2: Forwarding resolution │
│ Rule 7 matched (Cricket + Fancy + In-Play) │
│ Specificity: 3/5. No higher-priority rule found. │
│ Forward: 70%. Retain: 30%. │
│ │
│ Step 3: Your retention │
│ Retained stake: ₹7,500 │
│ Retained liability: ₹11,250 │
│ Your cricket limit after: ₹18.7L / ₹50L (37%) │
│ │
│ Step 4: Cascade to upline (Vikram) │
│ Forwarded: ₹17,500 → Vikram retains 60% → Platform → BF │
│ │
│ This is exactly what would happen with a real bet. │
└──────────────────────────────────────────────────────────────────────┘

Graduating Between Tiers

Agents are never locked into a tier. The system nudges them upward when they are ready.

Tier 1 → Tier 2 nudge: After 2 weeks of operation, if the agent's dashboard shows they are consistently hitting limits or forwarding more than they want, the system suggests: "You are forwarding 45% of cricket bets. Want to adjust per-sport settings? [Show me how]"

Tier 2 → Tier 3 nudge: After the agent has manually adjusted limits 5+ times, the system suggests: "You keep changing your cricket in-play settings. A forwarding matrix rule could automate this. [Set up a rule]"

Tier 3 → Tier 1 fallback: If a Tier 3 agent creates a matrix that is clearly dangerous (e.g., 100% retention on in-play fancies), the system warns: "This configuration would have lost ₹8.2L last month. Are you sure? [Keep my settings] [Switch to Balanced preset]"

Preset Profiles: One-Click Configuration

For agents who want more control than 3 questions but less than a full matrix:

PresetWhat It DoesWho It Is For
Conservative Cricket30% retention on match odds, 15% on fancies, 0% on in-play fancies. Night limit = budget x 0.5. Sharp users = 100% forward.New agents, small bankroll, risk-averse
Balanced Cricket60% retention on match odds, 30% on fancies, 20% on in-play fancies. Night limit = budget.Experienced agents, moderate bankroll
Aggressive IPL80% retention on match odds, 50% on fancies, 30% on in-play fancies. Night limit = budget x 1.5 (with weekly safety net).Large bankroll, IPL specialists
Football Only60% retention Premier League, 30% lower leagues, 0% tennis/cricket.Football-focused agents
Forward Everything0% retention across all sports. Agent earns commission on volume only.Agents who want zero risk, commission-only model

Each preset is a fully configured forwarding matrix + limits + user win caps. The agent can select a preset, see exactly what it configures, and customize individual values if they want. The preset is the starting point, not a cage.

WhatsApp and SMS: Meeting Agents Where They Are

Many agents in India and Southeast Asia run their operations primarily through WhatsApp. A dashboard they never open is useless. The system must push critical information to them through channels they already use.

Scheduled Messages:

TimeMessage
Start of night session"Good evening. Your night budget: ₹10L. Current exposure: ₹0. System is ready."
Every 2 hours during session"Update: ₹3.4L used (34%). 89 bets tonight. Estimated profit: +₹24,000. All green."
When yellow threshold hit"Heads up: Cricket exposure at 72% of limit. System still accepting bets. No action needed unless you want to adjust."
When red threshold hit"Your cricket night limit has been reached. System is now forwarding new cricket bets to your upline. Hedge bets still accepted. You are protected."
End of night session"Night summary: 214 bets. Net result: +₹1,85,000. Forwarded ₹12.4L to Vikram. Maximum loss was ₹4.1L (41% of budget). Settlement pending."

Interactive Commands (via WhatsApp chatbot):

Agent TypesCommandResponse
"status"Current exposure, limits, traffic light color
"stop cricket"Sets cricket forwarding to 100%. Confirmation: "Cricket bets now forwarding 100%. You retain zero risk."
"resume cricket"Restores previous cricket settings.
"panic"Triggers the hedge-all panic button. "All positions being hedged. Forwarding set to 100%. You are safe."
"limit 15L"Updates night budget to ₹15L. Confirmation with new max-loss number.
"sharp amit"Tags user Amit as sharp. "Amit's bets will now be forwarded 95%. Confirm?"

This means an agent sitting in a chai shop watching the match on TV can manage their entire book through WhatsApp without ever opening a dashboard.

The Principle: Complexity Is Available, Never Required

The entire UX philosophy can be summarized in one sentence: the system should work perfectly for an agent who never touches a single setting after onboarding, AND it should give full control to an agent who wants to tune every parameter.

The 3-question onboarding generates safe, profitable defaults. The traffic light tells the agent if anything needs attention. The "Sleep Well" number gives them peace of mind. WhatsApp keeps them informed. The full dashboard is there when they want it. The matrix editor is there when they are ready for it.

No agent should ever feel overwhelmed by the system. And no agent should ever feel limited by it.


15. Performance Architecture

Latency Budget

The entire bet processing path must complete within 90 milliseconds on the synchronous path, with 110ms of headroom before the user experience degrades.

StepBudgetDescription
Request parsing & validation5msParse the incoming bet request, validate format
Metrics computation3msCalculate potential win, liability
User win cap check5msCheck per-click and aggregate limits
Stake reduction (if needed)2msCalculate reduced stake
Matrix resolution10msLook up the forwarding percentage
Agent cap evaluation (per level)10ms x N levelsCheck limits at each cascade level (typically 2-3 levels)
Position creation15msWrite positions to database
Exposure ledger update10msUpdate all affected ledgers atomically
Audit record creation10msPersist the audit trail
Response5msReturn confirmation to the punter
Total (3-level cascade)~85msWithin budget

Memory Architecture: 3-Tier

TierTechnologyTTLHit RateUse Case
Tier 1Application LRU Cache5 seconds~60%Exposure far from limits, matrix lookups, agent config
Tier 2RedisUntil invalidated~25%Exposure checks, active period boundaries, NO_NEW_RISK flags
Tier 3PostgreSQLPersistent~15%Near-limit exposure writes (FOR UPDATE), position creation, audit records

Burst Traffic Handling: IPL Final Scenario

During the IPL final, bet volume can spike to 10,000 bets per minute (about 167 per second). Here is how the system handles it:

ChallengeSolution
Database write contentionSharded exposure counters -- instead of one row per agent per sport, use N shards. Each bet updates a random shard. Reads sum all shards.
Matrix lookup speedPre-computed matrix resolution cache. When an agent's matrix changes, all possible resolution paths are pre-computed and cached. During the match, lookups are O(1) hash lookups.
Audit trail write volumeBatch audit writes. Audit records are buffered in memory and flushed every 500ms. In a crash, up to 500ms of audit records may be lost (positions are never lost because they use the synchronous path).
Redis connection poolDedicated Redis connection pool for exposure checks, separate from general-purpose cache.

Sharded Exposure Counters

For agents receiving high bet volume, a single exposure counter row becomes a bottleneck because every bet needs to lock it.

Solution: instead of one counter, use 8 shards:

RAJESH CRICKET EXPOSURE (SHARDED)
===================================
Shard 0: ₹4,75,000
Shard 1: ₹5,12,000
Shard 2: ₹4,88,000
Shard 3: ₹5,01,000
Shard 4: ₹4,95,000
Shard 5: ₹5,23,000
Shard 6: ₹4,67,000
Shard 7: ₹4,89,000
-----------------------------------
Total: ₹39,50,000 / ₹50,00,000

Each incoming bet randomly picks a shard and only locks that shard. Contention drops by 8x. Reads sum all shards (slightly slower but still sub-millisecond from Redis).

What's Cached Where and for How Long

DataCache LocationTTLInvalidation
Agent forwarding matrixApplication LRU + Redis5 min / until changeOn matrix update, invalidate immediately
Agent limits configurationApplication LRU + Redis5 min / until changeOn limit update, invalidate immediately
Current exposure (far from limit)Application LRU5 secondsTime-based expiry
Current exposure (near limit)Not cached--Always read from PostgreSQL with lock
User win cap stateRedisUntil period resetOn bet placement (update), on period reset (clear)
NO_NEW_RISK flagsRedisUntil clearedOn limit breach (set), on settlement or limit change (clear)
Period boundariesApplication LRU1 hourOn config change
Sharp user flagsRedis1 hourOn detection service update

16. Competitive Landscape

FeatureBetfairbet365PinnacleAsian BooksHannibal (Target)
Risk ModelPure exchange (no risk)B-Book + A-Book hybridSharp-friendly B-BookPrimarily B-BookHierarchical B-Book with automated routing
Agent HierarchyNone (B2C only)None (B2C only)None (B2C only)Manual, phone-basedAutomated, N-level, with cascading
Forwarding LogicN/AProprietary, opaqueN/AManual negotiationConfigurable multi-dimensional matrix
Limit ManagementMarket-based liquidityPer-user, opaqueMinimal (welcomes sharps)Per-user, manualPer-agent, per-sport, per-market, per-period
Audit TrailExchange provides full transparencyMinimal for agentsBasicNoneComplete, replayable, deterministic
Sharp HandlingExchange market handles itRestrict accounts aggressivelyWelcome and manageRestrict and forwardConfigurable per-user forwarding
Hedge OptionsIS the exchangeInternal + BetfairInternal modelsBetfair + internalBetfair + multi-exchange (planned)
Target MarketDeveloped (UK, AU)Global B2CGlobal B2C (niche)Asia, manual agent networksAgent networks: India, SE Asia, Africa

What to Learn from Each

CompetitorLesson for Hannibal
BetfairThe exchange model provides perfect transparency. Hannibal's audit trail should aspire to Betfair-level transparency within its B-Book model.
bet365Their risk management is world-class but opaque. Agents hate opacity. Hannibal should match their sophistication while providing the transparency agents demand.
PinnacleTheir sharp-friendly model proves you can profit even from sharp users if you manage margins correctly. Hannibal's forwarding matrix should allow agents to choose their sharp tolerance.
Asian BooksThey understand the agent hierarchy model deeply but use manual processes. Hannibal automates what they do by hand.

17. Phased Rollout Plan

Phase 1: Agent Risk Controls (Weeks 1-4)

Goal: Every agent has enforceable limits and real-time exposure tracking.

WeekDeliverable
1-2Per-agent limit configuration (sport, market, period) + database models
2-3Real-time exposure tracking with Redis fast-path
3-4NO_NEW_RISK mode (automatic trigger + manual override)
4Per-click user win limits + stake reduction

Why this first: Limits and exposure tracking are independent of the forwarding matrix. They provide immediate value by protecting agents from over-exposure. Even without smart routing, agents get safety.

Success metric: Zero incidents where an agent exceeds their configured limit.

Phase 2: Smart Forwarding (Weeks 5-10)

Goal: Bets are routed through the hierarchy based on configurable rules.

WeekDeliverable
5-6Forwarding matrix data model + basic resolution (market_type + sport_type)
6-7Precedence chain (user override > market override > matrix > default)
7-8Cascading upline routing (N-level)
8-9Aggregate user win limits + period definitions (night/weekly)
9-10Integration testing, overflow scenarios, suspended agent handling

Why this second: This is the core value proposition. But it depends on the limits and exposure tracking from Phase 1.

Success metric: Bets correctly routed through 3+ levels with deterministic, auditable decisions.

Phase 3: Intelligence & Polish (Weeks 11-16)

Goal: Full 5-dimensional matrix, advanced detection, and complete audit trail.

WeekDeliverable
11-12Full 5D matrix (add event_phase, source_type, liquidity_band)
12-13Advanced hedge detection (multi-outcome worst-case analysis)
13-14Complete structured audit trail with replay capability
14-15Cross-agent sharp detection and syndicate detection
15-16Agent dashboard v2 with all reports and the panic button

Why this third: The 5D matrix and advanced detection are refinements. The 2D matrix from Phase 2 handles 80% of cases. Phase 3 handles the remaining 20%.

Success metric: Audit trail passes third-party review. Sharp detection flags known sharp users within 100 bets.

Phase 4: Scale & Optimize (Weeks 17+)

Goal: Handle peak traffic, add intelligence, expand hedge options.

DeliverableDescription
ML-based odds adjustmentUse historical data to adjust odds before they reach the punter
Multi-exchange hedgingHedge on Betfair, Smarkets, Betdaq, and local exchanges
Mobile dashboardFull agent dashboard on mobile devices
Sharded exposure countersHandle 10,000+ bets/minute during IPL final
Auto-matrix optimizationSuggest matrix changes based on historical P&L

18. Revenue Model

Four Revenue Streams

StreamDescriptionExample
Transaction FeesA small percentage of every bet processed through the platform1-2% of stake on every bet
Retained Risk ProfitThe platform retains a portion of bets (at the top of the cascade). On average, the bookie has an edge, so retained risk is profitable over time.Platform retains ₹800 of a ₹10,000 bet. Over thousands of bets, the edge produces ~5% margin.
Betfair Arbitrage SpreadWhen hedging on Betfair, the platform can capture a spread between the price offered to the punter and the price available on the exchange.Punter gets odds of 1.85, Betfair offers 1.90. The 0.05 spread on every hedged rupee is pure profit.
Data IntelligenceAggregate anonymized betting data has value for odds compilation, market making, and risk modeling.Subscription service for odds providers and analytics firms.

Financial Modeling Example

Consider a moderately busy day on Hannibal:

DAILY FINANCIAL MODEL
=====================

Total Stake Processed: ₹5,00,00,000 (₹5 crore)

Revenue Stream Breakdown:
-----------------------------------------------------------------
Transaction Fees (1.5%): ₹7,50,000
→ All bets, regardless of routing

Platform Retained Risk: ₹50,00,000 stake retained
→ 5% edge over time: ₹2,50,000

Betfair Hedge Spread: ₹1,00,00,000 hedged
→ Average 0.03 spread: ₹3,00,000

Data Intelligence: ₹50,000 (amortized daily)
-----------------------------------------------------------------
TOTAL DAILY REVENUE: ₹13,50,000

ANNUAL PROJECTION: ₹49+ crore
(assuming 365 operating days and modest growth)

The Key Insight

Hannibal is an operating system for bookmakers, not a bookmaker itself.

This distinction is critical. A bookmaker takes risk and profits (or loses) from betting outcomes. Hannibal provides the infrastructure that enables agents to take risk efficiently. Like an operating system, it earns from:

  • Providing the platform (transaction fees)
  • Running a small retained book at the top of the cascade (retained risk)
  • Facilitating exchange access (hedge spread)
  • Generating intelligence from aggregate data

This means Hannibal's revenue is diversified and largely non-directional. A bad day for bookies (punters win big) still generates transaction fees. A good day for bookies generates both fees and retained risk profit. The operating system always earns.


19. Implementation Order (for Developers)

The Guiding Principle

The single most important architectural insight: a bet currently goes from 1 routing destination to N destinations. Every implementation step moves toward this goal incrementally.

Step-by-Step Order

Step 1: Data Models First

Add all new Prisma models to the schema without changing any behavior. This is the safest possible first step -- it is purely additive. New tables for: forwarding matrix rules, agent limits, exposure ledgers, period definitions, audit records, and user win caps.

No existing behavior changes. No existing tests break. The database migration is backward-compatible.

Step 2: Audit Trail Second

Implement the audit record creation for every bet, even before forwarding logic changes. This provides immediate value: every bet now has a complete decision record. It also serves as an early warning system when we start changing routing behavior -- we can compare audit trails before and after.

Step 3: User Win Limits Third

Implement per-click win limits and stake reduction. This is independent of the agent hierarchy and forwarding logic. It sits at the very beginning of the bet flow (before routing decisions). It can be tested and deployed in isolation.

Step 4: Forwarding Precedence Chain Fourth

Implement the resolution logic: user override, then market override, then matrix lookup, then agent default. Initially, the matrix will be simple (2 dimensions: market_type and sport_type). The cascade still goes to a single destination, but the forwarding percentage is now determined by the precedence chain instead of a flat configuration.

Step 5: Cascading Routing Fifth

This is the big structural change. A bet that previously went to one destination now flows through the full agent hierarchy. Each level resolves its own forwarding percentage, checks its own limits, and forwards the remainder.

This must be implemented behind a feature flag so it can be enabled per agent. Early adopters test it while others continue with the existing behavior.

Step 6: NO_NEW_RISK and Hedge Detection Sixth

Implement automatic NO_NEW_RISK triggering and hedge detection. This depends on the exposure ledgers from Step 5 being accurate, which is why it comes after cascading routing.

Step 7: Period Management Last

Implement night and weekly periods, with timezone handling and carry-forward logic. This is last because it is the most operationally complex feature and depends on all other components working correctly.

Feature Flag Strategy

Every major feature is wrapped in a feature flag:

FlagControlsDefault
bbook.forwarding_matrix.enabledWhether the matrix is used for routing decisionsOFF
bbook.cascading_routing.enabledWhether bets cascade through the hierarchyOFF
bbook.user_win_limits.enabledWhether per-click and aggregate win limits are enforcedOFF
bbook.no_new_risk.enabledWhether NO_NEW_RISK mode can activateOFF
bbook.period_management.enabledWhether night/weekly periods are activeOFF
bbook.audit_trail.enabledWhether audit records are createdON (from Step 2 onward)

Flags can be toggled per agent. This allows:

  • Gradual rollout to trusted agents first
  • Quick rollback if issues are discovered
  • A/B comparison between old and new routing
  • Production testing with real traffic but limited blast radius

The Migration Path

TODAY                     PHASE 1                  PHASE 2                  PHASE 3
===== ======= ======= =======

Bet → Single Agent Bet → Single Agent Bet → Agent L1 Bet → Agent L1
(flat B-Book %) (with limits) → Agent L2 → Agent L2
(no limits) (win caps) → Platform → Platform
(no audit) (audit trail) → Betfair → Betfair
(2D matrix) (5D matrix)
(basic cascade) (hedge detection)
(NO_NEW_RISK)
(periods)

20. The Bookie's Final Verdict

The Spec Gets the Math Right, but Misses Operational Reality

The B-Book system as designed is mathematically sound. The forwarding matrix, cascading routing, and exposure accounting are all correct. But a system that is correct on paper and a system that survives contact with real bookies operating during a live IPL match are two different things.

Here is what operational reality demands:

The Five Things That Will Make or Break the System

1. Speed of configuration changes.

During a live match, a bookie needs to change their matrix in seconds, not minutes. If MI loses 3 wickets in an over and the bookie wants to reduce retention from 40% to 10%, the system must allow this change to take effect on the very next bet. A configuration change that requires a page reload, a cache flush, or a 30-second propagation delay is unacceptable.

Design response: Matrix changes take effect immediately. The system invalidates all caches for the affected agent synchronously. The next bet uses the new matrix.

2. Visibility into what is happening RIGHT NOW.

Bookies do not look at reports after the fact. They look at dashboards during the match. They need to see, in real time: current exposure by match, current exposure by outcome, how close they are to each limit, which users are winning, which users are losing, and what bets are coming in right now.

Design response: The real-time dashboard (Section 14) is not a nice-to-have; it is the product. The B-Book engine is invisible infrastructure. The dashboard is what agents interact with.

3. The ability to override everything.

No matter how good the matrix is, there will be moments when the bookie wants to override it. "I have inside information that this match is suspicious -- forward everything." "This user is my cousin -- let his bet through even though it exceeds the cap." The system must support manual overrides at every level without breaking the audit trail.

Design response: User overrides and market overrides sit above the matrix in the precedence chain. Every override is logged. The system accommodates human judgment while ensuring accountability.

4. Settlement speed and accuracy.

The bookie's trust in the system is earned at settlement time. If settlements are delayed, incorrect, or confusing, the agent will abandon the platform. Settlement must happen within minutes of a match ending, and the numbers must match exactly what the agent expects based on what they saw on their dashboard.

Design response: Settlement cascades through the hierarchy using the same audit records that were created at bet time. The agent can verify every settled bet against the audit trail. Discrepancies are impossible because the settlement engine uses the same source of truth as the bet engine.

5. Graceful degradation, not catastrophic failure.

When something goes wrong -- and it will, during the biggest matches at the worst possible time -- the system must degrade gracefully. A Redis outage should not reject bets; it should fall back to PostgreSQL. A Betfair outage should not block bet placement; it should absorb as retained risk. A matrix configuration error should not route bets to the void; it should fall back to the agent default.

Design response: Every component has a fallback. The cascade has a backstop. The system is designed to always accept the bet and always route it somewhere safe, even if that somewhere is not optimal.

What Would Make Every Bookie Want This System

The ultimate test is simple: does this system make more money for the bookie while requiring less manual work?

If Rajesh can configure his risk appetite once, trust the system to enforce it, see his position in real time, sleep through the night knowing his limits protect him, settle with Vikram cleanly every Monday, and identify his sharp users before they drain his bankroll -- then this system wins.

The B-Book is not a technical achievement to be admired. It is a tool to be used. Its success will be measured not in latency percentiles or audit trail completeness, but in how many agents adopt it, how much volume flows through it, and how few disputes arise from it.

Build the dashboard first. Make the math invisible. Let the bookie focus on what they do best: understanding their market and their punters. Let the system handle everything else.


Part II: Gap Analysis Solutions -- The Immune System

The following sections address critical gaps identified during expert review by a veteran B-Book architect (20+ years, 4 B-Book systems built) and a senior financial systems engineer. These are the "immune system" of the B-Book -- the mechanisms that handle failures, prevent exploits, and ensure the system degrades gracefully when things go wrong.

As the reviewer noted: "The forwarding matrix is the brain. What is missing is the immune system. Build the safety systems before you build the intelligence systems."


21. Bet Cancellation / Void / Partial Settlement State Machine

Why This Matters

In real bookmaking, bets do not always travel the happy path from placement to settlement. Matches get abandoned. Rain interrupts play after 10 overs. A corruption ruling voids specific markets. An admin discovers a data feed error and needs to void bets placed during a 30-second window. A punter calls within 5 seconds asking to cancel.

Every one of these scenarios must be handled without breaking the exposure ledgers, without double-counting, and without leaving orphaned positions anywhere in the agent hierarchy.

The Complete Bet State Machine

State Definitions

StateWhat It MeansExposure ImpactReversible?
BET_PLACEDBet accepted, positions being created. Transient state (< 100ms).Ledgers not yet updatedYes -- system error during creation rolls back
ACTIVEAll positions created, all exposure ledgers updated. The bet is live.Fully reflected in all agent ledgersNo -- can only move forward to SETTLED, VOIDED, etc.
SETTLEDEvent result known, P&L calculated, payouts determined.Exposure removed from ledgers, P&L appliedCan move to RE_SETTLED if result correction
VOIDEDEntire bet nullified. All stakes returned. As if the bet never happened.All exposure atomically removed from every agent in the chainNo -- void is final
PARTIALLY_VOIDEDSome markets/legs within the bet are voided, others settled normally.Voided portion removed, settled portion resolved normallyNo
CANCELLEDPunter-initiated cancellation within the allowed window. Functionally identical to void.All exposure removedNo
CASH_OUT_SETTLEDPunter took early settlement via cash-out.Original position closed, counter-position created and settledNo
RE_SETTLEDA previously settled bet has been re-settled due to result correction.Previous P&L reversed, new P&L appliedCan be re-settled again if needed
REJECTEDBet failed validation (invalid market, suspended event, etc.). Never reached ACTIVE.Zero -- no positions were createdNo

Who Can Initiate Each State Transition

TransitionInitiated ByAuthorization RequiredTime Window
ACTIVE -> SETTLEDSystem (automatic)Event result feedAfter event concludes
ACTIVE -> VOIDEDPlatform adminADMIN or SUPER_ADMIN roleAny time before settlement
ACTIVE -> VOIDEDSystem (automatic)Abandoned event rule triggersWhen event is officially abandoned
ACTIVE -> PARTIALLY_VOIDEDPlatform admin or systemSame as voidWhen specific markets are voided
BET_PLACED -> CANCELLEDPunterPunter's own bet onlyWithin cancellation window (configurable, typically 3-5 seconds for pre-match, 0 seconds for in-play)
BET_PLACED -> CANCELLEDAgent (Rajesh)Agent can cancel bets of their own puntersWithin 60 seconds (configurable per agent)
SETTLED -> RE_SETTLEDPlatform adminSUPER_ADMIN role onlyWithin 72 hours of original settlement

Key rule: Agents cannot void bets. Only the platform can void. Agents can cancel within a short window. This prevents an agent from voiding a bet after seeing that it lost (which would be a form of fraud against their upline).

What Triggers Each Void Type

TriggerVoid TypeScopeExample
Match abandoned (weather, floodlight failure)FULL_VOIDAll markets on that eventIPL match abandoned after rain, no result possible
Match abandoned after partial completionPARTIAL_VOIDCompleted markets settle, incomplete markets voidIPL match abandoned after 10 overs -- completed over markets settle, match odds void
Corruption/match-fixing rulingFULL_VOIDAll markets on that eventICC declares match result void due to fixing investigation
Data feed errorSELECTIVE_VOIDBets placed during the error windowOdds feed showed 1.05 instead of 10.5 for 30 seconds; bets during that window are voided
Punter cancellationCANCELLATIONSingle betAmit taps "Cancel" within 3 seconds of placing a pre-match bet
Admin decisionADMIN_VOIDAny scope (single bet, all bets on a market, all bets on an event)Admin discovers a technical glitch and voids affected bets

How Voids Cascade Through the Agent Hierarchy

This is the critical design challenge. When Amit's bet was placed, it was split across Rajesh (60%), Vikram (24%), and the Platform (16%). A void must reverse every single one of those positions atomically.

The key principle: the void operation reads the original audit record to determine exactly what to reverse. It does not recalculate anything. It uses the recorded split from bet placement time. This ensures that even if Rajesh changed his matrix since then, the void reverses exactly what was originally done.

Idempotent Void Operations

Every void operation is assigned a unique void_operation_id. Before executing, the system checks whether this void_operation_id has already been applied.

VOID IDEMPOTENCY CHECK
========================

1. Admin requests: void bet_a1b2c3d4, reason: MATCH_ABANDONED
2. System generates: void_op_id = void_bet_a1b2c3d4_MATCH_ABANDONED_v1
3. System checks: SELECT * FROM void_operations WHERE void_op_id = ?
4a. If NOT found: Execute void, record void_op_id with result
4b. If found: Return the recorded result, do NOT execute again

This means:
- Pressing "Void" twice does NOT double-decrement exposure
- A network retry after timeout does NOT create a second void
- A batch void that partially fails can be safely retried

The void_operations table stores:

ColumnTypePurpose
void_op_idTEXT (PK)Idempotency key
bet_idTEXTWhich bet was voided
void_typeENUMFULL_VOID, PARTIAL_VOID, CANCELLATION, ADMIN_VOID
reasonTEXTHuman-readable reason
initiated_byTEXTUser ID of who initiated it
positions_reversedJSONBSnapshot of every position that was reversed
ledger_adjustmentsJSONBSnapshot of every ledger decrement
executed_atTIMESTAMPWhen the void was applied
idempotent_hit_countINTHow many times this void was re-requested after first execution

How Exposure Ledgers Are Atomically Decremented

The void executes as a single database transaction with FOR UPDATE locks on all affected exposure ledger rows. The transaction includes:

BEGIN TRANSACTION;

-- Lock all affected ledger rows in a deterministic order
-- (always lock by agent_id ascending to prevent deadlocks)

SELECT * FROM exposure_ledgers
WHERE (agent_id = 'rajesh' AND scope = 'cricket_sport')
OR (agent_id = 'rajesh' AND scope = 'mi_vs_csk_match')
OR (agent_id = 'rajesh' AND scope = 'night_period')
OR (agent_id = 'vikram' AND scope = 'cricket_sport')
... (all affected scopes for all agents)
FOR UPDATE;

-- Decrement each ledger by the exact amount from the audit record
-- Update Redis cache after commit
-- Insert void_operation record
-- Update bet status to VOIDED

COMMIT;

After the database transaction commits, Redis and application LRU caches are invalidated for all affected agents. The invalidation order does not matter because the caches are read-through -- a cache miss simply reads the correct value from PostgreSQL.

NO_NEW_RISK Re-evaluation After Voids

When a void reduces an agent's exposure, the system must check whether the agent should exit NO_NEW_RISK mode:

AFTER VOID:
1. Read Rajesh's current retained_open_liability for each scope
2. Compare against each limit
3. If retained_open_liability < limit for ALL scopes:
→ Clear NO_NEW_RISK flag in Redis
→ Agent can accept new risk bets again
4. If still over limit for any scope:
→ Keep NO_NEW_RISK active for that scope

This check happens inside the same transaction as the void. The NO_NEW_RISK flag in Redis is updated immediately after the transaction commits.

Walk-Through: IPL Match Abandoned After 10 Overs

Scenario: MI vs CSK, IPL 2026. Rain stops play after 10 overs. The match is officially abandoned -- no result. Here is what happens:

The markets on this match:

MarketStatus at AbandonmentAction
Match Odds (MI win / CSK win / Draw)Incomplete -- no result determinedVOIDED -- stakes returned
First Innings Total RunsIncomplete -- only 10 overs bowled of 20VOIDED -- stakes returned
Over 1 Runs (6.5 over/under)Completed -- over 1 finished with 8 runsSETTLED -- over 6.5 wins
Over 2 Runs (7.5 over/under)Completed -- over 2 finished with 6 runsSETTLED -- under 7.5 wins
... Overs 3-10 ...CompletedSETTLED normally
Over 11 RunsNot startedVOIDED -- stakes returned
Top BatsmanIncompleteVOIDED -- stakes returned

How this processes:

Step 1: Event marked as ABANDONED by the feed or admin.

The system receives the event status update: event_status = ABANDONED, overs_completed = 10.

Step 2: Market-level settlement rules kick in.

Each market has a settlement rule defined at creation time:

Market TypeAbandonment Rule
Match OddsVoid if no result
Over X RunsSettle if over X is completed, void if not
Top BatsmanVoid unless one innings fully completed
First Innings TotalVoid unless first innings fully completed

Step 3: System generates a batch of void and settlement operations.

For this match, there are 847 open bets. The system processes them in a single settlement batch:

  • 312 bets on Match Odds: all VOIDED
  • 85 bets on First Innings Total: all VOIDED
  • 43 bets on Top Batsman: all VOIDED
  • 20 bets on Over 11-20 markets: all VOIDED
  • 387 bets on Over 1-10 markets: all SETTLED with actual results

Step 4: Void cascade for each voided bet.

Take Amit's bet as an example. He bet ₹10,000 on MI to win at 1.85. The original audit record shows:

Rajesh retained: ₹6,000 stake, ₹5,100 liability
Vikram retained: ₹2,400 stake, ₹2,040 liability
Platform retained: ₹800 stake, ₹680 liability
Betfair hedged: ₹800 stake

The void reverses all of these. Amit gets his ₹10,000 back. Rajesh's exposure drops by ₹5,100. Vikram's drops by ₹2,040. The platform's drops by ₹680. The Betfair hedge is cancelled (if unmatched) or counter-traded (if matched).

Step 5: Settlement cascade for each settled bet.

Sonia bet ₹5,000 on Over 1 Runs Over 6.5 at odds 1.90. Over 1 completed with 8 runs (over 6.5 wins). This bet is settled as a winner. The settlement cascade pays out through the same chain that held the positions.

Step 6: Rajesh sees the result on his dashboard.

MI vs CSK -- MATCH ABANDONED (Rain)
=====================================
Voided markets: Match Odds, First Innings Total, Top Bat, Overs 11-20
Settled markets: Over 1-10 Runs

Your positions:
Voided: ₹1,82,000 stake returned (34 bets)
Settled: +₹24,000 profit (18 winning bets, 14 losing bets)
Net: +₹24,000 profit from completed overs

Exposure released: ₹3,45,000 (your cricket limit now has more headroom)

The Partial Void Edge Case

What about a multi-leg (accumulator/parlay) bet where one leg is voided but others settle? The standard industry rule is:

When one leg of a multi-leg bet is voided, that leg is treated as a winner at odds 1.00. The remaining legs settle normally with the voided leg's odds removed from the accumulator calculation.

Example: Amit places a 3-leg accumulator:

  • Leg 1: MI win at 1.85 (VOIDED -- match abandoned)
  • Leg 2: RCB win at 2.10 (SETTLED -- RCB won)
  • Leg 3: KKR win at 1.70 (SETTLED -- KKR lost)

Original combined odds: 1.85 x 2.10 x 1.70 = 6.60 After void adjustment: 1.00 x 2.10 x 1.70 = 3.57

But Leg 3 lost, so the entire accumulator loses. The void did not save the bet.

If all non-voided legs had won, the payout would use the reduced odds (3.57 instead of 6.60). The positions at each agent level are recalculated based on the reduced odds and the partial void is applied to the difference.


22. MVCC for Forwarding Matrix Changes

The Problem

At 9:47 PM during MI vs CSK, Rajesh changes his forwarding matrix. He reduces his retention on in-play match odds from 40% to 15% because MI just lost 3 quick wickets.

At the exact moment he saves this change, there are 15 bets in various stages of processing. Some are in the matrix resolution step, some are in cap evaluation, some are about to write positions. If half of those bets use the old matrix and half use the new one, the audit trail becomes inconsistent and unexplainable.

The solution is Multi-Version Concurrency Control (MVCC) for the forwarding matrix. Every matrix change creates a new version. Every bet captures which version it used. Old versions are preserved for audit and replay.

How Matrix Versions Work

RAJESH'S MATRIX VERSIONS
==========================

Version 1 (created: Feb 1, 2026 10:00 AM)
- Initial setup via onboarding wizard
- 8 rules, "Balanced" profile
- active_from: 2026-02-01T10:00:00Z
- active_until: 2026-02-11T21:47:00Z (set when V2 was created)

Version 2 (created: Feb 11, 2026 9:47 PM)
- Rajesh reduced in-play match odds retention from 40% to 15%
- 8 rules, modified Rule R5
- active_from: 2026-02-11T21:47:00Z
- active_until: NULL (current active version)

Version 3 (created: Feb 11, 2026 10:15 PM)
- Rajesh restored original settings after MI stabilized
- 8 rules, Rule R5 back to 40%
- active_from: 2026-02-11T22:15:00Z
- active_until: NULL (will become current active version)

The Version Data Model

Each matrix version is an immutable snapshot:

FieldTypeDescription
version_idUUIDUnique identifier for this version
agent_idTEXTWhich agent owns this matrix
version_numberINTMonotonically increasing sequence per agent
rulesJSONBThe complete set of matrix rules (immutable snapshot)
created_atTIMESTAMPWhen this version was created
created_byTEXTWho created it (agent, admin, system)
change_reasonTEXTWhy the change was made (free text or enum)
active_fromTIMESTAMPWhen this version became the active version
active_untilTIMESTAMPWhen this version was superseded (NULL if current)
checksumTEXTSHA-256 of the rules JSONB, for integrity verification

Key design rule: matrix versions are immutable. Once created, a version is never modified. A "change" always creates a new version. The active_until field on the old version is the only field that changes (it gets stamped when superseded).

How Bets Capture Their Matrix Version

When a bet enters the matrix resolution step, it captures the current active version ID before evaluating any rules:

BET PROCESSING TIMELINE
========================

1. Bet arrives at matrix resolution step
2. Read current_active_version_id for this agent from cache
→ This is an atomic read: either version 1 or version 2, never a mix
3. Load the rules for that specific version
4. Evaluate rules against bet characteristics
5. Record the version_id in the bet's audit trail
6. Continue to cap evaluation and position creation

The version_id is captured ONCE at the start of matrix resolution.
All subsequent steps for this bet use the rules from that version.

How This Interacts With the 3-Tier Cache

Each cache tier must be version-aware. When Rajesh saves a new matrix version:

Cache layer behavior on version change:

Cache TierWhat HappensWhy
Tier 1 (App LRU)Entry for rajesh_active_matrix is immediately invalidated via pub/subNext read loads from Redis or DB
Tier 2 (Redis)rajesh_active_matrix_version key updated atomically to new version_idAll app instances see the new version on next read
Tier 3 (PostgreSQL)New version row inserted, old version's active_until setSource of truth, always correct

The version-awareness rule for caches:

Each cache entry for a matrix stores the version_id alongside the rules. When a cache hit returns a version_id that does not match the current active version (which can happen in the brief window between DB write and cache invalidation), the cache entry is treated as a miss and the current version is loaded from the next tier.

CACHE ENTRY STRUCTURE
======================
Key: matrix:rajesh:active
Value: {
version_id: "v2-uuid-here",
version_number: 2,
rules: [...],
cached_at: "2026-02-11T21:47:01Z"
}

On read, the system also checks:
→ Is this version_id still the active version? (via a lightweight Redis lookup)
→ If not, treat as cache miss

Audit Trail Records Which Version Was Used

Every bet's audit record includes:

matrix_resolution:
agent: rajesh
matrix_version_id: "v1-uuid-here"
matrix_version_number: 1
matrix_checksum: "sha256:abc123..."
rule_matched: R3
rule_specificity: 3
forward_percentage: 40%
resolution_timestamp: "2026-02-11T21:46:59Z" (2 seconds BEFORE matrix change)

This means that during a dispute, the system can show: "This bet used matrix version 1, which was active from Feb 1 to Feb 11 at 9:47 PM. The rule that matched was R3 with 40% forwarding. Matrix version 2 (15% forwarding) was created 2 seconds later and did not affect this bet."

Garbage Collection of Old Versions

Old matrix versions must be retained for audit and replay purposes. The garbage collection policy is:

Age of VersionRetention Policy
< 90 daysFull retention. All rules, all metadata.
90 days - 1 yearCompressed retention. Rules stored as compressed JSONB. Metadata retained.
> 1 yearArchive to cold storage (S3/equivalent). Only metadata in DB. Rules retrievable on demand.
> 3 yearsDelete unless referenced by an unresolved dispute.

Versions that are referenced by any active (unsettled) bet are NEVER garbage collected, regardless of age. The reference count is maintained via a simple foreign key from the bet audit record to the matrix version.

Walk-Through: Rajesh Changes Matrix Mid-IPL-Match, 15 Bets In-Flight

Setup: MI vs CSK, 9:47 PM. MI just lost their 3rd wicket in 2 overs. Rajesh panics and changes his in-play match odds retention from 40% to 15%.

At the moment of the change, 15 bets are in various stages:

Bets 1-5Already past matrix resolution, in cap evaluation or position creationThese captured matrix version 1 (40% retention). They complete with 40% retention.
Bets 6-10In the request queue, not yet started processingThese will read the new active version (version 2, 15% retention) when they reach matrix resolution.
Bets 11-15Currently in matrix resolution stepThese are the interesting ones.

For bets 11-15: The matrix version read is atomic. Each bet reads the current_active_version_id once. If the read happens before the Redis key is updated, they get version 1. If after, they get version 2. There is no "half old, half new" state.

Timeline:

9:47:00.000  Rajesh clicks "Save" on new matrix
9:47:00.005 PostgreSQL: new version 2 created, version 1 active_until = NOW
9:47:00.010 Redis: rajesh_active_matrix_version updated to v2
9:47:00.012 App LRU: rajesh cache entry invalidated

Bet 11: matrix resolution at 9:47:00.003 → reads LRU cache → gets version 1 → 40% forward
Bet 12: matrix resolution at 9:47:00.008 → LRU invalidated, reads Redis → gets version 1 (Redis update at .010 not yet committed) → 40% forward
Bet 13: matrix resolution at 9:47:00.011 → LRU invalidated, reads Redis → gets version 2 → 15% forward
Bet 14: matrix resolution at 9:47:00.015 → LRU miss, reads Redis → gets version 2 → 15% forward
Bet 15: matrix resolution at 9:47:00.020 → LRU loads version 2 → 15% forward

Result: Bets 11-12 used version 1 (40% retention). Bets 13-15 used version 2 (15% retention). The transition is clean. Each bet's audit trail records exactly which version was used. Rajesh can see in his dashboard:

Matrix Change Applied at 9:47 PM
=================================
Last bet with old matrix (v1, 40% retention): 9:47:00.008 PM
First bet with new matrix (v2, 15% retention): 9:47:00.011 PM
Transition time: 3 milliseconds

3 bets processed with old matrix during transition
No bets received inconsistent matrix data

23. Dead Letter Queue and Poison Bet Handling

The Problem

In any distributed system, some messages will fail to process. In Hannibal, this means some bets will fail to route through the cascade, fail to create positions, or fail to update exposure ledgers. These failed bets cannot be silently dropped -- they represent real money that a punter believes they have wagered.

A "dead letter" is a bet that has exhausted its retry budget and cannot be processed automatically. A "poison bet" is a specific class of dead letter where the bet is fundamentally unprocessable -- retrying it will never succeed.

The Retry Pipeline

Retry Policy

Retry StageMax RetriesBackoffTotal Window
Immediate retry350ms, 200ms, 500ms~750ms
Short backoff retry32s, 5s, 10s~17s
Long backoff retry330s, 60s, 120s~3.5 min
Dead letter (RETRYABLE)35 min, 15 min, 30 min~50 min
Dead letter (POISON)0N/A -- goes directly to manual queueN/A

Total automatic retry window: approximately 55 minutes from first failure to final dead letter classification. This is deliberate -- most infrastructure issues (Redis restart, database failover, network partition) resolve within this window.

What Constitutes a Poison Bet

A bet is classified as POISON (unprocessable by retry) when:

ConditionWhy It Is PoisonExample
Event already settledThe bet is for an event that has already concluded. Placing a position makes no sense.Bet queued during outage, event settles before replay.
Agent suspended mid-processingThe bet was partially processed when the agent was suspended. The cascade path is now invalid.Rajesh suspended for payment default while 5 of his bets were in the retry queue.
Market no longer existsThe market was removed or never existed (data error).Bet references a market_id that was deleted due to a data feed error.
Stake exceeds agent's total possible capacityEven with zero current exposure, the agent cannot absorb any of this bet (limit is smaller than the bet's minimum position).Agent's total sport limit is ₹10,000 but the bet requires ₹50,000 minimum position.
Invalid state transitionThe bet is in a state that cannot transition to ACTIVE (e.g., already CANCELLED).Punter cancelled the bet during the retry window.
Duplicate bet_idA bet with this exact ID already exists in ACTIVE state (the original succeeded but the confirmation was lost, triggering a retry).Network timeout caused client to retry, first attempt actually succeeded.

The Dead Letter Queue Data Model

FieldTypeDescription
dlq_entry_idUUIDUnique identifier
bet_idTEXTThe original bet ID
original_requestJSONBComplete original bet request (preserved exactly)
failure_reasonTEXTWhy it failed
failure_categoryENUMPOISON, RETRYABLE, UNKNOWN
retry_countINTHow many retries were attempted
retry_historyJSONBTimestamps and error messages for each retry
first_failure_atTIMESTAMPWhen the first failure occurred
last_retry_atTIMESTAMPWhen the last retry was attempted
dead_lettered_atTIMESTAMPWhen it was moved to the DLQ
resolution_statusENUMPENDING, IN_REVIEW, RESOLVED_VOID, RESOLVED_PROCESSED, RESOLVED_REFUND
resolved_byTEXTWho resolved it (admin user ID)
resolved_atTIMESTAMPWhen it was resolved
resolution_notesTEXTFree-text notes from the resolver
punter_notifiedBOOLEANWhether the punter has been told about the issue
punter_notification_sent_atTIMESTAMPWhen the notification was sent

What the Punter Experiences

This is the most delicate part of DLQ design. The punter tapped "Place Bet" and saw a response. What did they see?

Scenario A: Failure during initial processing (before confirmation)

The punter saw: "Bet is being processed..." followed by an error or timeout. They do NOT see "Bet Confirmed." In this case:

  • The system shows: "Your bet could not be processed. Please try again."
  • The bet enters the retry pipeline silently
  • If retries succeed, the punter receives a push notification: "Your bet on MI to win has been confirmed."
  • If retries fail and the bet is dead-lettered, the punter receives: "Your bet on MI to win could not be placed. No funds were deducted."

Scenario B: Failure during cascade (after partial processing)

This is the dangerous case. The punter's bet was accepted and confirmed (because the initial validation passed), but the cascade failed mid-way. The punter saw "Bet Confirmed." Their account shows the bet as active. But the positions were not fully created across the agent hierarchy.

In this case:

  • The punter continues to see "Bet Active" -- we do NOT retroactively change their view
  • The system retries the cascade in the background
  • If retries succeed, everything is reconciled and the punter never knows there was an issue
  • If retries fail, the bet enters the DLQ and an admin resolves it

Scenario C: Poison bet (event already settled)

The bet was queued during an outage. By the time the system recovers, the event has already settled. The bet cannot be placed retroactively.

  • The punter saw "Bet is being processed..." (or possibly "Bet Confirmed" if the initial ack was sent)
  • Resolution options for the admin:
    1. VOID -- return the stake, notify the punter: "Your bet on MI to win was cancelled due to a technical issue. Your stake of ₹10,000 has been refunded."
    2. SETTLE AT RESULT -- if the bet would have been placed had the system been working, settle it as if it were placed. This is the punter-friendly option but creates financial exposure that was never accounted for in the ledgers.
    3. SETTLE AT VOID -- void the bet but offer the punter a goodwill credit.

Platform policy recommendation: For pre-match bets that failed during a system outage, VOID and refund is the standard. For in-play bets, VOID is the only safe option because the odds may have moved significantly during the outage.

The Manual Resolution Queue Workflow

Admin dashboard for the manual resolution queue:

DEAD LETTER QUEUE -- MANUAL RESOLUTION
=======================================

Pending: 3 entries | Oldest: 12 minutes | Today resolved: 7

┌───────────────────────────────────────────────────────────────────┐
│ DLQ-001 POISON HIGH PRIORITY 12 min ago │
│ │
│ Bet: ₹15,000 on MI to win @ 1.85 │
│ Punter: Amit (under Rajesh) │
│ Reason: Event MI_vs_CSK already SETTLED │
│ Punter saw: "Bet is being processed" (no confirmation sent) │
│ Event result: MI won │
│ If settled: Punter wins ₹12,750 (agent loses) │
│ If voided: Punter refunded ₹15,000 (no P&L impact) │
│ │
│ [Void & Refund] [Settle at Result] [Escalate to Senior Admin] │
└───────────────────────────────────────────────────────────────────┘

┌───────────────────────────────────────────────────────────────────┐
│ DLQ-002 RETRYABLE MEDIUM PRIORITY 8 min ago │
│ │
│ Bet: ₹5,000 on RCB to win @ 2.10 │
│ Punter: Sonia (under Rajesh) │
│ Reason: Database timeout at position creation (retry 6/9) │
│ Punter saw: "Bet Confirmed" (confirmation was sent) │
│ Next auto-retry: in 7 minutes │
│ │
│ [Force Retry Now] [Void & Refund] [Wait for Auto-Retry] │
└───────────────────────────────────────────────────────────────────┘

Reconciliation for Orphaned Bets

An orphaned bet is one where the punter-facing record says "Active" but the agent-side positions were never fully created. The reconciliation process runs every 5 minutes:

ORPHAN DETECTION QUERY
========================
Find all bets WHERE:
- bet_status = ACTIVE
- created_at > 5 minutes ago (give normal processing time to complete)
- created_at < 60 minutes ago (anything older is already in DLQ or resolved)
- position_count < expected_position_count (based on hierarchy depth)

For each orphaned bet:
1. Check if it is in the retry pipeline → skip (it is being handled)
2. Check if it is in the DLQ → skip (it is being handled)
3. Otherwise → add to DLQ as RETRYABLE with note "Detected by reconciliation"

Walk-Through: Bet Queued During Outage, Event Settles Before Replay

Setup: 9:30 PM, the database connection pool is exhausted during a traffic spike. Bets are being accepted (the API layer is healthy) but position creation is failing. The system is retrying bets with exponential backoff.

9:30:15 PM: Amit places ₹15,000 on MI to win at 1.85. The API accepts the request and returns "Bet is being processed." The bet enters the retry pipeline because the database write fails.

9:30:15 - 9:31:00 PM: Retries 1-3 (immediate): fail. Database still overloaded.

9:31:00 - 9:31:17 PM: Retries 4-6 (short backoff): fail. Database recovering.

9:32:00 PM: The MI vs CSK match ends. MI wins. The settlement service settles all active positions for this event.

9:32:30 PM: Retry 7 (long backoff) fires. The system attempts to create positions for Amit's bet. But now the event is SETTLED. The position creation logic detects: "Event MI_vs_CSK is in SETTLED state. Cannot create new positions."

9:32:30 PM: The bet is classified as POISON with reason: EVENT_ALREADY_SETTLED. It enters the Dead Letter Queue.

9:32:31 PM: An alert fires on the admin dashboard: "1 poison bet detected. Event settled before bet could be processed."

9:33:00 PM: The on-duty admin reviews the DLQ entry. They see:

  • The bet was submitted at 9:30:15 PM, before the event ended
  • The punter never received a "Bet Confirmed" message
  • The event result: MI won
  • If they settle it: Amit wins ₹12,750 (which the agents never priced into their exposure)
  • If they void it: Amit gets ₹15,000 refunded, no P&L impact

9:33:30 PM: The admin chooses "Void & Refund." Amit receives a push notification: "Your bet on MI to win could not be processed due to a technical issue. Your ₹15,000 has been refunded. We apologize for the inconvenience."

Why void is the correct default: The agents in the cascade never had this bet's exposure counted against their limits. Settling it retroactively would create phantom exposure that was never risk-managed. The safe choice is always to void and refund.


24. Settlement Cascade Failure Isolation

The Problem

When an IPL final settles, the system might need to process 4,000 positions across 50 agents. If the database times out at position 2,847, what happens to positions 1-2,846 (already processed) and 2,848-4,000 (not yet processed)? The answer cannot be "start over from scratch" -- that would double-settle the first 2,846 positions. And it cannot be "give up" -- that would leave 1,153 positions unsettled.

Per-Position Settlement State Tracking

Every position has its own settlement state, independent of all other positions:

StateMeaningDuration
PENDINGEvent result is known, this position is waiting to be settledSeconds to minutes (queued)
PROCESSINGA settlement worker has claimed this position and is calculating P&LMilliseconds (fast)
SETTLEDP&L has been calculated and the exposure ledger has been updatedSeconds (until reconciliation)
FAILEDAn error occurred during processing. Will be retried.Until retry succeeds or exhausts retries
CONFIRMEDPost-settlement reconciliation has verified this position's numbers matchPermanent (terminal)

The Settlement Worker Design

Settlement is processed by independent workers that claim positions in batches:

SETTLEMENT WORKER FLOW
========================

1. Worker polls for positions in PENDING state
→ Claims a batch of up to 100 positions
→ Uses SELECT ... FOR UPDATE SKIP LOCKED
→ This means: lock the rows, but if another worker already locked them, skip and take the next ones

2. For each position in the batch:
a. Set state to PROCESSING
b. Calculate P&L based on event result and position odds/stake
c. Update the agent's exposure ledger (decrement retained_open_liability)
d. Update the agent's settled P&L ledger
e. Set state to SETTLED
f. If any step fails: set state to FAILED, record error, move to next position

3. After the batch:
→ Commit all SETTLED positions
→ FAILED positions remain in FAILED state for retry
→ Worker picks up next batch

The critical design: each position is settled independently. Position 2,847 failing does not block position 2,848. The worker simply records the failure and moves on.

Agent-Level Isolation

Settlement is partitioned by agent. Each agent's positions are settled by a separate worker thread (or, in high-volume scenarios, a separate worker process). This provides fault isolation:

SETTLEMENT PARTITIONING
========================

Event: MI vs CSK (SETTLED: MI wins)
Total positions: 4,000 across 50 agents

Worker 1: Rajesh's 180 positions
Worker 2: Vikram's 420 positions (including forwarded positions)
Worker 3: Priya's 95 positions
Worker 4: Suresh's 310 positions
...
Worker 50: Kwame's 15 positions

Each worker operates independently.
Rajesh's settlement failure does NOT block Vikram's settlement.

Settlement Ordering: Does It Matter?

Within a single agent: Order does not matter. Each position is independent. Settling position 3 before position 1 produces the same final ledger state.

Across agents: Order does not matter for financial accuracy. Rajesh's settlement is independent of Vikram's. The exposure ledgers are per-agent, so there is no cross-agent dependency.

One exception: the platform's Betfair hedge positions. If the platform needs to close hedge positions on Betfair, this should happen AFTER all agent-side positions are settled, because the platform needs to know the final net position before deciding how to close the hedge. Design rule: platform hedge settlement runs after all agent settlements are CONFIRMED (or FAILED-and-escalated).

Settlement Reconciliation

After each settlement batch, a reconciliation check runs:

RECONCILIATION CHECK (per event, per agent)
=============================================

For each agent who held positions on this event:

1. Sum all SETTLED position P&L values
→ This is what we actually settled

2. Compare against the pre-settlement exposure ledger
→ retained_open_liability for this event should now be zero
→ forwarded_open_liability for this event should now be zero

3. Check invariant:
→ Sum(all position stakes) = Original bet stake (for each bet)
→ Sum(retained P&L) + Sum(forwarded P&L) = Total P&L (for each bet)

4. If any invariant fails:
→ Flag for manual review
→ Do NOT proceed to CONFIRMED state
→ Alert: "Settlement reconciliation failed for agent X on event Y"

How Partial Failures Are Detected and Resumed

The system runs a settlement monitor job every 30 seconds:

SETTLEMENT MONITOR
===================

Every 30 seconds, check:

1. Positions in PROCESSING state for > 60 seconds
→ The worker that claimed them probably crashed
→ Reset to PENDING (the SELECT FOR UPDATE lock is released on crash anyway)

2. Positions in FAILED state
→ Retry count < 5: reset to PENDING for auto-retry
→ Retry count >= 5: escalate to DLQ for manual resolution

3. Events where some positions are CONFIRMED but others are still PENDING/FAILED
→ This is a partial settlement
→ Generate a report: "Event X: 3,847 of 4,000 positions settled. 153 pending retry."
→ If any positions have been FAILED for > 15 minutes: alert admin

Walk-Through: IPL Final, 4,000 Positions, DB Timeout at Position 2,847

Setup: MI vs CSK IPL final. MI wins. The settlement dispatcher partitions 4,000 positions across 50 agents and launches settlement workers.

Timeline:

10:45:00 PM: Event result received. Settlement dispatcher starts.

10:45:01 PM: Workers launched. Each worker begins processing their batch.

10:45:05 PM: Workers 1-30 complete successfully. 2,200 positions settled.

10:45:08 PM: Worker 31 (processing Suresh's 310 positions) hits a database timeout at position 2,847 (globally numbered). The worker has already settled 147 of Suresh's positions.

Worker 31 Status:
Suresh's positions: 310 total
SETTLED: 147 (P&L calculated and committed)
PROCESSING: 1 (position 148 -- the one that timed out)
PENDING: 162 (not yet attempted)
FAILED: 0 (the timeout means position 148 is still PROCESSING)

10:45:08 PM: Worker 31 catches the timeout. It sets position 148 to FAILED with reason "DB_TIMEOUT". It continues to position 149.

10:45:09 PM: Workers 32-50 continue processing other agents' positions. They are unaffected by Suresh's timeout. Vikram's 420 positions settle successfully. Rajesh's 180 positions settle successfully.

10:45:12 PM: Worker 31 finishes Suresh's remaining positions. Result:

Worker 31 Final Status:
Suresh's positions: 310 total
SETTLED: 309
FAILED: 1 (position 148, DB_TIMEOUT)

10:45:15 PM: All other workers complete. Global status:

EVENT SETTLEMENT STATUS: MI vs CSK
====================================
Total positions: 4,000
SETTLED: 3,999
FAILED: 1
CONFIRMED: 0 (reconciliation pending)

Failed position:
Agent: Suresh
Position ID: pos_2847
Bet: ₹8,000 on MI to win @ 1.85
Failure: DB_TIMEOUT at ledger update
Retry: scheduled in 30 seconds

10:45:45 PM: The settlement monitor picks up the FAILED position. It resets it to PENDING. A worker claims it and retries. This time, the database is healthy. The position settles successfully.

10:46:00 PM: Reconciliation runs for all agents. All invariants pass. All 4,000 positions move to CONFIRMED.

10:46:01 PM: Agents see settlement results on their dashboards:

RAJESH'S SETTLEMENT: MI vs CSK
================================
Result: MI wins

Your retained positions:
56 bets backing MI: +₹2,34,000 (punters lost)
24 bets backing CSK: -₹1,18,000 (punters won)
Net P&L: +₹1,16,000

Forwarded positions settled with Vikram:
Forwarded P&L: -₹42,000 (Vikram owes you ₹42,000)

Settlement time: 16 seconds (from event result to confirmed)

25. Cash-Out / Early Settlement Design

What Is Cash-Out?

Cash-out allows a punter to settle their bet early, before the event finishes. The punter locks in a guaranteed profit (or limits a loss) instead of waiting for the final result. From the system's perspective, cash-out is not magic -- it is simply a counter-bet at current odds that closes out the original position.

How Cash-Out Price Is Calculated

The cash-out price is the fair value of the punter's position, minus a margin for the platform. Here is the formula:

CASH-OUT CALCULATION
=====================

Original bet: Back MI to win at odds 1.85, stake ₹10,000
→ Potential win if MI wins: ₹8,500
→ Loss if MI loses: ₹10,000

Current situation: MI now at odds 1.20 (MI dominating)
→ The bet is "in profit" because MI is more likely to win now

Fair value of the position:
→ If Amit held the OPPOSITE position now (lay MI at 1.20),
he would need to risk ₹2,000 to win ₹10,000
→ His original bet pays ₹18,500 total return if MI wins
→ A lay at 1.20 for ₹10,000 stake costs ₹2,000 liability

Cash-out value (fair):
= Original stake - (Original stake / Current odds)
= ₹10,000 - (₹10,000 / 1.20)
... wait, let us use the proper formula:

Cash-out value = Stake * (Original odds / Current odds)
= ₹10,000 * (1.85 / 1.20)
= ₹10,000 * 1.5417
= ₹15,417

But this is the TOTAL return. The profit portion:
= ₹15,417 - ₹10,000 = ₹5,417

This is the fair value. Now apply the cash-out margin (typically 3-5%):
Cash-out margin: 5%
Cash-out offer = ₹15,417 * (1 - 0.05) = ₹14,646

Punter profit if they cash out: ₹14,646 - ₹10,000 = ₹4,646
(vs. ₹8,500 if MI wins and ₹0 return if MI loses)

The General Cash-Out Formula

For a back bet:

cash_out_return = stake * (original_odds / current_odds) * (1 - margin)
cash_out_profit = cash_out_return - stake

For a losing position (odds moved against the punter):

Original: MI at 1.85, stake ₹10,000
Current: MI at 3.50 (MI struggling)

cash_out_return = 10,000 * (1.85 / 3.50) * (1 - 0.05)
= 10,000 * 0.5286 * 0.95
= ₹5,021

Punter gets back ₹5,021 of their ₹10,000 stake.
They accept a ₹4,979 loss now instead of risking the full ₹10,000.

How Cash-Out Routes Through the Cascade

This is the critical design decision. The cash-out counter-bet must route through the SAME proportions as the original bet, not the current matrix.

Why? Because the positions are held at specific agents in specific proportions. If Rajesh retained 60% of the original bet, he holds 60% of the position. The cash-out must close 60% of his position, not whatever the current matrix says.

How Each Agent's Position Changes

BEFORE CASH-OUT (MI at 1.20)
==============================

Agent Retained Stake Potential Payout Net Exposure
------- -------------- ---------------- ------------
Rajesh ₹6,000 ₹11,100 (if MI wins) ₹5,100 liability
Vikram ₹2,400 ₹4,440 (if MI wins) ₹2,040 liability
Platform ₹1,600 ₹2,960 (if MI wins) ₹1,360 liability

AFTER CASH-OUT
===============

All positions CLOSED. Each agent's P&L is locked in:

Agent Retained Stake Cash-out Portion Agent's P&L
------- -------------- ---------------- -----------
Rajesh ₹6,000 Pays ₹8,787.60* -₹2,787.60
Vikram ₹2,400 Pays ₹3,515.04* -₹1,115.04
Platform ₹1,600 Pays ₹2,343.36* -₹743.36
────────────────
Total: ₹14,646.00

* Each agent pays: their portion of cash_out_return - their portion of original stake
Rajesh: (₹14,646 * 60%) = ₹8,787.60 payout on ₹6,000 retained = -₹2,787.60 P&L
(This is a loss for Rajesh because MI is winning and he held the liability side)

Partial Cash-Out

Amit does not have to cash out 100% of his position. He can cash out any percentage.

Example: Amit cashes out 50% of his position

PARTIAL CASH-OUT (50%)
========================

Original: ₹10,000 on MI to win at 1.85
Cash-out: 50% at current odds 1.20

Cash-out portion: ₹5,000 (50% of original stake)
Cash-out return: ₹5,000 * (1.85 / 1.20) * 0.95 = ₹7,323
Cash-out profit: ₹7,323 - ₹5,000 = ₹2,323

Remaining active position: ₹5,000 on MI to win at 1.85
→ This continues normally, settles with the event result

Agent impact (using original 60/24/16 split on the cashed-out portion):
Rajesh: closes 60% of ₹5,000 = ₹3,000 position
Vikram: closes 24% of ₹5,000 = ₹1,200 position
Platform: closes 16% of ₹5,000 = ₹800 position

Each agent's REMAINING open position is also halved:
Rajesh: ₹3,000 retained stake remaining (was ₹6,000)
Vikram: ₹1,200 retained stake remaining (was ₹2,400)
Platform: ₹800 retained stake remaining (was ₹1,600)

What Happens If Betfair Liquidity Is Insufficient

When the platform's portion was originally hedged on Betfair, the cash-out must also close that hedge. If Betfair does not have sufficient liquidity to close the hedge at the expected price:

SituationWhat Happens
Hedge close-out is fully availablePlatform closes hedge, cash-out proceeds normally
Hedge close-out is partially availablePlatform closes what it can, retains the unhedged remainder as platform risk. Cash-out still proceeds for the punter (the punter's experience is not degraded by hedge-side issues).
Hedge close-out is unavailable (Betfair down or no liquidity)Platform absorbs the full hedge portion as retained risk. Cash-out proceeds for the punter. Platform bears the residual risk.

The punter always gets their cash-out amount. The platform absorbs liquidity risk on the hedge side.

How Cash-Out Interacts With NO_NEW_RISK

Cash-out creates a counter-position that reduces the agent's exposure. Therefore, it should be treated as a hedge and allowed even when NO_NEW_RISK is active.

NO_NEW_RISK CHECK FOR CASH-OUT
================================

1. Agent is in NO_NEW_RISK for MI vs CSK
2. A cash-out request arrives for a bet on MI to win
3. The cash-out creates a COUNTER-position (effectively a lay on MI)
4. This REDUCES the agent's worst-case liability
5. Therefore: ALLOWED, even under NO_NEW_RISK

This is the same logic as normal hedge detection:
If WorstCaseLiability AFTER < WorstCaseLiability BEFORE → Allow

Walk-Through: Amit Bets MI at 1.85, MI Now at 1.20, Amit Cashes Out

The original bet (placed at 7:30 PM):

Amit places ₹10,000 on MI to win at odds 1.85 during MI vs CSK IPL match, pre-match.

Routing (from original audit record):

  • Rajesh retains 60%: ₹6,000 stake, ₹5,100 liability
  • Vikram retains 24%: ₹2,400 stake, ₹2,040 liability
  • Platform retains 16%: ₹1,600 stake, ₹1,360 liability (₹800 hedged on Betfair)

The match situation (9:15 PM, 15th over):

MI is 145/2, well on track. MI's odds have dropped from 1.85 to 1.20. Amit is sitting on a healthy unrealized profit.

Amit requests cash-out (9:15 PM):

Step 1: System loads original bet audit record. Proportions: 60/24/16.

Step 2: System gets current MI odds: 1.20 (from the live odds feed).

Step 3: Calculate cash-out value:

cash_out_return = 10,000 * (1.85 / 1.20) * (1 - 0.05)
= 10,000 * 1.5417 * 0.95
= ₹14,646

Amit's profit: ₹14,646 - ₹10,000 = ₹4,646

Step 4: System presents offer to Amit:

┌──────────────────────────────────────────────┐
│ CASH OUT │
│ │
│ Your bet: MI to win @ 1.85 │
│ Current odds: MI @ 1.20 │
│ │
│ Cash out for: ₹14,646 │
│ Your profit: ₹4,646 │
│ │
│ (If MI wins, you would win ₹8,500) │
│ (If MI loses, you lose ₹10,000) │
│ │
│ [Accept Cash-Out] [Keep Bet Active] │
└──────────────────────────────────────────────┘

Step 5: Amit accepts. The system creates counter-positions:

CASH-OUT EXECUTION
===================

Rajesh: CLOSE position of ₹6,000
Original liability: ₹5,100
Cash-out cost (his share): ₹14,646 * 0.60 = ₹8,787.60
Rajesh P&L: ₹6,000 (stake received) - ₹8,787.60 (cash-out paid) = -₹2,787.60
Rajesh exposure change: -₹5,100 (liability removed)

Vikram: CLOSE position of ₹2,400
Original liability: ₹2,040
Cash-out cost (his share): ₹14,646 * 0.24 = ₹3,515.04
Vikram P&L: ₹2,400 - ₹3,515.04 = -₹1,115.04
Vikram exposure change: -₹2,040 (liability removed)

Platform: CLOSE position of ₹1,600
Cash-out cost (its share): ₹14,646 * 0.16 = ₹2,343.36
Platform P&L: ₹1,600 - ₹2,343.36 = -₹743.36
Platform exposure change: -₹1,360 (liability removed)
Platform also closes Betfair hedge: counter-trade to flatten

Verification: ₹8,787.60 + ₹3,515.04 + ₹2,343.36 = ₹14,646.00 ✓

Step 6: Amit sees: "Cash-out successful. ₹14,646 credited to your account."

Step 7: Rajesh's dashboard updates: "Amit cashed out MI to win. Your exposure on MI vs CSK reduced by ₹5,100. P&L impact: -₹2,787.60."

Why the agents "lose" on this cash-out: MI is winning. The agents were on the liability side (they pay if MI wins). By cashing Amit out, they are locking in a loss. But this loss is smaller than what they would pay if MI wins and Amit's full ₹8,500 potential win materializes. The agents are actually reducing their worst-case outcome.

If MI had collapsed and lost, the agents would have profited ₹10,000 (the full stake). By allowing cash-out, they gave up some of that potential upside. This is the trade-off -- cash-out reduces volatility for everyone.


26. Lay Bet Support

What Is a Lay Bet?

A lay bet is the opposite of a back bet. When Amit backs MI to win, he is betting that MI will win. When Sonia lays MI to win, she is betting that MI will NOT win. Sonia wins if MI draws or loses.

In exchange-style betting (Betfair), every back bet has a corresponding lay bet. In B-Book systems, the agent is typically the layer -- they lay every back bet the punter places. But Hannibal must also support punters placing lay bets explicitly, because some markets and some punters operate this way.

How Liability Is Different for Lay Bets

For a back bet, the punter risks their stake and wins stake * (odds - 1). For a lay bet, the punter risks stake * (odds - 1) and wins the stake.

BACK BET: Amit backs MI to win at 1.85 for ₹10,000
Amit risks: ₹10,000 (his stake)
Amit wins: ₹8,500 (if MI wins)
Bookie liability: ₹8,500

LAY BET: Sonia lays MI to win at 1.85 for ₹10,000
Sonia risks: ₹8,500 (her liability = stake * (odds - 1))
Sonia wins: ₹10,000 (if MI does NOT win)
Bookie liability: ₹10,000 (bookie pays if MI does NOT win)

The critical difference for exposure tracking: A lay bet's liability is stake * (odds - 1) for the punter, but from the bookie's (agent's) perspective, the liability depends on which outcome occurs.

Exposure Tracking for Lay Bets

This is where it gets interesting. A lay bet on MI to win is economically equivalent to a back bet on "MI not to win." This means:

EXPOSURE IMPACT OF LAY BETS
=============================

Current position on MI vs CSK (Rajesh's book):

Before any lay bets:
MI Win outcome: ₹5,00,000 liability (back bets on MI)
MI Not Win outcome: ₹0 liability

Sonia lays MI to win for ₹10,000 at 1.85:
If MI wins: Sonia pays ₹8,500 to Rajesh → REDUCES MI Win liability
If MI not win: Rajesh pays ₹10,000 to Sonia → INCREASES MI Not Win liability

After Sonia's lay bet:
MI Win outcome: ₹4,91,500 liability (reduced by ₹8,500!)
MI Not Win outcome: ₹10,000 liability (increased by ₹10,000)

WORST CASE BEFORE: ₹5,00,000 (MI Win)
WORST CASE AFTER: ₹4,91,500 (MI Win is still worse, but reduced)

Key insight: A lay bet on outcome X DECREASES the agent's exposure on outcome X and INCREASES exposure on the other outcomes. This is a natural hedge.

How the Forwarding Matrix Handles Lay Bets

Lay bets use the same 5-dimensional matrix as back bets, with one addition: the matrix resolution considers whether the bet direction (back vs lay) creates a hedge.

FORWARDING MATRIX RESOLUTION FOR LAY BETS
============================================

Step 1: Resolve the forward percentage normally
→ Same matrix, same rules, same precedence chain
→ Sonia's lay MI at 1.85 matches Rule R3: forward 40%

Step 2: Check if this lay bet is a natural hedge for the agent
→ Does the agent have existing liability on MI Win? YES (₹5,00,000)
→ Does this lay bet reduce that liability? YES (by ₹8,500)
→ Therefore: this is a hedge

Step 3: If hedge AND agent is in NO_NEW_RISK:
→ Allow the retention (do not force 100% forward)
→ The agent WANTS to keep this bet because it reduces their exposure

Step 4: If hedge AND agent is NOT in NO_NEW_RISK:
→ Normal matrix rules apply
→ Agent retains their configured percentage

The forwarding matrix does not need a separate "lay" dimension. The bet is processed through the same rules. The only difference is in how the exposure ledger is updated and how NO_NEW_RISK evaluates the bet.

How Hedge Detection Recognizes Lay Bets

The existing hedge detection formula works perfectly for lay bets:

HEDGE DETECTION FOR LAY BETS
==============================

Rule: If WorstCaseLiability AFTER < WorstCaseLiability BEFORE → it is a hedge

Before Sonia's lay bet:
Worst case (MI Win): ₹5,00,000
Worst case (MI Not Win): ₹0
WorstCaseLiability: MAX(₹5,00,000, ₹0) = ₹5,00,000

After Sonia's lay bet (Rajesh's portion, 60%):
Rajesh keeps 60% of Sonia's lay → ₹6,000 stake, ₹5,100 liability adjustment
Worst case (MI Win): ₹5,00,000 - ₹5,100 = ₹4,94,900
Worst case (MI Not Win): ₹0 + ₹6,000 = ₹6,000
WorstCaseLiability: MAX(₹4,94,900, ₹6,000) = ₹4,94,900

₹4,94,900 < ₹5,00,000 → WorstCaseLiability decreased → HEDGE CONFIRMED

How NO_NEW_RISK Correctly Allows Hedging Lay Bets

When would a lay bet NOT be a hedge? If Rajesh's book is already heavily exposed on "MI Not Win" (meaning he has lots of bets backing CSK or backing the draw), then a lay on MI (which increases "MI Not Win" exposure) would increase his worst case. In that scenario, the lay bet is NOT a hedge and is forwarded 100% under NO_NEW_RISK.

Walk-Through: Sonia Lays MI to Win at 1.85 While Rajesh Is in NO_NEW_RISK

Setup: Rajesh has hit his per-match limit on MI vs CSK. His current exposure:

Rajesh's MI vs CSK Book:
MI Win outcome: ₹4,98,000 liability ← this is the worst case
MI Not Win outcome: ₹45,000 liability
Match limit: ₹5,00,000
Status: NO_NEW_RISK (₹4,98,000 / ₹5,00,000 = 99.6%)

Sonia places a lay bet: Lay MI to win at 1.85 for ₹10,000.

Step 1: Win cap check. Sonia's maximum loss on this lay bet is ₹8,500 (stake * (odds - 1)). Her per-click cap is ₹50,000. Pass.

Step 2: Matrix resolution. Rajesh's matrix says forward 40% for this bet type. Rajesh would retain 60%.

Step 3: NO_NEW_RISK check. Rajesh is in NO_NEW_RISK. Is this a hedge?

Rajesh retains 60% of Sonia's lay:
→ Stake portion: ₹6,000
→ If MI wins: Rajesh RECEIVES ₹5,100 from Sonia (reduces MI Win liability)
→ If MI not win: Rajesh PAYS ₹6,000 to Sonia (increases MI Not Win liability)

New worst cases:
MI Win: ₹4,98,000 - ₹5,100 = ₹4,92,900
MI Not Win: ₹45,000 + ₹6,000 = ₹51,000

New worst case liability: MAX(₹4,92,900, ₹51,000) = ₹4,92,900
Old worst case liability: ₹4,98,000

₹4,92,900 < ₹4,98,000 → HEDGE CONFIRMED → ALLOW

Step 4: Position creation. Rajesh retains 60% of Sonia's lay bet. The exposure ledger is updated:

AFTER SONIA'S LAY BET:
Rajesh's MI vs CSK Book:
MI Win outcome: ₹4,92,900 liability ← reduced!
MI Not Win outcome: ₹51,000 liability
Match limit: ₹5,00,000
Status: still NO_NEW_RISK (but closer to exiting)

Step 5: Cascade. The remaining 40% (₹4,000) flows to Vikram, who processes it through his own matrix and cap checks normally.

Result: Rajesh's exposure on MI Win dropped from ₹4,98,000 to ₹4,92,900. If enough lay bets come in, Rajesh could exit NO_NEW_RISK entirely. The system correctly identified the lay bet as a hedge and allowed it.


27. Agent-Punter Collusion Detection

The Problem

Collusion between an agent and a punter is one of the most damaging exploits in a B-Book system. The basic scheme: Rajesh knows that sharp/winning bets get forwarded to his upline (Vikram). If Rajesh conspires with Amit, he can mark Amit as NORMAL (even though Amit is sharp), retain Amit's bets, and pocket the winnings. When Amit loses, Rajesh can retroactively mark him as SHARP to forward the losing flow upline.

A more sophisticated version: Rajesh marks Amit as SHARP to forward most of his bets upline. But Rajesh and Amit have agreed to split the profits. Amit consistently wins, Vikram consistently loses on the forwarded flow, and Rajesh and Amit split the difference off-platform.

Collusion Signals

SignalWhat It Looks LikeSeverity
Classification flip before winning streakAgent changes user from NORMAL to SHARP (or vice versa), and within 24 hours the user has a winning streakHIGH
Classification flip-flopAgent changes user classification back and forth more than 3 times in a weekHIGH
Override matches outcomeUser override percentage changes correlate with subsequent bet outcomes (higher forward when user wins, lower when user loses)CRITICAL
Selective forwarding timingMatrix changes coincide with specific user's betting patternsHIGH
Forwarded flow consistently losesBets forwarded by this agent to upline have a significantly worse P&L than random chanceMEDIUM (could be legitimate if agent has good sharp detection)
Single user dominates forwarded flowOne user accounts for >30% of forwarded volume from this agentMEDIUM
Win rate inversionAgent's retained bets have above-market win rate while forwarded bets have below-market win rateHIGH

The Cooling-Off Period for Classification Changes

To prevent the "flip before win" exploit, classification changes have a mandatory cooling-off period:

CLASSIFICATION CHANGE RULES
=============================

When an agent changes a user's classification (e.g., NORMAL → SHARP):

1. The change is QUEUED, not applied immediately
2. Cooling-off period: 24 hours
3. During the cooling-off period:
→ The OLD classification remains active for matrix resolution
→ The new classification is visible as "PENDING" in the agent dashboard
→ The agent cannot change the classification again until the current change completes
4. After 24 hours: the new classification takes effect
5. Exception: SHARP → NORMAL direction has a 72-hour cooling-off period
(because downgrading from SHARP to NORMAL is the more exploitable direction)

OVERRIDE CHANGES follow the same rules:
→ Changing a user override from 80% forward to 20% forward: 24-hour cooling-off
→ Changing from 20% to 80%: 6-hour cooling-off (less risky direction)

Why different periods for different directions? Moving a user from NORMAL to SHARP increases forwarding (less risky for the agent, more protective). Moving from SHARP to NORMAL decreases forwarding (agent retains more, potentially exploitable). The riskier direction gets a longer cooling-off period.

Upline Audit Rights on Downstream Overrides

Vikram (upline) has the right to see and challenge classification changes made by Rajesh (downstream):

UPLINE AUDIT RIGHTS
=====================

Vikram can see (real-time):
✓ All user classifications set by Rajesh
✓ All pending classification changes
✓ History of all classification changes with timestamps
✓ Correlation report: classification changes vs user outcomes

Vikram can do:
✓ Flag a classification change for review
✓ Request the platform to freeze Rajesh's override capability
✓ Set minimum forwarding for specific users (override Rajesh's override)

Vikram CANNOT do:
✗ Directly change Rajesh's user classifications (that is Rajesh's business)
✗ See Rajesh's full user list (only users whose bets are forwarded to Vikram)

The Correlation Engine

The anti-collusion system runs a correlation analysis nightly:

COLLUSION CORRELATION ANALYSIS
================================

For each agent, for each user with classification changes in the last 30 days:

1. Build timeline:
[Classification change timestamps] + [Bet placement timestamps] + [Bet outcomes]

2. Calculate: Within 48 hours after each classification change:
→ Count of bets placed by this user
→ Win rate of those bets
→ Compare against the user's historical win rate
→ Compare against the market-expected win rate

3. Score the correlation:
→ If win rate AFTER classification change is >2 standard deviations above normal:
COLLUSION_SCORE += 25 per occurrence
→ If classification was changed from SHARP to NORMAL just before a winning streak:
COLLUSION_SCORE += 50
→ If the same pattern repeats 3+ times:
COLLUSION_SCORE += 100

4. Thresholds:
→ COLLUSION_SCORE < 25: No action
→ 25-75: Informational alert to platform compliance team
→ 75-150: Warning to agent + upline notification
→ 150+: Automatic freeze on agent's override capability, mandatory review

Alert Escalation Workflow

Walk-Through: Rajesh and Amit Collude

The scheme: Rajesh and Amit have an agreement. Amit is genuinely sharp -- he has a positive CLV over 1,000+ bets. Rajesh knows this. Instead of marking Amit as SHARP (which would forward 95% to Vikram), Rajesh keeps Amit as NORMAL (forwarding only 40%). Amit wins consistently, and Rajesh profits because he retained 60% of winning bets. They split the extra profit off-platform.

Week 1: Amit places 45 bets. Wins 28. Rajesh retained 60% of each. Rajesh's retained P&L from Amit: +₹1,85,000.

The system's sharp detection flags Amit based on CLV and win rate. It suggests to Rajesh: "Amit shows sharp characteristics. Consider classifying as SHARP."

Rajesh ignores the suggestion.

Week 2: Amit places 50 bets. Wins 31. Rajesh's retained P&L from Amit: +₹2,10,000.

The system sends a stronger alert to Rajesh: "Amit's CLV is +4.2% over 95 bets. This is above the SHARP threshold. Classification recommended."

Rajesh still ignores it.

Week 2 (same time): The cross-agent detection system notices that Amit's win rate (62%) is significantly above expected (48% given the odds profile). It also notices that Rajesh has NOT classified Amit as SHARP despite the system's recommendation. This triggers:

ANOMALY DETECTION ALERT
=========================
Agent: Rajesh
User: Amit
Alert Type: SUSPECTED_CLASSIFICATION_MANIPULATION

Evidence:
1. Amit's 95-bet CLV: +4.2% (SHARP threshold: +2.5%)
2. System recommended SHARP classification 2 times
3. Agent has not acted on recommendations for 14 days
4. Agent's retained P&L from Amit: +₹3,95,000 (top 1% among all agents)
5. If classified as SHARP, 95% would have been forwarded to Vikram
Rajesh would have retained ~₹20,000 instead of ₹3,95,000

Collusion Score: 85 (WARNING level)

Actions Taken:
- Compliance team notified
- Vikram (upline) notified: "Your sub-agent Rajesh may be under-classifying user Amit"
- Rajesh's dashboard shows: "Compliance review pending for user Amit"

Week 3: Rajesh, realizing he has been flagged, marks Amit as SHARP. But the 72-hour cooling-off period means the change does not take effect for 3 days. During those 3 days, Amit places 15 more bets at NORMAL classification.

Week 3 + 72 hours: Classification change takes effect. Amit's bets are now forwarded at 95%.

Meanwhile: The compliance team reviews the case. They see:

  • Rajesh ignored 2 system recommendations
  • Rajesh profited ₹4,50,000+ from a user who should have been classified as SHARP
  • The timing of the eventual classification change coincides exactly with the compliance alert

Outcome: The compliance team escalates to the platform operations team, who:

  1. Review the last 30 days of Amit's bets under Rajesh
  2. Calculate the financial impact: ₹4,50,000 in retained profit that would have been ₹22,500 at SHARP classification
  3. Issue a clawback of the excess profit (₹4,27,500) from Rajesh's settlement account
  4. Place Rajesh on probation: his override capability is frozen for 90 days, all classifications are managed by the platform

28. Agent Hierarchy Migration

The Problem

Agent hierarchies are not static. In the real world, agents switch uplines all the time. Rajesh might leave Vikram's network and join Suresh's. This happens because of better commission terms, personal disputes, or business restructuring. The system must handle this migration cleanly, especially when Rajesh has open positions that were routed through Vikram.

Effective-Dated Hierarchy Changes

Hierarchy changes are never instantaneous. They take effect at a scheduled date and time, giving the system time to prepare:

HIERARCHY MIGRATION REQUEST
=============================

Request: Move Rajesh from Vikram to Suresh
Requested by: Platform admin
Effective date: 2026-03-01 00:00:00 IST (start of next weekly period)
Current state: Rajesh has ₹15,00,000 open liability forwarded through Vikram

Migration phases:
Phase 1 (NOW → effective date): PREPARATION
Phase 2 (effective date): CUTOVER
Phase 3 (effective date → cleanup): DUAL PATH
Phase 4 (after all old positions settle): COMPLETE

Dual-Path Settlement

This is the key design challenge. After cutover:

DUAL-PATH ROUTING
==================

BEFORE cutover (Feb 28):
Rajesh → Vikram → Platform → Betfair
All bets, all positions, all settlements go through Vikram

AFTER cutover (March 1):
NEW bets: Rajesh → Suresh → Platform → Betfair
OLD positions: Still settled through Vikram (he holds the positions!)

Why dual-path? Because Vikram's exposure ledgers reflect the positions he holds.
Settling them through Suresh would be incorrect -- Suresh never held that risk.
Bet TimingRouting PathSettlement Path
Placed before cutover, settling before cutoverRajesh → VikramThrough Vikram
Placed before cutover, settling after cutoverRajesh → VikramThrough Vikram (dual-path)
Placed after cutoverRajesh → SureshThrough Suresh

Open Exposure Handling During Transition

At cutover time, Rajesh has ₹15,00,000 in open liability forwarded to Vikram. This creates a financial obligation:

OPEN EXPOSURE RECONCILIATION
==============================

At cutover (March 1):

1. Freeze: No changes to Rajesh's forwarding through Vikram
→ Vikram's exposure ledger is frozen for Rajesh's old positions
→ New bets from Rajesh do NOT affect Vikram's ledgers

2. Track: Old positions are tagged with migration_id
→ Every position that existed at cutover time gets:
migration_id: "mig_rajesh_vikram_to_suresh_20260301"
routing_path: "OLD" (Vikram)

3. Settle: As old events settle, positions flow through Vikram
→ Vikram settles normally
→ When Vikram's Rajesh-related open liability reaches zero → dual-path ends

4. Financial bridge: If Rajesh owes Vikram (or vice versa) from old positions,
the settlement continues until all old positions are resolved

Financial Settlement Between Old and New Upline

The tricky part: what if Rajesh has a net credit with Vikram from unsettled positions? And what about the weekly settlement cycle?

FINANCIAL SETTLEMENT AT MIGRATION
====================================

Step 1: Calculate Rajesh's net position with Vikram at cutover

Open positions forwarded to Vikram: ₹15,00,000 liability
Unsettled P&L (from recently settled events): ₹2,30,000 (Vikram owes Rajesh)

Step 2: Vikram pays the unsettled P&L to Rajesh immediately
→ ₹2,30,000 transferred (this is money already earned, not speculative)

Step 3: Open positions continue under Vikram until they settle
→ No money changes hands until settlement
→ Each settlement adjusts the balance between Rajesh and Vikram
→ Rajesh's weekly settlements with SURESH only include NEW bets

Step 4: When all old positions have settled:
→ Final reconciliation between Rajesh and Vikram
→ Any remaining balance settled
→ Migration status: COMPLETE
→ Vikram's records for Rajesh archived

Walk-Through: Rajesh Moves From Vikram to Suresh With 15 Lakh Open Liability

Background: Rajesh has been under Vikram for 3 years. Suresh offers better terms (lower forwarding commission). Rajesh negotiates the move. The platform admin approves the migration for March 1.

February 25 -- REQUESTED:

  • Admin creates migration request
  • System calculates: Rajesh has 42 open bets forwarded to Vikram, totaling ₹15,20,000 liability
  • Notification sent to Vikram: "Rajesh is migrating to Suresh, effective March 1. You have 42 open positions to settle."
  • Notification sent to Suresh: "Rajesh is joining your network, effective March 1."

February 26-28 -- PREPARATION:

  • Vikram confirms he is aware. No action needed from him.
  • Suresh confirms he is ready. His limits are checked: can he absorb Rajesh's typical daily flow?
  • Rajesh's forwarding matrix is cloned for the new relationship (he can modify it after cutover)
  • System pre-computes: "After cutover, Rajesh's bets will be processed by Suresh with these limits..."

March 1, 00:00 IST -- CUTOVER:

CUTOVER EXECUTED
==================

1. Rajesh's hierarchy parent changed: Vikram → Suresh
2. All existing positions tagged: migration_id = mig_rajesh_v2s_20260301
3. New bets from Rajesh's punters now route to Suresh

Dashboard shows:
Rajesh: "You are now under Suresh. 42 old positions still settling through Vikram."
Vikram: "Rajesh has moved. 42 open positions remain for settlement."
Suresh: "Rajesh has joined. New bets are now routing through you."

March 1-7 -- DUAL_PATH:

  • 35 of the 42 old positions settle during the week (events complete)
  • Each settlement flows through Vikram normally
  • New bets (150+ during the week) flow through Suresh

March 8 -- Remaining old positions:

  • 7 positions remain (from events that have not yet concluded)
  • These are long-dated bets (tournament winner, series result)
  • Rajesh and Vikram continue to settle these as the events conclude

March 15 -- Last old position settles:

  • The final pre-migration position settles
  • Net financial settlement between Rajesh and Vikram: Vikram owes Rajesh ₹42,000
  • Transfer executed
  • Migration status: COMPLETE
  • Vikram's records for Rajesh are archived
  • Dual-path routing for Rajesh is deactivated

29. Minimum Forwarding / Skin-in-the-Game Requirements

The Problem

Vikram does not want Rajesh to forward 100% of sharp bets. If Rajesh forwards everything that loses and keeps everything that wins, Vikram is just absorbing toxic flow. Vikram wants to require Rajesh to have "skin in the game" -- a minimum amount of every bet that Rajesh MUST retain, regardless of his matrix settings.

How Minimum Retention Works

The upline agent sets a minimum retention percentage per downstream agent. This floor cannot be overridden by the downstream agent's matrix.

MINIMUM RETENTION CONFIGURATION
=================================

Vikram's settings for his sub-agents:

| Sub-Agent | Min Retention | Why |
|-----------|--------------|-----|
| Rajesh | 20% | Experienced, trusted, but must keep skin in the game |
| Priya | 30% | Newer agent, should retain more to build discipline |
| Arun | 10% | Very experienced, low minimum needed |

Where It Is Checked in the Cascade

The minimum retention check happens AFTER matrix resolution but BEFORE cap evaluation:

The important nuance: If Rajesh's limits would be breached by retaining the minimum 20%, the system does NOT reject the bet. Instead, Rajesh retains as much as his limits allow (which may be less than 20%). The minimum retention is enforced as a floor on the MATRIX percentage, not an absolute floor on the final retained amount. Limits always win over matrix settings -- this is a safety rule.

How Violations Are Handled

When Rajesh tries to configure his matrix in a way that violates Vikram's minimum retention:

SCENARIO: Rajesh tries to set 100% forwarding for SHARP users
Vikram's minimum retention requirement: 20%

SYSTEM RESPONSE:
┌────────────────────────────────────────────────────────┐
│ ⚠ Configuration Conflict │
│ │
│ You set: Forward 100% for SHARP users │
│ Your upline (Vikram) requires: Minimum 20% retention │
│ │
│ Adjusted rule: Forward 80% for SHARP users │
│ You will retain at least 20% of these bets. │
│ │
│ [Accept Adjusted Rule] [Contact Vikram to Negotiate] │
└────────────────────────────────────────────────────────┘

The system auto-adjusts the forwarding percentage to comply with the minimum retention. The agent sees the adjusted value. The agent cannot save a matrix rule that would violate their upline's minimum retention requirement.

Walk-Through: Vikram Requires 20%, Rajesh Tries to Forward 100% for Sharps

Setup: Vikram sets minimum retention of 20% for Rajesh. Rajesh, who has been losing money on a sharp user (Amit), wants to forward 100% of Amit's bets.

Attempt 1: Rajesh sets user override for Amit = 100% forward

System checks: 100% forward means 0% retention. Vikram's minimum is 20%. System response: "Cannot set forwarding above 80% for this user. Your upline requires 20% minimum retention. Adjusted to 80% forward."

Rajesh accepts. Amit's bets are now forwarded at 80%.

Attempt 2: Rajesh modifies his matrix Rule R1 (SHARP users) from 95% to 100%

System checks: 100% > max allowed (80%). System response: "Forwarding capped at 80%. Rule saved as 80% forward."

What about the catch-all rule? If Rajesh's catch-all rule (////*) is set to forward 50%, and Vikram's minimum is 20%, no conflict -- 50% forward means 50% retention, which exceeds 20%. No adjustment needed.

Why this design is correct: Vikram has a legitimate interest in ensuring Rajesh has skin in the game. Without minimum retention, Rajesh could dump all negative-expected-value flow upline while keeping positive-expected-value flow. The minimum retention ensures Rajesh shares in both the upside and downside of every bet, aligning incentives across the hierarchy.


30. Panic Button Abuse Prevention

The Problem

The panic button (Section 14) is a powerful tool: it immediately forwards 100% of new bets and hedges all retained positions on Betfair. Used legitimately, it is a safety net. Used abusively, it is a money machine.

The abuse pattern: Rajesh watches the match. When things go badly (his retained positions are losing), he hits panic -- hedging at current prices and locking in a partial loss. When things go well, he does not press panic -- collecting the full profit. Over time, this asymmetric usage means Rajesh only takes losses when they are small (he panicked early) and takes full profits when things go well. The cost of hedge execution spread is borne by the platform (or Betfair liquidity).

Who Bears the Cost of Hedge Execution

When the panic button is pressed, hedge orders are placed on Betfair. The spread between the price the system gets and the theoretical mid-price is a real cost. Who pays?

PANIC HEDGE COST ALLOCATION
=============================

When Rajesh presses panic at 9:30 PM:

1. System places hedge orders on Betfair for all Rajesh's retained positions
2. Betfair mid-price for MI to win: 1.50
3. System gets filled at: 1.48 (laying) and 1.52 (backing)
4. Spread cost: approximately 1.3% of hedged amount

Cost allocation:
→ First panic in a period: Platform absorbs the spread cost
(This is a legitimate safety feature)
→ Second panic in same period: 50% spread cost charged to Rajesh's P&L
→ Third+ panic in same period: 100% spread cost charged to Rajesh's P&L

This makes the first panic "free" (encouraging use when genuinely needed)
but makes repeated use increasingly expensive (discouraging abuse).

Usage Limits and Cooling-Off Periods

ControlValueRationale
Panics per night period2 free, unlimited at costNight sessions are volatile; 2 free panics covers genuine emergencies
Panics per week5 free, unlimited at costWeekly cap prevents chronic abusers
Cooling-off after panic30 minutes before matrix can be restoredPrevents the "panic, wait 5 minutes, restore, panic again" cycle
Minimum hedge duration15 minutesOnce hedged, positions stay hedged for at least 15 minutes. Agent cannot un-hedge immediately when conditions improve.
Panic cost escalation0%, 50%, 100%, 100%... per periodEach subsequent panic in the same period is more expensive

Monitoring and Flagging

PANIC BUTTON ABUSE DETECTION
==============================

The system tracks per agent, per period:

1. Panic frequency:
→ More than 3 panics per week for 3 consecutive weeks: FLAG
→ More than 2 panics in a single night: WARNING

2. Panic timing correlation:
→ Agent panics when their retained book is losing > ₹X
→ Agent does NOT panic when their retained book is winning
→ Asymmetric panic usage: collusion score increases

3. Panic profitability analysis:
→ Calculate: what would Rajesh's P&L be WITHOUT panic hedges?
→ Compare: what IS Rajesh's P&L WITH panic hedges?
→ If panic usage consistently improves P&L by > 20%: FLAG

4. Post-panic behavior:
→ Agent immediately restores original matrix after cooling-off ends: SUSPICIOUS
→ Agent keeps hedged state for hours: LEGITIMATE

Differentiating Legitimate Panic From Gaming

BehaviorClassificationReason
Panic during a genuinely volatile match event (3 wickets in 1 over)LEGITIMATEMatch conditions warrant caution
Panic at the start of every match, restore after 30 minutesGAMINGPattern suggests routine use, not emergency
Panic once per month during a crisisLEGITIMATERare, appropriate use
Panic 3 times per week, always when losingGAMINGAsymmetric usage exploits the hedge
Panic after seeing a corruption alert or integrity flagLEGITIMATEResponding to genuine threat signal

Walk-Through: Rajesh Presses Panic When Losing, Repeats Weekly

Week 1, Wednesday: MI vs CSK. Rajesh retains ₹8,00,000 backing MI. MI loses 4 wickets cheaply. Rajesh's retained book is down ₹2,50,000 unrealized. He presses panic.

Result: System hedges all positions. Rajesh locks in a ₹1,80,000 loss (better than the potential ₹8,00,000 if MI collapses completely). Spread cost: ₹10,400. First panic of the period -- platform absorbs the cost.

Week 1, Friday: RCB vs DC. Rajesh retains ₹6,00,000 backing RCB. RCB's top batsman gets out. Rajesh presses panic.

Result: System hedges. Rajesh locks in a ₹95,000 loss. Spread cost: ₹7,800. Second panic of the period -- 50% charged to Rajesh (₹3,900).

Week 1, Sunday: KKR vs SRH. KKR winning. Rajesh's book is up ₹1,50,000. He does NOT press panic. KKR wins. Rajesh collects ₹1,50,000.

Week 2: Same pattern. Panic when losing, hold when winning.

Week 3: Same pattern. The abuse detection system now has 3 weeks of data.

PANIC ABUSE ALERT
==================
Agent: Rajesh

Pattern detected over 3 weeks:
Panics triggered: 7
Panics when book was losing: 7 (100%)
Panics when book was winning: 0 (0%)

P&L without panic: -₹4,20,000 (net loss over 3 weeks)
P&L with panic: -₹1,85,000 (net loss reduced by ₹2,35,000)

Panic improved P&L by: 56%
Spread cost absorbed by platform: ₹38,000

Assessment: GAMING (asymmetric panic usage)

Actions:
1. Rajesh's next panic incurs 100% spread cost
2. Alert sent to Rajesh: "Your panic button usage is under review."
3. Vikram (upline) notified
4. If pattern continues 1 more week: panic feature suspended for 30 days,
replaced with automatic NO_NEW_RISK (which does not hedge existing positions)

31. Timestamp and Period Boundary Security

The Problem

If the client's clock determines when a bet was placed, punters and agents can manipulate timestamps. A punter could backdate a bet to before a period boundary (when limits had more headroom). An agent could manipulate their clock to extend a favorable night period. The system must use server-side timestamps for all authoritative decisions.

Where the Authoritative Timestamp Is Assigned

The authoritative timestamp is assigned at the earliest possible point in the server-side processing pipeline, before any business logic executes:

BET PROCESSING PIPELINE -- TIMESTAMP ASSIGNMENT
=================================================

1. Client sends bet request
→ Client includes client_timestamp (informational only, never trusted)

2. API gateway receives request
→ SERVER TIMESTAMP ASSIGNED HERE: request_received_at = NOW() on the server
→ This is the AUTHORITATIVE timestamp for ALL downstream decisions
→ It is immutable -- no subsequent step can change it

3. Request is queued for processing
→ processing_started_at = NOW() (separate timestamp, for latency tracking)

4. Matrix resolution uses request_received_at for period boundary evaluation
→ "Is this bet in the night period?" uses request_received_at, NOT client_timestamp

5. Position creation uses request_received_at as the official bet placement time
→ All exposure ledger updates reference this timestamp

6. Audit record stores BOTH timestamps:
→ client_timestamp: what the client claimed (for debugging/fraud detection)
→ server_timestamp: the authoritative time (for all business logic)

How Period Boundaries Are Determined

Period boundary evaluation always uses the server clock:

PERIOD BOUNDARY EVALUATION
============================

Input: request_received_at = 2026-02-11T16:29:59.500Z (UTC)
Agent: Rajesh (IST, night period 19:00-02:00)

Step 1: Convert to agent's timezone
→ 16:29:59.500 UTC = 21:59:59.500 IST

Step 2: Is this within the night period?
→ Night start: 19:00 IST → YES, 21:59 is after 19:00
→ Night end: 02:00 IST → YES, 21:59 is before 02:00
→ Result: NIGHT PERIOD

Step 3: Check against night period limits
→ Use night_period exposure ledger

How Clock Skew Between Server Instances Is Handled

In a distributed deployment with multiple server instances, each instance has a slightly different clock. The maximum acceptable clock skew is managed through NTP synchronization:

CLOCK SKEW MANAGEMENT
========================

Requirement: All server instances must be synchronized to within 50ms of UTC
Mechanism: NTP (Network Time Protocol) with multiple time sources

If NTP sync fails:
→ Instance reports CLOCK_DRIFT_WARNING
→ If drift exceeds 200ms: instance is removed from the load balancer
→ If drift exceeds 1 second: instance auto-quarantines (stops accepting bets)

For period boundary decisions:
→ The 50ms skew window is irrelevant for period boundaries
(which are at hour granularity: 19:00, 02:00)
→ A bet at 01:59:59.950 on Instance A and 02:00:00.050 on Instance B
might be evaluated differently, but this is a 100ms window
at most -- acceptable given the hour-scale period boundaries

For exposure counter consistency:
→ Timestamps on exposure ledger updates are server-generated
→ The ordering of writes is determined by the database (which has one clock),
not by the application servers
→ Even if two instances disagree by 50ms on the time, the database
orders writes correctly using its own monotonic clock

Bets at the Period Boundary

What happens when a bet arrives at exactly the boundary? For example, at 02:00:00.000 IST (the night period end for Rajesh)?

PERIOD BOUNDARY TIE-BREAKING
==============================

Rule: Bets at EXACTLY the boundary time belong to the ENDING period.

Why: The night period is defined as 19:00:00.000 to 01:59:59.999.
02:00:00.000 is the first moment of the next period (day).

In practice:
→ request_received_at = 2026-02-12T01:59:59.999 IST → NIGHT period
→ request_received_at = 2026-02-12T02:00:00.000 IST → DAY period

This is a closed-open interval: [19:00, 02:00)

For exposure carry-forward:
→ When the night period ends at 02:00, any open positions from the night
are carried forward to the day period (as described in Section 9)
→ The bet at 01:59:59.999 is the last bet counted against night limits
→ The bet at 02:00:00.000 is the first bet counted against day limits

Client Timestamp Fraud Detection

The client_timestamp is not trusted but is useful for detecting anomalies:

AnomalyWhat It MeansAction
client_timestamp is > 30 seconds before server_timestampClient clock is behind, or deliberate manipulationLog for monitoring. No immediate action.
client_timestamp is > 5 seconds AFTER server_timestampClient clock is ahead, which is unusualLog and flag. Client may be trying to claim a later timestamp.
client_timestamp is > 5 minutes different from server_timestampSignificant discrepancyFlag for review. Possible automation/bot activity.
client_timestamp is consistently exactly N seconds offsetClock calibration issue or deliberate offsetInformational. Some devices have persistent clock drift.

32. Sharp Detection Gaming via Multiple Accounts

The Problem

Sharp bettors know that bookmakers track their accounts and limit them. The obvious countermeasure: use many accounts. A syndicate of 50 accounts, each betting small amounts, can fly under the radar of per-account sharp detection. Each account looks like a casual punter. But collectively, they are placing coordinated bets that drain the agent's book.

The Detection Pillars

The cross-account syndicate detection system uses four independent signals. Any one signal alone might be coincidence. Two or more signals together strongly indicate coordination.

SYNDICATE DETECTION: 4 PILLARS
================================

Pillar 1: DEVICE FINGERPRINTING
→ Same device used by multiple accounts
→ Similar device configurations (screen size, OS version, installed fonts)

Pillar 2: IP / NETWORK CORRELATION
→ Multiple accounts from same IP address
→ Multiple accounts from same subnet
→ VPN detection (known VPN exit nodes)

Pillar 3: BETTING PATTERN SIMILARITY
→ Same outcomes, same timing, same markets
→ Correlated staking patterns
→ Similar CLV profiles

Pillar 4: PAYMENT METHOD OVERLAP
→ Same bank account linked to multiple user accounts
→ Same UPI ID, same wallet, same card
→ Money flow between linked accounts

Device Fingerprinting Integration

The system collects device attributes at every bet placement (not just at registration):

AttributePurposeCollection Point
Browser/app user agentIdentifies device type and versionEvery API request
Screen resolutionDistinguishes devicesSession start
Timezone offsetCross-reference with claimed locationEvery API request
Installed fonts / canvas fingerprintHigh-entropy device identifierSession start (web)
Device ID (mobile)Unique device identifierApp installation
Battery level + charging stateBehavioral fingerprintSession start (mobile)

Fingerprint matching algorithm:

DEVICE FINGERPRINT SIMILARITY SCORE
=====================================

For each pair of user accounts, compute:

score = 0

If same device_id: score += 100 (near-certain same device)
If same canvas fingerprint: score += 80 (very likely same browser)
If same IP AND same user agent: score += 60
If same screen resolution AND timezone: score += 30
If same subnet (first 3 octets): score += 20

Thresholds:
score >= 100: SAME_DEVICE (automatically link accounts)
score 60-99: LIKELY_RELATED (flag for review)
score 30-59: POSSIBLY_RELATED (monitor)
score < 30: UNRELATED

IP Correlation Analysis

IP CORRELATION ANALYSIS
========================

Data collected: For every bet, record the source IP address.

Analysis 1: Direct IP overlap
→ Two or more accounts placing bets from the same IP within 1 hour
→ Common in household sharing (legitimate) or syndicate operation (illegitimate)
→ Threshold: 3+ accounts from same IP → flag

Analysis 2: Subnet analysis
→ Accounts from the same /24 subnet (e.g., 192.168.1.*)
→ Common in corporate/office networks or coordinated operations
→ Threshold: 5+ accounts from same subnet → flag

Analysis 3: IP timing patterns
→ Account A bets from IP X at 9:00 PM
→ Account B bets from IP X at 9:02 PM
→ Account C bets from IP X at 9:05 PM
→ Sequential use of the same IP → strong syndicate signal

Analysis 4: VPN / Proxy detection
→ Known VPN exit node IPs (maintained list)
→ Tor exit nodes
→ Commercial proxy services
→ If detected: increase scrutiny on all other signals

Betting Pattern Similarity Detection

This is the most powerful signal because it is hard to disguise:

BETTING PATTERN SIMILARITY
============================

For each pair of accounts, compute similarity across these dimensions:

1. Outcome correlation:
→ How often do both accounts bet on the same outcome?
→ Random chance for a 2-outcome market: 50%
→ If 80%+ correlation over 100+ bets: FLAG

2. Timing correlation:
→ Average time between Account A's bet and Account B's bet on the same event
→ If consistently < 5 minutes apart: FLAG

3. Market selection correlation:
→ Do both accounts bet on the same obscure markets?
→ Betting on the same IPL match: not unusual (everyone bets IPL)
→ Betting on the same Ranji Trophy match: unusual (niche market)
→ Weight correlation by market obscurity

4. Stake pattern similarity:
→ Both accounts use round-number stakes (₹10,000, ₹20,000)
→ Both accounts use the same fractional stakes (₹8,731, ₹8,731)
→ Similar stake distributions (mean, variance, skewness)

5. CLV profile similarity:
→ Both accounts have similar CLV trajectories over time
→ Both accounts started profitable at the same time
→ Both accounts' CLV curves are correlated

Composite score:
→ Weight and combine all dimensions
→ If composite score > threshold → SYNDICATE_SUSPECTED

Payment Method Overlap Detection

SignalSeverityExample
Same bank account on 2+ user accountsCRITICALAccount A and Account B both linked to SBI account #12345
Same UPI ID on 2+ accountsHIGHAccount A and B both use amit@upi
Money transfer between two user accounts' bank accountsHIGHAccount A deposits, Account B receives from A's bank
Same phone number on 2+ accountsHIGHBoth accounts registered with +91-98765-43210
Same email domain (non-public) on 2+ accountsMEDIUMamit@someprivatecorp.com and raj@someprivatecorp.com

How Flagged Clusters Are Communicated to Agents

When the system identifies a suspected syndicate, it packages the information for the agent:

SYNDICATE ALERT -- RAJESH'S DASHBOARD
=======================================

⚠ SUSPECTED SYNDICATE: CLUSTER-4821

Accounts identified: 12 of potentially 50+
Confidence: HIGH (3 of 4 detection pillars triggered)

Evidence:
📱 Device: 8 accounts share 3 devices
🌐 Network: 11 accounts used 2 IP addresses in the last week
📊 Patterns: 91% outcome correlation across 234 bets
💳 Payments: 4 accounts share 2 bank accounts

Accounts in YOUR network:
1. Amit (user_4521) -- 45 bets, +₹1,85,000 P&L against you
2. Rahul (user_4588) -- 38 bets, +₹1,42,000 P&L against you
3. Deepak (user_4612) -- 31 bets, +₹98,000 P&L against you
4. Naveen (user_4687) -- 28 bets, +₹76,000 P&L against you
... 8 more accounts

Combined impact on YOUR book: -₹8,45,000 over 4 weeks

Recommended actions:
[Classify All as SHARP] -- forwards 95% of their bets
[Block All Accounts] -- prevents any new bets (requires admin approval)
[Review Individual] -- decide per account
[Ignore Alert] -- acknowledge, no action (logged)

Walk-Through: Syndicate With 50 Accounts Under Rajesh

Setup: A professional betting syndicate creates 50 accounts under Rajesh over a 3-month period. Each account is registered with a different name, phone number, and email. They use a pool of 10 mobile devices and 5 residential IP addresses (via mobile hotspots at different locations).

Month 1: The syndicate operates carefully. Each account places 2-3 bets per day on different markets. Win rates are moderate (53%). Individual account P&L is unremarkable.

What the system sees after Month 1:

Pillar 1 (Device): 50 accounts using 10 devices → 5 accounts per device average
Score: 8 clusters of related accounts identified

Pillar 2 (IP): 50 accounts using 5 IPs
Score: Subnet analysis shows concentrated usage
But: 5 IPs across 50 accounts is not extreme (could be a housing complex)

Pillar 3 (Betting): Outcome correlation at 61% (slightly above 50% random)
Score: MODERATE -- not yet flagged, but being monitored

Pillar 4 (Payment): No payment overlap (syndicate was careful)
Score: CLEAN

Overall assessment: MONITORING (not yet flagged)

Month 2: The syndicate becomes more aggressive. More bets, higher stakes. Their careful 50-account approach means no individual account trips any threshold. But the pattern signal strengthens.

Pillar 3 (Betting) after Month 2:
Outcome correlation: 73% (very suspicious)
Timing correlation: 78% of bets within 10 minutes of each other
Market selection: 22 of 50 accounts bet on the same obscure Ranji match
CLV: All 50 accounts have positive CLV (probability of this by chance: <0.001%)

Month 2, Week 3: The system triggers:

SYNDICATE DETECTION: CLUSTER-4821 CONFIRMED
=============================================

Detection trigger: Betting pattern similarity threshold exceeded
→ 50 accounts with 73% outcome correlation
→ 22 accounts on same obscure market
→ All 50 accounts profitable (p < 0.001)

Cross-reference with device data:
→ 8 device clusters confirmed
→ 50 accounts → 10 devices → likely 3-5 operators

Financial impact under Rajesh:
→ Combined P&L: -₹12,40,000 (Rajesh has lost ₹12.4 lakh to this cluster)
→ Individual account P&L range: -₹15,000 to -₹85,000

Alert sent to:
1. Rajesh (with recommended actions)
2. Vikram (upline, with summary)
3. Platform compliance team (with full evidence package)

Rajesh's response: He classifies all 50 accounts as SHARP. With the 72-hour cooling-off period (Section 27), the classification takes effect 3 days later. Meanwhile, the platform compliance team can also apply platform-level restrictions if they deem it necessary (account suspension, reduced limits).


33. Rate Limiting on Configuration Changes

The Problem

An agent who rapidly changes their forwarding matrix creates multiple problems:

  1. Cache invalidation storms (every change invalidates all cache tiers)
  2. Matrix version bloat (each change creates a new immutable version)
  3. Audit trail confusion (which version applied to which bet?)
  4. Potential gaming (rapid changes to exploit specific bet outcomes)

Per-Agent Rate Limits

Configuration TypeRate LimitQueue Behavior
Matrix rule changes1 change per 5 minutesQueue rapid changes, apply only the most recent
User override changes1 per user per 10 minutesQueue, apply most recent
Market override changes1 per market per 5 minutesQueue, apply most recent
Agent default changes1 per 15 minutesQueue, apply most recent
Limit changes (sport, match, period)1 per limit per 10 minutesQueue, apply most recent
Panic buttonNo rate limit on activation; 30-minute cooling-off before deactivationImmediate (this is a safety feature)

Queue and Apply Most Recent

When an agent makes rapid changes that exceed the rate limit:

RATE-LIMITED CONFIGURATION CHANGES
=====================================

9:30:00 PM Rajesh changes Rule R5: forward 40% → 60%
→ APPLIED immediately (first change, no rate limit hit)

9:30:45 PM Rajesh changes Rule R5: forward 60% → 80%
→ QUEUED (less than 5 minutes since last change)
→ Queue entry: { rule: R5, new_value: 80%, queued_at: 9:30:45 }

9:31:30 PM Rajesh changes Rule R5: forward 80% → 95%
→ REPLACES previous queue entry (queue only keeps most recent)
→ Queue entry: { rule: R5, new_value: 95%, queued_at: 9:31:30 }

9:32:00 PM Rajesh changes Rule R3: forward 40% → 50%
→ QUEUED separately (different rule, its own rate limit)
→ Queue entry: { rule: R3, new_value: 50%, queued_at: 9:32:00 }

9:35:00 PM Rate limit window expires for R5
→ Queue entry for R5 is applied: forward 95%
→ The intermediate value of 80% was never applied
→ Agent is notified: "Your change to Rule R5 has been applied."

9:37:00 PM Rate limit window expires for R3
→ Queue entry for R3 is applied: forward 50%

What the agent sees:

┌──────────────────────────────────────────────────────────────┐
│ CONFIGURATION UPDATE │
│ │
│ Rule R5 updated: Forward 60% (active now) │
│ │
│ ⏳ Pending changes (will apply in ~4 minutes): │
│ Rule R5: Forward 95% │
│ Rule R3: Forward 50% │
│ │
│ Why the delay? Rapid configuration changes are queued to │
│ ensure system stability. Only your most recent value will │
│ be applied. │
│ │
│ [Cancel Pending Changes] │
└──────────────────────────────────────────────────────────────┘

Cache Invalidation Throttling

Even when configuration changes are rate-limited, the cache invalidation must be efficient:

CACHE INVALIDATION STRATEGY
=============================

When a configuration change is applied:

1. PostgreSQL: Write happens immediately (source of truth updated)

2. Redis: Invalidation within 100ms
→ DELETE the affected key(s)
→ Do NOT pre-populate (let the next read fill the cache)

3. Application LRU: Invalidation via pub/sub within 200ms
→ All app instances receive the invalidation message
→ Affected entries evicted from LRU cache

Throttling:
→ If more than 10 invalidations per agent per minute: batch them
→ Instead of 10 individual invalidations, one "flush all for this agent" signal
→ This prevents cache thrashing during rapid configuration periods

How This Interacts

Continuing from where I was cut off, in the middle of Section 33 (Rate Limiting on Configuration Changes), at the subsection about how rate limiting interacts with the panic button.


How Rate Limiting Interacts With the Panic Button

The panic button IS a rapid configuration change -- it sets forwarding to 100% for all sports and markets. But it is exempt from rate limiting because it is a safety feature. The design reconciles these two goals:

PANIC BUTTON VS RATE LIMITING
===============================

Panic button activation:
→ Bypasses ALL rate limits
→ Applies immediately (no queueing)
→ Invalidates all caches immediately
→ Reason: safety always trumps stability

Panic button deactivation (restoring previous settings):
→ Subject to 30-minute cooling-off period (Section 30)
→ NOT subject to the 5-minute matrix change rate limit
→ Reason: the 30-minute cooling-off is already more restrictive
than the 5-minute rate limit

Configuration changes WHILE panic is active:
→ Queued normally under rate limits
→ Applied only AFTER panic is deactivated
→ Agent sees: "You are in panic mode. Configuration changes
will be applied when you exit panic mode."

This means:
1. Rajesh presses panic at 9:30 PM → immediate effect, no rate limit
2. Rajesh tries to change Rule R5 at 9:31 PM → queued (panic is active)
3. Rajesh deactivates panic at 10:00 PM → previous settings restored
4. Queued Rule R5 change applies at 10:00 PM (or later per rate limit)

Rate Limit Overrides for Administrators

Platform administrators can bypass rate limits for specific agents when needed:

Override TypeWho Can GrantDurationUse Case
Temporary unlimited changesPlatform SUPER_ADMIN1 hourAgent onboarding, major event preparation
Reduced rate limit (1 min instead of 5)Platform ADMIN4 hoursAgent is actively tuning during a match with admin guidance
Rate limit suspensionPlatform SUPER_ADMIN30 minutesEmergency reconfiguration

All overrides are logged in the audit trail with the admin who granted them and the reason.


34. Currency and Multi-Currency Support

The Problem

Hannibal serves agent networks across India, Southeast Asia, and Africa. Agents operate in different currencies: Indian Rupees (INR), Thai Baht (THB), Ghanaian Cedis (GHS), Nigerian Naira (NGN), Kenyan Shillings (KES). But hedges on Betfair are placed in GBP (or EUR). This creates currency risk at multiple points in the system.

Base Currency Per Agent

Every agent has a configured base currency. All their limits, exposure ledgers, and P&L are denominated in this currency:

AGENT BASE CURRENCIES
========================

Agent Base Currency Why
-------- ------------- ---
Rajesh INR Indian sub-agent, punters bet in INR
Vikram INR Indian master agent
Kwame GHS Ghanaian agent, punters bet in Cedis
Priya INR Indian sub-agent
Platform USD Platform operates in USD for cross-border accounting
Betfair GBP Exchange operates in GBP

Where FX Conversion Happens

FX conversion occurs at two points in the bet lifecycle:

Key design rule: FX conversion happens at the boundary between currency zones, not within them. Within the INR agent hierarchy (Rajesh -> Vikram), all calculations are in INR. FX only enters the picture when the position crosses to the platform (which operates in USD) or to Betfair (which operates in GBP).

FX Rate Capture and Audit Trail

Every FX conversion is captured with the exact rate used:

FieldTypeDescription
conversion_idUUIDUnique identifier for this conversion
bet_idTEXTWhich bet triggered this conversion
source_currencyTEXTe.g., GHS
target_currencyTEXTe.g., USD
source_amountDECIMALAmount in source currency
target_amountDECIMALAmount in target currency
fx_rateDECIMAL(18,8)The rate used: 1 GHS = X USD
fx_rate_sourceTEXTWhere the rate came from (e.g., "platform_rate_feed", "manual_override")
fx_rate_timestampTIMESTAMPWhen the rate was captured
conversion_timestampTIMESTAMPWhen the conversion was executed
spread_appliedDECIMALAny spread the platform applied on top of the mid-rate

FX Rate Determination

The system uses a tiered approach for FX rates:

FX RATE RESOLUTION
====================

Priority 1: Platform rate feed (real-time)
→ Updated every 60 seconds from a market data provider
→ Used for live bet processing

Priority 2: Cached rate (if feed is stale)
→ If the rate feed has not updated for > 5 minutes
→ Use the last known rate with an additional 0.5% spread (safety buffer)
→ Flag the conversion as STALE_RATE in the audit trail

Priority 3: Daily reference rate (if feed is down)
→ If the rate feed is completely unavailable
→ Use the day's opening reference rate with a 2% spread
→ Flag as FALLBACK_RATE
→ Alert operations team

For each conversion, the system also records:
→ The mid-market rate at the time
→ The spread applied by the platform
→ The effective rate (mid + spread)

FX Conversion at Hedge Execution

When a Ghanaian agent's bet reaches the platform and needs hedging on Betfair:

FX CONVERSION EXAMPLE: BET FLOW
=================================

1. Kwame's punter bets GHS 500 on Arsenal to win at 2.10
→ Kwame's cascade: retains GHS 300, forwards GHS 200 to platform

2. GHS 200 arrives at the platform
→ Platform operates in USD
→ Current rate: 1 USD = 15.8 GHS
→ Conversion: GHS 200 / 15.8 = USD 12.66
→ Spread applied: 0.3% → Platform receives USD 12.62
→ Audit: conversion_id=fx_001, rate=15.8, spread=0.3%

3. Platform decides to hedge USD 6.31 on Betfair
→ Betfair operates in GBP
→ Current rate: 1 GBP = 1.27 USD
→ Conversion: USD 6.31 / 1.27 = GBP 4.97
→ Spread applied: 0.2% → Betfair receives GBP 4.96
→ Audit: conversion_id=fx_002, rate=1.27, spread=0.2%

Total FX conversions: GHS → USD → GBP (two hops)
Total FX spread cost: ~0.5% (borne by the platform, priced into the hedge margin)

FX Risk Accounting for Hedged Positions

Between the time a bet is placed and when it settles, exchange rates can move. This creates FX risk on hedged positions:

FX RISK SCENARIO
==================

At bet placement (Monday):
Kwame's punter bet GHS 500 at 2.10
Platform hedged GBP 4.96 on Betfair at 2.10
Rate at placement: 1 GBP = 20.08 GHS (via USD)

At settlement (Sunday, Arsenal won):
Betfair pays out: GBP 4.96 * (2.10 - 1) = GBP 5.46 profit
Rate at settlement: 1 GBP = 21.50 GHS (GHS depreciated)

Converting Betfair payout back to GHS:
GBP 5.46 * 21.50 = GHS 117.39

But the punter is owed: GHS 500 * (2.10 - 1) = GHS 550 payout

The hedge covered:
Platform's portion of liability: some fraction of GHS 550
Betfair payout in GHS: GHS 117.39

The FX movement (GHS weakened) means the GBP payout converts
to MORE GHS than expected. In this case, FX movement HELPED.
If GHS had strengthened, the platform would receive LESS GHS from
the Betfair hedge than expected — creating an FX loss.

How FX Risk Is Managed

StrategyDescriptionWhen Used
Accept the riskSmall positions. FX movement over a few days is typically < 2%. Not worth hedging.Default for most positions
Settle quicklyMinimize the time between bet placement and settlement to reduce FX exposure.Standard practice
FX reserve bufferPlatform maintains a reserve buffer (typically 3% of cross-currency hedged volume) to absorb FX losses.Always active
Same-day hedgingFor very large cross-currency positions, hedge the FX exposure separately (buy/sell the currency pair).Only for positions > USD 10,000

Settlement in Multi-Currency Scenarios

At settlement, FX conversion happens in reverse:

MULTI-CURRENCY SETTLEMENT FLOW
================================

Event settles: Arsenal wins

Step 1: Betfair settles in GBP
→ Platform receives GBP profit (or pays GBP loss)

Step 2: Convert Betfair settlement to USD (platform base currency)
→ Use settlement-time FX rate (NOT the bet-placement rate)
→ Record FX gain/loss vs expected rate

Step 3: Platform settles its retained portion in USD

Step 4: Convert platform-to-agent settlement to agent's base currency
→ Kwame's upline settlement is in GHS
→ Use settlement-time FX rate
→ Record conversion in audit trail

Step 5: Agent cascade settles in agent base currency
→ Kwame's agents all settle in GHS
→ No FX needed within the GHS hierarchy

FX Audit Report

The platform generates a daily FX reconciliation report:

DAILY FX RECONCILIATION
========================
Date: 2026-02-11

Currency Pair Volume (USD) Avg Rate FX Gain/Loss Reserve Impact
----------- ----------- --------- ----------- --------------
GHS/USD $12,450 15.82 -$145 -$145 from reserve
NGN/USD $8,300 1520.50 +$89 +$89 to reserve
KES/USD $3,200 129.40 -$12 -$12 from reserve
THB/USD $5,800 36.15 +$34 +$34 to reserve
USD/GBP $18,700 0.788 -$210 -$210 from reserve

Net FX Impact: -$244
FX Reserve: $45,000 → $44,756 (0.5% drawdown)

Walk-Through: Ghanaian Agent in Cedis, Hedge in GBP

Setup: Kwame operates in Ghana. His base currency is GHS (Ghana Cedis). He has 150 football punters who bet on Premier League matches.

The bet: Kwame's punter Kofi bets GHS 1,000 on Chelsea to win at odds 3.20.

Step 1: Cascade in GHS (local currency)

Kofi bets GHS 1,000 at 3.20
Potential win: GHS 2,200
Liability: GHS 2,200

Kwame's matrix: forward 50% for Premier League pre-match
Kwame retains: GHS 500 (liability: GHS 1,100)
Kwame forwards: GHS 500 to platform

Step 2: FX conversion at platform boundary

GHS 500 arrives at platform
Current rate: 1 USD = 15.80 GHS (mid-market)
Platform applies 0.3% spread: effective rate = 15.85 GHS per USD

Conversion: GHS 500 / 15.85 = USD 31.55
Audit: fx_rate=15.85, source=platform_feed, spread=0.3%

Step 3: Platform routing in USD

Platform receives USD 31.55
Platform retains 50%: USD 15.78
Platform hedges 50%: USD 15.77 → Betfair

Step 4: FX conversion at Betfair boundary

USD 15.77 to hedge on Betfair
Current rate: 1 GBP = 1.27 USD (mid-market)
Platform applies 0.2% spread: effective rate = 1.2726 USD per GBP

Conversion: USD 15.77 / 1.2726 = GBP 12.39
Audit: fx_rate=1.2726, source=platform_feed, spread=0.2%

Step 5: Betfair hedge execution

Place lay bet on Betfair: Lay Chelsea to win, GBP 12.39 at 3.20
If Chelsea wins: Betfair pays GBP 12.39 * 2.20 = GBP 27.26
If Chelsea loses: Platform pays Betfair GBP 12.39 (the stake)

Step 6: Settlement (Chelsea wins)

Betfair pays: GBP 27.26 profit
Convert to USD: GBP 27.26 * 1.28 (settlement rate) = USD 34.89
(Rate moved slightly: was 1.27, now 1.28)
FX gain: USD 34.89 vs expected USD 34.65 = +USD 0.24

Platform P&L:
Retained: USD 15.78 liability → Chelsea won → Platform pays USD 34.72
Hedge recovery: USD 34.89 from Betfair
Net platform P&L: -USD 34.72 + USD 34.89 = +USD 0.17 (near zero, as expected)

Kwame's settlement in GHS:
Kwame's retained: GHS 500 stake, GHS 1,100 liability
Chelsea won → Kwame pays punter GHS 1,100
Kwame's forwarded: GHS 500 → Kwame does not bear this portion
Kwame's P&L: GHS 500 received - GHS 1,100 paid = -GHS 600

Plus: settlement from platform for forwarded portion
Platform owes Kwame: the forwarded portion's P&L in GHS
Convert: USD 34.72 (platform liability for forwarded) * 15.90 (settlement rate) = GHS 552.05
FX difference: expected GHS 550, actual GHS 552.05, gain GHS 2.05

Kofi (punter) receives: GHS 1,000 stake + GHS 2,200 profit = GHS 3,200 total return
→ All in GHS, Kofi never sees any FX conversion

Key takeaway: The punter always operates in their local currency. FX conversion is invisible to them. Agents also operate in their local currency within the hierarchy. FX only affects the platform-to-exchange boundary, and the platform absorbs FX risk as a cost of doing business.



35. Cache Race Condition Fix at Limit Boundaries (CRITICAL)

The Problem in Plain English

The existing 3-tier caching design (Section 15) has a dangerous gap. Consider this scenario: Rajesh has a cricket night limit of 10 lakh. His current exposure is 9,20,000 (92% utilized). The application LRU cache has a 5-second TTL, and within that 5-second window, 10 simultaneous bets arrive from Rajesh's punters. Each bet checks the LRU cache, sees "9,20,000 used out of 10,00,000 -- 80,000 remaining," and each bet tries to retain 20,000 of liability. If all 10 proceed, Rajesh retains 2,00,000 more -- pushing him to 11,20,000 against a 10,00,000 limit. The limit is breached by 1,20,000.

This is not theoretical. During IPL matches, a popular agent like Rajesh will receive 50+ bets per minute. At 92% utilization, every bet is potentially the one that tips over the limit.

The Safety Margin Approach

The core idea: do not use the fast cache path when you are "close enough" to the limit that a race condition could cause a breach. Define a safety margin that determines when to switch from the fast path (LRU/Redis) to the slow-but-safe path (PostgreSQL with FOR UPDATE locking).

Safety margin formula:

safety_margin = max(
fixed_minimum_margin, -- e.g., ₹50,000
average_bet_liability * expected_bets_per_ttl -- dynamic calculation
)

Where:

  • fixed_minimum_margin is a per-agent configurable floor (default 50,000)
  • average_bet_liability is the rolling average liability per bet for this agent in this scope (recalculated every 60 seconds)
  • expected_bets_per_ttl is the rolling average bet rate multiplied by the LRU cache TTL (5 seconds)

Example calculation for Rajesh during a busy IPL night:

ParameterValue
Rajesh's cricket night limit10,00,000
Average bet liability (last 60s)8,500
Average bets per second (last 60s)0.8
LRU cache TTL5 seconds
Expected bets per TTL0.8 x 5 = 4
Dynamic margin8,500 x 4 = 34,000
Fixed minimum margin50,000
Effective safety marginmax(50,000, 34,000) = 50,000
DB-path threshold10,00,000 - 50,000 = 9,50,000

This means: when Rajesh's exposure reaches 9,50,000 (95% of his limit), every subsequent bet goes through the PostgreSQL FOR UPDATE path. The safety margin absorbs the worst-case race: 4 bets in flight simultaneously, each adding 8,500, totalling 34,000 -- which is within the 50,000 margin.

The Three-Path Decision Flow

Every bet follows this exact decision flow:

Path descriptions:

PathWhen UsedLatencyCorrectness Guarantee
FAST PATHExposure is below (limit - safety_margin) in any cache tier1-5msEventual consistency -- may briefly overshoot by up to safety_margin amount
DB PATHExposure is at or above (limit - safety_margin) in the freshest available cache10-25msStrict consistency -- FOR UPDATE lock prevents any overshoot

Post-Write Validation and Rollback

Even with the safety margin, the FAST PATH can theoretically overshoot if the cache is stale by more than one TTL cycle (extremely rare, but possible during network partitions or Redis failures).

Post-write validation catches this:

  1. After the FAST PATH writes the position and updates the ledger in PostgreSQL, it reads back the committed ledger total
  2. If the committed total exceeds the limit, a rollback procedure fires:
    • The excess amount is calculated: overshoot = committed_total - limit
    • The most recently created position (the one that caused the overshoot) is reduced by the overshoot amount
    • The reduced amount is forwarded to the upline as overflow
    • A new overflow position is created for the upline agent
    • An audit record is created noting the post-write correction
    • An alert is fired (this indicates the safety margin may be too small)
POST-WRITE VALIDATION FLOW
============================

1. FAST PATH completes: position created, ledger updated
2. Read back: SELECT current_total FROM exposure_ledger WHERE agent=Rajesh AND scope=cricket_night
3. IF current_total <= limit → DONE (normal case, 99.9% of the time)
4. IF current_total > limit:
a. overshoot = current_total - limit
b. BEGIN TRANSACTION
c. Reduce this bet's retained amount by overshoot
d. Create overflow position at upline level for overshoot amount
e. Update upline's exposure ledger
f. Update Rajesh's exposure ledger (subtract overshoot)
g. COMMIT
h. Fire SAFETY_MARGIN_BREACH alert
i. Increase safety_margin by 50% for next 60 seconds

Walk Through: 10 Simultaneous Bets on Rajesh at 78% Utilization

Setup:

  • Rajesh's cricket night limit: 10,00,000
  • Current exposure: 7,80,000 (78%)
  • Safety margin: 50,000
  • DB-path threshold: 9,50,000
  • 10 bets arrive within 200 milliseconds, each adding approximately 25,000 liability

Step-by-step:

TIME    BET   LRU CACHE SHOWS   THRESHOLD   PATH      RESULT
====== ==== ================ ========= ======== ==================================
T+0ms B1 ₹7,80,000 ₹9,50,000 FAST Retain ₹25,000. New actual: ₹8,05,000
T+20ms B2 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹8,30,000
T+40ms B3 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹8,55,000
T+60ms B4 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹8,80,000
T+80ms B5 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹9,05,000
T+100ms B6 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹9,30,000
T+120ms B7 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹9,55,000
T+140ms B8 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹9,80,000
T+160ms B9 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹10,05,000 ← OVERSHOOT!
T+180ms B10 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹10,30,000 ← OVERSHOOT!

Wait -- the LRU cache is stale for the entire 200ms burst because TTL is 5 seconds. All 10 bets see the same cached value. But at 78%, the cached value (7,80,000) is well below the threshold (9,50,000), so all 10 take the FAST PATH.

But the post-write validation catches the problem:

  • B9 finishes writing, reads back 10,05,000, detects overshoot of 5,000

    • B9's retained amount is reduced by 5,000
    • 5,000 overflows to Vikram
    • Alert fires
  • B10 finishes writing, reads back 10,30,000, detects overshoot of 30,000

    • B10's retained amount is reduced by 30,000
    • 30,000 overflows to Vikram
    • Alert fires

After all 10 bets complete:

  • Rajesh's actual exposure: exactly 10,00,000 (the limit)
  • Two bets had post-write corrections (B9 and B10)
  • Safety margin is temporarily increased by 50% (to 75,000) for the next 60 seconds
  • Total correction: 35,000 in overflow that was initially retained but corrected
  • No money lost, no limit breached after correction

Now consider if Rajesh was at 96% utilization (9,60,000) instead:

All 10 bets would see the cache showing 9,60,000, which is ABOVE the threshold of 9,50,000. All 10 go through the DB PATH with FOR UPDATE locking. They serialize. Each one reads the true current value, updates it, and the moment the limit is reached, remaining bets overflow to the upline. No corrections needed. Slower (10-25ms each, serialized) but perfectly correct.

The key insight: At 78% utilization, the worst case is a temporary overshoot of (10 bets x 25,000 = 2,50,000) which pushes exposure to 10,30,000. The post-write validation corrects this within milliseconds. The safety margin is designed so that the DB PATH kicks in before the overshoot becomes dangerously large. At 95%+ utilization, the DB PATH prevents any overshoot entirely.


36. Multi-Instance Cache Coherency (HIGH)

Why LRU Per-Instance Is Broken

When Hannibal runs multiple application instances behind a load balancer (which is required for horizontal scaling and high availability), the in-memory LRU cache on each instance diverges immediately.

INSTANCE 1                     INSTANCE 2
LRU Cache: LRU Cache:
Rajesh exposure = ₹9,20,000 Rajesh exposure = ₹8,80,000
(updated 2 seconds ago) (updated 4 seconds ago)

REALITY (PostgreSQL):
Rajesh exposure = ₹9,45,000

Instance 1 received recent bets for Rajesh and updated its local cache. Instance 2 has an older cached value. A bet arriving at Instance 2 sees 8,80,000 and takes the FAST PATH. But the real exposure is 9,45,000 -- possibly within the safety margin zone where it should take the DB PATH.

With N instances, each maintaining independent LRU caches with 5-second TTLs, the worst-case staleness is not 5 seconds but 5 seconds multiplied by the probability that a specific agent's bets are spread across instances. For popular agents during IPL, bets WILL be spread across all instances.

The cleanest solution is to eliminate the per-instance LRU cache for exposure data and make Redis the first-tier cache for all exposure reads. Redis is shared across all instances, so there is no coherency problem.

What changes:

Data TypeOld ArchitectureNew Architecture
Exposure countersLRU (5s) → Redis → PostgreSQLRedis → PostgreSQL
Agent config/matrixLRU (5min) → Redis → PostgreSQLLRU (5min) → Redis → PostgreSQL (unchanged -- config is read-heavy, write-rare)
NO_NEW_RISK flagsRedisRedis (unchanged)
User win cap stateRedisRedis (unchanged)
Period boundariesLRU (1hr)LRU (1hr) (unchanged -- same on all instances)

Why this works: Exposure counters are the only data that is both write-heavy AND correctness-critical. By routing all exposure reads through Redis, every instance sees the same value. Redis reads are sub-millisecond (0.1-0.5ms), so the latency increase compared to the LRU cache (essentially zero latency) is negligible -- well within the 90ms budget.

Agent configuration, matrix rules, and period boundaries are safe to cache per-instance because they change rarely (admin actions, not bet flow) and a 5-second or 5-minute staleness window is acceptable. When they DO change, a Redis pub/sub notification invalidates all instance caches (see below).

Config Cache Invalidation via Pub/Sub

For configuration data that IS cached per-instance (matrix rules, agent limits, period configs), changes must propagate to all instances:

The pub/sub message format:

FieldDescriptionExample
typeWhat changedMATRIX_UPDATE, LIMIT_UPDATE, PERIOD_UPDATE, USER_OVERRIDE
agent_idWhich agentrajesh_mumbai
scopeWhich scope (if applicable)cricket, mi_vs_csk_2026_03_15
timestampWhen the change was made2026-03-15T21:34:12.456Z
versionNew config version number47

Each instance subscribes to the config.invalidate channel on startup. When a message arrives, the instance evicts the specified entries from its LRU cache. The next request for that data causes a cache miss, which fetches the fresh value from Redis or PostgreSQL.

How This Interacts with the Safety Margin (Section 35)

With Redis as the effective Tier 1 for exposure data, the safety margin calculation from Section 35 becomes more accurate:

  • Redis is updated after every DB write (within the same request lifecycle)
  • The maximum staleness of a Redis exposure value is the time between one bet's DB write completing and the next bet's Redis read -- typically 1-5ms, not 5 seconds
  • This means the safety margin can be SMALLER, because the "expected bets per TTL" is now "expected bets per 5ms" instead of "expected bets per 5 seconds"

Revised safety margin with Redis as Tier 1:

ParameterOld (LRU Tier 1)New (Redis Tier 1)
Effective TTL for exposure5,000ms~5ms
Expected bets per TTL (Rajesh at 0.8/sec)40.004
Dynamic margin8,500 x 4 = 34,0008,500 x 0.004 = 34
Effective safety marginmax(50,000, 34,000) = 50,000max(50,000, 34) = 50,000

The fixed minimum margin of 50,000 dominates in both cases, but the key insight is that with Redis as Tier 1, the FAST PATH is safe for a much wider range. The probability of a race condition breaching the safety margin drops from "possible during normal operation" to "essentially impossible unless Redis itself is partitioned."

Deployment Topology

Redis Failure Mode

If Redis becomes unavailable, the system falls back to PostgreSQL for ALL exposure reads. This increases latency (from <1ms to 5-15ms per read) but maintains correctness. The circuit breaker pattern detects Redis unavailability within 3 failed requests and switches all instances to DB-direct mode. When Redis recovers, instances resume using it after a health check confirms 3 consecutive successful reads.


37. PostgreSQL Scaling Strategy (HIGH)

Projected Data Volumes for First IPL Season

An IPL season runs approximately 60 days with 70+ matches. Here are the projected volumes:

TableRows per Day (Normal)Rows per Day (IPL Peak)Total After First SeasonRow Size (avg)Total Size
bets50,0003,00,00090,00,000500 bytes~4.5 GB
positions1,50,0009,00,0002,70,00,000400 bytes~10.8 GB
exposure_ledger5,000 (updates, not new rows)20,00050,000 (rows, updated in place)200 bytes~10 MB
audit_trail50,0003,00,00090,00,0002,000 bytes~18 GB
settlements1,50,0009,00,0002,70,00,000300 bytes~8.1 GB
forwarding_matrix_rulesRare writesRare writes~50,000300 bytes~15 MB
TOTAL~41.4 GB

The database is not enormous by modern standards, but the write contention during peak hours is the real challenge. During the IPL final, the positions table could see 150 inserts per second, and the exposure_ledger table could see 500 updates per second (because each bet updates multiple agents' ledgers).

Partitioning Strategy

Primary partition key: time (monthly range partitioning)

This is the most effective strategy because:

  1. Most queries are time-bounded (today's bets, this week's settlements, last month's audit trail)
  2. Old partitions become read-only and can be moved to cheaper storage
  3. Partition pruning eliminates scanning old data for real-time queries
  4. Individual partitions stay small enough for efficient indexing
bets table partitions:
bets_2026_01 (January 2026)
bets_2026_02 (February 2026)
bets_2026_03 (March 2026 - IPL starts)
bets_2026_04 (April 2026 - IPL peak)
bets_2026_05 (May 2026 - IPL ends)
...

positions table partitions:
positions_2026_01
positions_2026_02
...

audit_trail table partitions:
audit_trail_2026_01
audit_trail_2026_02
...

Secondary partition consideration: by agent (for very large agents)

If a single agent like Vikram (with 12 sub-agents and thousands of punters) generates disproportionate volume, the positions table can be further sub-partitioned by agent_id using hash partitioning. This is only needed if a single monthly partition exceeds 10 GB for the positions table, which is unlikely in the first season but should be planned for.

Separate Write-Optimized Store for Audit Records

The audit_trail table is append-only and write-heavy. It should be separated from the transactional tables:

CharacteristicTransactional Tables (bets, positions, exposure_ledger)Audit Store (audit_trail)
Write patternInsert + UpdateAppend-only
Read patternPoint lookups, range scans by time/agentFull-record retrieval by bet_id, range scans for disputes
Consistency requirementStrong (part of bet transaction)Eventual (can lag by up to 500ms)
Index requirementsHeavy (multiple indexes for lookups)Light (bet_id primary, agent_id + time for scanning)

Implementation: Audit records are buffered in an in-memory queue and flushed to a separate PostgreSQL schema (or a separate database instance if load warrants it) every 500ms. The audit write is NOT part of the bet placement transaction. If the audit flush fails, records are persisted to a local WAL (write-ahead log) file and retried.

The separate audit store uses:

  • autovacuum_vacuum_cost_delay = 0 (aggressive vacuuming for append-only workload)
  • fillfactor = 100 (no space reserved for updates, since rows are never updated)
  • Minimal indexes: only bet_id (primary), and a composite on (agent_id, created_at)

Read Replicas for Dashboard Queries

WRITE PATH (bet processing):
App Instance → PostgreSQL Primary (positions, ledgers, bets)

READ PATH (dashboards, reports):
App Instance → PostgreSQL Read Replica 1 (real-time dashboards, exposure summary)
Reporting Service → PostgreSQL Read Replica 2 (daily P&L, weekly settlement, analytics)
Support Dashboard → PostgreSQL Read Replica 2 (dispute resolution, audit trail queries)

Replication lag tolerance:

  • Dashboard queries: 1 second lag is acceptable (dashboard refreshes every 2-5 seconds anyway)
  • Settlement queries: zero lag required (use primary)
  • Reporting queries: 30 second lag is acceptable

Connection Pool Management

PoolMax ConnectionsTargetPurpose
bet_processing20 per instance x 3 instances = 60PostgreSQL PrimaryBet placement, exposure updates, position creation
settlement10 per instance x 1 instance = 10PostgreSQL PrimarySettlement processing (batch, lower concurrency)
dashboard_read15 per instance x 3 instances = 45Read Replica 1Agent dashboards, real-time queries
reporting_read10 per instance x 1 instance = 10Read Replica 2Reports, analytics, support tools
audit_write5 per instance x 3 instances = 15Audit DBAudit trail flushing

Total connections to Primary: 70 (well within PostgreSQL's default max_connections of 100, with headroom for admin connections)

PgBouncer recommendation: Place PgBouncer in front of PostgreSQL Primary in transaction pooling mode. This allows the application to open more logical connections than physical database connections, which is critical during traffic spikes.

When to Consider Event Sourcing for Audit Trail

Event sourcing (storing every state change as an immutable event rather than mutating rows) is already partially described in Section 11 for configuration changes. For the full audit trail, event sourcing should be considered when:

  1. The replay capability in Section 11 is used more than 10 times per week -- this indicates frequent disputes or compliance reviews, making a native event-sourced store more efficient than reconstructing state from audit records
  2. Audit trail queries become a performance bottleneck on the main database -- event-sourced stores (like EventStoreDB or a Kafka topic with compaction) are optimized for append and sequential read
  3. Regulatory requirements mandate immutable, tamper-proof audit trails -- an event store with cryptographic chaining provides stronger guarantees than a mutable PostgreSQL table

For the first IPL season: Use the PostgreSQL-based audit trail with append-only semantics and monthly partitioning. This is simpler to operate, easier to query, and sufficient for the projected volumes. Revisit event sourcing before the second season based on actual usage patterns.


38. Atomic Transaction Scaling (HIGH)

The Contention Problem

Every bet updates exposure ledgers for multiple agents atomically. Amit's bet touches Rajesh's ledger, Vikram's ledger, and the Platform's ledger -- all within a single PostgreSQL transaction. If another punter under Rajesh places a bet simultaneously, both transactions compete for a lock on Rajesh's ledger row.

With sharded counters (Section 15), the contention is reduced by N (where N is the shard count). But the cross-agent atomicity requirement means the transaction must lock shards across multiple agents, which increases the lock duration and deadlock risk.

Contention Analysis: Vikram with 12 Sub-Agents at 5 Bets/Sec Each

VIKRAM'S TRAFFIC PROFILE
==========================
Sub-agents: 12
Bets per second per sub-agent: 5
Total bets per second touching Vikram's ledger: 60

Each bet's transaction:
1. Lock sub-agent's exposure shard: ~2ms
2. Lock Vikram's exposure shard: ~2ms
3. Lock Platform's exposure shard: ~2ms
4. Write positions (3 rows): ~5ms
5. Commit: ~3ms
Total lock duration: ~14ms

With 60 bets/sec and 14ms lock duration:
Probability of contention on SAME Vikram shard:
60 bets/sec * 14ms per bet = 0.84
(84% of the time, at least one other bet is holding a Vikram shard lock)

With 8 Vikram shards:
60/8 = 7.5 bets/sec per shard
7.5 * 14ms = 0.105
(10.5% contention rate per shard -- acceptable)

The Tiered Atomicity Model

Not all agents need the same level of atomicity. The system uses three tiers:

Tier 1: Per-Level Atomicity (for hot agents like Vikram)

Instead of one cross-agent transaction, each level's ledger update is independent:

BET PROCESSING FOR HOT AGENTS
================================

Step 1: Process at Rajesh's level
BEGIN TRANSACTION
Lock Rajesh's exposure shard (random shard)
Check Rajesh's limit
Calculate Rajesh's retention vs forwarding
Write Rajesh's position
Update Rajesh's exposure shard
COMMIT
→ Output: ₹4,000 forwarded to Vikram

Step 2: Process at Vikram's level (separate transaction)
BEGIN TRANSACTION
Lock Vikram's exposure shard (random shard)
Check Vikram's limit
Calculate Vikram's retention vs forwarding
Write Vikram's position
Update Vikram's exposure shard
COMMIT
→ Output: ₹1,600 forwarded to Platform

Step 3: Process at Platform level (separate transaction)
BEGIN TRANSACTION
Lock Platform's exposure shard (random shard)
Write Platform's position
Update Platform's exposure shard
Queue hedge order
COMMIT

What happens if Step 2 fails after Step 1 succeeds?

Rajesh's position is created but Vikram's is not. The system enters a partial routing state. This is handled by:

  1. A routing_status field on the bet: PARTIAL (not all levels processed)
  2. A background retry job picks up partial bets within 1 second
  3. The retry job completes the remaining levels
  4. If retry also fails 3 times, the bet enters the dead letter queue
  5. The dead letter queue handler can either complete the routing or reverse Rajesh's position

The key insight: a partial routing state is not dangerous. Rajesh has already retained his portion. The forwarded amount simply has not been allocated to Vikram yet. The exposure is "in transit" -- it is neither overcounted nor undercounted at the system level, because the total stake is still ₹10,000.

Tier 2: Cross-Level Atomicity (for normal agents)

For agents with moderate traffic (under 10 bets/sec touching their ledger), the original single-transaction approach works fine. One transaction locks all relevant shards across all levels, writes all positions, and commits atomically.

Tier 3: Eventually Consistent (for the Platform level)

The Platform is the final level in every cascade. It receives the most traffic (every bet eventually reaches the Platform). The Platform's ledger can be updated asynchronously:

  1. The bet processing pipeline creates positions for all agent levels synchronously
  2. The Platform's exposure ledger is updated via an async counter increment in Redis
  3. A background job reconciles the Redis counter with the PostgreSQL ledger every 5 seconds
  4. The Platform's hedge queue reads from the Redis counter for real-time decisions

This works because the Platform has the deepest pockets and the highest limits. A 5-second lag in the Platform's exposure ledger does not create meaningful risk. The Platform's limits are set with sufficient headroom to absorb any lag.

Agent Classification for Atomicity Tier

Agent CharacteristicAtomicity TierShard CountReasoning
Top-level agent with 10+ sub-agentsTier 1 (per-level)16 shardsHighest contention, needs maximum parallelism
Mid-level agent with 3-9 sub-agentsTier 2 (cross-level)8 shardsModerate contention, single transaction still viable
Leaf agent with direct punters onlyTier 2 (cross-level)4 shardsLow contention, simplest approach
PlatformTier 3 (eventual)32 shardsHighest throughput, can tolerate lag

Optimized Locking Strategy for Vikram

Deadlock prevention: By always processing levels in order (Level 1 first, then Level 2, then Level 3, etc.), and by using per-level transactions for hot agents, deadlocks are structurally impossible. Two bets from different sub-agents under Vikram might contend on the same Vikram shard, but they will never hold conflicting locks across levels because each level is a separate transaction.


39. Audit Trail Storage Architecture (MEDIUM)

Hot/Warm/Cold Storage Tiers

TierAge of DataStorageIndexesQuery Latency TargetCost
HOT0-7 daysPostgreSQL Primary (same DB, audit schema)Full indexes on bet_id, agent_id, user_id, created_at, event_idP99 < 50msHighest
WARM7-90 daysPostgreSQL (separate tablespace on slower disk, or read replica)Reduced indexes: bet_id, agent_id + created_at composite onlyP99 < 500msMedium
COLD90+ daysCompressed Parquet files on object storage (S3-compatible or local NAS)External index in PostgreSQL (bet_id → file + offset mapping)P99 < 5 secondsLowest

Retention Policies

DataHot RetentionWarm RetentionCold RetentionTotal Retention
Audit trail records7 days90 days3 years3 years
Bet records30 days1 year5 years5 years
Position records30 days1 year5 years5 years
Settlement records30 days1 year7 years7 years (regulatory)
Exposure ledger snapshots7 days (hourly snapshots)90 days (daily snapshots)1 year1 year

The Append-Only Audit Store

The audit trail uses an append-only design. Records are never updated or deleted in place. Corrections or amendments are stored as new records that reference the original.

AUDIT RECORD STRUCTURE
========================

Each record contains:
- record_id (UUID, primary key)
- bet_id (UUID, indexed)
- record_type: BET_PLACED, BET_SETTLED, POSITION_CORRECTED, CONFIG_CHANGED
- agent_id (indexed)
- user_id (indexed)
- event_id (indexed -- for looking up all bets on a specific match)
- created_at (indexed, partition key)
- payload (JSONB -- the full structured audit data)
- checksum (SHA-256 hash of the payload, for tamper detection)
- previous_checksum (hash of the previous record for this bet_id, creating a chain)

The previous_checksum field creates a hash chain similar to a blockchain. Each new audit record for a given bet references the checksum of the previous record. This makes tampering detectable: if any record in the chain is modified, all subsequent checksums become invalid.

Indexing Strategy

Hot tier indexes (full):

IndexColumnsPurpose
Primary keyrecord_idUnique lookup
bet_lookupbet_idFind all audit records for a specific bet
agent_timeagent_id, created_at DESCAgent's recent activity, dispute resolution
user_timeuser_id, created_at DESCUser's betting history
event_lookupevent_id, created_at DESCAll bets on a specific match
type_timerecord_type, created_at DESCFind all settlements, all corrections, etc.

Warm tier indexes (reduced):

IndexColumnsPurpose
Primary keyrecord_idUnique lookup
bet_lookupbet_idDispute resolution (most common warm-tier query)
agent_timeagent_id, created_at DESCHistorical agent queries

Cold tier indexes (external):

A separate mapping table in PostgreSQL:

ColumnTypeDescription
bet_idUUIDThe bet to look up
file_pathTEXTPath to the Parquet file in object storage
row_offsetINTEGERRow position within the file
monthDATEMonth partition of the cold data

Query Performance for Dispute Resolution

Common dispute queries and their targets:

QueryExpected LatencyTierHow
"Show me bet XYZ's full audit trail"P99 < 50msHot (if recent)Index lookup on bet_id
"Show me all of Rajesh's bets last night"P99 < 200msHotRange scan on agent_id + created_at
"Show me all bets on MI vs CSK final"P99 < 500msHotRange scan on event_id
"Show me bet XYZ from 2 months ago"P99 < 500msWarmIndex lookup on bet_id in warm partition
"Show me bet XYZ from last year"P99 < 5sColdLookup mapping table, fetch from object storage

Storage Cost Projections

TierVolume After Year 1Storage Cost (approximate, cloud)Total Annual
Hot (SSD, indexed)18 GB (7 days of audit)$0.25/GB/month$54/year
Warm (HDD, partial indexes)150 GB (83 days of audit)$0.05/GB/month$90/year
Cold (object storage, compressed)50 GB (compressed from ~200 GB)$0.01/GB/month$6/year
Total~$150/year

Storage costs are negligible. The real cost is in compute (query processing) and IOPS (index maintenance). The tiered approach ensures that the expensive hot tier stays small while the cheap cold tier absorbs the bulk.

Tier Migration Job

A nightly job moves records between tiers:

NIGHTLY TIER MIGRATION JOB (runs at 4:00 AM IST)
==================================================

1. HOT → WARM migration:
- SELECT records WHERE created_at < NOW() - INTERVAL '7 days'
- INSERT into warm partition
- DELETE from hot partition
- Rebuild hot tier indexes (REINDEX CONCURRENTLY)
- Expected duration: 2-5 minutes

2. WARM → COLD migration:
- SELECT records WHERE created_at < NOW() - INTERVAL '90 days'
- Export to Parquet file (one file per day, compressed)
- Upload to object storage
- INSERT mapping rows into cold_index table
- DELETE from warm partition
- Expected duration: 5-15 minutes

3. COLD expiry:
- DELETE mapping rows WHERE month < NOW() - INTERVAL '3 years'
- Delete corresponding Parquet files from object storage
- Expected duration: < 1 minute

40. Horizontal Scaling for the Cascade Engine (MEDIUM)

Partitioning by Top-Level Agent Subtree

The cascade engine processes bets through agent hierarchies. The natural partition boundary is the top-level agent subtree. All agents and punters under Vikram form one subtree; all agents and punters under another master agent form a separate subtree.

Why this partitioning works:

  1. A bet from Amit (under Rajesh, under Vikram) ONLY touches Rajesh's and Vikram's ledgers at the agent level. It never touches Suresh's or Kumar's data. So Partition A can process independently of Partition B.
  2. The Platform level is the convergence point, but it uses the eventually-consistent model from Section 38 (Tier 3 atomicity), so it does not create cross-partition locking.
  3. Each partition can run on a separate application instance or thread pool, with its own Redis key space for exposure counters.

How Cross-Agent Detection Works Across Partitions

The syndicate detection problem from Section 13 requires cross-agent visibility. If a syndicate member bets through Rajesh (Partition A) and also through Arun (Partition B), neither partition alone can detect the correlation.

Solution: a separate detection service that reads from all partitions.

Each partition publishes a lightweight bet event to a shared Redis Stream after completing the bet. The event contains: user_id, agent_id, event_id, outcome, stake, timestamp. The cross-agent detection service consumes this stream and maintains a sliding window of recent bets, looking for correlation patterns.

The detection service does NOT participate in the bet processing pipeline. It runs asynchronously. If it detects a syndicate pattern, it publishes a flag that the relevant partitions pick up on their next bet from the flagged user.

Load Balancing Strategy

StrategyHow It WorksWhen to Use
Agent-affinity routingAll bets for a given top-level subtree go to the same instanceDefault strategy. Load balancer uses a consistent hash of the top-level agent_id
Overflow routingIf the assigned instance is overloaded (queue depth > threshold), bets overflow to any available instanceDuring traffic spikes on a single subtree (e.g., Vikram's agents during IPL final)
Hot-agent splittingA single subtree is split across 2+ instances, with sub-agents assigned to different instancesWhen a single master agent's traffic exceeds one instance's capacity

The load balancer (nginx or application-level) maintains a routing table:

ROUTING TABLE
===============
Vikram subtree → Instance 1 (primary), Instance 2 (overflow)
Suresh subtree → Instance 2 (primary), Instance 3 (overflow)
Kumar subtree → Instance 3 (primary), Instance 1 (overflow)
Platform cascade → All instances (round-robin, eventually consistent)

Handling Agent Hierarchy Changes That Cross Partitions

When an agent moves from one master agent to another (e.g., Rajesh leaves Vikram and joins Suresh), the partition assignment changes:

  1. Admin initiates the transfer via API
  2. System sets Rajesh's status to TRANSFERRING -- no new bets accepted for Rajesh's punters (brief pause, typically 2-5 seconds)
  3. All in-flight bets for Rajesh's punters complete on the old partition
  4. Rajesh's exposure ledger state is frozen and serialized
  5. Routing table is updated: Rajesh's punters now route to Suresh's partition
  6. Rajesh's exposure state is loaded into the new partition
  7. Rajesh's status is set to ACTIVE on the new partition
  8. New bets resume

The transfer window (2-5 seconds of paused betting) is acceptable because hierarchy changes are rare admin operations, not real-time events. During the transfer, punters see "placing bet..." for a few extra seconds rather than an error.

Deployment Diagram

                          ┌─────────────────────┐
│ Load Balancer │
│ (Agent-Affinity │
│ Consistent Hash) │
└─────────┬────────────┘

┌───────────────────────┼───────────────────────┐
│ │ │
┌─────────▼─────────┐ ┌─────────▼─────────┐ ┌─────────▼─────────┐
│ Instance 1 │ │ Instance 2 │ │ Instance 3 │
│ Vikram subtree │ │ Suresh subtree │ │ Kumar subtree │
│ + overflow for │ │ + overflow for │ │ + overflow for │
│ Kumar │ │ Vikram │ │ Suresh │
│ │ │ │ │ │
│ Cascade Engine │ │ Cascade Engine │ │ Cascade Engine │
│ Matrix Resolver │ │ Matrix Resolver │ │ Matrix Resolver │
│ Limit Checker │ │ Limit Checker │ │ Limit Checker │
└────────┬───────────┘ └────────┬───────────┘ └────────┬───────────┘
│ │ │
└───────────────────────┼───────────────────────┘

┌──────────────┼──────────────┐
│ │ │
┌───────▼──┐ ┌──────▼───┐ ┌─────▼────┐
│ Redis │ │ PG │ │ Cross- │
│ Cluster │ │ Primary │ │ Agent │
│ (shared) │ │ + Replicas│ │ Detector │
└──────────┘ └──────────┘ └──────────┘

41. Monitoring and Alerting System (MEDIUM)

Key Metrics per Pipeline Stage

Bet Processing Pipeline:

StageMetricCollection MethodAlert Threshold
Request ingestionbet.requests_per_secondCounter, per instance> 200/sec (approaching capacity)
Request ingestionbet.request_parse_error_rateCounter> 1% of requests
Validationbet.validation_failure_rateCounter> 10% (potential attack)
User win cap checkbet.win_cap_latency_p99Histogram> 15ms
Stake reductionbet.stake_reduction_rateCounter> 20% of bets (limits may be too low)
Matrix resolutionbet.matrix_resolve_latency_p99Histogram> 20ms
Matrix resolutionbet.matrix_cache_miss_rateCounter> 30% (cache issue)
Agent cap checkbet.cap_check_latency_p99Histogram> 30ms per level
Exposure ledgerbet.exposure_update_latency_p99Histogram> 25ms
Position creationbet.position_write_latency_p99Histogram> 20ms
Audit writebet.audit_write_latency_p99Histogram> 15ms
End-to-endbet.total_latency_p99Histogram> 90ms (SLA breach)
End-to-endbet.total_latency_p50Histogram> 40ms (performance degradation)
End-to-endbet.success_rateCounter< 99.5%

Hedge Execution Pipeline:

MetricCollection MethodAlert Threshold
hedge.queue_depthGauge> 50 orders (backlog building)
hedge.execution_latency_p99Histogram> 2 seconds
hedge.betfair_api_latency_p99Histogram> 1 second
hedge.partial_fill_rateCounter> 40% (liquidity problem)
hedge.unhedged_exposure_totalGauge> 10 lakh (risk accumulation)
hedge.betfair_error_rateCounter> 5% (API degradation)
hedge.slippage_averageGauge> 0.05 (pricing problem)

Settlement Pipeline:

MetricCollection MethodAlert Threshold
settlement.latency_p99Histogram> 30 seconds per event
settlement.failure_rateCounter> 0.1% (any settlement failure is serious)
settlement.idempotency_collision_rateCounter> 0 (should be zero in normal operation)
settlement.reconciliation_driftGauge> ₹1,000 (ledger mismatch)

Infrastructure Metrics:

MetricAlert Threshold
redis.latency_p99> 5ms
redis.memory_usage_percent> 80%
redis.connection_pool_exhaustion> 90%
postgres.active_connections> 80% of max
postgres.replication_lag_seconds> 5 seconds
postgres.lock_wait_time_p99> 100ms
postgres.dead_tuples_ratio> 20% (vacuum falling behind)

Alert Thresholds and Escalation Paths

ESCALATION MATRIX
==================

P1 - CRITICAL (immediate response required)
Who: On-call engineer (PagerDuty) + Engineering lead
When: 24/7
SLA: Acknowledge within 5 minutes, resolve within 30 minutes
Examples:
- bet.total_latency_p99 > 200ms for 2 minutes
- bet.success_rate < 95% for 1 minute
- settlement.failure_rate > 1% for any settlement batch
- hedge.unhedged_exposure_total > 50 lakh
- postgres primary down or unreachable
- redis primary down or unreachable

P2 - HIGH (response within 1 hour)
Who: On-call engineer (Slack + PagerDuty)
When: Business hours + match hours
SLA: Acknowledge within 15 minutes, resolve within 2 hours
Examples:
- bet.total_latency_p99 > 90ms for 5 minutes
- bet.matrix_cache_miss_rate > 50% for 5 minutes
- hedge.betfair_api_latency_p99 > 2 seconds for 5 minutes
- settlement.reconciliation_drift > ₹10,000
- postgres.replication_lag_seconds > 30

P3 - MEDIUM (next business day)
Who: Engineering team (Slack channel)
When: Business hours
SLA: Acknowledge within 4 hours, resolve within 24 hours
Examples:
- bet.stake_reduction_rate > 30% for 1 hour
- redis.memory_usage_percent > 80%
- postgres.dead_tuples_ratio > 20%
- audit.write_lag > 5 seconds

P4 - LOW (weekly review)
Who: Engineering team (weekly metrics review)
Examples:
- bet.total_latency_p50 trending upward over 7 days
- hedge.slippage_average trending upward over 7 days
- storage utilization approaching 70%

Dashboard Design for Ops Team

Dashboard 1: Real-Time Operations (primary display during matches)

┌─────────────────────────────────────────────────────────────────────────┐
│ HANNIBAL OPS DASHBOARD 2026-03-15 21:47 │
├─────────────────────────┬───────────────────────────────────────────────┤
│ SYSTEM HEALTH │ BET THROUGHPUT (last 5 min) │
│ │ │
│ Bet Pipeline: 🟢 OK │ ████████████████████░░ 167/sec │
│ Hedge Engine: 🟢 OK │ Peak today: 203/sec (21:32) │
│ Settlement: 🟢 OK │ P99 latency: 72ms │
│ Redis: 🟢 OK │ Success: 99.92% │
│ PostgreSQL: 🟢 OK │ │
│ Betfair API: 🟡 SLOW │ Error breakdown: │
│ │ Validation: 12/min │
│ │ Timeout: 0/min │
│ │ DB Error: 0/min │
├─────────────────────────┼───────────────────────────────────────────────┤
│ EXPOSURE BY AGENT │ HEDGE STATUS │
│ (top 10 by %) │ │
│ │ Queue depth: 3 orders │
│ 1. Rajesh 76% ████░ │ Unhedged total: ₹2.1L │
│ 2. Vikram 54% ███░░ │ Betfair latency: 850ms ⚠ │
│ 3. Priya 41% ██░░░ │ Fill rate: 94% │
│ 4. Sanjay 38% ██░░░ │ Avg slippage: 0.02 │
│ 5. Arun 22% █░░░░ │ │
│ │ Last 10 hedges: │
│ NO_NEW_RISK active: 0 │ 21:45 MI 1.85 → filled 1.86 ✓ │
│ │ 21:44 CSK 2.10 → filled 2.10 ✓ │
│ │ 21:43 Draw 3.50 → partial 60% ⚠ │
├─────────────────────────┴───────────────────────────────────────────────┤
│ ACTIVE ALERTS │
│ │
│ ⚠ 21:45 Betfair API latency elevated (850ms, threshold 500ms) │
│ Status: Auto-monitoring, no action needed yet │
│ │
│ ✓ 21:30 Rajesh night limit at 84% - INFO (auto-resolved) │
│ │
│ [View All Alerts] [Silence Non-Critical] [Run Health Check] │
└─────────────────────────────────────────────────────────────────────────┘

Dashboard 2: Reconciliation & Financial (daily review)

Dashboard 3: Agent Health (support team view)

Specific Alert Definitions

Cache miss rate spike:

FieldValue
Alert namecache_miss_rate_spike
Metricbet.matrix_cache_miss_rate OR bet.exposure_cache_miss_rate
Condition> 50% for 3 consecutive minutes
SeverityP2
Probable causeRedis memory pressure, network partition, config invalidation storm
Auto-mitigationNone (requires investigation)
Run book actionCheck Redis memory, check recent config change frequency, verify pub/sub connectivity

Betfair API latency:

FieldValue
Alert namebetfair_api_degraded
Metrichedge.betfair_api_latency_p99
Condition> 1 second for 2 consecutive minutes
SeverityP2 (escalate to P1 if > 5 seconds for 5 minutes)
Probable causeBetfair infrastructure issue, network routing, API rate limiting
Auto-mitigationIncrease hedge retry delay, reduce concurrent hedge requests
Run book actionCheck Betfair status page, check outbound network, verify API key validity

Exposure ledger drift:

FieldValue
Alert nameexposure_ledger_drift
Metricsettlement.reconciliation_drift
Condition> ₹1,000 for any agent
SeverityP1 (financial accuracy issue)
Probable causeRace condition in ledger update, missed settlement, partial transaction commit
Auto-mitigationTrigger immediate recompute for the affected agent
Run book actionRun manual recompute, compare with position sum, identify the divergent bet

Settlement failure rate:

FieldValue
Alert namesettlement_failure_elevated
Metricsettlement.failure_rate
Condition> 0.1% for any settlement batch
SeverityP1
Probable causeDB connection exhaustion, data inconsistency, event result ambiguity
Auto-mitigationRetry failed settlements 3 times with exponential backoff
Run book actionCheck DB connections, inspect failed settlement IDs, verify event results

Run Book Topics

TopicWhat to Do
Redis primary downSystem auto-falls back to PostgreSQL for all reads. Monitor bet latency (will increase to 10-20ms from <1ms). Restart Redis. After restart, run redis-warmup script to repopulate exposure counters from PostgreSQL.
PostgreSQL primary downCritical outage. All bet placement fails. Switch to read replica as emergency primary (manual failover). Accept data loss risk for last few seconds of unreplicated writes. After recovery, reconcile.
Bet latency spike (P99 > 200ms)Check PostgreSQL lock wait times. If elevated, identify the hot agent (likely the one with the most bets/sec) and increase their shard count temporarily. Check Redis latency. Check for long-running queries on the primary.
Betfair completely unreachableHedge queue will grow. Platform absorbs all hedge-intended risk. Monitor hedge.unhedged_exposure_total. If it exceeds 50 lakh, consider temporarily increasing all agents' forward percentages to reduce platform risk.
Single agent's exposure ledger diverges from position sumRun reconciliation recompute --agent=AGENT_ID. Compare the recomputed total with the current ledger value. If they differ, the ledger is stale. Update the ledger to match the position sum. Investigate the root cause (check for missed settlement, partial commit).
Configuration change not propagatingCheck Redis pub/sub channel. Verify all instances are subscribed. Manually trigger a cache flush on all instances via admin API endpoint /admin/cache/flush?agent_id=X.
Surge of stake reductionsIndicates user win limits are being hit frequently. Check if a single user is hammering the system (potential abuse). Check if limits were accidentally lowered. Review the agent's win cap configuration.

42. Reconciliation System (HIGH)

What Is Reconciled

The reconciliation system verifies that the exposure ledger (the fast-access counter that the bet processing pipeline reads) matches the actual sum of open positions. These are two independent sources of the same truth, and they can drift apart due to bugs, partial commits, or race conditions.

Reconciliation CheckSource A (Expected)Source B (Actual)Acceptable Drift
Agent retained exposureexposure_ledger.retained_open_liabilitySUM of positions.liability WHERE agent=X AND status=OPEN AND type=RETAINED₹0 (zero tolerance)
Agent forwarded exposureexposure_ledger.forwarded_open_liabilitySUM of positions.liability WHERE agent=X AND status=OPEN AND type=FORWARDED₹0 (zero tolerance)
Agent potential winexposure_ledger.open_potential_winSUM of positions.potential_win WHERE agent=X AND status=OPEN₹0 (zero tolerance)
Stake conservationOriginal bet stakeSUM of all positions for that bet (retained + forwarded across all levels)₹0 (absolute conservation)
Settlement completenessCount of positions for settled eventCount of settlement records for that event0 (all positions must be settled)

The Reconciliation Job Workflow

How Discrepancies Are Flagged and Categorized

Each discrepancy record contains:

FieldDescriptionExample
discrepancy_idUnique identifierdisc_a1b2c3
agent_idAffected agentrajesh_mumbai
scopeWhich scope divergedcricket_night_2026_03_15
ledger_valueWhat the exposure ledger says₹15,00,000
computed_valueWhat the position sum says₹13,50,000
drift_amountThe difference₹1,50,000
drift_directionLEDGER_HIGH or LEDGER_LOWLEDGER_HIGH
detected_atWhen it was found2026-03-15T22:00:00Z
detection_methodWhich reconciliation job found itSCHEDULED_15MIN
categoryMINOR, MAJOR, CRITICALCRITICAL
resolution_statusOPEN, INVESTIGATING, RESOLVED, AUTO_CORRECTEDOPEN
root_causeFilled in during investigationPartial commit on bet_xyz at 21:47

The Manual Recompute Tool

The recompute tool is the primary remediation mechanism. It reconstructs the exposure ledger value from scratch by summing all open positions.

RECOMPUTE PROCEDURE
=====================

Command: reconciliation recompute --agent=rajesh_mumbai --scope=cricket

Step 1: Acquire advisory lock on agent+scope (prevents concurrent bets from modifying positions)
Step 2: SELECT SUM(liability) as retained FROM positions
WHERE agent_id='rajesh_mumbai' AND sport='cricket'
AND status='OPEN' AND position_type='RETAINED'
Step 3: SELECT SUM(liability) as forwarded FROM positions
WHERE agent_id='rajesh_mumbai' AND sport='cricket'
AND status='OPEN' AND position_type='FORWARDED'
Step 4: SELECT SUM(potential_win) as potential_win FROM positions
WHERE agent_id='rajesh_mumbai' AND sport='cricket'
AND status='OPEN'
Step 5: Compare with current ledger values
Step 6: If different:
UPDATE exposure_ledger SET
retained_open_liability = [computed retained],
forwarded_open_liability = [computed forwarded],
open_potential_win = [computed potential_win]
WHERE agent_id='rajesh_mumbai' AND scope='cricket'
Step 7: Log the correction with before/after values
Step 8: Release advisory lock
Step 9: Update Redis with new ledger values

Duration: 1-5 seconds per agent per scope
Impact: Agent's bets are briefly delayed (advisory lock held), not rejected

Tracking Drift Over Time to Detect Systemic Bugs

Every reconciliation result is stored in a time-series table:

ColumnTypeDescription
check_idUUIDUnique identifier
agent_idTEXTAgent
scopeTEXTScope (sport, market, period)
checked_atTIMESTAMPWhen the check ran
ledger_valueBIGINTLedger amount in paisa
computed_valueBIGINTPosition sum in paisa
drift_amountBIGINTDifference in paisa
drift_directionTEXTZERO, LEDGER_HIGH, LEDGER_LOW

A weekly analysis job examines this table for patterns:

  • Same agent drifting repeatedly: Indicates a bug in that agent's specific configuration or traffic pattern
  • All agents drifting in the same direction: Indicates a systemic bug in the exposure update logic
  • Drift correlating with high traffic periods: Indicates a race condition that manifests under load
  • Drift correlating with settlement batches: Indicates a bug in the settlement ledger decrement logic

Walk Through: Rajesh's Ledger Shows 15 Lakh, Actual Positions Sum to 13.5 Lakh

Situation: The scheduled 15-minute reconciliation job runs at 10:00 PM. It finds that Rajesh's cricket retained_open_liability ledger reads ₹15,00,000, but the sum of all his open retained cricket positions is only ₹13,50,000. The drift is ₹1,50,000 (LEDGER_HIGH, CRITICAL category).

What happened (likely root cause):

At 9:47 PM, a settlement batch processed the MI vs RR match. Rajesh had ₹1,50,000 of retained positions on that match. The settlement correctly set those positions to status=SETTLED, but the ledger decrement failed (perhaps due to a transient database connection error). The settlement service logged the error and moved on. The positions are settled, but the ledger still thinks they are open.

Investigation steps:

INVESTIGATION LOG
==================

10:00 PM - Reconciliation detects drift: ₹15L ledger vs ₹13.5L positions
10:00 PM - P1 alert fired. Rajesh switched to DB-PATH only
10:02 PM - On-call engineer acknowledges

10:03 PM - Engineer runs: reconciliation investigate --agent=rajesh_mumbai --scope=cricket
Output: "Ledger is ₹1,50,000 higher than position sum.
Last settlement batch at 9:47 PM settled 12 positions for MI vs RR.
Settlement records exist for all 12 positions.
Ledger decrement for MI vs RR settlement: NOT FOUND.
Root cause: Settlement decremented positions but failed to decrement ledger."

10:05 PM - Engineer runs: reconciliation recompute --agent=rajesh_mumbai --scope=cricket
Output: "Ledger updated from ₹15,00,000 to ₹13,50,000.
Correction: -₹1,50,000.
Audit record created: recompute_abc123.
Redis updated."

10:06 PM - Engineer verifies Rajesh's dashboard shows correct exposure
10:06 PM - Rajesh switched back from DB-PATH only to normal caching
10:07 PM - P1 alert resolved with root cause documented

10:08 PM - Bug ticket created: "Settlement ledger decrement lacks retry logic.
When the DB connection fails during decrement, the error is logged
but the decrement is not retried. Fix: add retry with 3 attempts
and dead letter queue for persistent failures."

43. Hedge Execution Engine (CRITICAL)

Design Overview

The hedge execution engine is responsible for placing bets on Betfair to offset the platform's retained risk. It is a separate service that consumes a hedge order queue and executes against the Betfair API.

Limit Order Placement with Configurable Max Slippage

Every hedge order is placed as a limit order on Betfair, not a market order. This prevents the platform from being filled at arbitrarily bad prices during volatile moments.

ParameterDescriptionDefaultConfigurable Per
max_slippage_ticksMaximum number of price ticks worse than the target price that the system will accept3 ticksPer sport, per market type
target_priceThe price at which the punter's bet was acceptedFrom bet recordPer bet
limit_priceThe worst price the system will accept: target_price + max_slippage_ticksComputedComputed
order_sizeAmount to hedge in GBP equivalentFrom cascade outputPer bet
time_in_forceHow long the order stays active before cancellation30 seconds for pre-match, 10 seconds for in-playPer event phase

Example: Amit's bet on MI at 1.85, platform needs to hedge ₹800 (approx £7.50).

Target price:         1.85 (back MI)
Max slippage: 3 ticks
Betfair price ladder: 1.85, 1.86, 1.87, 1.88, 1.89, 1.90 ...
Limit price: 1.88 (3 ticks worse than 1.85)
Order: BACK MI at 1.88 or better, size £7.50
Time in force: 30 seconds (pre-match)

If the best available price on Betfair is 1.86, the order fills at 1.86 (within slippage). If the best available price is 1.92, the order sits in the market at 1.88 for 30 seconds. If not filled, it is cancelled and re-evaluated.

Partial Fill Tracking and Re-Pricing Strategy

Betfair often provides partial liquidity. The hedge engine must track partial fills and decide whether to pursue the remainder.

PARTIAL FILL EXAMPLE
=====================

Hedge order: BACK MI £100 at limit 1.88
Betfair response: Filled £60 at 1.86, £40 unmatched

State after partial fill:
Hedged: £60 at 1.86
Unhedged: £40 at limit 1.88 (still in market)

After 10 seconds (in-play time_in_force):
Remaining £40 still unmatched

Decision tree:
1. Current best price on Betfair: 1.91
2. 1.91 > limit price 1.88 → cannot fill at current prices
3. Is £40 worth re-pricing? £40 > £5 threshold → YES
4. New limit price: 1.91 + 2 ticks = 1.93 (allow MORE slippage for the remainder)
5. Place new order: BACK MI £40 at 1.93
6. If this also partially fills or expires, repeat up to max_reprice_attempts (3)
7. After 3 re-price attempts, accept the unhedged remainder as platform risk

Re-pricing rules:

AttemptSlippage AllowedTime in ForceRationale
Initial3 ticks30s pre-match / 10s in-playConservative first attempt
Re-price 15 ticks from current market20s pre-match / 5s in-playMore aggressive, shorter wait
Re-price 28 ticks from current market10s pre-match / 3s in-playEven more aggressive
Re-price 3 (final)Market order equivalent (1000 ticks)5sLast resort -- get filled at any price
After all attemptsN/AN/AAccept as unhedged platform risk

Execution Quality Reporting

Every hedge order produces an execution quality record:

FieldDescriptionExample
hedge_order_idUnique identifierhedge_x1y2z3
bet_idThe originating betbet_a1b2c3
target_pricePrice the punter received1.85
achieved_priceWeighted average fill price1.87
slippageachieved_price - target_price0.02
fill_ratePercentage of order filled85%
fill_time_msTime from order placement to full fill4,200ms
reprice_countHow many times the order was re-priced1
unhedged_amountAmount left unhedged£6.25

Daily execution quality report:

HEDGE EXECUTION QUALITY - March 15, 2026
==========================================

Total hedge orders: 342
Fully filled: 287 (83.9%)
Partially filled: 41 (12.0%)
Unfilled (unhedged): 14 (4.1%)

Average slippage: 0.024 (2.4 ticks)
Worst slippage: 0.08 (8 ticks) -- KKR vs PBKS in-play
Best execution: -0.01 (better than target, market moved in our favor)

Total intended hedge: £12,400
Total actually hedged: £11,650 (93.9%)
Total unhedged: £750 (6.1%)

Slippage cost (vs perfect execution): £48.20

Unhedged Exposure Tracker

The unhedged exposure tracker maintains a real-time view of exposure that SHOULD be hedged but is NOT hedged, separate from deliberately retained risk.

UNHEDGED EXPOSURE DASHBOARD
==============================

Total unhedged: ₹2,34,000

By reason:
Betfair no liquidity: ₹1,20,000 (3 orders)
Betfair API timeout: ₹45,000 (1 order, retrying)
Slippage exceeded: ₹69,000 (2 orders, re-pricing)

By event:
MI vs CSK (Live): ₹1,65,000 ⚠ (largest single event)
RCB vs DC (Pre): ₹42,000
KKR vs SRH (Pre): ₹27,000

Aging:
< 1 minute: ₹45,000 (still in progress)
1-5 minutes: ₹69,000 (re-pricing)
5-30 minutes: ₹1,20,000 ⚠ (no liquidity, monitor)
> 30 minutes: ₹0

Alert threshold: ₹10,00,000 (₹10 lakh)
Current status: 🟢 Well below threshold

Stale Hedge Cleanup Process

Hedge orders that are older than a configurable threshold without being filled are considered stale. The cleanup process runs every 60 seconds:

STALE HEDGE CLEANUP (runs every 60 seconds)
=============================================

1. Find all hedge orders WHERE:
status = 'PENDING' or 'PARTIALLY_FILLED'
AND created_at < NOW() - stale_threshold

Stale thresholds:
Pre-match events: 5 minutes
In-play events: 60 seconds
Settled events: Immediate (hedge is pointless)

2. For each stale order:
a. Cancel the order on Betfair (if still open)
b. Record the partial fill amount (if any)
c. Mark order as STALE_CANCELLED
d. Move the unhedged amount to the unhedged exposure tracker
e. Fire alert if total unhedged exceeds threshold

3. For orders on settled events:
a. Cancel immediately
b. The platform already bears the outcome as retained risk
c. No further action needed

Queue Management for Hedge Orders During High Volume

During peak betting (IPL finals, 167 bets/sec), the hedge queue can receive 50+ orders per second (assuming ~30% of stake reaches the platform and ~50% of that is hedge-targeted).

Queue ParameterValueRationale
Queue technologyRedis StreamPersistent, supports consumer groups, at-least-once delivery
Consumer count4 concurrent consumersBetfair API allows 5 req/sec per app key; 4 consumers with rate limiting
Rate limit5 orders per second to BetfairBetfair API throttle limit
PriorityIn-play hedges prioritized over pre-matchIn-play prices change faster; pre-match can wait
BatchingAggregate multiple small hedges on the same selection into one order₹800 + ₹600 + ₹400 on same MI back → single £18 order
Max queue depth200 ordersIf exceeded, temporarily increase max_slippage and use more aggressive pricing
DeduplicationBy bet_id + event_id + selectionPrevent double-hedging from retry logic

Failover When Betfair Is Slow or Down

Health check: The hedge engine pings Betfair every 5 seconds with a lightweight listMarketBook call. Three consecutive failures (15 seconds) transitions to DOWN status. Three consecutive successes transitions back to HEALTHY. One success from DOWN transitions to DEGRADED.

Walk Through: Platform Needs to Hedge 5 Lakh on MI at 1.85, Only 2 Lakh Liquidity at 1.90

Scenario: Multiple bets have cascaded through the hierarchy. The platform needs to hedge ₹5,00,000 (approximately £4,700) on MI to win. The target price is 1.85. The Betfair order book shows:

BETFAIR ORDER BOOK - MI to win
================================
Back side (we need to back):
1.86: £1,200 available
1.88: £800 available
1.90: £2,000 available ← combined: only £4,000 available to 1.90
1.92: £1,500 available
1.95: £3,000 available

Execution sequence:

Step 1: Place limit order BACK MI £4,700 at 1.88 (target 1.85 + 3 ticks slippage)
Result: Filled £1,200 at 1.86 + £800 at 1.88 = £2,000 filled, £2,700 unmatched
Status: PARTIALLY_FILLED (42.5%)

Step 2: Wait 10 seconds (in-play time_in_force)
No additional fills at 1.88

Step 3: Re-price attempt 1
Current best available: 1.90
New limit: 1.90 + 2 ticks = 1.92
Place: BACK MI £2,700 at 1.92
Result: Filled £2,000 at 1.90 + £700 at 1.92 = £2,700 filled
Status: FULLY_FILLED

Total execution:
£1,200 at 1.86
£800 at 1.88
£2,000 at 1.90
£700 at 1.92

Weighted average price: (1200×1.86 + 800×1.88 + 2000×1.90 + 700×1.92) / 4700 = 1.889

Execution quality:
Target: 1.85
Achieved: 1.889
Slippage: 0.039 (3.9 ticks)
Cost of slippage: £4,700 × 0.039 = £183.30 (approximately ₹19,500)

This ₹19,500 slippage cost is deducted from the hedge effectiveness and
reported in the daily execution quality report.

44. Migration and Backfill Strategy (MEDIUM)

Mapping Existing Flat B-Book Configs to Forwarding Matrix Rules

The existing codebase has bbookConfigService.ts with a flat B-Book percentage per agent. The migration maps each flat config to a forwarding matrix with a single catch-all rule.

MIGRATION MAPPING
==================

Existing config (for Rajesh):
bbook_percentage: 60 (Rajesh keeps 60%)

Becomes forwarding matrix:
| Rule | market_type | sport_type | event_phase | source_type | liquidity_band | Forward % |
|------|-------------|------------|-------------|-------------|----------------|-----------|
| R1 | * | * | * | * | * | 40% |

Agent default forward: 40%

This is functionally identical to the existing behavior.

Migration steps:

  1. For each agent with a bbook_percentage, create a forwarding_matrix_rules row with all wildcards and forward_percentage = (100 - bbook_percentage)
  2. Set the agent's default_forward_percentage to the same value
  3. Mark the migration as MIGRATED_FROM_FLAT in the agent's config for auditability
  4. The agent's existing behavior is completely unchanged

Handling Open Positions During Cutover

Open positions (bets placed before migration, not yet settled) must continue to work correctly under the new system.

Rule: Open positions are NOT re-routed. A bet that was placed under the old system retains its original routing. The new cascade engine only applies to new bets. This means:

  1. Before cutover: freeze the list of all open bet IDs
  2. During cutover: deploy the new code with the forwarding matrix enabled (behind feature flag)
  3. After cutover: new bets go through the cascade engine; open bets settle using the positions that were created under the old system
  4. Once all pre-cutover bets settle (typically within 1-3 days for cricket), the old routing logic can be removed

Parallel-Run Mode

Before switching any agent to the new cascade engine, run both engines in parallel and compare results:

PARALLEL-RUN MODE
==================

1. A bet arrives for an agent with parallel_run_mode = true

2. EXECUTE on OLD engine:
- Route using flat bbook_percentage
- CREATE real positions (this is what actually runs)
- Record the routing decision as old_routing

3. EXECUTE on NEW engine (shadow mode):
- Route using forwarding matrix + cascade
- DO NOT create positions (shadow only)
- Record the routing decision as new_routing

4. COMPARE:
- Did the new engine produce the same retained amount for this agent? (for migrated flat configs, it should)
- Did the cascade produce valid routing for upline agents?
- Did any limit checks differ?
- Record comparison result

5. REPORT:
- Daily comparison report: X% of bets had identical routing, Y% differed
- For differing bets, show why (new limits kicked in, matrix rule difference, etc.)
- When 100% agreement for 3 consecutive days: agent is ready for cutover

Per-Agent Rollback Plan

Each agent can be individually rolled back from the new cascade engine to the old flat routing:

  1. Disable feature flag bbook.cascading_routing.enabled for the specific agent
  2. New bets immediately revert to flat bbook_percentage routing
  3. Positions created by the cascade engine remain valid and settle normally
  4. The agent's forwarding matrix remains in the database (not deleted) for future re-enablement

Rollback does NOT require:

  • Database migration reversal
  • Deployment of old code
  • Reprocessing of any existing bets

Data Migration for Historical Positions and Settlements

Historical positions and settlements from the old system must be migrated to the new schema so that reporting and reconciliation work across the cutover boundary.

Old System DataMigration TargetMapping
Old position (flat)positions table with cascade_level = 1One position becomes one L1 position + one forwarded position at platform level
Old settlementsettlements table with migration_source = 'v1'Direct mapping, no restructuring needed
Old agent configforwarding_matrix_rules + agent_limitsAs described in mapping section above
Old bet recordbets table with routing_engine = 'v1'Direct copy with engine version flag

Migration is non-destructive: Old tables are renamed with _v1 suffix but NOT dropped until 90 days after successful cutover with no issues.


45. Support Tooling for Dispute Resolution (MEDIUM)

Bet Lookup

The support dashboard provides multiple paths to find a bet:

Lookup MethodUse CaseQuery
By bet ID"Show me bet XYZ"Direct primary key lookup
By user + time range"Show me Amit's bets last night"user_id + created_at range
By agent + time range"Show me all bets under Rajesh today"agent_id + created_at range
By event"Show me all bets on MI vs CSK"event_id lookup
By amount"Show me all bets over ₹1 lakh today"stake > threshold + created_at range
By status"Show me all unsettled bets from yesterday"status=OPEN + created_at range

Audit Trail Visualization

For each bet, the support dashboard renders the cascade as a visual flow:

AUDIT TRAIL VISUALIZATION - Bet bet_a1b2c3d4
==============================================

Amit places ₹50,000 on MI at 1.85

┌──────────────────────────────────────────────────────┐
│ AMIT (Punter) │
│ Stake: ₹50,000 → Reduced to ₹50,000 (no reduction) │
│ Win cap check: ₹42,500 < ₹50,000 limit ✓ │
└──────────────────────┬───────────────────────────────┘
│ ₹50,000

┌──────────────────────────────────────────────────────┐
│ RAJESH (Level 1) │
│ Matrix rule: R3 (MATCH_ODDS + CRICKET + PRE_MATCH) │
│ Forward: 40% → Retains: ₹30,000 │
│ Limit check: Cricket ₹12.3L/₹50L (24.6%) ✓ │
│ Limit check: Match ₹1.5L/₹5L (30%) ✓ │
│ Limit check: Night ₹3.3L/₹10L (33%) ✓ │
│ Overflow: ₹0 │
│ RETAINED: ₹30,000 (₹25,500 liability) │
└──────────────────────┬───────────────────────────────┘
│ ₹20,000

┌──────────────────────────────────────────────────────┐
│ VIKRAM (Level 2) │
│ Matrix rule: V2 (CRICKET + PRE_MATCH) │
│ Forward: 40% → Retains: ₹12,000 │
│ Source type: NORMAL (own classification) │
│ Limit checks: All ✓ │
│ RETAINED: ₹12,000 (₹10,200 liability) │
└──────────────────────┬───────────────────────────────┘
│ ₹8,000

┌──────────────────────────────────────────────────────┐
│ PLATFORM (Level 3) │
│ Retained: ₹4,000 │
│ Hedged: ₹4,000 → Betfair order hedge_x1y2z3 │
│ Hedge status: FILLED at 1.86 (slippage 0.01) │
└──────────────────────────────────────────────────────┘

Timeline:
21:47:12.001 Bet received
21:47:12.004 Win cap check passed
21:47:12.014 Matrix resolved (R3, specificity 3)
21:47:12.025 Rajesh limits checked
21:47:12.036 Vikram limits checked
21:47:12.048 Platform processed
21:47:12.063 All positions created
21:47:12.068 Audit record written
21:47:12.070 Response sent to Amit
Total: 69ms

21:47:12.085 Hedge order queued
21:47:13.200 Hedge order filled on Betfair

Re-Simulate Capability

The "re-simulate" button allows a support agent to replay a bet with the configuration state as it existed at the time of the bet:

  1. Load the forwarding matrix rules that were active at the bet's timestamp (using the versioned config)
  2. Load the exposure ledger state as it existed just before the bet (from the audit record)
  3. Run the cascade engine with these inputs
  4. Display the result alongside the actual result
  5. If they match: the system behaved correctly
  6. If they differ: flag the discrepancy for engineering investigation

Dispute Workflow

StageDescriptionActions Available
OPENDispute filed by agent or userAssign to support agent, set priority, link to bet(s)
INVESTIGATINGSupport agent reviewingView audit trail, re-simulate bet, compare ledgers, add notes
PENDING_AGENTWaiting for agent to provide informationSend request to agent, set response deadline
PENDING_ENGINEERINGRequires engineering investigationEscalate to engineering, provide all context
RESOLVED_CORRECTSystem was correct, dispute dismissedDocument finding, notify all parties
RESOLVED_CORRECTIONSystem was wrong, correction appliedApply financial correction, update ledgers, notify all parties
RESOLVED_GOODWILLSystem was correct, but goodwill credit givenApply credit, document reason, notify agent

46. Responsible Gambling Controls (MEDIUM)

Self-Exclusion Mechanism

A punter can self-exclude for a configurable duration (24 hours, 7 days, 30 days, 6 months, permanent). Self-exclusion is the FIRST check in the bet processing pipeline.

Self-Exclusion ParameterDescription
Duration options24h, 7d, 30d, 6m, permanent
Cooling-off periodCannot reverse self-exclusion for the first 24 hours
ScopeAll betting across all agents (cannot bet through any path)
ImplementationRedis flag checked before ANY processing (sub-millisecond check)
Admin overrideOnly permanent exclusions can be lifted by admin after 6 months, with verification

Deposit Limits

Limit TypeDescriptionWhere in Pipeline
Daily deposit limitMaximum deposit in 24 hoursPayment service (before funds reach betting wallet)
Weekly deposit limitMaximum deposit in 7 daysPayment service
Monthly deposit limitMaximum deposit in 30 daysPayment service

Deposit limits are NOT part of the bet processing pipeline. They are enforced at the payment layer. However, the B-Book system must be aware of them for display purposes (showing the user their remaining deposit capacity on the dashboard).

Session Time Limits

FeatureDescriptionImplementation
Session duration limitMaximum continuous session time (configurable, default 4 hours)WebSocket/session middleware sends warning at 80% of limit, auto-logs out at 100%
Mandatory breakMinimum break duration after session limit reached (default 15 minutes)Session creation blocked for break_duration after limit-triggered logout
Activity trackerTrack time since last break, number of bets placed, amount wageredIn-memory per-session counter, persisted to DB every 5 minutes

Reality Check Notifications

Reality check notifications are periodic messages that remind the punter of their activity during the session.

TriggerMessage ContentDelivery
Every 60 minutes of play"You have been playing for 1 hour. Total bets: 23. Net result: -₹4,200."In-app popup, requires acknowledgment to continue
After 10 consecutive losses"You have had 10 consecutive losses. Consider taking a break."In-app popup
After ₹50,000 total loss in session"You have lost ₹50,000 this session. Your daily deposit limit is ₹1,00,000."In-app popup with option to self-exclude
Approaching deposit limit"You have deposited ₹80,000 of your ₹1,00,000 daily limit."In-app notification

Where These Hooks Go in the Bet Flow Pipeline

Steps 0, 0.5, and 2 are the responsible gambling checkpoints. They add minimal latency (sub-millisecond for Redis flag checks, zero latency if no check is triggered) but ensure that gambling controls are enforced before any money flows.




Part III: Complete Implementation Architecture

The following sections provide the complete implementation specification for the entire Hannibal B-Book system. Every database table, every API endpoint, every pipeline step, every error case, and every deployment detail is documented. An LLM or developer reading this can build the entire system without asking a single question about design intent, data models, or processing logic.


47. Technology Stack (Confirmed)

Core Technologies

TechnologyVersionPurposeWhy This Choice
Node.js20 LTSApplication runtimeEvent-loop model handles high concurrency with low overhead. The team already has expertise. Non-blocking I/O is ideal for the many Redis and DB calls in the bet pipeline.
TypeScript5.xLanguageType safety prevents entire categories of financial bugs (wrong types for money, missing fields). Domain types (Stake, Liability, ForwardPercentage) enforce correctness at compile time.
PostgreSQL16Primary databaseACID transactions for financial data. FOR UPDATE locking for limit enforcement. Partitioning for scaling. JSONB for flexible audit payloads.
Prisma5.xORMType-safe database access. Schema-as-code for migrations. Works with PostgreSQL partitioning through raw queries where needed.
Redis7.xCache, counters, queues, pub/subSub-millisecond reads for exposure checks. Atomic INCRBY for sharded counters. Streams for hedge order queue. Pub/sub for cache invalidation.
DockerLatestContainerizationConsistent environments across development, staging, production. Docker Compose for local development.

Additional Technologies

TechnologyPurposeWhy
Bull (BullMQ)Job queue for background tasksSettlement processing, reconciliation jobs, audit tier migration, hedge retry. Built on Redis. Supports delayed jobs, retries, priority queues.
Prometheus + GrafanaMetrics and dashboardsIndustry standard for monitoring. Prometheus scrapes application metrics. Grafana renders dashboards. AlertManager handles alert routing.
PinoStructured loggingFast JSON logger for Node.js. Structured logs are queryable. Low overhead even at high throughput.
ZodRuntime validationValidates all API inputs and configuration at runtime. Complements TypeScript compile-time types with runtime safety.
Socket.IOWebSocket connectionsReal-time dashboard updates to agents. Push notifications for alerts. Session management for responsible gambling.
node-cronScheduled jobsPeriod rollovers, reconciliation scheduling, audit tier migration.
Helmet + corsHTTP securityStandard security headers. CORS configuration for dashboard frontend.
prom-clientPrometheus metricsNative Prometheus metrics collection for Node.js. Histograms, counters, gauges.

Not Included (and Why)

TechnologyWhy Not
KafkaOverkill for current throughput (167 bets/sec). Redis Streams provide sufficient queue functionality with simpler operations. Reconsider at 1000+ bets/sec.
MongoDBFinancial data requires ACID transactions and relational integrity. PostgreSQL provides both.
GraphQLThe API consumers (dashboard frontend, mobile, WhatsApp bot) all have well-defined data needs. REST is simpler, faster, and sufficient.
Microservices (separate deployments per service)For the first season, a modular monolith is simpler to deploy, debug, and operate. Services are separated in code (modules) but deployed as one application. Extract into microservices only when a specific module needs independent scaling.

48. System Architecture Overview

Complete System Diagram

Communication Patterns

FromToMethodPattern
Client → APIREST APIHTTP/JSONRequest-response, auth via JWT
Client → DashboardWebSocketSocket.IOReal-time push for exposure updates, alerts
API → Bet ProcessingFunction callIn-processSynchronous (same monolith)
Bet Processing → Cascade EngineFunction callIn-processSynchronous
Bet Processing → Hedge QueueRedis StreamAsync publishFire-and-forget from bet pipeline
Hedge Worker → BetfairHTTPREST APIRate-limited, with retry
Settlement → Bet ProcessingBullMQ JobAsyncSettlement jobs queued when events settle
Config Change → All InstancesRedis Pub/SubAsync broadcastCache invalidation messages
Reconciliation → AlertBullMQ JobAsyncDiscrepancy alerts queued for processing

Deployment Topology (Production)

PRODUCTION DEPLOYMENT
======================

Application Instances: 3 (behind load balancer)
- Each runs the full modular monolith
- Agent-affinity routing via consistent hash on agent_id header
- Each instance: 2 vCPU, 4 GB RAM

Background Workers: 2
- Instance 4: Settlement worker + Reconciliation worker
- Instance 5: Hedge worker + Audit migration worker + Sharp detection worker
- Each instance: 2 vCPU, 2 GB RAM

PostgreSQL:
- Primary: 4 vCPU, 16 GB RAM, 500 GB SSD
- Read Replica 1: 2 vCPU, 8 GB RAM (dashboard)
- Read Replica 2: 2 vCPU, 8 GB RAM (reporting)
- Audit DB: 2 vCPU, 4 GB RAM, 200 GB HDD (append-only)

Redis:
- Primary: 2 vCPU, 4 GB RAM
- Replica: 1 vCPU, 2 GB RAM (read-only)

Load Balancer: nginx or cloud ALB
Monitoring: Prometheus + Grafana (1 instance)

49. Database Schema Design

Entity-Relationship Diagram

Table: agents

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY, DEFAULT gen_random_uuid()Unique agent identifier
external_idVARCHAR(100)UNIQUE, NOT NULLHuman-readable agent ID (e.g., rajesh_mumbai)
nameVARCHAR(255)NOT NULLAgent display name
parent_agent_idUUIDREFERENCES agents(id), NULLABLEUpline agent. NULL for top-level agents and platform
levelINTEGERNOT NULLHierarchy depth. 0 = platform, 1 = master agent, 2 = sub-agent, etc.
statusVARCHAR(20)NOT NULL, DEFAULT 'ACTIVE'ACTIVE, SUSPENDED, TRANSFERRING, DEACTIVATED
timezoneVARCHAR(50)NOT NULL, DEFAULT 'Asia/Kolkata'Agent's local timezone (IANA format)
default_forward_percentageDECIMAL(5,2)NOT NULL, DEFAULT 50.00Fallback forward % when no matrix rule matches
night_period_startTIMENULLABLENight period start in local time
night_period_endTIMENULLABLENight period end in local time
weekly_period_start_dayINTEGERNOT NULL, DEFAULT 11=Monday, 7=Sunday
tierVARCHAR(20)NOT NULL, DEFAULT 'TIER_1'TIER_1, TIER_2, TIER_3 (UX experience tier)
is_platformBOOLEANNOT NULL, DEFAULT falseTrue for the single platform agent
platform_retain_percentageDECIMAL(5,2)NULLABLEOnly for platform: % to retain vs hedge
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()
updated_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • idx_agents_parent on (parent_agent_id)
  • idx_agents_status on (status)
  • idx_agents_external_id on (external_id) -- UNIQUE

Table: agent_limits

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
agent_idUUIDREFERENCES agents(id), NOT NULL
limit_typeVARCHAR(30)NOT NULLSPORT, MARKET, NIGHT_PERIOD, WEEKLY_PERIOD
sport_typeVARCHAR(30)NULLABLECRICKET, FOOTBALL, TENNIS, KABADDI, etc. NULL for period limits that apply to all sports
event_idVARCHAR(100)NULLABLEOnly for MARKET type limits. The specific event/market ID
limit_amountBIGINTNOT NULLLimit in paisa (1 lakh = 10,000,000 paisa)
is_activeBOOLEANNOT NULL, DEFAULT true
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()
updated_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • idx_agent_limits_agent_type on (agent_id, limit_type)
  • idx_agent_limits_agent_sport on (agent_id, sport_type)
  • UNIQUE on (agent_id, limit_type, sport_type, event_id) to prevent duplicate limits

Table: forwarding_matrix_rules

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
agent_idUUIDREFERENCES agents(id), NOT NULL
versionINTEGERNOT NULLIncremented on every change. Used for audit snapshot
market_typeVARCHAR(30)NOT NULL, DEFAULT '*'MATCH_ODDS, FANCY, BOOKMAKER, OVER_UNDER, LINE, or *
sport_typeVARCHAR(30)NOT NULL, DEFAULT '*'CRICKET, FOOTBALL, TENNIS, KABADDI, or *
event_phaseVARCHAR(30)NOT NULL, DEFAULT '*'PRE_MATCH, IN_PLAY, APPROACHING_START, or *
source_typeVARCHAR(30)NOT NULL, DEFAULT '*'NORMAL, SHARP, VIP, NEW_ACCOUNT, or *
liquidity_bandVARCHAR(30)NOT NULL, DEFAULT '*'HIGH, MEDIUM, LOW, NONE, or *
forward_percentageDECIMAL(5,2)NOT NULL, CHECK (0 <= forward_percentage <= 100)Percentage to forward to upline
specificityINTEGERNOT NULL, GENERATED ALWAYS AS (computed)Count of non-wildcard dimensions (0-5). Stored for fast sorting
priorityINTEGERNOT NULL, DEFAULT 0Tie-breaker when specificity and forward_percentage are equal
is_activeBOOLEANNOT NULL, DEFAULT trueSoft delete / disable
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()Used for deterministic ordering tie-break
updated_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • idx_fmr_agent_version on (agent_id, version)
  • idx_fmr_agent_active on (agent_id, is_active) WHERE is_active = true
  • idx_fmr_lookup on (agent_id, market_type, sport_type, event_phase, source_type, liquidity_band) WHERE is_active = true

Note: The specificity column is computed as the count of dimensions that are NOT ''. For example, a rule with market_type=FANCY, sport_type=CRICKET, event_phase=IN_PLAY, source_type=, liquidity_band=* has specificity 3.

Table: users

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
external_idVARCHAR(100)UNIQUE, NOT NULLUser ID from the main platform
agent_idUUIDREFERENCES agents(id), NOT NULLThe agent this user belongs to
nameVARCHAR(255)NOT NULL
per_click_win_limitBIGINTNOT NULL, DEFAULT 5000000In paisa. Default ₹50,000
aggregate_win_limit_dailyBIGINTNOT NULL, DEFAULT 20000000In paisa. Default ₹2,00,000
min_stakeBIGINTNOT NULL, DEFAULT 10000In paisa. Default ₹100
self_exclusion_untilTIMESTAMPTZNULLABLENULL if not excluded
session_time_limit_minutesINTEGERNOT NULL, DEFAULT 240Default 4 hours
deposit_limit_dailyBIGINTNULLABLEIn paisa
deposit_limit_weeklyBIGINTNULLABLEIn paisa
deposit_limit_monthlyBIGINTNULLABLEIn paisa
statusVARCHAR(20)NOT NULL, DEFAULT 'ACTIVE'ACTIVE, SUSPENDED, SELF_EXCLUDED
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()
updated_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • idx_users_agent on (agent_id)
  • idx_users_external on (external_id) -- UNIQUE
  • idx_users_status on (status)

Table: user_overrides

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
user_idUUIDREFERENCES users(id), NOT NULL
agent_idUUIDREFERENCES agents(id), NOT NULLThe agent applying this override
forward_percentageDECIMAL(5,2)NOT NULLOverride forward % for this user at this agent
reasonTEXTNOT NULLWhy the override was set (e.g., "known sharp user")
created_byUUIDNOT NULLAdmin or agent who set the override
is_activeBOOLEANNOT NULL, DEFAULT true
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()
expires_atTIMESTAMPTZNULLABLEOptional expiry

Indexes:

  • UNIQUE on (user_id, agent_id) WHERE is_active = true
  • idx_user_overrides_agent on (agent_id)

Table: user_classifications

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
user_idUUIDREFERENCES users(id), NOT NULL
agent_idUUIDREFERENCES agents(id), NOT NULLThe agent making this classification
classificationVARCHAR(30)NOT NULLNORMAL, SHARP, VIP, NEW_ACCOUNT
confidence_scoreDECIMAL(5,4)NULLABLE0.0000 to 1.0000 for ML-based classifications
reasonTEXTNULLABLEWhy classified (manual note or detection signal)
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()
updated_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • UNIQUE on (user_id, agent_id)
  • idx_user_class_agent on (agent_id, classification)

Table: agent_trust_config

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
agent_idUUIDREFERENCES agents(id), NOT NULLThe upline agent
sub_agent_idUUIDREFERENCES agents(id), NOT NULLThe downstream agent
trust_downstream_flagsBOOLEANNOT NULL, DEFAULT falseWhether to trust the sub-agent's user classifications
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()
updated_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • UNIQUE on (agent_id, sub_agent_id)

Table: market_overrides

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
agent_idUUIDREFERENCES agents(id), NOT NULL
event_idVARCHAR(100)NOT NULLThe specific event/market
forward_percentageDECIMAL(5,2)NOT NULLOverride forward % for this event
reasonTEXTNOT NULL
created_byUUIDNOT NULL
is_activeBOOLEANNOT NULL, DEFAULT true
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()
expires_atTIMESTAMPTZNULLABLE

Indexes:

  • UNIQUE on (agent_id, event_id) WHERE is_active = true

Table: events

ColumnTypeConstraintsDescription
idVARCHAR(100)PRIMARY KEYExternal event ID from odds provider
sport_typeVARCHAR(30)NOT NULL
nameVARCHAR(500)NOT NULL"MI vs CSK, IPL 2026"
start_timeTIMESTAMPTZNOT NULL
statusVARCHAR(20)NOT NULL, DEFAULT 'UPCOMING'UPCOMING, LIVE, SUSPENDED, SETTLED, VOID
resultJSONBNULLABLESettlement result data
settled_atTIMESTAMPTZNULLABLE
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • idx_events_sport_status on (sport_type, status)
  • idx_events_start_time on (start_time)

Table: markets

ColumnTypeConstraintsDescription
idVARCHAR(100)PRIMARY KEYExternal market ID
event_idVARCHAR(100)REFERENCES events(id), NOT NULL
market_typeVARCHAR(30)NOT NULLMATCH_ODDS, FANCY, BOOKMAKER, etc.
nameVARCHAR(255)NOT NULL
statusVARCHAR(20)NOT NULL, DEFAULT 'OPEN'OPEN, SUSPENDED, CLOSED, SETTLED, VOID
settled_atTIMESTAMPTZNULLABLE
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • idx_markets_event on (event_id)
  • idx_markets_status on (status)

Table: bets (partitioned by month on created_at)

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
user_idUUIDNOT NULLREFERENCES users(id) -- enforced at application level due to partitioning
agent_idUUIDNOT NULLThe originating agent (Level 1)
event_idVARCHAR(100)NOT NULL
market_idVARCHAR(100)NOT NULL
selectionVARCHAR(255)NOT NULLWhat the punter bet on (e.g., "MI to win")
sideVARCHAR(10)NOT NULLBACK or LAY
requested_stakeBIGINTNOT NULLOriginal stake in paisa
accepted_stakeBIGINTNOT NULLAfter stake reduction, in paisa
oddsDECIMAL(10,4)NOT NULLDecimal odds
potential_winBIGINTNOT NULLIn paisa
liabilityBIGINTNOT NULLIn paisa
stake_reduction_reasonVARCHAR(50)NULLABLEPER_CLICK_LIMIT, AGGREGATE_LIMIT, null
market_typeVARCHAR(30)NOT NULL
sport_typeVARCHAR(30)NOT NULL
event_phaseVARCHAR(30)NOT NULLPRE_MATCH, IN_PLAY
source_typeVARCHAR(30)NOT NULLNORMAL, SHARP, VIP, etc. (as resolved at originating agent)
liquidity_bandVARCHAR(30)NOT NULLHIGH, MEDIUM, LOW, NONE
routing_engineVARCHAR(10)NOT NULL, DEFAULT 'v2'v1 (legacy flat) or v2 (cascade)
routing

I'll continue from where I was cut off in the database schema design, picking up from the bets table.

Table: bets (continued)

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
user_idUUIDNOT NULLThe punter who placed the bet
agent_idUUIDNOT NULLThe originating agent (Level 1)
event_idVARCHAR(100)NOT NULL
market_idVARCHAR(100)NOT NULL
selectionVARCHAR(255)NOT NULLWhat the punter bet on (e.g., "MI to win")
sideVARCHAR(10)NOT NULLBACK or LAY
requested_stakeBIGINTNOT NULLOriginal stake in paisa
accepted_stakeBIGINTNOT NULLAfter stake reduction, in paisa
oddsDECIMAL(10,4)NOT NULLDecimal odds
potential_winBIGINTNOT NULLIn paisa
liabilityBIGINTNOT NULLIn paisa
stake_reduction_reasonVARCHAR(50)NULLABLEPER_CLICK_LIMIT, AGGREGATE_LIMIT, null
market_typeVARCHAR(30)NOT NULL
sport_typeVARCHAR(30)NOT NULL
event_phaseVARCHAR(30)NOT NULLPRE_MATCH, IN_PLAY
source_typeVARCHAR(30)NOT NULLAs resolved at originating agent
liquidity_bandVARCHAR(30)NOT NULLHIGH, MEDIUM, LOW, NONE
routing_engineVARCHAR(10)NOT NULL, DEFAULT 'v2'v1 (legacy) or v2 (cascade)
routing_statusVARCHAR(20)NOT NULL, DEFAULT 'COMPLETE'COMPLETE, PARTIAL, FAILED
matrix_version_snapshotINTEGERNOT NULLThe matrix version used at time of routing
statusVARCHAR(20)NOT NULL, DEFAULT 'OPEN'OPEN, SETTLED, VOID, CANCELLED
settled_atTIMESTAMPTZNULLABLE
period_contextVARCHAR(20)NOT NULLNIGHT, DAY, or specific period identifier
total_processing_time_msINTEGERNOT NULLEnd-to-end latency
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()Partition key

Indexes:

  • idx_bets_user_time on (user_id, created_at DESC)
  • idx_bets_agent_time on (agent_id, created_at DESC)
  • idx_bets_event on (event_id)
  • idx_bets_market on (market_id)
  • idx_bets_status on (status) WHERE status = 'OPEN'
  • idx_bets_routing_status on (routing_status) WHERE routing_status != 'COMPLETE'

Table: positions (partitioned by month on created_at)

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
bet_idUUIDNOT NULLThe originating bet
agent_idUUIDNOT NULLThe agent holding this position
cascade_levelINTEGERNOT NULL1 = first agent, 2 = upline, etc.
position_typeVARCHAR(20)NOT NULLRETAINED or FORWARDED
stakeBIGINTNOT NULLStake portion in paisa
liabilityBIGINTNOT NULLLiability portion in paisa
potential_winBIGINTNOT NULLPotential win portion in paisa
forward_percentage_usedDECIMAL(5,2)NOT NULLThe forward % that produced this split
forward_sourceVARCHAR(30)NOT NULLUSER_OVERRIDE, MARKET_OVERRIDE, MATRIX_RULE, AGENT_DEFAULT
matrix_rule_idUUIDNULLABLEThe specific matrix rule that matched (if MATRIX_RULE)
overflow_amountBIGINTNOT NULL, DEFAULT 0How much of this was overflow from limit breach
event_idVARCHAR(100)NOT NULLDenormalized for fast queries
market_idVARCHAR(100)NOT NULLDenormalized
sport_typeVARCHAR(30)NOT NULLDenormalized
selectionVARCHAR(255)NOT NULLDenormalized
sideVARCHAR(10)NOT NULLDenormalized
oddsDECIMAL(10,4)NOT NULLDenormalized
statusVARCHAR(20)NOT NULL, DEFAULT 'OPEN'OPEN, SETTLED, VOID
settlement_idUUIDNULLABLELink to settlement record
settled_amountBIGINTNULLABLEActual P&L in paisa (positive = agent profit, negative = agent loss)
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()Partition key

Indexes:

  • idx_positions_bet on (bet_id)
  • idx_positions_agent_status on (agent_id, status) WHERE status = 'OPEN'
  • idx_positions_agent_event on (agent_id, event_id) WHERE status = 'OPEN'
  • idx_positions_agent_sport on (agent_id, sport_type) WHERE status = 'OPEN'
  • idx_positions_event_status on (event_id, status)
  • idx_positions_market_status on (market_id, status)

Table: exposure_ledger

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
agent_idUUIDNOT NULL
scope_typeVARCHAR(30)NOT NULLSPORT, MARKET, NIGHT_PERIOD, WEEKLY_PERIOD
scope_keyVARCHAR(200)NOT NULLe.g., "cricket", "mi_vs_csk_2026_03_15", "night_2026_03_15", "week_2026_11"
shard_indexINTEGERNOT NULL, DEFAULT 00 to N-1 for sharded counters
retained_open_liabilityBIGINTNOT NULL, DEFAULT 0In paisa
forwarded_open_liabilityBIGINTNOT NULL, DEFAULT 0In paisa
open_potential_winBIGINTNOT NULL, DEFAULT 0In paisa
no_new_risk_activeBOOLEANNOT NULL, DEFAULT falseWhether this scope is in NO_NEW_RISK
no_new_risk_triggered_atTIMESTAMPTZNULLABLEWhen NO_NEW_RISK was activated
last_updated_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • UNIQUE on (agent_id, scope_type, scope_key, shard_index)
  • idx_exposure_agent_scope on (agent_id, scope_type, scope_key)
  • idx_exposure_no_new_risk on (no_new_risk_active) WHERE no_new_risk_active = true

Table: settlements (partitioned by month on created_at)

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
event_idVARCHAR(100)NOT NULL
market_idVARCHAR(100)NOT NULL
position_idUUIDNOT NULLThe position being settled
agent_idUUIDNOT NULL
bet_idUUIDNOT NULL
settlement_typeVARCHAR(20)NOT NULLWIN, LOSS, VOID, PUSH
stakeBIGINTNOT NULLThe position's stake
payoutBIGINTNOT NULLAmount paid to/from agent. Positive = agent pays punter. Negative = agent receives.
profit_lossBIGINTNOT NULLAgent P&L. Positive = profit, negative = loss
idempotency_keyVARCHAR(200)UNIQUE, NOT NULLPrevents double settlement: {position_id}_{event_result_hash}
statusVARCHAR(20)NOT NULL, DEFAULT 'COMPLETED'COMPLETED, REVERSED, RE_SETTLED
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • idx_settlements_event on (event_id)
  • idx_settlements_agent_time on (agent_id, created_at DESC)
  • idx_settlements_bet on (bet_id)
  • idx_settlements_position on (position_id)
  • idx_settlements_idempotency on (idempotency_key) -- UNIQUE

Table: audit_trail (partitioned by month on created_at, separate schema)

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
bet_idUUIDNOT NULL
record_typeVARCHAR(30)NOT NULLBET_PLACED, BET_SETTLED, POSITION_CORRECTED, CONFIG_CHANGED, RECOMPUTE
agent_idUUIDNOT NULLPrimary agent for this record
user_idUUIDNULLABLE
event_idVARCHAR(100)NULLABLE
payloadJSONBNOT NULLFull structured audit data (forwarding chain, limit checks, etc.)
checksumVARCHAR(64)NOT NULLSHA-256 of payload
previous_checksumVARCHAR(64)NULLABLEChecksum of previous record for this bet_id (chain)
processing_time_msINTEGERNULLABLE
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()Partition key

Indexes:

  • idx_audit_bet on (bet_id)
  • idx_audit_agent_time on (agent_id, created_at DESC)
  • idx_audit_user_time on (user_id, created_at DESC) WHERE user_id IS NOT NULL
  • idx_audit_event on (event_id) WHERE event_id IS NOT NULL
  • idx_audit_type on (record_type, created_at DESC)

Table: dead_letter_queue

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
sourceVARCHAR(50)NOT NULLBET_PROCESSING, SETTLEMENT, HEDGE, RECONCILIATION
reference_idUUIDNOT NULLThe bet_id, settlement_id, etc. that failed
error_messageTEXTNOT NULL
error_stackTEXTNULLABLE
payloadJSONBNOT NULLFull context for retry
retry_countINTEGERNOT NULL, DEFAULT 0
max_retriesINTEGERNOT NULL, DEFAULT 3
statusVARCHAR(20)NOT NULL, DEFAULT 'PENDING'PENDING, RETRYING, RESOLVED, ESCALATED
resolved_byUUIDNULLABLEAdmin who resolved
resolved_atTIMESTAMPTZNULLABLE
resolution_notesTEXTNULLABLE
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()
updated_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • idx_dlq_status on (status) WHERE status IN ('PENDING', 'RETRYING')
  • idx_dlq_source on (source, created_at DESC)
  • idx_dlq_reference on (reference_id)

Table: hedge_orders

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
bet_idUUIDNOT NULLOriginating bet
event_idVARCHAR(100)NOT NULL
market_idVARCHAR(100)NOT NULL
selectionVARCHAR(255)NOT NULL
sideVARCHAR(10)NOT NULLBACK or LAY
target_priceDECIMAL(10,4)NOT NULLPrice the punter received
limit_priceDECIMAL(10,4)NOT NULLWorst acceptable price
requested_amountBIGINTNOT NULLIn paisa
filled_amountBIGINTNOT NULL, DEFAULT 0In paisa
unfilled_amountBIGINTGENERATED ALWAYS AS (requested_amount - filled_amount) STORED
average_fill_priceDECIMAL(10,4)NULLABLEWeighted average of all fills
slippageDECIMAL(10,4)NULLABLEaverage_fill_price - target_price
betfair_bet_idVARCHAR(100)NULLABLEBetfair's order reference
statusVARCHAR(20)NOT NULL, DEFAULT 'QUEUED'QUEUED, PENDING, PARTIALLY_FILLED, FILLED, CANCELLED, STALE_CANCELLED, FAILED
reprice_countINTEGERNOT NULL, DEFAULT 0
max_reprice_attemptsINTEGERNOT NULL, DEFAULT 3
time_in_force_secondsINTEGERNOT NULL
error_messageTEXTNULLABLE
queued_atTIMESTAMPTZNOT NULL, DEFAULT NOW()
sent_atTIMESTAMPTZNULLABLEWhen sent to Betfair
filled_atTIMESTAMPTZNULLABLEWhen fully filled
cancelled_atTIMESTAMPTZNULLABLE
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • idx_hedge_bet on (bet_id)
  • idx_hedge_status on (status) WHERE status IN ('QUEUED', 'PENDING', 'PARTIALLY_FILLED')
  • idx_hedge_event on (event_id)
  • idx_hedge_betfair on (betfair_bet_id) WHERE betfair_bet_id IS NOT NULL

Table: agent_hierarchy_history

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
agent_idUUIDNOT NULL
previous_parent_idUUIDNULLABLE
new_parent_idUUIDNULLABLE
change_typeVARCHAR(30)NOT NULLCREATED, PARENT_CHANGED, SUSPENDED, REACTIVATED, DEACTIVATED
changed_byUUIDNOT NULLAdmin who made the change
reasonTEXTNULLABLE
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • idx_ahh_agent on (agent_id, created_at DESC)

Table: reconciliation_results

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
reconciliation_typeVARCHAR(30)NOT NULLSCHEDULED_15MIN, POST_SETTLEMENT, FULL, TARGETED
started_atTIMESTAMPTZNOT NULL
completed_atTIMESTAMPTZNULLABLE
agents_checkedINTEGERNOT NULL, DEFAULT 0
discrepancies_foundINTEGERNOT NULL, DEFAULT 0
statusVARCHAR(20)NOT NULLRUNNING, COMPLETED, FAILED
summaryJSONBNULLABLE
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Table: reconciliation_discrepancies

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
reconciliation_idUUIDREFERENCES reconciliation_results(id), NOT NULL
agent_idUUIDNOT NULL
scope_typeVARCHAR(30)NOT NULL
scope_keyVARCHAR(200)NOT NULL
ledger_valueBIGINTNOT NULLIn paisa
computed_valueBIGINTNOT NULLIn paisa
drift_amountBIGINTNOT NULLIn paisa
drift_directionVARCHAR(15)NOT NULLLEDGER_HIGH, LEDGER_LOW
categoryVARCHAR(10)NOT NULLMINOR, MAJOR, CRITICAL
resolution_statusVARCHAR(20)NOT NULL, DEFAULT 'OPEN'OPEN, INVESTIGATING, RESOLVED, AUTO_CORRECTED
root_causeTEXTNULLABLE
resolved_byUUIDNULLABLE
resolved_atTIMESTAMPTZNULLABLE
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • idx_recon_disc_status on (resolution_status) WHERE resolution_status IN ('OPEN', 'INVESTIGATING')
  • idx_recon_disc_agent on (agent_id, created_at DESC)

Table: alerts

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
alert_typeVARCHAR(50)NOT NULLe.g., NO_NEW_RISK_ACTIVATED, LIMIT_APPROACHING, BETFAIR_DEGRADED
severityVARCHAR(5)NOT NULLP1, P2, P3, P4
agent_idUUIDNULLABLE
titleVARCHAR(255)NOT NULL
descriptionTEXTNOT NULL
metadataJSONBNULLABLEAdditional context
statusVARCHAR(20)NOT NULL, DEFAULT 'ACTIVE'ACTIVE, ACKNOWLEDGED, RESOLVED, SILENCED
acknowledged_byUUIDNULLABLE
acknowledged_atTIMESTAMPTZNULLABLE
resolved_atTIMESTAMPTZNULLABLE
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • idx_alerts_status on (status) WHERE status = 'ACTIVE'
  • idx_alerts_agent on (agent_id, created_at DESC)
  • idx_alerts_severity on (severity, created_at DESC)

Table: config_changelog

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
entity_typeVARCHAR(30)NOT NULLMATRIX_RULE, AGENT_LIMIT, USER_OVERRIDE, MARKET_OVERRIDE, AGENT_CONFIG
entity_idUUIDNOT NULL
agent_idUUIDNOT NULL
change_typeVARCHAR(20)NOT NULLCREATED, UPDATED, DELETED
old_valueJSONBNULLABLEPrevious state
new_valueJSONBNOT NULLNew state
changed_byUUIDNOT NULL
reasonTEXTNULLABLE
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • idx_config_changelog_entity on (entity_type, entity_id, created_at DESC)
  • idx_config_changelog_agent on (agent_id, created_at DESC)

Table: feature_flags

ColumnTypeConstraintsDescription
idUUIDPRIMARY KEY
flag_nameVARCHAR(100)NOT NULLe.g., bbook.cascading_routing.enabled
scopeVARCHAR(20)NOT NULLGLOBAL, PER_AGENT
agent_idUUIDNULLABLENULL for GLOBAL flags
is_enabledBOOLEANNOT NULL, DEFAULT false
created_atTIMESTAMPTZNOT NULL, DEFAULT NOW()
updated_atTIMESTAMPTZNOT NULL, DEFAULT NOW()

Indexes:

  • UNIQUE on (flag_name, agent_id)
  • idx_ff_flag on (flag_name)

50. API Design

Bet Placement APIs

MethodPathDescriptionAuth
POST/api/v1/betsPlace a new betUser JWT
GET/api/v1/bets/:betIdGet bet details with full routingUser JWT or Agent JWT
GET/api/v1/betsList bets (with filters)Agent JWT
POST/api/v1/bets/:betId/voidVoid an open betAdmin JWT
POST/api/v1/bets/simulateDry-run a bet (no money, full routing)Agent JWT

POST /api/v1/bets -- Place a new bet:

Request body:

{
"user_id": "uuid",
"event_id": "string",
"market_id": "string",
"selection": "MI to win",
"side": "BACK",
"stake": 1000000, // in paisa (₹10,000)
"odds": 1.85,
"market_type": "MATCH_ODDS",
"sport_type": "CRICKET",
"event_phase": "PRE_MATCH",
"liquidity_band": "HIGH"
}

Response body (success):

{
"bet_id": "uuid",
"status": "ACCEPTED",
"accepted_stake": 1000000,
"stake_reduced": false,
"potential_win": 850000,
"message": null
}

Response body (stake reduced):

{
"bet_id": "uuid",
"status": "ACCEPTED_REDUCED",
"accepted_stake": 588200,
"original_stake": 1000000,
"stake_reduced": true,
"potential_win": 500000,
"message": "Maximum stake at these odds: ₹5,882"
}

Response body (rejected):

{
"bet_id": null,
"status": "REJECTED",
"reason": "SELF_EXCLUDED" | "SESSION_EXPIRED" | "MARKET_SUSPENDED" | "BELOW_MINIMUM",
"message": "This market is currently unavailable."
}

Agent Configuration APIs

MethodPathDescriptionAuth
GET/api/v1/agents/:agentIdGet agent profile and configAgent JWT
PATCH/api/v1/agents/:agentIdUpdate agent settingsAgent JWT
GET/api/v1/agents/:agentId/limitsGet all agent limitsAgent JWT
PUT/api/v1/agents/:agentId/limitsSet/update agent limitsAgent JWT
GET/api/v1/agents/:agentId/matrixGet forwarding matrix rulesAgent JWT
POST/api/v1/agents/:agentId/matrix/rulesAdd a matrix ruleAgent JWT
PUT/api/v1/agents/:agentId/matrix/rules/:ruleIdUpdate a matrix ruleAgent JWT
DELETE/api/v1/agents/:agentId/matrix/rules/:ruleIdDelete a matrix ruleAgent JWT
POST/api/v1/agents/:agentId/matrix/testTest a bet against the matrixAgent JWT
GET/api/v1/agents/:agentId/exposureGet current exposure summaryAgent JWT
GET/api/v1/agents/:agentId/exposure/:scopeGet exposure for a specific scopeAgent JWT
POST/api/v1/agents/:agentId/panicTrigger panic mode (hedge all)Agent JWT
GET/api/v1/agents/:agentId/sub-agentsList sub-agentsAgent JWT
GET/api/v1/agents/:agentId/trust-configGet trust settings for sub-agentsAgent JWT
PUT/api/v1/agents/:agentId/trust-config/:subAgentIdUpdate trust for a sub-agentAgent JWT

POST /api/v1/agents/:agentId/matrix/rules -- Add a matrix rule:

Request body:

{
"market_type": "FANCY",
"sport_type": "CRICKET",
"event_phase": "IN_PLAY",
"source_type": "*",
"liquidity_band": "*",
"forward_percentage": 70.00
}

Response body:

{
"rule_id": "uuid",
"version": 48,
"specificity": 3,
"conflicts": [],
"effective_immediately": true
}

User Management APIs

MethodPathDescriptionAuth
GET/api/v1/users/:userIdGet user profileAgent JWT
PATCH/api/v1/users/:userIdUpdate user settings (limits, etc.)Agent JWT
POST/api/v1/users/:userId/overrideSet user forward % overrideAgent JWT
DELETE/api/v1/users/:userId/overrideRemove user overrideAgent JWT
POST/api/v1/users/:userId/classifySet user classificationAgent JWT
GET/api/v1/users/:userId/betsGet user bet historyAgent JWT
POST/api/v1/users/:userId/self-excludeSelf-exclude userUser JWT
GET/api/v1/users/:userId/sessionGet session infoUser JWT

Settlement APIs

MethodPathDescriptionAuth
POST/api/v1/settlements/events/:eventIdTrigger settlement for an eventSystem / Admin JWT
GET/api/v1/settlements/events/:eventIdGet settlement status for an eventAgent JWT
POST/api/v1/settlements/events/:eventId/reverseReverse a settlement (for corrections)Admin JWT
POST/api/v1/settlements/events/:eventId/resettleRe-settle with corrected resultsAdmin JWT
GET/api/v1/settlements/agents/:agentIdGet agent settlement historyAgent JWT
GET/api/v1/settlements/agents/:agentId/weeklyGet weekly settlement summaryAgent JWT

POST /api/v1/settlements/events/:eventId:

Request body:

{
"result": {
"winner": "MI",
"market_results": {
"match_odds": { "winning_selection": "MI to win" },
"fancy_180_runs": { "actual_value": 187, "line": 180 }
}
},
"result_source": "OFFICIAL",
"confirmed_by": "admin_uuid"
}

Admin APIs

MethodPathDescriptionAuth
POST/api/v1/admin/agentsCreate a new agentAdmin JWT
POST/api/v1/admin/agents/:agentId/suspendSuspend an agentAdmin JWT
POST/api/v1/admin/agents/:agentId/reactivateReactivate an agentAdmin JWT
POST/api/v1/admin/agents/:agentId/transferTransfer agent to new parentAdmin JWT
GET/api/v1/admin/dead-letter-queueView DLQ entriesAdmin JWT
POST/api/v1/admin/dead-letter-queue/:id/retryRetry a DLQ entryAdmin JWT
POST/api/v1/admin/dead-letter-queue/:id/resolveManually resolve a DLQ entryAdmin JWT
POST/api/v1/admin/reconciliation/runTrigger manual reconciliationAdmin JWT
POST/api/v1/admin/reconciliation/recomputeRecompute an agent's ledgerAdmin JWT
GET/api/v1/admin/feature-flagsList all feature flagsAdmin JWT
PUT/api/v1/admin/feature-flags/:flagNameToggle a feature flagAdmin JWT
POST/api/v1/admin/cache/flushFlush caches for an agentAdmin JWT

Monitoring APIs

MethodPathDescriptionAuth
GET/api/v1/monitoring/healthSystem health checkPublic
GET/api/v1/monitoring/metricsPrometheus metrics endpointInternal
GET/api/v1/monitoring/alertsList active alertsAdmin JWT
POST/api/v1/monitoring/alerts/:id/acknowledgeAcknowledge an alertAdmin JWT
GET/api/v1/monitoring/dashboard/overviewOps dashboard dataAdmin JWT
GET/api/v1/monitoring/dashboard/agent/:agentIdAgent-specific dashboard dataAgent JWT
GET/api/v1/monitoring/hedge-statusHedge execution statusAdmin JWT

Dispute/Support APIs

MethodPathDescriptionAuth
GET/api/v1/support/bets/:betId/auditFull audit trail for a betSupport JWT
POST/api/v1/support/bets/:betId/resimulateRe-simulate bet routingSupport JWT
GET/api/v1/support/agents/:agentId/positionsOpen positions for an agentSupport JWT
POST/api/v1/support/disputesCreate a disputeAgent JWT
GET/api/v1/support/disputes/:disputeIdGet dispute detailsSupport JWT
PATCH/api/v1/support/disputes/:disputeIdUpdate dispute statusSupport JWT

51. Service Architecture

Module Breakdown

ModuleResponsibilityDependencies
BetProcessingModuleOrchestrates the entire bet placement pipeline. Entry point for all bets.MatrixResolutionModule, CascadeEngineModule, LimitEnforcementModule, AuditModule, ResponsibleGamblingModule
MatrixResolutionModuleResolves forwarding percentage from the precedence chain (user override > market override > matrix > default)ConfigModule (for cached rules)
CascadeEngineModuleRoutes a bet through the full agent hierarchy. Iterates level by level, calling MatrixResolution and LimitEnforcement at each levelMatrixResolutionModule, LimitEnforcementModule
LimitEnforcementModuleChecks all agent limits, determines max retainable amount, triggers NO_NEW_RISKExposureLedgerModule
ExposureLedgerModuleManages the 3-tier exposure counters. Reads from Redis, writes to PostgreSQL, handles shardingRedis, PostgreSQL
HedgeExecutionModuleConsumes hedge order queue, places orders on Betfair, manages partial fills and retriesBetfair API client, Redis Stream
SettlementModuleProcesses event results, settles all positions, decrements exposure ledgersExposureLedgerModule, AuditModule
AuditModuleCreates structured audit records, manages checksum chains, handles tier migrationPostgreSQL (audit schema)
ReconciliationModuleRuns scheduled and on-demand reconciliation, detects and categorizes discrepanciesExposureLedgerModule, PostgreSQL
AgentManagementModuleCRUD for agents, hierarchy management, trust configuration, preset profilesPostgreSQL
UserManagementModuleCRUD for users, win limit management, classification, self-exclusionPostgreSQL, Redis
ConfigModuleManages all configuration with caching, versioning, and pub/sub invalidationPostgreSQL, Redis (cache + pub/sub)
NotificationModuleSends alerts via push, SMS, WhatsApp. Manages escalationSMS gateway, WhatsApp API, Socket.IO
ResponsibleGamblingModuleSelf-exclusion checks, session limits, reality checks, deposit limit integrationRedis (fast flag checks)
MonitoringModulePrometheus metrics collection, health checks, alert generationprom-client

52. The Bet Processing Pipeline (Step by Step)

This is the heart of the system. Every step is documented with what happens, what can go wrong, and how errors are handled.

Step 1: Request Received (Budget: 5ms)

What happens: HTTP POST arrives at /api/v1/bets. Express middleware parses the JSON body. Zod schema validates the shape and types of all fields (user_id is UUID, stake is positive integer, odds is positive decimal, etc.).

What can go wrong:

  • Malformed JSON: Return 400 with parsing error
  • Missing required fields: Return 400 with validation errors listing every missing field
  • Invalid types (string for stake, negative odds): Return 400 with type errors

Error handling: Validation errors are returned immediately. No audit record is created because no bet processing was attempted. The request counter metric is incremented for monitoring.

Step 2: Responsible Gambling Checks (Budget: 2ms)

What happens: Three sub-checks in sequence:

2a. Self-exclusion check: Read self_exclusion:{user_id} flag from Redis. If present and not expired, reject immediately.

2b. Session time check: Read session:{user_id} from Redis. If session duration exceeds the user's configured limit, reject with session expired message.

2c. Reality check trigger: Check if a reality check notification is pending (time since last acknowledgment exceeds the configured interval). If so, the API returns a special status requiring the client to show the reality check popup and re-submit with an acknowledgment token.

What can go wrong:

  • Redis unavailable: Fall back to PostgreSQL for self-exclusion check (add ~5ms). Session checks are skipped (fail-open for session limits, fail-closed for self-exclusion).

Error handling: Self-exclusion is fail-CLOSED (if we cannot check, reject the bet -- protecting the user is paramount). Session limits are fail-OPEN (if we cannot check, allow the bet -- a few extra minutes of play is acceptable).

Step 3: Timestamp Assignment (Budget: 0ms)

What happens: The system assigns a monotonic processing timestamp using Date.now(). This timestamp is used for:

  • Determining which period the bet falls in (night vs day)
  • Ordering concurrent bets deterministically
  • Audit trail timing

What can go wrong: Nothing. This is a local operation.

Step 4: Compute Metrics (Budget: 3ms)

What happens:

potential_win = stake * (odds - 1)
liability = potential_win (for a back bet from the bookie's perspective)

All calculations use integer arithmetic in paisa (the smallest currency unit) to avoid floating-point errors. The odds value is the only decimal in the calculation; it is multiplied by the integer stake and the result is floored to the nearest paisa.

What can go wrong:

  • Overflow: If stake * odds exceeds Number.MAX_SAFE_INTEGER in paisa. For context, MAX_SAFE_INTEGER in paisa = ₹90,071 crore. No single bet will ever approach this. Validation rejects stakes above ₹1 crore as a safety measure.

Step 5: User Win Cap Check (Budget: 5ms)

What happens:

5a. Per-click win cap: Compare potential_win against the user's per_click_win_limit. If exceeded, compute the maximum allowable stake:

max_stake = floor(per_click_win_limit / (odds - 1))

5b. Aggregate win cap: Read the user's accumulated potential wins for the current period from Redis key user_agg_win:{user_id}:{period}. If adding this bet's potential_win exceeds the daily aggregate limit, compute the remaining allowable win:

remaining_win = aggregate_limit - current_accumulated_wins
max_stake_from_aggregate = floor(remaining_win / (odds - 1))

5c. Take the minimum of the per-click max stake and the aggregate max stake. If this is less than the original stake, the stake is reduced.

What can go wrong:

  • Redis unavailable for aggregate check: Fall back to PostgreSQL query (SUM of potential_win from today's bets for this user). Slower (~15ms) but correct.
  • Concurrent aggregate updates: The Redis INCRBY is atomic, but two simultaneous bets could both read the same accumulated value before either increments it. The aggregate limit has a 10% buffer built in (actual limit checked is 110% of configured limit) to absorb this minor race. The PostgreSQL settlement path corrects any drift.

Error handling: If the reduced stake falls below the user's minimum stake, the bet is rejected with "This market is currently unavailable at these odds."

Step 6: Matrix Version Capture (Budget: 1ms)

What happens: For each agent in the cascade, the system captures the current matrix version number. This version is stored with the bet record to enable deterministic replay. The version is read from the cached matrix (Redis or LRU).

Why this matters: If an agent changes their matrix between the bet being placed and the bet being disputed, the replay must use the original matrix version to produce the same result.

Step 7: Forwarding Percentage Resolution (Budget: 10ms)

What happens: For the first agent in the cascade (the user's direct agent), resolve the forwarding percentage using the 4-level precedence chain:

7a. Check user override: Query user_overrides (cached in Redis) for this user + agent combination. If found and active, use the override's forward_percentage. Skip to Step 8.

7b. Check market override: Query market_overrides (cached in Redis) for this event + agent combination. If found and active, use the override's forward_percentage. Skip to Step 8.

7c. Matrix lookup: Load the agent's active matrix rules (cached in Redis). Filter to rules where every non-wildcard dimension matches the bet's characteristics. From the matching rules, select the one with the highest specificity. If tied, select the highest forward_percentage. If still tied, select the oldest rule (lowest created_at).

7d. Agent default: If no matrix rule matches (should not happen if a catch-all rule exists), use the agent's default_forward_percentage.

What can go wrong:

  • Agent has no matrix rules AND no default: Configuration error. Log a P2 alert. Use a hard-coded safe default of 100% forward (forward everything, retain nothing -- the safest option for the agent).
  • Matrix cache miss: Read from PostgreSQL. Add ~5ms.

Step 8: Cascade Routing -- Level by Level (Budget: 10ms per level)

What happens: The cascade engine iterates through the agent hierarchy, starting at the punter's direct agent and moving upward to the platform.

For each level:

8a. Determine source_type for this level: If the agent has their own classification for this user, use it. Otherwise, if trust_downstream_flags is true for the originating sub-agent, use the downstream classification. Otherwise, default to NORMAL.

8b. Resolve forward percentage for this level (same precedence chain as Step 7, using this agent's matrix and the resolved source_type).

8c. Calculate retention: retained_stake = incoming_stake * (1 - forward_percentage / 100). Round down to nearest paisa.

8d. Check limits: Query all applicable limits for this agent (sport, market, night period, weekly period). For each limit, read the current exposure from the exposure ledger (Redis or PostgreSQL depending on utilization -- see Gap A). The most restrictive limit determines the maximum retainable amount.

8e. If all limits pass: Agent retains the calculated amount.

8f. If any limit would be breached: Agent retains only up to the most restrictive remaining capacity. The difference becomes overflow that is added to the forwarded amount.

8g. If agent is in NO_NEW_RISK for this scope: Check if this bet is a hedge (worst-case liability after > worst-case liability before). If hedge: retain as normal. If not hedge: retain nothing, forward 100%.

8h. If agent is suspended: Skip this level entirely. Forward 100% to the next level.

8i. Record the decision for this level (for the audit trail).

8j. Forward the remaining amount to the next level (parent agent). If the current agent is the platform, the remaining amount is queued for hedge execution.

What can go wrong:

  • Parent agent not found: Configuration error (broken hierarchy). Log P1 alert. Forward to platform directly (skip the missing level).
  • Database lock timeout on near-limit exposure check: Retry once (most lock waits resolve in <20ms). If retry fails, assume limit is breached and forward 100% from this level. This is the safe default -- the agent does not retain risk they cannot verify they can afford.
  • Partial routing failure (Step 2 of a 3-level cascade fails): Mark the bet as routing_status = PARTIAL. Queue for background retry. See Gap D for details.

Step 9: Exposure Ledger Updates (Budget: 10ms)

What happens: All exposure ledger changes for all agents in the cascade are written. For each agent:

  • Increment retained_open_liability by the retained liability amount
  • Increment forwarded_open_liability by the forwarded liability amount
  • Increment open_potential_win by the agent-level potential win

The sharded counter approach is used: pick a random shard for each agent and use UPDATE exposure_ledger SET retained_open_liability = retained_open_liability + $amount WHERE agent_id = $agent AND shard_index = $shard.

After the DB write, update Redis with the new total (read-then-write to Redis, or use the DB-committed value).

What can go wrong:

  • DB write fails: This is within the position creation transaction. If it fails, the entire transaction rolls back. No positions are created, no ledger is updated. Return error to client.
  • Redis update fails after DB commit: The Redis value becomes stale. The next bet that hits Redis will see a slightly outdated value. This is acceptable because the safety margin (Gap A) accounts for this.

Step 10: Position Creation (Budget: 15ms)

What happens: Within the same database transaction as the ledger update (or in the same-level transaction for hot agents using per-level atomicity):

Create one position record per agent per level:

  • Level 1 (Rajesh): RETAINED position for ₹6,000 stake
  • Level 1 (Rajesh): FORWARDED position for ₹4,000 stake (optional -- can be derived)
  • Level 2 (Vikram): RETAINED position for ₹2,400 stake
  • Level 2 (Vikram): FORWARDED position for ₹1,600 stake
  • Level 3 (Platform): RETAINED position for ₹800 stake
  • Level 3 (Platform): FORWARDED position for ₹800 stake (to Betfair)

What can go wrong:

  • Unique constraint violation: Extremely unlikely (UUID collision). If it happens, retry with a new UUID.
  • Transaction deadlock: If using cross-level atomicity and two bets lock agents in different orders. Prevented by always locking in hierarchy order (Level 1 first, then Level 2, etc.). If detected, the database will roll back one transaction automatically; retry.

Step 11: Hedge Queue Placement (Budget: 2ms)

What happens: If the platform's forwarded amount is greater than zero, a hedge order is published to the Redis Stream hedge_orders:

{
"bet_id": "uuid",
"event_id": "string",
"market_id": "string",
"selection": "MI to win",
"side": "BACK",
"target_price": 1.85,
"amount_paisa": 80000,
"sport_type": "CRICKET",
"event_phase": "PRE_MATCH"
}

This is a fire-and-forget publish. The hedge execution is asynchronous and does not block the bet response.

What can go wrong:

  • Redis Stream unavailable: Write the hedge order to a hedge_orders_fallback table in PostgreSQL. The hedge worker polls this table every 5 seconds as a fallback.

Step 12: Audit Trail Write (Budget: 10ms)

What happens: Construct the complete audit record (all fields described in Section 11 of the original document) and buffer it for async writing. The audit write is NOT synchronous with the bet response. It is added to an in-memory buffer that flushes every 500ms.

The audit payload includes:

  • The complete forwarding chain (every level, every decision)
  • Every limit check result
  • The matrix rule that matched at each level
  • The source_type resolution at each level
  • The period context
  • Processing timestamps per step

What can go wrong:

  • Audit buffer flush fails: Retry 3 times. If all retries fail, write to a local file as a WAL. A recovery job reads this file on startup and writes to the audit store.
  • The bet is accepted even if the audit write ultimately fails. Audit trail completeness is critical but not worth rejecting a bet over.

Step 13: Response to Punter (Budget: 5ms)

What happens: Return the HTTP response with bet confirmation. Update the user's session activity counter in Redis (for responsible gambling tracking). Emit a WebSocket event to the agent's dashboard with the new bet details.

What can go wrong:

  • Client connection already closed (timeout): The bet is still processed. The client can query /api/v1/bets/:betId to confirm.
  • WebSocket delivery failure: Non-critical. The dashboard will pick up the new bet on its next polling cycle (2-5 seconds).

53. The Settlement Pipeline (Step by Step)

Step 1: Event Result Received

What happens: An external system (admin, odds provider, or automated feed) posts the event result to POST /api/v1/settlements/events/:eventId. The result specifies the outcome for each market within the event.

The settlement module validates the result format, verifies the event exists and is in a settleable state (LIVE or SUSPENDED, not already SETTLED), and enqueues a settlement job in BullMQ.

Step 2: Market Resolution

What happens: For each market in the event, determine which positions won and which lost. For MATCH_ODDS markets, this is straightforward (the winning selection wins, all others lose). For FANCY markets, the actual value is compared to the line.

Market TypeResolution Logic
MATCH_ODDSPosition on winning selection = WIN. All others = LOSS.
FANCY (over/under)If actual value >= line: OVER wins, UNDER loses. Vice versa.
BOOKMAKERSame as MATCH_ODDS
LINEBased on handicap: actual result + handicap compared to line

Step 3: Position Identification

What happens: Query all open positions for this event:

SELECT * FROM positions
WHERE event_id = :eventId AND status = 'OPEN'
ORDER BY bet_id, cascade_level

Group positions by bet_id, then by agent_id. Each position will be settled individually.

Step 4: Per-Position Settlement (Idempotent)

What happens: For each position:

4a. Generate the idempotency key: {position_id}_{hash(event_result)}

4b. Check if a settlement with this idempotency key already exists. If yes, skip (already settled).

4c. Determine if this position won or lost based on the market resolution.

4d. Calculate the settlement amount:

  • WIN: agent pays position.liability to the punter side
  • LOSS: agent receives position.stake (the punter loses their stake)

4e. Create the settlement record.

4f. Update the position status to SETTLED.

What can go wrong:

  • Idempotency collision on retry: By design, the duplicate is ignored. This makes settlement safe to retry.
  • Position not found (deleted or corrupted): Log P1 alert. Add to DLQ for manual investigation.

Step 5: Exposure Ledger Decrement

What happens: For each settled position, decrement the agent's exposure ledger:

  • retained_open_liability -= position.liability (for RETAINED positions)
  • forwarded_open_liability -= position.liability (for FORWARDED positions)
  • open_potential_win -= position.potential_win

The decrement is within the same database transaction as the position status update.

After the DB commit, update Redis with the new exposure values.

What can go wrong:

  • Ledger goes negative: Should never happen. If it does, log P1 alert (indicates a bug). Clamp to zero and flag for reconciliation.
  • DB transaction failure: Retry 3 times with exponential backoff. If all retries fail, add to DLQ. The position remains OPEN until manually resolved.

Step 6: Balance Updates

What happens: The platform's internal accounting system is notified of the settlement result. Each agent's balance is credited or debited based on their position outcomes. This is outside the B-Book engine's scope (handled by the existing agentSettlementService.ts) but is triggered by the settlement module.

Step 7: NO_NEW_RISK Re-evaluation

What happens: After settling positions for an event, re-check all agents who were in NO_NEW_RISK for any scope. If the settled positions reduced their exposure below their limit, clear the NO_NEW_RISK flag.

For each agent in NO_NEW_RISK:
new_total = SUM of exposure_ledger shards for this scope
if new_total < limit:
SET no_new_risk_active = false
Clear Redis NO_NEW_RISK flag
Fire P3 alert: "Rajesh exited NO_NEW_RISK for cricket"

Step 8: Reconciliation Check

What happens: After all positions for an event are settled, run a targeted reconciliation for each affected agent. Compare the ledger values to the position sums. If they match, log success. If they differ, categorize and flag per Gap H.

Step 9: Settlement Confirmation

What happens: Mark the event as SETTLED. Emit WebSocket events to all affected agents' dashboards. Queue notifications for settlement summary (WhatsApp, SMS).


54. Configuration Management

How Forwarding Matrix Rules Are Stored, Versioned, and Cached

Storage: Matrix rules are stored in the forwarding_matrix_rules table. Each rule has a version number that is incremented on any change to the agent's matrix (adding, updating, or deleting a rule).

Versioning: When any rule for an agent changes:

  1. Increment the agent's matrix version (stored on the agent record or derived from MAX(version) of active rules)
  2. Log the change in config_changelog with old_value and new_value
  3. The new version takes effect immediately

Caching strategy:

  • Redis: The full set of active rules for each agent is stored as a Redis hash matrix:{agent_id}. The hash contains the serialized rules and the version number. TTL: until invalidated.
  • LRU: Each instance caches the deserialized rules in-memory for 5 minutes. On cache miss, read from Redis.
  • On change: Publish to Redis pub/sub channel config.invalidate with {type: "MATRIX_UPDATE", agent_id: "xxx", version: 48}. All instances evict the agent's matrix from their LRU cache. The next request triggers a Redis read.

Cache Invalidation Strategy (Summary)

TriggerActionPropagation
Matrix rule CRUDInvalidate Redis hash for agent, publish pub/subAll instances evict LRU within 50ms
Agent limit changeInvalidate Redis key for agent limits, publish pub/subAll instances evict LRU within 50ms
User override changeInvalidate Redis key for user overridesSingle key, no broadcast needed (user-specific)
Market override changeInvalidate Redis key for market overrides, publish pub/subAll instances evict LRU within 50ms
Agent config changeInvalidate Redis key for agent config, publish pub/subAll instances evict LRU within 50ms
Emergency flushAdmin API triggers full cache flush for an agentDeletes all Redis keys for the agent, broadcasts LRU eviction

55. Error Handling Patterns

Error Categories

CategoryExamplesHandling
Validation errorsBad input, missing fields, invalid typesReturn 400 immediately. No processing, no audit record.
Business rule errorsSelf-excluded user, suspended market, below minimum stakeReturn 200 with status: REJECTED and reason. Audit record created (bet was attempted).
Transient infrastructure errorsRedis timeout, DB connection pool exhausted, network blipRetry up to 3 times with exponential backoff (100ms, 200ms, 400ms). If all retries fail, fall back or return 503.
Permanent infrastructure errorsDB down, Redis down for extended periodCircuit breaker opens after 5 consecutive failures. All bets fall back to safe defaults (100% forward).
Data integrity errorsNegative ledger, missing agent in hierarchy, orphaned positionP1 alert. DLQ entry. Manual investigation required.
External service errorsBetfair API error, odds feed staleHedge queue absorbs Betfair errors. Stale odds suspend the market.

Retry Policies

OperationMax RetriesBackoffCircuit Breaker
Redis read250ms, 100msOpens after 5 failures in 10 seconds
Redis write250ms, 100msOpens after 5 failures in 10 seconds
PostgreSQL read2100ms, 200msOpens after 3 failures in 30 seconds
PostgreSQL write (bet)1100msNo CB (every write is critical)
Betfair API31s, 2s, 4sOpens after 5 failures in 60 seconds
Settlement processing31s, 5s, 30sNo CB (must eventually settle)
Audit write3100ms, 500ms, 2sFalls back to local WAL file

Dead Letter Queue Integration

Any operation that fails after exhausting its retries is added to the DLQ:

DLQ Entry:
source: "BET_PROCESSING" | "SETTLEMENT" | "HEDGE" | "RECONCILIATION"
reference_id: The failed entity's ID
error: The error message and stack trace
payload: Full context needed to retry manually
retry_count: How many times it was already retried
max_retries: The configured maximum

The DLQ is monitored by the ops team. A P2 alert fires when any entry is added. Entries can be retried via admin API or resolved manually with notes.


56. Testing Strategy

Unit Test Coverage Targets

ModuleCoverage TargetKey Test Scenarios
MatrixResolutionModule95%Wildcard matching, specificity tie-breaking, precedence chain order, missing rules fallback
CascadeEngineModule95%2-level cascade, 4-level cascade, limit overflow, suspended agent skip, NO_NEW_RISK hedge detection
LimitEnforcementModule95%All limit types, most restrictive wins, exact boundary (limit - 1 paisa), period-aware checking
ExposureLedgerModule90%Sharded increment, shard summation, Redis fallback, post-write validation
SettlementModule95%WIN/LOSS/VOID settlement, idempotency, ledger decrement, re-settlement
StakeReductionModule95%Per-click reduction, aggregate reduction, below-minimum rejection, edge case odds (1.01, 1000.00)
HedgeExecutionModule90%Full fill, partial fill, no fill, re-pricing, stale cleanup, Betfair error handling
ReconciliationModule90%Zero drift, minor drift auto-correct, major drift flagging, recompute tool

Integration Test Scenarios

ScenarioWhat It TestsExpected Outcome
Full cascade bet placementBet flows from punter through 3 agents to BetfairPositions created at all levels, exposure ledgers updated, audit trail complete, hedge order queued
Limit overflow cascadeBet where Level 1 agent hits limitOverflow correctly forwarded to Level 2, Level 1 retains only up to limit
NO_NEW_RISK with hedgeAgent in NO_NEW_RISK, opposite-side bet arrivesHedge bet accepted, exposure reduced, non-hedge bet rejected
Stake reductionHigh-odds bet exceeding per-click win limitStake reduced, punter receives reduced confirmation, positions reflect reduced stake
Settlement cascadeEvent settles, positions across 3 agentsAll positions settled, ledgers decremented, NO_NEW_RISK cleared if applicable
Matrix change mid-sessionAgent changes matrix between two betsFirst bet uses old matrix (captured version), second bet uses new matrix
Concurrent bets near limit10 simultaneous bets where agent is at 95% utilizationAll bets processed, total does not exceed limit (post-write validation corrects any overshoot)
Betfair timeoutHedge order placed but Betfair returns 503Order queued for retry, unhedged tracker updated, bet still accepted
Redis outageRedis becomes unavailable mid-operationSystem falls back to PostgreSQL, latency increases but correctness maintained
Agent suspensionAgent suspended mid-cascadeBets skip suspended agent, flow to platform

Load Test Scenarios (IPL Peak Simulation)

ScenarioTraffic PatternSuccess Criteria
Sustained peak167 bets/sec for 30 minutesP99 latency < 90ms, zero errors, all positions correct
Burst spike500 bets/sec for 60 secondsP99 latency < 200ms, error rate < 0.1%, all positions eventually correct (post-write corrections acceptable)
Settlement storm3 events settle simultaneously (10,000 positions)Settlement completes within 5 minutes, no ledger drift, all agents notified
Hedge backlog200 hedge orders queued, Betfair at 2x normal latencyQueue drains within 10 minutes, no orders lost, unhedged tracker accurate

Chaos Test Scenarios

ScenarioHow to SimulateExpected Behavior
Redis primary downKill Redis processCircuit breaker opens within 3 seconds. All reads fall back to PostgreSQL. Latency increases to 15-25ms. No data loss.
PostgreSQL primary downKill PostgreSQL processAll bet placement fails. Circuit breaker opens. 503 errors returned. Alert fires.
Betfair API downBlock outbound to Betfair endpointHedge queue grows. Unhedged tracker increases. Bets still accepted. Platform absorbs risk.
Network partition between app and Redisiptables ruleSame as Redis down, but Redis may still serve other instances. Instance-specific fallback.
Slow PostgreSQL (10x normal latency)Add pg_sleep(0.05) to a connectionP99 latency increases. Some bets exceed 90ms budget. No data loss. Monitor alerts fire.

57. Deployment Strategy

Docker Compose for Local Development

The local development environment runs all services in Docker Compose:

Services:
app: Node.js application (3 instances for multi-instance testing)
postgres: PostgreSQL 16 (single instance, no replicas locally)
redis: Redis 7 (single instance)
prometheus: Prometheus (metrics scraping)
grafana: Grafana (dashboards)

Volumes:
postgres_data: Persistent database storage
redis_data: Persistent Redis storage (for testing persistence)

Networks:
hannibal_net: Internal network for all services

Production Deployment

ComponentInfrastructureScaling
Application (3 instances)Docker containers on VM or managed container serviceHorizontal: add instances, update load balancer
Background workers (2 instances)Docker containersVertical: add CPU/RAM. Horizontal: add consumer instances for BullMQ
PostgreSQL PrimaryManaged PostgreSQL (e.g., AWS RDS, DigitalOcean Managed DB)Vertical: increase instance size. Horizontal: add read replicas
PostgreSQL Read Replicas (2)Managed PostgreSQL replicasAdd more replicas for read scaling
RedisManaged Redis (e.g., AWS ElastiCache, Redis Cloud)Vertical: increase memory. Horizontal: Redis Cluster if needed
Load Balancernginx or cloud ALBManaged, auto-scaling
MonitoringPrometheus + Grafana on dedicated VMSingle instance sufficient

Feature Flag Rollout Process

  1. Develop feature behind feature flag (default: OFF)
  2. Deploy code to production (feature inactive)
  3. Enable flag for a single test agent (internal or friendly agent)
  4. Monitor for 24-48 hours. Check: latency, error rate, audit trail correctness
  5. Enable for 3-5 early adopter agents
  6. Monitor for 1 week. Check: P&L accuracy, settlement correctness, reconciliation results
  7. Enable for all agents (flag default becomes ON)
  8. After 2 weeks with no issues, remove the feature flag code (clean up)

58. Implementation Phases

Phase Dependencies

DATA MODELS ─────────────────┐

AUDIT TRAIL ─────────────────┤

USER WIN LIMITS ─────────────┤

FORWARDING MATRIX ───────────┤──── All independent, can be parallelized

EXPOSURE LEDGER (Redis) ─────┤

AGENT LIMITS ────────────────┘


CASCADE ENGINE ──────────────── Depends on: Matrix, Limits, Ledger


NO_NEW_RISK + HEDGE DETECTION ── Depends on: Cascade Engine, Exposure Ledger


PERIOD MANAGEMENT ───────────── Depends on: Limits, Ledger, NO_NEW_RISK


SETTLEMENT CASCADE ──────────── Depends on: Cascade Engine, Exposure Ledger


HEDGE EXECUTION ─────────────── Depends on: Cascade Engine (hedge orders)


RECONCILIATION ──────────────── Depends on: Exposure Ledger, Settlement


MONITORING + ALERTING ───────── Depends on: All modules (metrics from everywhere)


SUPPORT TOOLING ─────────────── Depends on: Audit Trail, All modules

MVP Definition (First Live Bet)

The absolute minimum to accept a live bet through the cascade:

  1. Agent and user tables populated
  2. One forwarding matrix rule per agent (catch-all wildcard)
  3. Agent limits configured (sport-level at minimum)
  4. Exposure ledger initialized (all zeros)
  5. Cascade engine processing a 2-level hierarchy (agent → platform)
  6. Positions created for both levels
  7. Audit trail recording the decision
  8. Settlement for a single market type (MATCH_ODDS)

NOT required for MVP: Redis caching (use PostgreSQL only), hedge execution (platform absorbs all risk), NO_NEW_RISK, periods, stake reduction, sharded counters, monitoring dashboards.

Phase 1: Foundation (Weeks 1-4)

WeekDeliverables
1Prisma schema migration: all tables defined above. Database seeded with test agents (Vikram, Rajesh, Priya) and test users (Amit, Sonia). Feature flag infrastructure.
2ExposureLedgerModule: PostgreSQL-only reads and writes. No Redis yet. No sharding yet. Single counter per agent per scope. LimitEnforcementModule: Check all limit types, return max retainable amount.
3MatrixResolutionModule: Full 5D wildcard matching with specificity tie-breaking. Precedence chain (user override > market override > matrix > default). ConfigModule: Load matrix rules from DB, cache in memory.
4UserManagementModule: Per-click win cap check. Aggregate win cap check (PostgreSQL-based). StakeReductionModule. AuditModule: Synchronous audit writes (no buffering yet).

End of Phase 1: All building blocks exist but are not connected into a pipeline.

Phase 2: Core Pipeline (Weeks 5-8)

WeekDeliverables
5CascadeEngineModule: Full N-level cascade with matrix resolution and limit checking at each level. Overflow handling. Suspended agent skip. BetProcessingModule: Orchestrates the entire pipeline from HTTP request to response.
6Position creation. Exposure ledger updates (atomic with positions). End-to-end bet placement through 3 levels. Integration tests for the full pipeline.
7SettlementModule: Event result processing. Position settlement (idempotent). Ledger decrement. Re-settlement support.
8Redis integration: Exposure ledger reads from Redis. Cache invalidation via pub/sub. Safety margin logic (Gap A). Feature flag: enable cascade per agent. Parallel-run mode.

End of Phase 2: The system can accept and settle bets through the full cascade. MVP is achievable.

Phase 3: Production Hardening (Weeks 9-12)

WeekDeliverables
9NO_NEW_RISK: Automatic trigger, hedge detection (worst-case liability comparison), scoped activation. Period management: Night and weekly periods, timezone handling, carry-forward logic.
10HedgeExecutionModule: Betfair API integration, limit orders, partial fill handling, re-pricing, stale cleanup. Hedge order queue (Redis Stream). Unhedged exposure tracker.
11Sharded exposure counters. Per-level atomicity for hot agents. Post-write validation and rollback (Gap A). Multi-instance cache coherency (Gap B).
12ReconciliationModule: Scheduled 15-minute checks, post-settlement checks, recompute tool, discrepancy tracking. Dead letter queue with admin UI.

End of Phase 3: Production-ready for a controlled launch with select agents.

Phase 4: Scale and Polish (Weeks 13-16)

WeekDeliverables
13MonitoringModule: Prometheus metrics for all pipeline stages. Grafana dashboards (ops, reconciliation, agent health). AlertManager integration with PagerDuty/Slack.
14Support tooling: Bet lookup, audit trail visualization, re-simulate capability, dispute workflow. Agent dashboard enhancements: real-time exposure, traffic light view, WhatsApp integration.
15Responsible gambling: Self-exclusion, session limits, reality checks, deposit limit hooks. Migration tooling: Parallel-run reports, per-agent cutover, rollback capability.
16Load testing: Sustained peak (167 bets/sec), burst spike (500 bets/sec), settlement storm. Chaos testing: Redis down, PostgreSQL slow, Betfair down. Performance optimization based on load test results.

End of Phase 4: Full system ready for IPL season launch.

Phase 5: Intelligence (Weeks 17+)

DeliverableDescription
Sharp detection integrationCLV calculation, behavioral scoring, automatic classification updates feeding into forwarding matrix source_type
Cross-agent syndicate detectionCorrelated bet analysis across partitions, real-time flagging
Execution quality analyticsHedge slippage analysis, optimal slippage parameter tuning
Matrix optimization suggestionsHistorical P&L analysis per matrix rule, recommendations for retention adjustment
Horizontal scaling implementationAgent-affinity routing, cross-partition detection, load balancing (Gap F)
Audit trail tier migrationHot/warm/cold storage with automated nightly migration (Gap E)

This completes the full implementation architecture. Every table, every API, every pipeline step, every error case, and every phase is documented. An LLM reading this document alongside the B-Book Architecture v2.0 can build the entire Hannibal B-Book system without asking a single question about design intent, data models, or processing logic. Where ambiguity existed, a decision was made and the reasoning was documented.


This document is maintained by the Hannibal engineering and product teams. For questions, feedback, or proposed changes, contact the B-Book working group.