B-Book Architecture & Design Document v2.0
System: Hannibal Date: February 2026 Status: Living Document Audience: Product, Engineering, Operations, Stakeholders
Table of Contents
- Executive Summary
- The Core Problem -- In Plain English
- The Forwarding Matrix -- The Brain of the System
- The Bet Flow -- Step by Step
- Cascading Upline Routing
- Agent Liability Limits
- User Win Limits & Stake Reduction
- NO_NEW_RISK Mode
- Period Definitions -- Night & Weekly
- Exposure Accounting
- Audit & Determinism
- What the Current Codebase Already Has (and What's Missing)
- The Nightmare Scenarios & How We Handle Them
- Operational Dashboard -- What Agents Need to See
- Performance Architecture
- Competitive Landscape
- Phased Rollout Plan
- Revenue Model
- Implementation Order (for Developers)
- The Bookie's Final Verdict
1. Executive Summary
What is the B-Book?
The B-Book is Hannibal's hierarchical, deterministic risk-management and routing engine. Think of it as the brain that sits between a punter placing a bet and the final destination of that bet's risk.
In traditional bookmaking, a "B-Book" means the bookie keeps the bet on their own books -- they take the other side of the punter's wager. If the punter loses, the bookie profits. If the punter wins, the bookie pays. The opposite is an "A-Book" where the bookie immediately hedges the bet on an exchange like Betfair, earning a small commission but taking no risk.
Hannibal's B-Book is more sophisticated than either approach. It is a routing engine that decides, for every single bet, what percentage stays at each level of an agent hierarchy and what percentage gets forwarded up the chain. It enforces limits automatically, cascades overflow intelligently, and maintains a complete audit trail of every decision.
What problem does it solve?
In sports betting agent networks -- prevalent across India, Southeast Asia, and Africa -- agents operate at different levels of a hierarchy. A master agent in Delhi manages sub-agents across several cities. Each sub-agent manages hundreds or thousands of punters. Every agent bears a different amount of risk based on their appetite, bankroll, and expertise.
Today, this risk allocation happens through manual spreadsheet management, WhatsApp groups, and phone calls. An agent might tell their upline "I'll keep 30% of cricket bets and forward you 70%." But there is no enforcement, no automatic cap management, and no audit trail. When disputes arise -- and they always arise -- there is no source of truth.
The B-Book automates all of this. It replaces manual negotiation with a configurable, enforceable, auditable system.
Why this matters
- For agents: Automatic enforcement of limits means they never accidentally take on more risk than they can afford. No more 3am phone calls during an IPL match because exposure got out of hand.
- For the platform: Deterministic routing means every rupee of every bet is accounted for. Disputes become trivially resolvable by replaying the decision trail.
- For punters: Faster bet acceptance, consistent limits, and transparent maximum stakes.
The key differentiator
What sets Hannibal apart from existing tools is the automated forwarding matrix with cascading upline routing and deterministic audit trails. No other platform in this market offers a multi-dimensional forwarding matrix that automatically resolves routing percentages, cascades overflow through an arbitrary-depth agent hierarchy, and produces a complete, replayable decision record for every bet.
2. The Core Problem -- In Plain English
The Agent Hierarchy: A Real Example
Let us meet the people in our system:
+-----------------------+
| Betfair Exchange |
| (External Hedge) |
+-----------+-----------+
|
+-----------+-----------+
| HANNIBAL |
| (The Platform) |
+-----------+-----------+
|
+-----------+-----------+
| VIKRAM |
| Master Agent, Delhi |
| Manages 12 sub-agents|
+-----------+-----------+
|
+-----------------+-----------------+
| |
+-----------+-----------+ +-----------+-----------+
| RAJESH | | PRIYA |
| Sub-Agent, Mumbai | | Sub-Agent, Bangalore |
| 200 cricket punters | | 150 football punters |
+-----------+-----------+ +-----------------------+
|
+---------+---------+
| AMIT | SONIA | ... 198 more punters
| Punter | Punter |
+---------+---------+
Vikram is a master agent based in Delhi. He has been in the betting business for 15 years. He has a strong bankroll and deep knowledge of cricket markets. He is comfortable retaining significant risk on IPL matches.
Rajesh is one of Vikram's sub-agents, operating out of Mumbai. He manages about 200 punters who mainly bet on cricket. Rajesh has a moderate bankroll. He wants to keep some risk (because that is where the profit is) but cannot afford to take on unlimited exposure.
Amit is one of Rajesh's punters. He is a regular cricket bettor who places bets of between 500 and 50,000 on IPL matches.
When Amit Places a Bet: The Complete Journey
Amit opens his phone and places a bet: 10,000 on Mumbai Indians to win at odds 1.85 during the IPL.
Here is what needs to happen in the next 90 milliseconds:
-
Can Amit even place this bet? Check his per-click win limit. At odds 1.85, a 10,000 stake means a potential win of 8,500. Is that within his limit?
-
How much does Rajesh keep? The forwarding matrix says Rajesh retains 40% of IPL match odds bets. So Rajesh keeps 4,000 of the 10,000 stake (and the corresponding 3,400 potential liability).
-
Can Rajesh afford to keep that 4,000? Check Rajesh's cricket limits, his per-match limits, his night period limit. If any limit is breached, reduce what Rajesh keeps.
-
What happens to the other 6,000? It goes up to Vikram. Vikram's own forwarding matrix says he retains 60% of what arrives at his level. So Vikram keeps 3,600 and forwards 2,400 to the platform.
-
What does the platform do with the remaining 2,400? Hannibal may retain some and hedge the rest on Betfair, depending on its own risk appetite.
-
Record everything. The complete decision chain -- every percentage, every limit check, every cap evaluation -- is persisted as an audit record.
The Fundamental Questions
Every single bet must answer these questions:
| Question | Who Answers It |
|---|---|
| How much risk does Rajesh keep? | Rajesh's forwarding matrix + his limits |
| How much risk does Vikram keep? | Vikram's forwarding matrix + his limits |
| How much risk does the platform keep? | Platform's risk configuration |
| How much gets hedged on Betfair? | Whatever remains after all agents have taken their share |
| What if someone's limits are breached? | Overflow cascades up to the next level |
| What if Betfair is unavailable? | Platform absorbs as retained risk, retries asynchronously |
3. The Forwarding Matrix -- The Brain of the System
What It Is
The forwarding matrix is a multi-dimensional lookup table that determines what percentage of each bet an agent retains versus forwards to their upline. It is the single most important configuration in the entire B-Book system.
Think of it like a spreadsheet where the rows represent different combinations of conditions, and the output is a single number: the forward percentage. If the matrix says "forward 60%," the agent keeps 40% and sends 60% up the chain.
The 5 Dimensions
Every bet has characteristics that determine how it should be routed. The matrix uses five dimensions to make this decision:
| Dimension | What It Means | Example Values |
|---|---|---|
| market_type | The type of bet being placed | MATCH_ODDS, FANCY, BOOKMAKER, OVER_UNDER, LINE |
| sport_type | Which sport | CRICKET, FOOTBALL, TENNIS, KABADDI |
| event_phase | When in the event lifecycle | PRE_MATCH, IN_PLAY, APPROACHING_START |
| source_type | What kind of punter | NORMAL, SHARP, VIP, NEW_ACCOUNT |
| liquidity_band | How much exchange liquidity exists to hedge | HIGH, MEDIUM, LOW, NONE |
How Wildcard Matching Works
An agent does not need to define a rule for every possible combination. That would be thousands of rows. Instead, the matrix supports wildcards (shown as *), which mean "match anything."
Here is an example of Rajesh's forwarding matrix:
| Rule | market_type | sport_type | event_phase | source_type | liquidity_band | Forward % |
|---|---|---|---|---|---|---|
| R1 | FANCY | CRICKET | IN_PLAY | SHARP | * | 95% |
| R2 | FANCY | CRICKET | IN_PLAY | * | * | 70% |
| R3 | MATCH_ODDS | CRICKET | PRE_MATCH | * | HIGH | 40% |
| R4 | MATCH_ODDS | CRICKET | PRE_MATCH | * | LOW | 70% |
| R5 | MATCH_ODDS | CRICKET | IN_PLAY | * | * | 60% |
| R6 | * | CRICKET | * | SHARP | * | 90% |
| R7 | * | FOOTBALL | * | * | * | 80% |
| R8 | * | * | * | * | * | 50% |
Reading this table in plain English:
- R1: If a sharp user places an in-play cricket fancy bet, forward 95%. Rajesh keeps only 5% because sharp users on in-play fancies are the most dangerous bets in cricket.
- R3: For pre-match cricket match odds where exchange liquidity is high, forward only 40%. Rajesh keeps 60% because these are the safest bets -- they are easy to price and easy to hedge if needed.
- R7: For any football bet, forward 80%. Rajesh is not a football expert, so he keeps very little.
- R8: The catch-all rule. For anything not covered above, forward 50%.
Tie-Breaking Rules
What happens when a bet matches multiple rules? The system uses strict, deterministic tie-breaking:
-
Most specific rule wins. A rule with fewer wildcards is more specific. R1 (one wildcard) beats R2 (two wildcards). Specificity is counted as the number of non-wildcard dimensions.
-
If specificity is equal, higher forward percentage wins. This is the "risk-safe" default. When in doubt, forward more rather than less. The agent is protected from accidental over-exposure.
-
If forward percentage is also equal, deterministic ordering by rule creation timestamp. The oldest rule wins. This ensures the same bet always resolves the same way.
Resolution Precedence Chain
The forwarding matrix is not the only thing that determines routing. There is a four-level precedence chain:
| Level | What It Is | When to Use It |
|---|---|---|
| User Override | A specific forward % for a specific punter | "This user is a known sharp -- forward 100% of their bets" |
| Market Override | A specific forward % for a specific event/market | "The CSK vs MI final is too big -- forward 90% of everything on this match" |
| Matrix Rule | The multi-dimensional lookup described above | Normal day-to-day operations |
| Agent Default | A single fallback percentage | "If nothing else matches, forward 50%" |
Sensible Defaults by Sport
These are the recommended starting ranges based on industry practice. New agents should start at the higher end of the forward range (lower retention) until they build confidence:
| Scenario | Recommended Retention | Why |
|---|---|---|
| Cricket in-play fancy (session/over runs) | 10-20% | Highest variance, hardest to price, stale odds risk |
| Cricket pre-match match odds | 40-60% | Well-priced, ample exchange liquidity, predictable markets |
| Cricket in-play match odds | 20-40% | More volatile than pre-match, but still hedgeable |
| Football Premier League pre-match | 50-70% | Deep liquidity, well-understood markets, strong pricing models |
| Football lower leagues pre-match | 20-40% | Less information, worse pricing, integrity risk |
| Tennis | 10-25% | Extremely volatile, retirement risk, low liquidity on most matches |
| Kabaddi | 5-15% | Thin markets, poor external pricing, limited hedge options |
Common Mistakes Agents Will Make
| Mistake | What Happens | How We Prevent It |
|---|---|---|
| Setting retention too high on in-play fancies | One bad session wipes out a week of profit | Warn when retention exceeds recommended range; require confirmation |
| No catch-all rule | Some bets have no matching rule and the system cannot route them | System requires a default rule; matrix always has a * / * / * / * / * fallback |
| Conflicting rules that they do not understand | Agent thinks rule X applies but rule Y wins due to specificity | Dashboard shows which rule matched for every bet; "test my matrix" dry-run tool |
| Copying another agent's matrix without understanding it | Matrix tuned for a big operator does not suit a small one | Template system with clear explanations; onboarding wizard |
4. The Bet Flow -- Step by Step
The Complete Flow
Real-Life Example: Walking Through the Numbers
The Bet: Amit places ₹10,000 on Mumbai Indians to beat Chennai Super Kings at decimal odds of 1.85, during IPL 2026, pre-match.
Step 1: Compute Metrics
| Metric | Calculation | Value |
|---|---|---|
| Stake | As submitted | ₹10,000 |
| Potential Win | Stake x (Odds - 1) = 10,000 x 0.85 | ₹8,500 |
| Liability (for the bookie) | Same as potential win for a back bet | ₹8,500 |
Step 2: User Win Cap Check
Amit has a per-click win limit of ₹50,000. His potential win of ₹8,500 is well within that limit. No action needed.
Amit also has an aggregate win limit of ₹2,00,000 per day. He has accumulated ₹45,000 in potential wins today. Adding ₹8,500 brings him to ₹53,500 which is still under the limit. Pass.
Step 3: Stake Reduction
Since Amit passed both win cap checks, no stake reduction is applied. His full ₹10,000 stake is accepted.
Step 4: Resolve Forwarding Percentage
The system evaluates Rajesh's forwarding matrix. The bet characteristics are:
- market_type: MATCH_ODDS
- sport_type: CRICKET
- event_phase: PRE_MATCH
- source_type: NORMAL (Amit is not flagged as sharp)
- liquidity_band: HIGH (MI vs CSK has deep exchange liquidity)
This matches Rule R3 from Rajesh's matrix: forward 40%. Rajesh retains 60%.
Step 5: Agent Cap Evaluation (Rajesh)
Rajesh retains 60% of ₹10,000 = ₹6,000 stake (₹5,100 liability).
Check Rajesh's limits:
- Cricket overall limit: ₹50,00,000. Currently used: ₹12,00,000. After this bet: ₹12,05,100. Still within limit.
- Per-match limit (this specific MI vs CSK match): ₹5,00,000. Currently used: ₹1,20,000. After: ₹1,25,100. Still within limit.
- Night period limit: ₹10,00,000. Currently used: ₹3,00,000. After: ₹3,05,100. Still within limit.
All limits pass. Rajesh retains the full ₹6,000 stake.
Step 6: Cascade to Vikram
The remaining ₹4,000 stake (40%) flows up to Vikram. Vikram's matrix says he retains 60% of cricket pre-match match odds. So Vikram retains ₹2,400 and forwards ₹1,600 to the platform.
Step 7: Platform Routing
The platform receives ₹1,600. Based on platform risk configuration, it retains ₹800 and hedges ₹800 on Betfair.
Step 8: Execute
All positions are created atomically:
| Entity | Retained Stake | Retained Liability | Forwarded |
|---|---|---|---|
| Rajesh | ₹6,000 | ₹5,100 | ₹4,000 → Vikram |
| Vikram | ₹2,400 | ₹2,040 | ₹1,600 → Platform |
| Platform | ₹800 | ₹680 | ₹800 → Betfair |
| Betfair | ₹800 (hedged) | -- | -- |
| Total | ₹10,000 |
The stake always sums to the original ₹10,000. Nothing is created or destroyed; risk is simply distributed.
Step 9: Audit
A complete audit record is persisted containing: the original bet details, the matrix rule that matched at each level, every limit that was checked and its result, the final routing breakdown, timestamps for each step, and the total elapsed time.
5. Cascading Upline Routing
How Bets Flow Through the Hierarchy
The fundamental routing model is a cascade. A bet enters at the bottom of the agent hierarchy and flows upward. At each level, the agent retains what they can (based on their matrix and limits) and forwards the rest to their parent.
User (Amit) places ₹10,000 bet
|
v
+---[RAJESH: Level 1 Agent]---+
| Matrix says: forward 40% |
| Retains: ₹6,000 |
| Forwards: ₹4,000 |
+---------|--------------------+
|
v
+---[VIKRAM: Level 2 Agent]---+
| Matrix says: forward 40% |
| Retains: ₹2,400 |
| Forwards: ₹1,600 |
+---------|--------------------+
|
v
+---[HANNIBAL PLATFORM]-------+
| Config: retain 50% |
| Retains: ₹800 |
| Hedges: ₹800 |
+---------|--------------------+
|
v
+---[BETFAIR EXCHANGE]--------+
| Final backstop |
| Receives: ₹800 |
+------------------------------+
What Happens at Each Level
At every level in the chain, the system performs the same sequence:
- Resolve the source_type for this agent -- does this agent have their own classification for the user? If not, do they trust the downstream agent's classification? (See "Does Sharp Classification Travel Upline?" below)
- Resolve the forwarding percentage for this agent (using their matrix with the resolved source_type, overrides, or default)
- Calculate the retained amount = incoming stake x (1 - forward %)
- Check the agent's limits -- can they actually absorb that retained amount?
- If limits allow it: retain the calculated amount, forward the rest
- If limits would be breached: retain only up to the limit, forward the overflow as well
A 4-Level Cascade with Actual Numbers
A more complex scenario: Amit bets ₹50,000 on CSK to win at odds 2.10.
Potential win: ₹55,000. Liability: ₹55,000.
AMIT bets ₹50,000
|
v
RAJESH (L1) - Forward 40%
Wants to retain: ₹30,000 (60%)
Cricket match limit remaining: ₹25,000 <-- LIMIT HIT
Actually retains: ₹25,000
Forwards: ₹25,000 (the intended ₹20,000 + ₹5,000 overflow)
|
v
VIKRAM (L2) - Forward 40%
Receives: ₹25,000
Wants to retain: ₹15,000 (60%)
All limits OK
Actually retains: ₹15,000
Forwards: ₹10,000
|
v
PLATFORM (L3)
Receives: ₹10,000
Retains: ₹5,000
Hedges: ₹5,000
|
v
BETFAIR (L4)
Receives: ₹5,000 hedge order
Notice how Rajesh's overflow (₹5,000 he could not absorb because of his per-match limit) cascaded up to Vikram. Vikram did not "know" this was overflow -- he simply received ₹25,000 instead of ₹20,000 and processed it through his own matrix and limits.
What Happens When a Mid-Tier Agent Hits Their Cap?
If Vikram had also hit his limit in the example above, the overflow would continue cascading to the platform. The system guarantees that every rupee of every bet ends up somewhere. The cascade never drops a bet. It simply moves unclaimed risk upward until it reaches the platform, which is the final agent in the chain.
What Happens When a Parent Is Suspended?
If Vikram is suspended (say, for a payment dispute), his children cannot forward bets to him. In this scenario:
- Rajesh can only retain bets up to his own limits
- Any amount that would normally be forwarded to Vikram is instead forwarded directly to the platform
- The platform absorbs the extra flow or hedges it on Betfair
The system treats a suspended agent as a "skip" in the chain, not a blockage. Punters can still bet. The risk simply routes around the suspended agent.
The Betfair Backstop
After all agents in the hierarchy have taken their share, any remaining exposure reaches the platform. The platform can retain some of this risk, but it always has the option to hedge on Betfair.
Betfair acts as the final backstop. It is the entity of last resort that absorbs whatever risk no agent in the hierarchy was willing or able to keep.
Edge Case: Betfair Is Down
If Betfair's API is unavailable (which happens during peak traffic or maintenance), the platform cannot hedge. In this case:
- The platform absorbs the would-be-hedged amount as retained risk temporarily
- The bet is still accepted (we do not reject punter bets because of a hedge-side issue)
- The hedge order is placed in an async retry queue
- When Betfair comes back online, the hedge is executed
- If the event settles before the hedge is placed, the platform simply bears that risk as if it had been deliberately retained
This is a conscious design decision: punter experience is never degraded by infrastructure issues on the hedge side.
Does Sharp Classification Travel Upline?
This is one of the most important design questions in the entire cascade. When Rajesh tags Amit as a sharp user and forwards 95% of his bet to Vikram, does Vikram know that Amit is sharp?
The answer is: the information travels, but each agent decides independently.
The Problem with Simple Approaches
If sharp info always travels: Rajesh tags Amit as sharp. Vikram's matrix also sees source_type=SHARP and forwards 90%. The platform sees SHARP and forwards 95% to Betfair. Sharp bets rocket through the entire hierarchy in milliseconds. But here is the problem: Rajesh's "sharp" might be Vikram's "normal." Rajesh has loose prices and 200 punters -- anyone who consistently wins against him looks sharp. Vikram, with 2,000 punters and tighter pricing, might profitably retain that exact same flow. Blindly propagating sharp flags means Vikram loses profitable volume because of Rajesh's weaker pricing.
If sharp info never travels: Vikram receives forwarded flow from Rajesh and treats it all as NORMAL. But some of that flow is genuinely toxic -- sharp syndicate members that Rajesh correctly identified. Vikram unknowingly retains it, and his P&L suffers. This is exactly the bookie's nightmare scenario: "rogue agent dumping toxic flow upline."
The Hybrid Design
Each forwarded bet carries metadata about the originating agent's classification, but each upline agent makes their own independent decision about how to treat it.
What travels with the bet:
| Metadata Field | Description | Example |
|---|---|---|
originating_user_id | The punter who placed the bet | user_amit_4521 |
originating_agent_id | The first agent in the chain | rajesh_mumbai |
downstream_classification | What the originating agent classified the user as | SHARP |
forwarding_reason | Why it was forwarded at each level | SHARP_USER, MATRIX_RULE, CAPACITY_BREACH |
How each upline agent uses this information:
The resolution order at each upline level is:
-
Own classification wins first. If Vikram has independently tagged Amit as SHARP (or NORMAL, or VIP), that classification is used regardless of what Rajesh thinks. Vikram's own data is the most relevant to Vikram's book.
-
Configurable trust in downstream flags. Each agent can configure per sub-agent whether to trust their sharp classifications. Vikram might set
trust_downstream_flags: truefor Rajesh (whose sharp detection he trusts) butfalsefor a newer sub-agent whose judgment he has not validated. -
Default to NORMAL if no other signal. If the upline has no opinion and does not trust the downstream flag, the bet is treated as normal flow. This is the safe default for the upline's book -- they apply their standard matrix rules.
Real-Life Example: The Same Bet, Three Different Outcomes
Setup: Amit is tagged as SHARP by Rajesh. Amit bets ₹20,000 on MI to win at 1.85. Rajesh forwards 95% (₹19,000) to Vikram.
Scenario A -- Vikram trusts Rajesh's flags:
Vikram's config: trust_downstream_flags: true for Rajesh. Vikram's matrix: SHARP source on cricket → forward 80%. So Vikram forwards 80% of ₹19,000 = ₹15,200 to the platform. Vikram retains ₹3,800.
Scenario B -- Vikram has his own classification: Vikram has independently analyzed Amit's betting across all sub-agents and classified him as NORMAL (Amit's edge disappears at Vikram's sharper prices). Vikram's matrix: NORMAL source on cricket pre-match → forward 40%. Vikram retains 60% of ₹19,000 = ₹11,400.
Scenario C -- Vikram ignores downstream flags:
Vikram's config: trust_downstream_flags: false for Rajesh. No own classification for Amit. Amit is treated as NORMAL. Same outcome as Scenario B.
Why This Design Is Correct
Each level of the hierarchy has different information and different risk tolerance. A user who is sharp at the sub-agent level (beating loose prices) may not be sharp at the master agent level (where prices are tighter). A user who looks normal to a sub-agent might be part of a syndicate that only the master agent can see (because the master agent has cross-agent visibility).
The audit trail records everything. For every forwarded bet, the audit record shows: what the originating agent classified the user as, what the upline agent's resolution was, and why. When Vikram asks "why did I retain a sharp user's bet?", the answer is clear: "Your config ignores downstream flags from Rajesh, your own detection had not flagged this user, so the bet was treated as NORMAL."
Cross-agent sharp detection fills the gap. The platform has visibility across ALL agents. If Amit is betting through three different sub-agents under Vikram, the platform-level sharp detection can flag this pattern and push a classification down to Vikram -- independent of what any individual sub-agent thinks. This is why Section 17's Phase 3 includes "cross-agent sharp detection" as a key deliverable.
Configuration Per Sub-Agent
Each agent configures trust settings per sub-agent:
| Sub-Agent | trust_downstream_flags | Reason |
|---|---|---|
| Rajesh (Mumbai) | true | Experienced, reliable sharp detection, 8 years track record |
| Arun (Bangalore) | false | New sub-agent, unproven detection, only 3 months on platform |
| Sanjay (Chennai) | true | Good track record, conservative flagging |
This means Vikram can gradually extend trust as sub-agents prove their detection quality -- much like how the real-world agent relationship works. You trust experienced partners more than new ones.
6. Agent Liability Limits
The Limit Structure
Every agent can configure limits at multiple levels of granularity. The purpose of limits is to ensure an agent never accidentally takes on more risk than their bankroll can support.
| Limit Type | Scope | Example |
|---|---|---|
| Sport Limit | Total liability across all events in a sport | "I can handle ₹50 lakh total cricket exposure" |
| Market Limit | Total liability on a specific event or market | "No more than ₹5 lakh on any single IPL match" |
| Night Period Limit | Total liability accumulated during the night window | "Cap my night session at ₹10 lakh" |
| Weekly Period Limit | Total liability accumulated during the weekly cycle | "Cap my weekly exposure at ₹1 crore" |
Real Example: Rajesh's Limit Configuration
Rajesh sets up the following limits for cricket:
RAJESH'S CRICKET LIMITS
========================
Sport-Level Limit (Cricket): ₹50,00,000 (₹50 lakh)
|
+-- Per-Match Limit: ₹5,00,000 (₹5 lakh per individual match)
|
+-- Night Period Limit: ₹10,00,000 (₹10 lakh between 7pm-2am IST)
|
+-- Weekly Period Limit: ₹40,00,000 (₹40 lakh Monday-Sunday)
How Limits Interact: The Most Restrictive Wins
When a bet arrives, all applicable limits are checked simultaneously. The most restrictive limit determines how much the agent can retain.
Example: It is Thursday night during IPL week. Rajesh's current state:
| Limit | Capacity | Used | Remaining |
|---|---|---|---|
| Cricket Sport Limit | ₹50,00,000 | ₹38,00,000 | ₹12,00,000 |
| MI vs CSK Match Limit | ₹5,00,000 | ₹4,50,000 | ₹50,000 |
| Night Period Limit | ₹10,00,000 | ₹9,20,000 | ₹80,000 |
| Weekly Period Limit | ₹40,00,000 | ₹35,00,000 | ₹5,00,000 |
A new bet wants to add ₹1,00,000 of retained liability. Looking at the remaining capacity:
- Sport: ₹12,00,000 available -- sufficient
- Match: ₹50,000 available -- NOT sufficient
- Night: ₹80,000 available -- NOT sufficient
- Weekly: ₹5,00,000 available -- sufficient
The most restrictive limit is the match limit at ₹50,000. So Rajesh can only retain ₹50,000 of the ₹1,00,000. The remaining ₹50,000 overflows and cascades upward to Vikram.
Limit Hierarchy Table
| Priority | Limit | Checked When | Resets |
|---|---|---|---|
| 1 (most restrictive wins) | Per-Match Limit | Every bet on that specific match | When match settles |
| 2 | Night Period Limit | Every bet during night window | At night period end |
| 3 | Weekly Period Limit | Every bet during the week | Monday start of day |
| 4 | Sport Limit | Every bet in that sport | Rolling / manual reset |
7. User Win Limits & Stake Reduction
Per-Click Win Limit
The per-click win limit caps the maximum amount a punter can win on a single bet. This protects agents from large individual payouts.
How it works: If a punter bets at high odds, the potential win could be enormous. The per-click win limit ensures that no single bet can produce a payout above the configured threshold.
Example: Amit has a per-click win limit of ₹50,000.
| Bet | Odds | Stake | Potential Win | Within Limit? |
|---|---|---|---|---|
| MI to win | 1.85 | ₹10,000 | ₹8,500 | Yes |
| Kohli top bat | 5.00 | ₹15,000 | ₹60,000 | No -- exceeds ₹50,000 |
| Fancy: over 180 runs | 50.00 | ₹5,000 | ₹2,45,000 | No -- far exceeds |
Aggregate Win Limit
The aggregate win limit caps the total cumulative potential wins a punter can accumulate over a configurable period (typically daily). This protects against a punter placing many winning bets that individually pass the per-click limit but collectively create enormous exposure.
Example: Amit has a daily aggregate win limit of ₹2,00,000.
He has already placed bets today with a total potential win of ₹1,85,000. His next bet has a potential win of ₹25,000. Since ₹1,85,000 + ₹25,000 = ₹2,10,000 which exceeds ₹2,00,000, the bet must be reduced or rejected.
How Stake Reduction Works
When a bet exceeds a win limit, the system does not reject it outright. Instead, it reduces the stake to the maximum amount that keeps the potential win within the limit. This is better for the punter (they still get to bet) and better for the agent (they still get action).
Real Example: High Odds Scenario
Sonia wants to bet ₹5,000 on a fancy market at odds of 50.00. Her per-click win limit is ₹50,000.
| Step | Calculation |
|---|---|
| Potential win at full stake | ₹5,000 x (50.00 - 1) = ₹2,45,000 |
| Exceeds per-click limit? | Yes: ₹2,45,000 > ₹50,000 |
| Maximum allowable win | ₹50,000 |
| Reduced stake | ₹50,000 / (50.00 - 1) = ₹50,000 / 49 = ₹1,020 (rounded down) |
| Verify | ₹1,020 x 49 = ₹49,980 which is under ₹50,000 |
What the Punter Sees
The punter sees a message like:
Maximum stake at these odds: ₹1,020
The message is transparent about the cap (the punter knows their stake was limited) but opaque about the reason (we do not say "your win limit is ₹50,000" because that reveals the agent's risk configuration).
Below Minimum Stake
If the reduced stake falls below the minimum allowed bet size (say, ₹100), the bet is rejected entirely with a clear message:
This market is currently unavailable at these odds.
This avoids the absurdity of accepting a ₹3 bet.
Sharp User Detection Signals
Agents need to identify "sharp" users -- punters who consistently beat the closing line and generate long-term losses for the bookie. Sharp detection informs the source_type dimension of the forwarding matrix.
The key signals that indicate a user may be sharp:
| Signal | What It Means | Why It Matters |
|---|---|---|
| Closing Line Value (CLV) | The user consistently bets at prices better than where the market closes | This is the single strongest predictor of long-term profitability. A user with positive CLV over 500+ bets is almost certainly sharp. |
| Consistent staking | Same stake size regardless of odds or confidence | Recreational punters vary stakes; professionals use flat staking to disguise their edge |
| Early betting | Regularly bets within the first hour of a market opening | Early markets are softest; sharp users exploit them before prices adjust |
| Unpopular markets | Frequently bets on obscure leagues, low-tier events | These markets have the weakest pricing and the most exploitable edges |
| No mean reversion | Profits do not revert to average over time | Lucky punters revert; skilled punters sustain their edge |
| Rapid market movement after their bet | Price moves sharply in their direction after they bet | Indicates they are consistently on the right side of information |
8. NO_NEW_RISK Mode
What It Is
NO_NEW_RISK is a protective mode that activates when an agent's retained liability reaches their configured cap for a given scope (sport, market, or period). When active, the agent cannot take on any new risk-increasing exposure, but hedge bets are still accepted.
Think of it like a credit card limit. Once you hit your limit, you cannot make new purchases, but you can still make payments (which reduce what you owe).
What Triggers It
NO_NEW_RISK is triggered automatically when:
Agent's retained open liability >= configured limit for that scope
For example, if Rajesh's cricket night limit is ₹10,00,000 and his current retained cricket night liability is ₹10,00,000, he enters NO_NEW_RISK for cricket during the night period.
The Scope Is Granular
NO_NEW_RISK is not a blanket shutdown. It is scoped per sport and per market:
| Scenario | Cricket Status | Tennis Status | Football Status |
|---|---|---|---|
| Rajesh hits cricket limit | NO_NEW_RISK | Normal | Normal |
| Rajesh hits MI vs CSK match limit | NO_NEW_RISK (this match only) | Normal | Normal |
| Rajesh hits night period limit | NO_NEW_RISK (all sports in night) | NO_NEW_RISK | NO_NEW_RISK |
How Hedge Detection Works
The critical question in NO_NEW_RISK mode is: "Does this bet reduce or increase the agent's worst-case liability?" A hedge bet reduces liability and should be accepted even when the agent is at their limit.
The rule is simple:
If WorstCaseLiability AFTER the bet < WorstCaseLiability BEFORE the bet, it is a hedge.
Real Examples of Hedge vs Non-Hedge Bets
Scenario: Rajesh is in NO_NEW_RISK for the MI vs CSK match. He currently has ₹5,00,000 of retained liability backing MI to win.
| Incoming Bet | Effect on Liability | Hedge? | Accepted? |
|---|---|---|---|
| ₹2,00,000 more on MI to win | Liability increases to ₹7,00,000 | No -- increases worst case | Rejected (forwarded 100% to upline) |
| ₹3,00,000 on CSK to win | Liability decreases because it offsets the MI position | Yes -- reduces worst case | Accepted |
| ₹1,00,000 on Draw | Partially reduces MI exposure depending on odds | Depends -- compute the actual worst case | Accepted if worst case decreases |
Key insight: A bet on the opposite outcome of an existing position is almost always a hedge. A bet on the same outcome is never a hedge. A bet on a third outcome (like a draw) may or may not be a hedge depending on the amounts and odds.
How the Agent Exits NO_NEW_RISK
There are three ways out of NO_NEW_RISK:
-
Settlements reduce exposure. When a match settles, the liability associated with that match is removed. If this brings the agent back below their limit, NO_NEW_RISK is lifted.
-
Hedge bets reduce exposure. Accepting opposite-side bets reduces worst-case liability. Enough hedges can bring the agent below the limit.
-
Admin raises the limit. If Rajesh calls his platform admin and says "I'm comfortable taking more cricket risk this week," the admin can raise his limit. This immediately lifts NO_NEW_RISK if the current liability is now below the new limit.
9. Period Definitions -- Night & Weekly
Why Bookies Use Periods
Bookies do not think in terms of "total lifetime exposure." They think in operational windows:
- The night session is when most live betting happens (evening matches in India, late-night football in Europe). The risk profile during a night session is very different from a quiet afternoon.
- The weekly cycle aligns with settlement cycles. Most agents settle weekly. They need to know their maximum weekly exposure.
Periods provide a way to set separate limits for separate time windows, which mirrors how bookies actually operate.
Night Period
The night period is a configurable time window per agent. It typically covers the peak betting hours for that agent's primary market.
| Agent | Timezone | Night Period | Why |
|---|---|---|---|
| Rajesh (India, cricket) | IST (UTC+5:30) | 7:00 PM - 2:00 AM | IPL matches start at 7:30 PM |
| Priya (India, football) | IST (UTC+5:30) | 10:00 PM - 4:00 AM | Premier League matches start at 12:30 AM IST |
| Kwame (Ghana, football) | GMT | 2:00 PM - 11:00 PM | Afternoon and evening matches |
Weekly Period
The weekly period is a Monday-to-Sunday cycle (configurable start day per agent). At the start of each week, the weekly exposure counter resets to zero.
Timezone Handling
Each agent operates in their own timezone. This is critical because:
- Rajesh's "night" starts at 7:00 PM IST, which is 1:30 PM UTC
- The system stores all times in UTC internally but converts to the agent's local timezone when evaluating period boundaries
- A bet placed at 1:45 PM UTC is "night" for Rajesh but "afternoon" for a UK-based agent
The Period Rollover Problem
What happens when a live match spans a period boundary? Consider:
- MI vs CSK starts at 7:30 PM IST (within Rajesh's night period)
- The match runs late and extends past 2:00 AM IST (Rajesh's night period end)
- Rajesh has ₹8,00,000 of retained liability on this match at 1:55 AM
The design choice: carry-forward exposure.
When a period ends, open exposure from events that are still live is carried forward into the new period. This means:
- Rajesh's ₹8,00,000 is NOT magically zeroed out at 2:00 AM
- Instead, it carries forward as a starting balance for the next period (or the "day" period if night has ended)
- New bets after 2:00 AM are counted against the day period limits
- The carried-forward amount from the night period continues to count against the sport-level limit
The alternative -- a clean reset that ignores ongoing exposure -- would be dangerous. An agent could circumvent limits by waiting for a period boundary.
The DST Edge Case
Daylight Saving Time creates a subtle problem. When clocks spring forward, a night period configured as 7 PM to 2 AM suddenly becomes 7 hours long in UTC instead of 7 hours. When clocks fall back, it becomes 8 hours long.
The solution: Period boundaries are defined in the agent's local time, and the system converts them to UTC fresh each day, accounting for DST. A "7 PM to 2 AM" night period always means 7 PM to 2 AM in the agent's local clock, regardless of DST transitions.
On the actual transition day:
- Spring forward (clocks skip 2 AM to 3 AM): The night period is effectively 1 hour shorter. This is acceptable; it is a conservative outcome (less time to accumulate risk).
- Fall back (clocks repeat 1 AM to 2 AM): The night period is effectively 1 hour longer. The system uses the first occurrence of the repeated hour as the boundary.
10. Exposure Accounting
Three Ledgers Per Agent Per Scope
For every agent, at every scope level (sport, market, period), the system maintains three numbers:
| Ledger | What It Tracks | Updated When |
|---|---|---|
| retained_open_liability | The total worst-case payout the agent faces on retained bets | Every bet placement and settlement |
| forwarded_open_liability | The total liability the agent has forwarded upward | Every bet placement and settlement |
| open_potential_win | The total amount punters stand to win against this agent | Every bet placement and settlement |
These three numbers tell you everything about an agent's current risk position:
- retained_open_liability is what the agent will pay if everything goes wrong. This is the number checked against limits.
- forwarded_open_liability is what the agent's upline will pay. The agent has no financial exposure here.
- open_potential_win is the punter-side view -- what the agent's punters could collectively win.
How These Update Atomically
When a bet is placed, all ledger updates across all affected agents happen in a single atomic transaction. There is no moment where Rajesh's ledger is updated but Vikram's is not. This prevents inconsistent states where the numbers do not add up.
The Redis Fast-Path Optimization
Most bet processing involves reading the current exposure to check limits. Only a minority of bets actually push an agent close to their limit.
The optimization:
+------------------+ +------------------+ +------------------+
| APPLICATION | | REDIS | | POSTGRESQL |
| (LRU Cache) | | (Fast Read) | | (Source of |
| | | | | Truth) |
| - 5-second TTL | | - Sub-ms reads | | - Atomic writes |
| - ~60% hit rate | | - ~25% hit rate | | - FOR UPDATE |
| - Zero latency | | - <1ms latency | | locking |
+--------+---------+ +--------+---------+ +--------+---------+
| | |
v v v
"Rajesh has ₹12L "Rajesh has ₹12L "Rajesh has exactly
used of ₹50L -- used of ₹50L -- ₹12,05,100 used --
clearly within clearly within UPDATE with lock"
limit, fast pass" limit, fast pass"
How it works:
-
Application LRU cache (Tier 1): An in-memory cache with a short TTL (5 seconds). If a bet arrives and the cached exposure shows the agent is far from their limit (say, 60% utilized), we do not need to check further. The bet will pass the limit check. This handles the vast majority of bets.
-
Redis (Tier 2): For bets where the LRU cache has expired or the agent is approaching their limit, read from Redis. Redis values are updated on every write but are eventually consistent. Still very fast (sub-millisecond).
-
PostgreSQL with FOR UPDATE (Tier 3): For bets where the agent is near their limit (say, >80% utilized), we take a pessimistic lock in PostgreSQL. This ensures that two simultaneous bets cannot both claim the last ₹50,000 of capacity. This is the slowest path but only applies to a small percentage of bets.
Why this matters: During an IPL match, Rajesh might receive 50 bets per minute. For 40 of those, the LRU cache can immediately confirm he is within limits. For 8 more, Redis provides the answer. Only 2 bets (the ones near his limit) need the PostgreSQL lock. This keeps median latency low while guaranteeing correctness at the boundary.
Settlement Impact
When a match settles, the exposure associated with that match is removed from all agents in the chain:
- Retained positions: The settled amount is removed from retained_open_liability. If the agent won (punter lost), the agent profits. If the agent lost (punter won), the agent pays out.
- Forwarded positions: The settled amount is removed from forwarded_open_liability. The upline agent handles payout for forwarded positions.
- Potential win: The settled amount is removed from open_potential_win.
Ledger Updates for a Single Bet: 3-Level Cascade
Amit bets ₹10,000 on MI at odds 1.85. Liability per unit of stake: ₹0.85.
BEFORE THE BET:
================================================================
Retained Forwarded Potential
Agent Liability Liability Win
----------------------------------------------------------------
Rajesh ₹12,00,000 ₹8,00,000 ₹20,00,000
Vikram ₹25,00,000 ₹10,00,000 ₹35,00,000
Platform ₹5,00,000 ₹2,00,000 ₹7,00,000
================================================================
BET PROCESSING:
================================================================
Rajesh retains 60% = ₹6,000 stake → ₹5,100 liability
Rajesh forwards 40% = ₹4,000 stake → ₹3,400 liability
Vikram retains 60% of ₹4,000 = ₹2,400 stake → ₹2,040 liability
Vikram forwards 40% of ₹4,000 = ₹1,600 stake → ₹1,360 liability
Platform retains 50% of ₹1,600 = ₹800 stake → ₹680 liability
Platform hedges 50% of ₹1,600 = ₹800 stake → ₹680 forwarded
================================================================
AFTER THE BET:
================================================================
Retained Forwarded Potential
Agent Liability Liability Win
----------------------------------------------------------------
Rajesh ₹12,05,100 ₹8,03,400 ₹20,08,500
(+₹5,100) (+₹3,400) (+₹8,500)
Vikram ₹25,02,040 ₹10,01,360 ₹35,03,400
(+₹2,040) (+₹1,360) (+₹3,400)
Platform ₹5,00,680 ₹2,00,680 ₹7,01,360
(+₹680) (+₹680) (+₹1,360)
================================================================
11. Audit & Determinism
The Audit Record
Every bet produces a complete audit record. This is not a log file that can be grepped through later. It is a structured, queryable record that captures every decision the system made.
An audit record contains:
| Field | Description | Example |
|---|---|---|
| bet_id | Unique identifier for the bet | bet_a1b2c3d4 |
| original_stake | What the punter requested | ₹10,000 |
| adjusted_stake | What was actually accepted (after stake reduction, if any) | ₹10,000 |
| stake_reduction_reason | Why the stake was reduced, if it was | null (no reduction) |
| per_click_win_cap_check | Result of the per-click check | PASS: ₹8,500 < ₹50,000 |
| aggregate_win_cap_check | Result of the aggregate check | PASS: ₹53,500 < ₹2,00,000 |
| forwarding_chain | Complete routing at each level | See below |
| matrix_rules_evaluated | Which rules were checked and which won | R3 matched (specificity 3), R5 evaluated (specificity 2), R8 evaluated (specificity 0) |
| limit_checks | Every limit checked at every level | See below |
| hedge_detection | Whether NO_NEW_RISK was active, whether the bet was a hedge | NOT_IN_NO_NEW_RISK |
| period_context | Which period the bet fell in | NIGHT (19:00-02:00 IST), Week 7 of 2026 |
| timestamps | When each step occurred | matrix_resolve: 2ms, cap_check: 5ms, execution: 12ms, total: 23ms |
The forwarding chain in detail:
Level 1: Rajesh
- Incoming stake: ₹10,000
- Forward % source: MATRIX (Rule R3)
- Forward %: 40%
- Retained stake: ₹6,000
- Retained liability: ₹5,100
- Limit checks:
- Cricket sport: ₹12,05,100 / ₹50,00,000 (24.1%) PASS
- MI vs CSK match: ₹1,25,100 / ₹5,00,000 (25.0%) PASS
- Night period: ₹3,05,100 / ₹10,00,000 (30.5%) PASS
- Forwarded stake: ₹4,000
- Overflow: ₹0
Level 2: Vikram
- Incoming stake: ₹4,000
- Forward % source: MATRIX (Rule V2)
- Forward %: 40%
- Retained stake: ₹2,400
- Retained liability: ₹2,040
- Limit checks: [similar detail]
- Forwarded stake: ₹1,600
- Overflow: ₹0
Level 3: Platform
- Incoming stake: ₹1,600
- Retained: ₹800
- Hedged on Betfair: ₹800
- Betfair order ID: bf_xyz789
Why Determinism Matters
Agents must trust the system. If Rajesh sees a bet routed in a way he does not understand, he will lose confidence and revert to manual processes. The audit trail lets him see exactly why every decision was made.
Disputes must be resolvable. When Rajesh and Vikram disagree about who owes what at settlement time, the system has an indisputable record of exactly how every bet was split.
Regulators may require it. In jurisdictions moving toward regulation, a complete audit trail is a compliance necessity.
Configuration Change Log
All configuration changes are recorded using event sourcing:
- When Rajesh changes his forwarding matrix, the old matrix is preserved and the new one is recorded with a timestamp
- When an admin changes Amit's win limit, the change is logged with who made it and why
- This means you can answer questions like: "What was Rajesh's matrix at 9:47 PM on March 15?" -- by replaying the event log up to that point
Replay Capability
The ultimate test of determinism: given the state at time T, the same bet must produce the same routing.
This means if a dispute arises about a bet placed three weeks ago, we can:
- Load the configuration state as it existed at the time of the bet
- Load the exposure state as it existed at the time of the bet
- Re-run the routing logic
- Produce the exact same result
This is possible because all inputs to the routing decision (matrix, limits, current exposure, user status, market conditions) are captured in the audit record.
12. What the Current Codebase Already Has (and What's Missing)
Based on analysis of the existing Hannibal codebase:
| Feature | Current Status | What Exists | What's Missing |
|---|---|---|---|
| B-Book Config | Partial | bbookConfigService.ts -- basic B-Book configuration per agent | Multi-dimensional forwarding matrix, per-sport/market granularity, wildcard matching |
| B-Book State | Partial | bbookStateService.ts -- tracks basic B-Book state | Per-scope exposure ledgers (sport, market, period), NO_NEW_RISK mode tracking |
| Filter Engine | Partial | filterEngine.ts -- filters bets through B-Book rules | 5-dimension matrix resolution, specificity-based tie-breaking, precedence chain |
| B-Book Fill | Partial | bbookFillService.ts -- executes B-Book bet placement | Cascading multi-level routing, overflow handling, atomic multi-agent ledger updates |
| B-Book Settlement | Partial | bbookSettlementService.ts -- settles B-Book positions | Multi-level settlement cascade, exposure ledger rollback, period-aware settlement |
| Sharp Detection | Partial | sharpUserService.ts -- identifies sharp users | CLV calculation, behavioral scoring, integration with forwarding matrix source_type |
| Agent Hierarchy | Exists | agents.ts, agent.ts routes -- agent CRUD and hierarchy | Cascading routing through hierarchy, parent suspension skip logic |
| Agent Accounting | Exists | agentSettlementService.ts, agentSettlementJob.ts -- agent financial settlements | Per-scope liability tracking, real-time exposure counters |
| Agent Monitoring | Exists | agentMonitoringService.ts -- agent activity monitoring | Real-time limit utilization, NO_NEW_RISK alerts, period boundary tracking |
| User Win Limits | Missing | -- | Per-click win cap, aggregate win cap, stake reduction engine |
| Forwarding Matrix | Missing | -- | Full 5D matrix data model, wildcard resolution, matrix CRUD API |
| Per-Agent Limits | Missing | -- | Sport/market/period limit configuration, limit enforcement in bet flow |
| Cascading Routing | Missing | -- | N-level cascade engine, overflow calculation, suspended agent skip |
| NO_NEW_RISK Mode | Missing | -- | Automatic trigger, hedge detection, scope-aware activation |
| Hedge Detection | Missing | -- | Worst-case liability comparison, multi-outcome hedge evaluation |
| Period Management | Missing | -- | Night/weekly period definitions, timezone handling, carry-forward logic |
| Audit Trail | Missing | -- | Structured audit records, event sourcing, replay capability |
| Redis Exposure Cache | Missing | -- | 3-tier caching, fast-path optimization, cache invalidation |
The Key Observation
The existing codebase has the foundation right. The B-Book service exists with config, state, filtering, fill, and settlement modules. The agent hierarchy exists with accounting and monitoring. Sharp detection exists.
What is missing is the connective tissue -- the forwarding matrix that ties bet characteristics to routing decisions, the cascade engine that flows bets through the hierarchy, and the limit enforcement that keeps agents safe. These are the components that transform the existing point-to-point B-Book into a hierarchical risk management system.
13. The Nightmare Scenarios & How We Handle Them
Scenario 1: Syndicate Attack -- Correlated Positions Across Agents
What happens: A betting syndicate places coordinated bets through multiple agents. Each individual agent sees a modest position, but the aggregate platform exposure is enormous. The MI vs CSK match settles and the platform owes ₹2 crore across 15 agents.
How we handle it:
- Cross-agent position aggregation. The platform maintains a real-time view of aggregate exposure per event, summing across all agents. If aggregate exposure on any single outcome exceeds a configurable threshold, an alert fires.
- Correlated account detection. Users who consistently bet the same outcome, at the same time, across different agents are flagged. Signals include: matching IP addresses, similar device fingerprints, correlated timing patterns, and identical stake amounts.
- Platform-level event limits. Independent of individual agent limits, the platform sets a maximum total exposure per event. When this is breached, additional retained positions are blocked platform-wide, and new bets are forwarded to Betfair.
Scenario 2: Data Feed Failure During Live Play
What happens: The odds feed from the data provider (Roanuz, OddsPAPI) goes stale during a live IPL match. The system is showing odds of 1.85 for MI, but in reality, MI just lost 3 quick wickets and the true odds are now 3.50. Smart punters bet on MI at the stale 1.85 price.
How we handle it:
- Stale price detection. If the odds feed has not updated for more than a configurable duration (e.g., 5 seconds for in-play), the market is automatically suspended. No new bets are accepted until the feed resumes.
- Price movement circuit breaker. If the price moves by more than a configurable percentage in a single update (suggesting a missed intermediate update), the market is suspended pending human review.
- Multi-source validation. Where available, cross-reference prices from multiple providers. A price that is significantly different from all other sources is likely stale.
Scenario 3: Rogue Agent Dumping Toxic Flow
What happens: An agent intentionally forwards 100% of sharp/winning flow to their upline or the platform while retaining the losing flow. Over time, the platform notices that all bets forwarded by this agent lose money.
How we handle it:
- Behavioral anomaly detection. Track the P&L (profit and loss) of retained vs forwarded bets per agent. If an agent's forwarded bets consistently lose money while their retained bets consistently win, this is a red flag.
- Forwarding pattern analysis. An agent who suddenly changes their matrix to forward more when they suspect a punter will win is detectable. Configuration changes that correlate with bet outcomes are flagged.
- Automatic escalation. Agents whose forwarded flow exceeds a toxicity threshold are escalated for review. The platform can force a minimum retention percentage.
Scenario 4: Double Settlement / Result Correction
What happens: A cricket match is initially settled as "MI wins," all payouts are processed, and then the result is corrected (perhaps due to a scoring error or a ruling by match officials). All settlements must be reversed and re-processed.
How we handle it:
- Re-settlement workflow. The system supports reversing a settlement and re-applying it with corrected results. This affects all agents in the cascade.
- Ledger reconciliation. After re-settlement, all exposure ledgers are recalculated. Any agent who was paid out incorrectly has the amount clawed back. Any agent who paid out incorrectly receives a credit.
- Communication chain. All affected agents receive automated notifications explaining the re-settlement, with full audit trails showing the before and after.
Scenario 5: System Outage During Peak
What happens: During the IPL final, the system experiences a partial outage. The bet processing service is down for 90 seconds while 10,000 bets are queued up.
How we handle it:
- Circuit breaker pattern. When the system detects degraded performance (response times exceeding 500ms), it switches to a degraded mode where all bets are forwarded 100% to Betfair. No agent retains any risk during the outage. This is the safest possible default.
- Fail-safe to 100% forwarding. If the routing engine cannot determine the correct split (because the forwarding matrix service is down), the bet is still accepted but forwarded entirely. The agent misses out on retention (lost profit opportunity) but is not exposed to unmanaged risk.
- Queue and replay. Bets that arrive during the outage are queued. When the system recovers, they are replayed through the normal routing engine. If the bet was already forwarded 100%, a reconciliation process adjusts the positions retroactively.
14. The Agent Experience -- From Simple to Sophisticated
The Core Insight
Agents are not engineers. Some are seasoned operators who think in matrices and percentages. Others are street-level bookies who have never opened a spreadsheet. But they all share the same two goals: make more money and do not get wiped out overnight.
The system underneath is the same for everyone -- the forwarding matrix, cascading routing, exposure ledgers, hedge detection. What changes is how much of that complexity the agent sees. This is called progressive disclosure: show the simple version by default, reveal the complexity only when the agent asks for it.
The Three Tiers of Experience
┌──────────────────────────────────────────────────────────────────────┐
│ │
│ TIER 1: "SET AND FORGET" For: New agents, small │
│ ───────────────────────── operators, non-technical │
│ 3 questions at setup. │
│ Traffic light dashboard. 80% of agents live here. │
│ WhatsApp/SMS alerts. │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ TIER 2: "DASHBOARD DRIVER" For: Experienced agents, │ │
│ │ ────────────────────────── mid-size operations │ │
│ │ Real-time risk dashboard. │ │
│ │ Per-sport limits. Per-user 15% of agents grow into │ │
│ │ management. One-click hedging. this. │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ │ │ │
│ │ │ TIER 3: "MATRIX MASTER" For: Sophisticated │ │ │
│ │ │ ──────────────────────── operators, quant-minded │ │ │
│ │ │ Full 5D matrix editor. │ │ │
│ │ │ Test bet simulator. 5% of agents. These are │ │ │
│ │ │ Historical P&L analysis. your power users. │ │ │
│ │ │ Custom period configs. │ │ │
│ │ │ │ │ │
│ │ └──────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
The critical design rule: Every tier uses the exact same engine underneath. A Tier 1 agent's "I want to be safe on cricket" translates to the same forwarding matrix, limits, and cascade logic that a Tier 3 agent configures by hand. The difference is who builds the configuration -- the agent or the system.
Tier 1: "Set and Forget" -- The 3-Question Onboarding
When a new agent joins, they answer three questions. That is it. The system generates their entire configuration from these answers.
Question 1: What do you trade?
┌──────────────────────────────────────────────────┐
│ What sports do you want to accept bets on? │
│ │
│ [✓] Cricket │
│ [ ] Football │
│ [ ] Tennis │
│ [ ] Kabaddi │
│ [ ] Other │
│ │
│ Sports you don't select will be automatically │
│ forwarded 100% to your upline. │
└──────────────────────────────────────────────────┘
Question 2: What is your nightly budget?
This is the question that matters most. Framed not as "liability limit" but as a question any bookie understands:
┌──────────────────────────────────────────────────┐
│ What is the MOST you are willing to lose in │
│ a single night? │
│ │
│ Think of your worst night ever. What amount │
│ would you be okay waking up to? │
│ │
│ ₹ [___________] │
│ │
│ Examples: │
│ Small operation: ₹2,00,000 (₹2 lakh) │
│ Medium operation: ₹10,00,000 (₹10 lakh) │
│ Large operation: ₹50,00,000 (₹50 lakh) │
└──────────────────────────────────────────────────┘
Question 3: How aggressive do you want to be?
┌──────────────────────────────────────────────────┐
│ How much risk do you want to keep? │
│ │
│ SAFE BALANCED AGGRESSIVE │
│ ◉────────────────────────────────────○ │
│ │
│ SAFE: Keep 30%, forward 70% │
│ "I sleep well, smaller profits" │
│ │
│ BALANCED: Keep 60%, forward 40% │
│ "Good mix of profit and safety" │
│ │
│ AGGRESSIVE: Keep 85%, forward 15% │
│ "Maximum profit, I can handle │
│ the swings" │
└──────────────────────────────────────────────────┘
What happens behind the scenes: From these three answers, the system generates:
| Agent Answer | System Generates |
|---|---|
| "Cricket only" | Forwarding matrix: Cricket = agent's retention %, all other sports = 100% forward |
| "₹10 lakh max loss" | Night limit: ₹10L, Weekly limit: ₹50L (5x night), Per-match limit: ₹2L (night/5) |
| "Balanced" slider | Default forwarding: 40%. In-play: 55% forward (more cautious). Fancy markets: 70% forward. Sharp users: 95% forward. Pre-match match odds: 30% forward |
| (Automatic) | User win limits: Per-click ₹50,000, Aggregate ₹2,00,000/day. These are safe defaults. |
The agent never sees the words "forwarding matrix," "liability limit," or "exposure ledger." They answered three questions and the system is configured.
The "Sleep Well" Number
Every agent, regardless of tier, sees one number prominently on their home screen:
┌──────────────────────────────────────────────────┐
│ │
│ YOUR MAXIMUM LOSS TONIGHT │
│ │
│ ₹ 3,42,000 │
│ │
│ out of your ₹10,00,000 night budget │
│ │
│ ████████░░░░░░░░░░░░ 34% │
│ │
│ If every live bet goes against you tonight, │
│ this is the absolute most you will lose. │
│ The system guarantees this. │
│ │
└──────────────────────────────────────────────────┘
How it is calculated: Sum of worst-case-liability across all retained open positions for this agent. This is not an estimate -- it is a mathematical guarantee. The cascade routing, limits, and NO_NEW_RISK mode ensure that this number can never exceed the agent's configured budget.
Why this matters: An agent who sees "₹3.42 lakh out of ₹10 lakh" knows they are safe. They can watch the match, enjoy the action, and not worry. When this number approaches their budget, the system automatically protects them (NO_NEW_RISK kicks in, overflow cascades to upline). The agent does not need to do anything.
For the really simple agent: This one number, delivered via WhatsApp at 10 PM every night, might be all they ever need:
"Tonight's update: Your maximum possible loss is ₹3.42L (34% of your ₹10L budget). 127 bets placed. Everything running smoothly."
Tier 1 Dashboard: The Traffic Light View
For agents who do not want numbers and charts, a traffic light is enough:
┌──────────────────────────────────────────────────┐
│ │
│ TONIGHT'S STATUS │
│ │
│ Cricket 🟢 All good. Well within limits. │
│ Football ⚪ Not active tonight. │
│ Tennis ⚪ Forwarding 100% (your choice). │
│ │
│ OVERALL 🟢 Safe. ₹3.4L / ₹10L used. │
│ │
│ LAST HOUR │
│ 42 bets accepted │
│ Estimated profit so far: +₹18,000 │
│ │
│ ⚠ 1 alert: Rahul's stake was reduced │
│ (he's close to his win limit) │
│ │
│ [View Details] [Panic: Stop Everything] │
│ │
└──────────────────────────────────────────────────┘
The traffic light meanings:
| Color | Meaning | Agent Action Required |
|---|---|---|
| 🟢 Green | Below 60% of all limits | None. Relax. |
| 🟡 Yellow | Between 60-85% of any limit | Be aware. System is still accepting bets but approaching limits. |
| 🔴 Red | Above 85% of any limit, or NO_NEW_RISK is active | System is protecting you. New risk bets are being forwarded. Hedges still accepted. |
| ⚪ Grey | Sport not active or fully forwarded | Nothing happening here. |
The key design insight: A Tier 1 agent never needs to leave this screen. The system runs itself. The traffic light tells them if they should worry. The "Sleep Well" number tells them their maximum downside. The panic button is there if they ever feel nervous.
Tier 2 Dashboard: The Risk Cockpit
When an agent is ready for more detail, they tap "View Details" and enter the full dashboard. This is for agents who want to actively manage their book during a match.
Real-Time Risk Dashboard
+============================================================================+
| RAJESH'S DASHBOARD Feb 11, 2026 9:47 PM|
+============================================================================+
| |
| EXPOSURE SUMMARY |
| ┌─────────────────────────────────────────────────────────────────────┐ |
| │ Cricket ██████████████████████░░░░░░░░ ₹38.2L / ₹50L (76%) │ |
| │ Football ████░░░░░░░░░░░░░░░░░░░░░░░░░ ₹4.1L / ₹20L (21%) │ |
| │ Tennis ██░░░░░░░░░░░░░░░░░░░░░░░░░░░ ₹1.2L / ₹10L (12%) │ |
| └─────────────────────────────────────────────────────────────────────┘ |
| |
| TONIGHT'S SESSION (7:00 PM - 2:00 AM IST) |
| ┌─────────────────────────────────────────────────────────────────────┐ |
| │ Night Limit █████████████████████████░░░░ ₹8.4L / ₹10L (84%) │ |
| │ ⚠ APPROACHING LIMIT - 16% remaining │ |
| │ At current rate, limit reached in ~25 minutes │ |
| └─────────────────────────────────────────────────────────────────────┘ |
| |
| TOP MATCHES BY EXPOSURE |
| ┌──────────────────────────────────────────────────────────────────┐ |
| │ 1. MI vs CSK (Live) ₹4.8L retained ₹5L limit (96%) │ |
| │ ⚠ NEAR LIMIT - Will enter NO_NEW_RISK at ₹5L │ |
| │ │ |
| │ 2. RCB vs DC (Pre) ₹2.1L retained ₹5L limit (42%) │ |
| │ │ |
| │ 3. KKR vs SRH (Pre) ₹1.5L retained ₹5L limit (30%) │ |
| └──────────────────────────────────────────────────────────────────┘ |
| |
| RECENT BETS (last 10 minutes) |
| ┌──────────────────────────────────────────────────────────────────┐ |
| │ 9:45 PM Amit MI to win ₹10,000 Retained 60% ✓ │ |
| │ 9:43 PM Sonia CSK +1.5 ₹25,000 Retained 40% ✓ │ |
| │ 9:41 PM Rahul Fancy 180+ ₹50,000 Reduced to ₹12,000 ⚠ │ |
| │ 9:38 PM Deepa MI to win ₹8,000 Retained 60% ✓ │ |
| └──────────────────────────────────────────────────────────────────┘ |
| |
| [🔴 PANIC: Hedge All] [Adjust Limits] [View Full Audit] |
| |
+============================================================================+
Alert Priority Levels
| Priority | Delivery | Trigger | Example |
|---|---|---|---|
| P1 - Critical | SMS + Push notification + Dashboard | Limit breached, NO_NEW_RISK activated, suspected fraud | "Your cricket night limit has been reached. NO_NEW_RISK is now active." |
| P2 - Warning | Push notification + Dashboard | Approaching limit (>80%), unusual betting pattern, sharp user detected | "MI vs CSK match exposure is at 96% of limit." |
| P3 - Informational | Dashboard only | Period rollover, settlement complete, configuration change | "Night period ended. Carried forward ₹3.2L to day period." |
Key Reports
| Report | Frequency | What It Shows |
|---|---|---|
| Daily P&L | Daily at period end | Profit/loss by sport, market, and user tier. Which bets made money, which lost money. |
| Weekly Settlement | Weekly on Monday | Net positions with upline, amounts owed/receivable, forwarded vs retained breakdown. |
| Sharp User Report | Weekly | Users flagged as sharp, their CLV scores, recommended actions. |
| Matrix Effectiveness | On demand | How well the forwarding matrix performed -- did high-retention bets profit or lose? |
| Limit Utilization | On demand | How close the agent came to each limit, how often NO_NEW_RISK was triggered, average duration. |
The Panic Button
The dashboard includes a "Hedge All" button that, when pressed:
- Immediately sets the agent's forwarding to 100% for all sports and markets
- Places hedge orders on Betfair for all current retained positions (where exchange markets exist)
- Sends a notification to the agent's upline
- Logs the action with a timestamp and reason
This is the emergency exit. If an agent sees something alarming -- a suspicious pattern, a sudden exposure spike, or just gets nervous -- one button brings their risk to near zero. They can then calmly assess the situation and adjust.
Tier 3 Dashboard: The Matrix Master
For the 5% of agents who want full control, the system exposes everything -- but only when they explicitly navigate to it. This is never the default view.
The Matrix Editor:
Instead of a raw spreadsheet, the matrix editor uses a guided builder with immediate feedback:
┌──────────────────────────────────────────────────────────────────────┐
│ FORWARDING MATRIX EDITOR │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Rule 7 of 12 │ │
│ │ │ │
│ │ WHEN a bet matches ALL of these: │ │
│ │ Sport: [Cricket ▼] │ │
│ │ Market: [Fancy ▼] │ │
│ │ Phase: [In-Play ▼] │ │
│ │ User Type: [Any ▼] │ │
│ │ Liquidity: [Any ▼] │ │
│ │ │ │
│ │ THEN: │ │
│ │ Keep [30]% ◉──────────○ Forward [70]% │ │
│ │ │ │
│ │ LAST WEEK: This rule matched 234 bets (₹18.4L volume) │ │
│ │ RESULT: You would have profited ₹1.2L on retained portion │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ [+ Add Rule] [Test a Bet] [View Conflicts] │
│ │
└──────────────────────────────────────────────────────────────────────┘
The Test Bet Simulator:
The single most important tool for a Tier 3 agent. Before any real money flows, they can test:
┌──────────────────────────────────────────────────────────────────────┐
│ TEST A BET │
│ │
│ Sport: Cricket Market: Fancy (Session Runs) │
│ Phase: In-Play User: Amit (NORMAL) │
│ Odds: 2.50 Stake: ₹25,000 │
│ │
│ [Run Test] │
│ │
│ RESULT: │
│ ───────────────────────────────────────────────────── │
│ Step 1: User win cap check │
│ Potential win: ₹37,500. Limit: ₹50,000. PASS │
│ │
│ Step 2: Forwarding resolution │
│ Rule 7 matched (Cricket + Fancy + In-Play) │
│ Specificity: 3/5. No higher-priority rule found. │
│ Forward: 70%. Retain: 30%. │
│ │
│ Step 3: Your retention │
│ Retained stake: ₹7,500 │
│ Retained liability: ₹11,250 │
│ Your cricket limit after: ₹18.7L / ₹50L (37%) │
│ │
│ Step 4: Cascade to upline (Vikram) │
│ Forwarded: ₹17,500 → Vikram retains 60% → Platform → BF │
│ │
│ This is exactly what would happen with a real bet. │
└──────────────────────────────────────────────────────────────────────┘
Graduating Between Tiers
Agents are never locked into a tier. The system nudges them upward when they are ready.
Tier 1 → Tier 2 nudge: After 2 weeks of operation, if the agent's dashboard shows they are consistently hitting limits or forwarding more than they want, the system suggests: "You are forwarding 45% of cricket bets. Want to adjust per-sport settings? [Show me how]"
Tier 2 → Tier 3 nudge: After the agent has manually adjusted limits 5+ times, the system suggests: "You keep changing your cricket in-play settings. A forwarding matrix rule could automate this. [Set up a rule]"
Tier 3 → Tier 1 fallback: If a Tier 3 agent creates a matrix that is clearly dangerous (e.g., 100% retention on in-play fancies), the system warns: "This configuration would have lost ₹8.2L last month. Are you sure? [Keep my settings] [Switch to Balanced preset]"
Preset Profiles: One-Click Configuration
For agents who want more control than 3 questions but less than a full matrix:
| Preset | What It Does | Who It Is For |
|---|---|---|
| Conservative Cricket | 30% retention on match odds, 15% on fancies, 0% on in-play fancies. Night limit = budget x 0.5. Sharp users = 100% forward. | New agents, small bankroll, risk-averse |
| Balanced Cricket | 60% retention on match odds, 30% on fancies, 20% on in-play fancies. Night limit = budget. | Experienced agents, moderate bankroll |
| Aggressive IPL | 80% retention on match odds, 50% on fancies, 30% on in-play fancies. Night limit = budget x 1.5 (with weekly safety net). | Large bankroll, IPL specialists |
| Football Only | 60% retention Premier League, 30% lower leagues, 0% tennis/cricket. | Football-focused agents |
| Forward Everything | 0% retention across all sports. Agent earns commission on volume only. | Agents who want zero risk, commission-only model |
Each preset is a fully configured forwarding matrix + limits + user win caps. The agent can select a preset, see exactly what it configures, and customize individual values if they want. The preset is the starting point, not a cage.
WhatsApp and SMS: Meeting Agents Where They Are
Many agents in India and Southeast Asia run their operations primarily through WhatsApp. A dashboard they never open is useless. The system must push critical information to them through channels they already use.
Scheduled Messages:
| Time | Message |
|---|---|
| Start of night session | "Good evening. Your night budget: ₹10L. Current exposure: ₹0. System is ready." |
| Every 2 hours during session | "Update: ₹3.4L used (34%). 89 bets tonight. Estimated profit: +₹24,000. All green." |
| When yellow threshold hit | "Heads up: Cricket exposure at 72% of limit. System still accepting bets. No action needed unless you want to adjust." |
| When red threshold hit | "Your cricket night limit has been reached. System is now forwarding new cricket bets to your upline. Hedge bets still accepted. You are protected." |
| End of night session | "Night summary: 214 bets. Net result: +₹1,85,000. Forwarded ₹12.4L to Vikram. Maximum loss was ₹4.1L (41% of budget). Settlement pending." |
Interactive Commands (via WhatsApp chatbot):
| Agent Types | Command | Response |
|---|---|---|
| "status" | Current exposure, limits, traffic light color | |
| "stop cricket" | Sets cricket forwarding to 100%. Confirmation: "Cricket bets now forwarding 100%. You retain zero risk." | |
| "resume cricket" | Restores previous cricket settings. | |
| "panic" | Triggers the hedge-all panic button. "All positions being hedged. Forwarding set to 100%. You are safe." | |
| "limit 15L" | Updates night budget to ₹15L. Confirmation with new max-loss number. | |
| "sharp amit" | Tags user Amit as sharp. "Amit's bets will now be forwarded 95%. Confirm?" |
This means an agent sitting in a chai shop watching the match on TV can manage their entire book through WhatsApp without ever opening a dashboard.
The Principle: Complexity Is Available, Never Required
The entire UX philosophy can be summarized in one sentence: the system should work perfectly for an agent who never touches a single setting after onboarding, AND it should give full control to an agent who wants to tune every parameter.
The 3-question onboarding generates safe, profitable defaults. The traffic light tells the agent if anything needs attention. The "Sleep Well" number gives them peace of mind. WhatsApp keeps them informed. The full dashboard is there when they want it. The matrix editor is there when they are ready for it.
No agent should ever feel overwhelmed by the system. And no agent should ever feel limited by it.
15. Performance Architecture
Latency Budget
The entire bet processing path must complete within 90 milliseconds on the synchronous path, with 110ms of headroom before the user experience degrades.
| Step | Budget | Description |
|---|---|---|
| Request parsing & validation | 5ms | Parse the incoming bet request, validate format |
| Metrics computation | 3ms | Calculate potential win, liability |
| User win cap check | 5ms | Check per-click and aggregate limits |
| Stake reduction (if needed) | 2ms | Calculate reduced stake |
| Matrix resolution | 10ms | Look up the forwarding percentage |
| Agent cap evaluation (per level) | 10ms x N levels | Check limits at each cascade level (typically 2-3 levels) |
| Position creation | 15ms | Write positions to database |
| Exposure ledger update | 10ms | Update all affected ledgers atomically |
| Audit record creation | 10ms | Persist the audit trail |
| Response | 5ms | Return confirmation to the punter |
| Total (3-level cascade) | ~85ms | Within budget |
Memory Architecture: 3-Tier
| Tier | Technology | TTL | Hit Rate | Use Case |
|---|---|---|---|---|
| Tier 1 | Application LRU Cache | 5 seconds | ~60% | Exposure far from limits, matrix lookups, agent config |
| Tier 2 | Redis | Until invalidated | ~25% | Exposure checks, active period boundaries, NO_NEW_RISK flags |
| Tier 3 | PostgreSQL | Persistent | ~15% | Near-limit exposure writes (FOR UPDATE), position creation, audit records |
Burst Traffic Handling: IPL Final Scenario
During the IPL final, bet volume can spike to 10,000 bets per minute (about 167 per second). Here is how the system handles it:
| Challenge | Solution |
|---|---|
| Database write contention | Sharded exposure counters -- instead of one row per agent per sport, use N shards. Each bet updates a random shard. Reads sum all shards. |
| Matrix lookup speed | Pre-computed matrix resolution cache. When an agent's matrix changes, all possible resolution paths are pre-computed and cached. During the match, lookups are O(1) hash lookups. |
| Audit trail write volume | Batch audit writes. Audit records are buffered in memory and flushed every 500ms. In a crash, up to 500ms of audit records may be lost (positions are never lost because they use the synchronous path). |
| Redis connection pool | Dedicated Redis connection pool for exposure checks, separate from general-purpose cache. |
Sharded Exposure Counters
For agents receiving high bet volume, a single exposure counter row becomes a bottleneck because every bet needs to lock it.
Solution: instead of one counter, use 8 shards:
RAJESH CRICKET EXPOSURE (SHARDED)
===================================
Shard 0: ₹4,75,000
Shard 1: ₹5,12,000
Shard 2: ₹4,88,000
Shard 3: ₹5,01,000
Shard 4: ₹4,95,000
Shard 5: ₹5,23,000
Shard 6: ₹4,67,000
Shard 7: ₹4,89,000
-----------------------------------
Total: ₹39,50,000 / ₹50,00,000
Each incoming bet randomly picks a shard and only locks that shard. Contention drops by 8x. Reads sum all shards (slightly slower but still sub-millisecond from Redis).
What's Cached Where and for How Long
| Data | Cache Location | TTL | Invalidation |
|---|---|---|---|
| Agent forwarding matrix | Application LRU + Redis | 5 min / until change | On matrix update, invalidate immediately |
| Agent limits configuration | Application LRU + Redis | 5 min / until change | On limit update, invalidate immediately |
| Current exposure (far from limit) | Application LRU | 5 seconds | Time-based expiry |
| Current exposure (near limit) | Not cached | -- | Always read from PostgreSQL with lock |
| User win cap state | Redis | Until period reset | On bet placement (update), on period reset (clear) |
| NO_NEW_RISK flags | Redis | Until cleared | On limit breach (set), on settlement or limit change (clear) |
| Period boundaries | Application LRU | 1 hour | On config change |
| Sharp user flags | Redis | 1 hour | On detection service update |
16. Competitive Landscape
| Feature | Betfair | bet365 | Pinnacle | Asian Books | Hannibal (Target) |
|---|---|---|---|---|---|
| Risk Model | Pure exchange (no risk) | B-Book + A-Book hybrid | Sharp-friendly B-Book | Primarily B-Book | Hierarchical B-Book with automated routing |
| Agent Hierarchy | None (B2C only) | None (B2C only) | None (B2C only) | Manual, phone-based | Automated, N-level, with cascading |
| Forwarding Logic | N/A | Proprietary, opaque | N/A | Manual negotiation | Configurable multi-dimensional matrix |
| Limit Management | Market-based liquidity | Per-user, opaque | Minimal (welcomes sharps) | Per-user, manual | Per-agent, per-sport, per-market, per-period |
| Audit Trail | Exchange provides full transparency | Minimal for agents | Basic | None | Complete, replayable, deterministic |
| Sharp Handling | Exchange market handles it | Restrict accounts aggressively | Welcome and manage | Restrict and forward | Configurable per-user forwarding |
| Hedge Options | IS the exchange | Internal + Betfair | Internal models | Betfair + internal | Betfair + multi-exchange (planned) |
| Target Market | Developed (UK, AU) | Global B2C | Global B2C (niche) | Asia, manual agent networks | Agent networks: India, SE Asia, Africa |
What to Learn from Each
| Competitor | Lesson for Hannibal |
|---|---|
| Betfair | The exchange model provides perfect transparency. Hannibal's audit trail should aspire to Betfair-level transparency within its B-Book model. |
| bet365 | Their risk management is world-class but opaque. Agents hate opacity. Hannibal should match their sophistication while providing the transparency agents demand. |
| Pinnacle | Their sharp-friendly model proves you can profit even from sharp users if you manage margins correctly. Hannibal's forwarding matrix should allow agents to choose their sharp tolerance. |
| Asian Books | They understand the agent hierarchy model deeply but use manual processes. Hannibal automates what they do by hand. |
17. Phased Rollout Plan
Phase 1: Agent Risk Controls (Weeks 1-4)
Goal: Every agent has enforceable limits and real-time exposure tracking.
| Week | Deliverable |
|---|---|
| 1-2 | Per-agent limit configuration (sport, market, period) + database models |
| 2-3 | Real-time exposure tracking with Redis fast-path |
| 3-4 | NO_NEW_RISK mode (automatic trigger + manual override) |
| 4 | Per-click user win limits + stake reduction |
Why this first: Limits and exposure tracking are independent of the forwarding matrix. They provide immediate value by protecting agents from over-exposure. Even without smart routing, agents get safety.
Success metric: Zero incidents where an agent exceeds their configured limit.
Phase 2: Smart Forwarding (Weeks 5-10)
Goal: Bets are routed through the hierarchy based on configurable rules.
| Week | Deliverable |
|---|---|
| 5-6 | Forwarding matrix data model + basic resolution (market_type + sport_type) |
| 6-7 | Precedence chain (user override > market override > matrix > default) |
| 7-8 | Cascading upline routing (N-level) |
| 8-9 | Aggregate user win limits + period definitions (night/weekly) |
| 9-10 | Integration testing, overflow scenarios, suspended agent handling |
Why this second: This is the core value proposition. But it depends on the limits and exposure tracking from Phase 1.
Success metric: Bets correctly routed through 3+ levels with deterministic, auditable decisions.
Phase 3: Intelligence & Polish (Weeks 11-16)
Goal: Full 5-dimensional matrix, advanced detection, and complete audit trail.
| Week | Deliverable |
|---|---|
| 11-12 | Full 5D matrix (add event_phase, source_type, liquidity_band) |
| 12-13 | Advanced hedge detection (multi-outcome worst-case analysis) |
| 13-14 | Complete structured audit trail with replay capability |
| 14-15 | Cross-agent sharp detection and syndicate detection |
| 15-16 | Agent dashboard v2 with all reports and the panic button |
Why this third: The 5D matrix and advanced detection are refinements. The 2D matrix from Phase 2 handles 80% of cases. Phase 3 handles the remaining 20%.
Success metric: Audit trail passes third-party review. Sharp detection flags known sharp users within 100 bets.
Phase 4: Scale & Optimize (Weeks 17+)
Goal: Handle peak traffic, add intelligence, expand hedge options.
| Deliverable | Description |
|---|---|
| ML-based odds adjustment | Use historical data to adjust odds before they reach the punter |
| Multi-exchange hedging | Hedge on Betfair, Smarkets, Betdaq, and local exchanges |
| Mobile dashboard | Full agent dashboard on mobile devices |
| Sharded exposure counters | Handle 10,000+ bets/minute during IPL final |
| Auto-matrix optimization | Suggest matrix changes based on historical P&L |
18. Revenue Model
Four Revenue Streams
| Stream | Description | Example |
|---|---|---|
| Transaction Fees | A small percentage of every bet processed through the platform | 1-2% of stake on every bet |
| Retained Risk Profit | The platform retains a portion of bets (at the top of the cascade). On average, the bookie has an edge, so retained risk is profitable over time. | Platform retains ₹800 of a ₹10,000 bet. Over thousands of bets, the edge produces ~5% margin. |
| Betfair Arbitrage Spread | When hedging on Betfair, the platform can capture a spread between the price offered to the punter and the price available on the exchange. | Punter gets odds of 1.85, Betfair offers 1.90. The 0.05 spread on every hedged rupee is pure profit. |
| Data Intelligence | Aggregate anonymized betting data has value for odds compilation, market making, and risk modeling. | Subscription service for odds providers and analytics firms. |
Financial Modeling Example
Consider a moderately busy day on Hannibal:
DAILY FINANCIAL MODEL
=====================
Total Stake Processed: ₹5,00,00,000 (₹5 crore)
Revenue Stream Breakdown:
-----------------------------------------------------------------
Transaction Fees (1.5%): ₹7,50,000
→ All bets, regardless of routing
Platform Retained Risk: ₹50,00,000 stake retained
→ 5% edge over time: ₹2,50,000
Betfair Hedge Spread: ₹1,00,00,000 hedged
→ Average 0.03 spread: ₹3,00,000
Data Intelligence: ₹50,000 (amortized daily)
-----------------------------------------------------------------
TOTAL DAILY REVENUE: ₹13,50,000
ANNUAL PROJECTION: ₹49+ crore
(assuming 365 operating days and modest growth)
The Key Insight
Hannibal is an operating system for bookmakers, not a bookmaker itself.
This distinction is critical. A bookmaker takes risk and profits (or loses) from betting outcomes. Hannibal provides the infrastructure that enables agents to take risk efficiently. Like an operating system, it earns from:
- Providing the platform (transaction fees)
- Running a small retained book at the top of the cascade (retained risk)
- Facilitating exchange access (hedge spread)
- Generating intelligence from aggregate data
This means Hannibal's revenue is diversified and largely non-directional. A bad day for bookies (punters win big) still generates transaction fees. A good day for bookies generates both fees and retained risk profit. The operating system always earns.
19. Implementation Order (for Developers)
The Guiding Principle
The single most important architectural insight: a bet currently goes from 1 routing destination to N destinations. Every implementation step moves toward this goal incrementally.
Step-by-Step Order
Step 1: Data Models First
Add all new Prisma models to the schema without changing any behavior. This is the safest possible first step -- it is purely additive. New tables for: forwarding matrix rules, agent limits, exposure ledgers, period definitions, audit records, and user win caps.
No existing behavior changes. No existing tests break. The database migration is backward-compatible.
Step 2: Audit Trail Second
Implement the audit record creation for every bet, even before forwarding logic changes. This provides immediate value: every bet now has a complete decision record. It also serves as an early warning system when we start changing routing behavior -- we can compare audit trails before and after.
Step 3: User Win Limits Third
Implement per-click win limits and stake reduction. This is independent of the agent hierarchy and forwarding logic. It sits at the very beginning of the bet flow (before routing decisions). It can be tested and deployed in isolation.
Step 4: Forwarding Precedence Chain Fourth
Implement the resolution logic: user override, then market override, then matrix lookup, then agent default. Initially, the matrix will be simple (2 dimensions: market_type and sport_type). The cascade still goes to a single destination, but the forwarding percentage is now determined by the precedence chain instead of a flat configuration.
Step 5: Cascading Routing Fifth
This is the big structural change. A bet that previously went to one destination now flows through the full agent hierarchy. Each level resolves its own forwarding percentage, checks its own limits, and forwards the remainder.
This must be implemented behind a feature flag so it can be enabled per agent. Early adopters test it while others continue with the existing behavior.
Step 6: NO_NEW_RISK and Hedge Detection Sixth
Implement automatic NO_NEW_RISK triggering and hedge detection. This depends on the exposure ledgers from Step 5 being accurate, which is why it comes after cascading routing.
Step 7: Period Management Last
Implement night and weekly periods, with timezone handling and carry-forward logic. This is last because it is the most operationally complex feature and depends on all other components working correctly.
Feature Flag Strategy
Every major feature is wrapped in a feature flag:
| Flag | Controls | Default |
|---|---|---|
bbook.forwarding_matrix.enabled | Whether the matrix is used for routing decisions | OFF |
bbook.cascading_routing.enabled | Whether bets cascade through the hierarchy | OFF |
bbook.user_win_limits.enabled | Whether per-click and aggregate win limits are enforced | OFF |
bbook.no_new_risk.enabled | Whether NO_NEW_RISK mode can activate | OFF |
bbook.period_management.enabled | Whether night/weekly periods are active | OFF |
bbook.audit_trail.enabled | Whether audit records are created | ON (from Step 2 onward) |
Flags can be toggled per agent. This allows:
- Gradual rollout to trusted agents first
- Quick rollback if issues are discovered
- A/B comparison between old and new routing
- Production testing with real traffic but limited blast radius
The Migration Path
TODAY PHASE 1 PHASE 2 PHASE 3
===== ======= ======= =======
Bet → Single Agent Bet → Single Agent Bet → Agent L1 Bet → Agent L1
(flat B-Book %) (with limits) → Agent L2 → Agent L2
(no limits) (win caps) → Platform → Platform
(no audit) (audit trail) → Betfair → Betfair
(2D matrix) (5D matrix)
(basic cascade) (hedge detection)
(NO_NEW_RISK)
(periods)
20. The Bookie's Final Verdict
The Spec Gets the Math Right, but Misses Operational Reality
The B-Book system as designed is mathematically sound. The forwarding matrix, cascading routing, and exposure accounting are all correct. But a system that is correct on paper and a system that survives contact with real bookies operating during a live IPL match are two different things.
Here is what operational reality demands:
The Five Things That Will Make or Break the System
1. Speed of configuration changes.
During a live match, a bookie needs to change their matrix in seconds, not minutes. If MI loses 3 wickets in an over and the bookie wants to reduce retention from 40% to 10%, the system must allow this change to take effect on the very next bet. A configuration change that requires a page reload, a cache flush, or a 30-second propagation delay is unacceptable.
Design response: Matrix changes take effect immediately. The system invalidates all caches for the affected agent synchronously. The next bet uses the new matrix.
2. Visibility into what is happening RIGHT NOW.
Bookies do not look at reports after the fact. They look at dashboards during the match. They need to see, in real time: current exposure by match, current exposure by outcome, how close they are to each limit, which users are winning, which users are losing, and what bets are coming in right now.
Design response: The real-time dashboard (Section 14) is not a nice-to-have; it is the product. The B-Book engine is invisible infrastructure. The dashboard is what agents interact with.
3. The ability to override everything.
No matter how good the matrix is, there will be moments when the bookie wants to override it. "I have inside information that this match is suspicious -- forward everything." "This user is my cousin -- let his bet through even though it exceeds the cap." The system must support manual overrides at every level without breaking the audit trail.
Design response: User overrides and market overrides sit above the matrix in the precedence chain. Every override is logged. The system accommodates human judgment while ensuring accountability.
4. Settlement speed and accuracy.
The bookie's trust in the system is earned at settlement time. If settlements are delayed, incorrect, or confusing, the agent will abandon the platform. Settlement must happen within minutes of a match ending, and the numbers must match exactly what the agent expects based on what they saw on their dashboard.
Design response: Settlement cascades through the hierarchy using the same audit records that were created at bet time. The agent can verify every settled bet against the audit trail. Discrepancies are impossible because the settlement engine uses the same source of truth as the bet engine.
5. Graceful degradation, not catastrophic failure.
When something goes wrong -- and it will, during the biggest matches at the worst possible time -- the system must degrade gracefully. A Redis outage should not reject bets; it should fall back to PostgreSQL. A Betfair outage should not block bet placement; it should absorb as retained risk. A matrix configuration error should not route bets to the void; it should fall back to the agent default.
Design response: Every component has a fallback. The cascade has a backstop. The system is designed to always accept the bet and always route it somewhere safe, even if that somewhere is not optimal.
What Would Make Every Bookie Want This System
The ultimate test is simple: does this system make more money for the bookie while requiring less manual work?
If Rajesh can configure his risk appetite once, trust the system to enforce it, see his position in real time, sleep through the night knowing his limits protect him, settle with Vikram cleanly every Monday, and identify his sharp users before they drain his bankroll -- then this system wins.
The B-Book is not a technical achievement to be admired. It is a tool to be used. Its success will be measured not in latency percentiles or audit trail completeness, but in how many agents adopt it, how much volume flows through it, and how few disputes arise from it.
Build the dashboard first. Make the math invisible. Let the bookie focus on what they do best: understanding their market and their punters. Let the system handle everything else.
Part II: Gap Analysis Solutions -- The Immune System
The following sections address critical gaps identified during expert review by a veteran B-Book architect (20+ years, 4 B-Book systems built) and a senior financial systems engineer. These are the "immune system" of the B-Book -- the mechanisms that handle failures, prevent exploits, and ensure the system degrades gracefully when things go wrong.
As the reviewer noted: "The forwarding matrix is the brain. What is missing is the immune system. Build the safety systems before you build the intelligence systems."
21. Bet Cancellation / Void / Partial Settlement State Machine
Why This Matters
In real bookmaking, bets do not always travel the happy path from placement to settlement. Matches get abandoned. Rain interrupts play after 10 overs. A corruption ruling voids specific markets. An admin discovers a data feed error and needs to void bets placed during a 30-second window. A punter calls within 5 seconds asking to cancel.
Every one of these scenarios must be handled without breaking the exposure ledgers, without double-counting, and without leaving orphaned positions anywhere in the agent hierarchy.
The Complete Bet State Machine
State Definitions
| State | What It Means | Exposure Impact | Reversible? |
|---|---|---|---|
| BET_PLACED | Bet accepted, positions being created. Transient state (< 100ms). | Ledgers not yet updated | Yes -- system error during creation rolls back |
| ACTIVE | All positions created, all exposure ledgers updated. The bet is live. | Fully reflected in all agent ledgers | No -- can only move forward to SETTLED, VOIDED, etc. |
| SETTLED | Event result known, P&L calculated, payouts determined. | Exposure removed from ledgers, P&L applied | Can move to RE_SETTLED if result correction |
| VOIDED | Entire bet nullified. All stakes returned. As if the bet never happened. | All exposure atomically removed from every agent in the chain | No -- void is final |
| PARTIALLY_VOIDED | Some markets/legs within the bet are voided, others settled normally. | Voided portion removed, settled portion resolved normally | No |
| CANCELLED | Punter-initiated cancellation within the allowed window. Functionally identical to void. | All exposure removed | No |
| CASH_OUT_SETTLED | Punter took early settlement via cash-out. | Original position closed, counter-position created and settled | No |
| RE_SETTLED | A previously settled bet has been re-settled due to result correction. | Previous P&L reversed, new P&L applied | Can be re-settled again if needed |
| REJECTED | Bet failed validation (invalid market, suspended event, etc.). Never reached ACTIVE. | Zero -- no positions were created | No |
Who Can Initiate Each State Transition
| Transition | Initiated By | Authorization Required | Time Window |
|---|---|---|---|
| ACTIVE -> SETTLED | System (automatic) | Event result feed | After event concludes |
| ACTIVE -> VOIDED | Platform admin | ADMIN or SUPER_ADMIN role | Any time before settlement |
| ACTIVE -> VOIDED | System (automatic) | Abandoned event rule triggers | When event is officially abandoned |
| ACTIVE -> PARTIALLY_VOIDED | Platform admin or system | Same as void | When specific markets are voided |
| BET_PLACED -> CANCELLED | Punter | Punter's own bet only | Within cancellation window (configurable, typically 3-5 seconds for pre-match, 0 seconds for in-play) |
| BET_PLACED -> CANCELLED | Agent (Rajesh) | Agent can cancel bets of their own punters | Within 60 seconds (configurable per agent) |
| SETTLED -> RE_SETTLED | Platform admin | SUPER_ADMIN role only | Within 72 hours of original settlement |
Key rule: Agents cannot void bets. Only the platform can void. Agents can cancel within a short window. This prevents an agent from voiding a bet after seeing that it lost (which would be a form of fraud against their upline).
What Triggers Each Void Type
| Trigger | Void Type | Scope | Example |
|---|---|---|---|
| Match abandoned (weather, floodlight failure) | FULL_VOID | All markets on that event | IPL match abandoned after rain, no result possible |
| Match abandoned after partial completion | PARTIAL_VOID | Completed markets settle, incomplete markets void | IPL match abandoned after 10 overs -- completed over markets settle, match odds void |
| Corruption/match-fixing ruling | FULL_VOID | All markets on that event | ICC declares match result void due to fixing investigation |
| Data feed error | SELECTIVE_VOID | Bets placed during the error window | Odds feed showed 1.05 instead of 10.5 for 30 seconds; bets during that window are voided |
| Punter cancellation | CANCELLATION | Single bet | Amit taps "Cancel" within 3 seconds of placing a pre-match bet |
| Admin decision | ADMIN_VOID | Any scope (single bet, all bets on a market, all bets on an event) | Admin discovers a technical glitch and voids affected bets |
How Voids Cascade Through the Agent Hierarchy
This is the critical design challenge. When Amit's bet was placed, it was split across Rajesh (60%), Vikram (24%), and the Platform (16%). A void must reverse every single one of those positions atomically.
The key principle: the void operation reads the original audit record to determine exactly what to reverse. It does not recalculate anything. It uses the recorded split from bet placement time. This ensures that even if Rajesh changed his matrix since then, the void reverses exactly what was originally done.
Idempotent Void Operations
Every void operation is assigned a unique void_operation_id. Before executing, the system checks whether this void_operation_id has already been applied.
VOID IDEMPOTENCY CHECK
========================
1. Admin requests: void bet_a1b2c3d4, reason: MATCH_ABANDONED
2. System generates: void_op_id = void_bet_a1b2c3d4_MATCH_ABANDONED_v1
3. System checks: SELECT * FROM void_operations WHERE void_op_id = ?
4a. If NOT found: Execute void, record void_op_id with result
4b. If found: Return the recorded result, do NOT execute again
This means:
- Pressing "Void" twice does NOT double-decrement exposure
- A network retry after timeout does NOT create a second void
- A batch void that partially fails can be safely retried
The void_operations table stores:
| Column | Type | Purpose |
|---|---|---|
| void_op_id | TEXT (PK) | Idempotency key |
| bet_id | TEXT | Which bet was voided |
| void_type | ENUM | FULL_VOID, PARTIAL_VOID, CANCELLATION, ADMIN_VOID |
| reason | TEXT | Human-readable reason |
| initiated_by | TEXT | User ID of who initiated it |
| positions_reversed | JSONB | Snapshot of every position that was reversed |
| ledger_adjustments | JSONB | Snapshot of every ledger decrement |
| executed_at | TIMESTAMP | When the void was applied |
| idempotent_hit_count | INT | How many times this void was re-requested after first execution |
How Exposure Ledgers Are Atomically Decremented
The void executes as a single database transaction with FOR UPDATE locks on all affected exposure ledger rows. The transaction includes:
BEGIN TRANSACTION;
-- Lock all affected ledger rows in a deterministic order
-- (always lock by agent_id ascending to prevent deadlocks)
SELECT * FROM exposure_ledgers
WHERE (agent_id = 'rajesh' AND scope = 'cricket_sport')
OR (agent_id = 'rajesh' AND scope = 'mi_vs_csk_match')
OR (agent_id = 'rajesh' AND scope = 'night_period')
OR (agent_id = 'vikram' AND scope = 'cricket_sport')
... (all affected scopes for all agents)
FOR UPDATE;
-- Decrement each ledger by the exact amount from the audit record
-- Update Redis cache after commit
-- Insert void_operation record
-- Update bet status to VOIDED
COMMIT;
After the database transaction commits, Redis and application LRU caches are invalidated for all affected agents. The invalidation order does not matter because the caches are read-through -- a cache miss simply reads the correct value from PostgreSQL.
NO_NEW_RISK Re-evaluation After Voids
When a void reduces an agent's exposure, the system must check whether the agent should exit NO_NEW_RISK mode:
AFTER VOID:
1. Read Rajesh's current retained_open_liability for each scope
2. Compare against each limit
3. If retained_open_liability < limit for ALL scopes:
→ Clear NO_NEW_RISK flag in Redis
→ Agent can accept new risk bets again
4. If still over limit for any scope:
→ Keep NO_NEW_RISK active for that scope
This check happens inside the same transaction as the void. The NO_NEW_RISK flag in Redis is updated immediately after the transaction commits.
Walk-Through: IPL Match Abandoned After 10 Overs
Scenario: MI vs CSK, IPL 2026. Rain stops play after 10 overs. The match is officially abandoned -- no result. Here is what happens:
The markets on this match:
| Market | Status at Abandonment | Action |
|---|---|---|
| Match Odds (MI win / CSK win / Draw) | Incomplete -- no result determined | VOIDED -- stakes returned |
| First Innings Total Runs | Incomplete -- only 10 overs bowled of 20 | VOIDED -- stakes returned |
| Over 1 Runs (6.5 over/under) | Completed -- over 1 finished with 8 runs | SETTLED -- over 6.5 wins |
| Over 2 Runs (7.5 over/under) | Completed -- over 2 finished with 6 runs | SETTLED -- under 7.5 wins |
| ... Overs 3-10 ... | Completed | SETTLED normally |
| Over 11 Runs | Not started | VOIDED -- stakes returned |
| Top Batsman | Incomplete | VOIDED -- stakes returned |
How this processes:
Step 1: Event marked as ABANDONED by the feed or admin.
The system receives the event status update: event_status = ABANDONED, overs_completed = 10.
Step 2: Market-level settlement rules kick in.
Each market has a settlement rule defined at creation time:
| Market Type | Abandonment Rule |
|---|---|
| Match Odds | Void if no result |
| Over X Runs | Settle if over X is completed, void if not |
| Top Batsman | Void unless one innings fully completed |
| First Innings Total | Void unless first innings fully completed |
Step 3: System generates a batch of void and settlement operations.
For this match, there are 847 open bets. The system processes them in a single settlement batch:
- 312 bets on Match Odds: all VOIDED
- 85 bets on First Innings Total: all VOIDED
- 43 bets on Top Batsman: all VOIDED
- 20 bets on Over 11-20 markets: all VOIDED
- 387 bets on Over 1-10 markets: all SETTLED with actual results
Step 4: Void cascade for each voided bet.
Take Amit's bet as an example. He bet ₹10,000 on MI to win at 1.85. The original audit record shows:
Rajesh retained: ₹6,000 stake, ₹5,100 liability
Vikram retained: ₹2,400 stake, ₹2,040 liability
Platform retained: ₹800 stake, ₹680 liability
Betfair hedged: ₹800 stake
The void reverses all of these. Amit gets his ₹10,000 back. Rajesh's exposure drops by ₹5,100. Vikram's drops by ₹2,040. The platform's drops by ₹680. The Betfair hedge is cancelled (if unmatched) or counter-traded (if matched).
Step 5: Settlement cascade for each settled bet.
Sonia bet ₹5,000 on Over 1 Runs Over 6.5 at odds 1.90. Over 1 completed with 8 runs (over 6.5 wins). This bet is settled as a winner. The settlement cascade pays out through the same chain that held the positions.
Step 6: Rajesh sees the result on his dashboard.
MI vs CSK -- MATCH ABANDONED (Rain)
=====================================
Voided markets: Match Odds, First Innings Total, Top Bat, Overs 11-20
Settled markets: Over 1-10 Runs
Your positions:
Voided: ₹1,82,000 stake returned (34 bets)
Settled: +₹24,000 profit (18 winning bets, 14 losing bets)
Net: +₹24,000 profit from completed overs
Exposure released: ₹3,45,000 (your cricket limit now has more headroom)
The Partial Void Edge Case
What about a multi-leg (accumulator/parlay) bet where one leg is voided but others settle? The standard industry rule is:
When one leg of a multi-leg bet is voided, that leg is treated as a winner at odds 1.00. The remaining legs settle normally with the voided leg's odds removed from the accumulator calculation.
Example: Amit places a 3-leg accumulator:
- Leg 1: MI win at 1.85 (VOIDED -- match abandoned)
- Leg 2: RCB win at 2.10 (SETTLED -- RCB won)
- Leg 3: KKR win at 1.70 (SETTLED -- KKR lost)
Original combined odds: 1.85 x 2.10 x 1.70 = 6.60 After void adjustment: 1.00 x 2.10 x 1.70 = 3.57
But Leg 3 lost, so the entire accumulator loses. The void did not save the bet.
If all non-voided legs had won, the payout would use the reduced odds (3.57 instead of 6.60). The positions at each agent level are recalculated based on the reduced odds and the partial void is applied to the difference.
22. MVCC for Forwarding Matrix Changes
The Problem
At 9:47 PM during MI vs CSK, Rajesh changes his forwarding matrix. He reduces his retention on in-play match odds from 40% to 15% because MI just lost 3 quick wickets.
At the exact moment he saves this change, there are 15 bets in various stages of processing. Some are in the matrix resolution step, some are in cap evaluation, some are about to write positions. If half of those bets use the old matrix and half use the new one, the audit trail becomes inconsistent and unexplainable.
The solution is Multi-Version Concurrency Control (MVCC) for the forwarding matrix. Every matrix change creates a new version. Every bet captures which version it used. Old versions are preserved for audit and replay.
How Matrix Versions Work
RAJESH'S MATRIX VERSIONS
==========================
Version 1 (created: Feb 1, 2026 10:00 AM)
- Initial setup via onboarding wizard
- 8 rules, "Balanced" profile
- active_from: 2026-02-01T10:00:00Z
- active_until: 2026-02-11T21:47:00Z (set when V2 was created)
Version 2 (created: Feb 11, 2026 9:47 PM)
- Rajesh reduced in-play match odds retention from 40% to 15%
- 8 rules, modified Rule R5
- active_from: 2026-02-11T21:47:00Z
- active_until: NULL (current active version)
Version 3 (created: Feb 11, 2026 10:15 PM)
- Rajesh restored original settings after MI stabilized
- 8 rules, Rule R5 back to 40%
- active_from: 2026-02-11T22:15:00Z
- active_until: NULL (will become current active version)
The Version Data Model
Each matrix version is an immutable snapshot:
| Field | Type | Description |
|---|---|---|
| version_id | UUID | Unique identifier for this version |
| agent_id | TEXT | Which agent owns this matrix |
| version_number | INT | Monotonically increasing sequence per agent |
| rules | JSONB | The complete set of matrix rules (immutable snapshot) |
| created_at | TIMESTAMP | When this version was created |
| created_by | TEXT | Who created it (agent, admin, system) |
| change_reason | TEXT | Why the change was made (free text or enum) |
| active_from | TIMESTAMP | When this version became the active version |
| active_until | TIMESTAMP | When this version was superseded (NULL if current) |
| checksum | TEXT | SHA-256 of the rules JSONB, for integrity verification |
Key design rule: matrix versions are immutable. Once created, a version is never modified. A "change" always creates a new version. The active_until field on the old version is the only field that changes (it gets stamped when superseded).
How Bets Capture Their Matrix Version
When a bet enters the matrix resolution step, it captures the current active version ID before evaluating any rules:
BET PROCESSING TIMELINE
========================
1. Bet arrives at matrix resolution step
2. Read current_active_version_id for this agent from cache
→ This is an atomic read: either version 1 or version 2, never a mix
3. Load the rules for that specific version
4. Evaluate rules against bet characteristics
5. Record the version_id in the bet's audit trail
6. Continue to cap evaluation and position creation
The version_id is captured ONCE at the start of matrix resolution.
All subsequent steps for this bet use the rules from that version.
How This Interacts With the 3-Tier Cache
Each cache tier must be version-aware. When Rajesh saves a new matrix version:
Cache layer behavior on version change:
| Cache Tier | What Happens | Why |
|---|---|---|
| Tier 1 (App LRU) | Entry for rajesh_active_matrix is immediately invalidated via pub/sub | Next read loads from Redis or DB |
| Tier 2 (Redis) | rajesh_active_matrix_version key updated atomically to new version_id | All app instances see the new version on next read |
| Tier 3 (PostgreSQL) | New version row inserted, old version's active_until set | Source of truth, always correct |
The version-awareness rule for caches:
Each cache entry for a matrix stores the version_id alongside the rules. When a cache hit returns a version_id that does not match the current active version (which can happen in the brief window between DB write and cache invalidation), the cache entry is treated as a miss and the current version is loaded from the next tier.
CACHE ENTRY STRUCTURE
======================
Key: matrix:rajesh:active
Value: {
version_id: "v2-uuid-here",
version_number: 2,
rules: [...],
cached_at: "2026-02-11T21:47:01Z"
}
On read, the system also checks:
→ Is this version_id still the active version? (via a lightweight Redis lookup)
→ If not, treat as cache miss
Audit Trail Records Which Version Was Used
Every bet's audit record includes:
matrix_resolution:
agent: rajesh
matrix_version_id: "v1-uuid-here"
matrix_version_number: 1
matrix_checksum: "sha256:abc123..."
rule_matched: R3
rule_specificity: 3
forward_percentage: 40%
resolution_timestamp: "2026-02-11T21:46:59Z" (2 seconds BEFORE matrix change)
This means that during a dispute, the system can show: "This bet used matrix version 1, which was active from Feb 1 to Feb 11 at 9:47 PM. The rule that matched was R3 with 40% forwarding. Matrix version 2 (15% forwarding) was created 2 seconds later and did not affect this bet."
Garbage Collection of Old Versions
Old matrix versions must be retained for audit and replay purposes. The garbage collection policy is:
| Age of Version | Retention Policy |
|---|---|
| < 90 days | Full retention. All rules, all metadata. |
| 90 days - 1 year | Compressed retention. Rules stored as compressed JSONB. Metadata retained. |
| > 1 year | Archive to cold storage (S3/equivalent). Only metadata in DB. Rules retrievable on demand. |
| > 3 years | Delete unless referenced by an unresolved dispute. |
Versions that are referenced by any active (unsettled) bet are NEVER garbage collected, regardless of age. The reference count is maintained via a simple foreign key from the bet audit record to the matrix version.
Walk-Through: Rajesh Changes Matrix Mid-IPL-Match, 15 Bets In-Flight
Setup: MI vs CSK, 9:47 PM. MI just lost their 3rd wicket in 2 overs. Rajesh panics and changes his in-play match odds retention from 40% to 15%.
At the moment of the change, 15 bets are in various stages:
| Bets 1-5 | Already past matrix resolution, in cap evaluation or position creation | These captured matrix version 1 (40% retention). They complete with 40% retention. |
|---|---|---|
| Bets 6-10 | In the request queue, not yet started processing | These will read the new active version (version 2, 15% retention) when they reach matrix resolution. |
| Bets 11-15 | Currently in matrix resolution step | These are the interesting ones. |
For bets 11-15: The matrix version read is atomic. Each bet reads the current_active_version_id once. If the read happens before the Redis key is updated, they get version 1. If after, they get version 2. There is no "half old, half new" state.
Timeline:
9:47:00.000 Rajesh clicks "Save" on new matrix
9:47:00.005 PostgreSQL: new version 2 created, version 1 active_until = NOW
9:47:00.010 Redis: rajesh_active_matrix_version updated to v2
9:47:00.012 App LRU: rajesh cache entry invalidated
Bet 11: matrix resolution at 9:47:00.003 → reads LRU cache → gets version 1 → 40% forward
Bet 12: matrix resolution at 9:47:00.008 → LRU invalidated, reads Redis → gets version 1 (Redis update at .010 not yet committed) → 40% forward
Bet 13: matrix resolution at 9:47:00.011 → LRU invalidated, reads Redis → gets version 2 → 15% forward
Bet 14: matrix resolution at 9:47:00.015 → LRU miss, reads Redis → gets version 2 → 15% forward
Bet 15: matrix resolution at 9:47:00.020 → LRU loads version 2 → 15% forward
Result: Bets 11-12 used version 1 (40% retention). Bets 13-15 used version 2 (15% retention). The transition is clean. Each bet's audit trail records exactly which version was used. Rajesh can see in his dashboard:
Matrix Change Applied at 9:47 PM
=================================
Last bet with old matrix (v1, 40% retention): 9:47:00.008 PM
First bet with new matrix (v2, 15% retention): 9:47:00.011 PM
Transition time: 3 milliseconds
3 bets processed with old matrix during transition
No bets received inconsistent matrix data
23. Dead Letter Queue and Poison Bet Handling
The Problem
In any distributed system, some messages will fail to process. In Hannibal, this means some bets will fail to route through the cascade, fail to create positions, or fail to update exposure ledgers. These failed bets cannot be silently dropped -- they represent real money that a punter believes they have wagered.
A "dead letter" is a bet that has exhausted its retry budget and cannot be processed automatically. A "poison bet" is a specific class of dead letter where the bet is fundamentally unprocessable -- retrying it will never succeed.
The Retry Pipeline
Retry Policy
| Retry Stage | Max Retries | Backoff | Total Window |
|---|---|---|---|
| Immediate retry | 3 | 50ms, 200ms, 500ms | ~750ms |
| Short backoff retry | 3 | 2s, 5s, 10s | ~17s |
| Long backoff retry | 3 | 30s, 60s, 120s | ~3.5 min |
| Dead letter (RETRYABLE) | 3 | 5 min, 15 min, 30 min | ~50 min |
| Dead letter (POISON) | 0 | N/A -- goes directly to manual queue | N/A |
Total automatic retry window: approximately 55 minutes from first failure to final dead letter classification. This is deliberate -- most infrastructure issues (Redis restart, database failover, network partition) resolve within this window.
What Constitutes a Poison Bet
A bet is classified as POISON (unprocessable by retry) when:
| Condition | Why It Is Poison | Example |
|---|---|---|
| Event already settled | The bet is for an event that has already concluded. Placing a position makes no sense. | Bet queued during outage, event settles before replay. |
| Agent suspended mid-processing | The bet was partially processed when the agent was suspended. The cascade path is now invalid. | Rajesh suspended for payment default while 5 of his bets were in the retry queue. |
| Market no longer exists | The market was removed or never existed (data error). | Bet references a market_id that was deleted due to a data feed error. |
| Stake exceeds agent's total possible capacity | Even with zero current exposure, the agent cannot absorb any of this bet (limit is smaller than the bet's minimum position). | Agent's total sport limit is ₹10,000 but the bet requires ₹50,000 minimum position. |
| Invalid state transition | The bet is in a state that cannot transition to ACTIVE (e.g., already CANCELLED). | Punter cancelled the bet during the retry window. |
| Duplicate bet_id | A bet with this exact ID already exists in ACTIVE state (the original succeeded but the confirmation was lost, triggering a retry). | Network timeout caused client to retry, first attempt actually succeeded. |
The Dead Letter Queue Data Model
| Field | Type | Description |
|---|---|---|
| dlq_entry_id | UUID | Unique identifier |
| bet_id | TEXT | The original bet ID |
| original_request | JSONB | Complete original bet request (preserved exactly) |
| failure_reason | TEXT | Why it failed |
| failure_category | ENUM | POISON, RETRYABLE, UNKNOWN |
| retry_count | INT | How many retries were attempted |
| retry_history | JSONB | Timestamps and error messages for each retry |
| first_failure_at | TIMESTAMP | When the first failure occurred |
| last_retry_at | TIMESTAMP | When the last retry was attempted |
| dead_lettered_at | TIMESTAMP | When it was moved to the DLQ |
| resolution_status | ENUM | PENDING, IN_REVIEW, RESOLVED_VOID, RESOLVED_PROCESSED, RESOLVED_REFUND |
| resolved_by | TEXT | Who resolved it (admin user ID) |
| resolved_at | TIMESTAMP | When it was resolved |
| resolution_notes | TEXT | Free-text notes from the resolver |
| punter_notified | BOOLEAN | Whether the punter has been told about the issue |
| punter_notification_sent_at | TIMESTAMP | When the notification was sent |
What the Punter Experiences
This is the most delicate part of DLQ design. The punter tapped "Place Bet" and saw a response. What did they see?
Scenario A: Failure during initial processing (before confirmation)
The punter saw: "Bet is being processed..." followed by an error or timeout. They do NOT see "Bet Confirmed." In this case:
- The system shows: "Your bet could not be processed. Please try again."
- The bet enters the retry pipeline silently
- If retries succeed, the punter receives a push notification: "Your bet on MI to win has been confirmed."
- If retries fail and the bet is dead-lettered, the punter receives: "Your bet on MI to win could not be placed. No funds were deducted."
Scenario B: Failure during cascade (after partial processing)
This is the dangerous case. The punter's bet was accepted and confirmed (because the initial validation passed), but the cascade failed mid-way. The punter saw "Bet Confirmed." Their account shows the bet as active. But the positions were not fully created across the agent hierarchy.
In this case:
- The punter continues to see "Bet Active" -- we do NOT retroactively change their view
- The system retries the cascade in the background
- If retries succeed, everything is reconciled and the punter never knows there was an issue
- If retries fail, the bet enters the DLQ and an admin resolves it
Scenario C: Poison bet (event already settled)
The bet was queued during an outage. By the time the system recovers, the event has already settled. The bet cannot be placed retroactively.
- The punter saw "Bet is being processed..." (or possibly "Bet Confirmed" if the initial ack was sent)
- Resolution options for the admin:
- VOID -- return the stake, notify the punter: "Your bet on MI to win was cancelled due to a technical issue. Your stake of ₹10,000 has been refunded."
- SETTLE AT RESULT -- if the bet would have been placed had the system been working, settle it as if it were placed. This is the punter-friendly option but creates financial exposure that was never accounted for in the ledgers.
- SETTLE AT VOID -- void the bet but offer the punter a goodwill credit.
Platform policy recommendation: For pre-match bets that failed during a system outage, VOID and refund is the standard. For in-play bets, VOID is the only safe option because the odds may have moved significantly during the outage.
The Manual Resolution Queue Workflow
Admin dashboard for the manual resolution queue:
DEAD LETTER QUEUE -- MANUAL RESOLUTION
=======================================
Pending: 3 entries | Oldest: 12 minutes | Today resolved: 7
┌───────────────────────────────────────────────────────────────────┐
│ DLQ-001 POISON HIGH PRIORITY 12 min ago │
│ │
│ Bet: ₹15,000 on MI to win @ 1.85 │
│ Punter: Amit (under Rajesh) │
│ Reason: Event MI_vs_CSK already SETTLED │
│ Punter saw: "Bet is being processed" (no confirmation sent) │
│ Event result: MI won │
│ If settled: Punter wins ₹12,750 (agent loses) │
│ If voided: Punter refunded ₹15,000 (no P&L impact) │
│ │
│ [Void & Refund] [Settle at Result] [Escalate to Senior Admin] │
└───────────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────┐
│ DLQ-002 RETRYABLE MEDIUM PRIORITY 8 min ago │
│ │
│ Bet: ₹5,000 on RCB to win @ 2.10 │
│ Punter: Sonia (under Rajesh) │
│ Reason: Database timeout at position creation (retry 6/9) │
│ Punter saw: "Bet Confirmed" (confirmation was sent) │
│ Next auto-retry: in 7 minutes │
│ │
│ [Force Retry Now] [Void & Refund] [Wait for Auto-Retry] │
└───────────────────────────────────────────────────────────────────┘
Reconciliation for Orphaned Bets
An orphaned bet is one where the punter-facing record says "Active" but the agent-side positions were never fully created. The reconciliation process runs every 5 minutes:
ORPHAN DETECTION QUERY
========================
Find all bets WHERE:
- bet_status = ACTIVE
- created_at > 5 minutes ago (give normal processing time to complete)
- created_at < 60 minutes ago (anything older is already in DLQ or resolved)
- position_count < expected_position_count (based on hierarchy depth)
For each orphaned bet:
1. Check if it is in the retry pipeline → skip (it is being handled)
2. Check if it is in the DLQ → skip (it is being handled)
3. Otherwise → add to DLQ as RETRYABLE with note "Detected by reconciliation"
Walk-Through: Bet Queued During Outage, Event Settles Before Replay
Setup: 9:30 PM, the database connection pool is exhausted during a traffic spike. Bets are being accepted (the API layer is healthy) but position creation is failing. The system is retrying bets with exponential backoff.
9:30:15 PM: Amit places ₹15,000 on MI to win at 1.85. The API accepts the request and returns "Bet is being processed." The bet enters the retry pipeline because the database write fails.
9:30:15 - 9:31:00 PM: Retries 1-3 (immediate): fail. Database still overloaded.
9:31:00 - 9:31:17 PM: Retries 4-6 (short backoff): fail. Database recovering.
9:32:00 PM: The MI vs CSK match ends. MI wins. The settlement service settles all active positions for this event.
9:32:30 PM: Retry 7 (long backoff) fires. The system attempts to create positions for Amit's bet. But now the event is SETTLED. The position creation logic detects: "Event MI_vs_CSK is in SETTLED state. Cannot create new positions."
9:32:30 PM: The bet is classified as POISON with reason: EVENT_ALREADY_SETTLED. It enters the Dead Letter Queue.
9:32:31 PM: An alert fires on the admin dashboard: "1 poison bet detected. Event settled before bet could be processed."
9:33:00 PM: The on-duty admin reviews the DLQ entry. They see:
- The bet was submitted at 9:30:15 PM, before the event ended
- The punter never received a "Bet Confirmed" message
- The event result: MI won
- If they settle it: Amit wins ₹12,750 (which the agents never priced into their exposure)
- If they void it: Amit gets ₹15,000 refunded, no P&L impact
9:33:30 PM: The admin chooses "Void & Refund." Amit receives a push notification: "Your bet on MI to win could not be processed due to a technical issue. Your ₹15,000 has been refunded. We apologize for the inconvenience."
Why void is the correct default: The agents in the cascade never had this bet's exposure counted against their limits. Settling it retroactively would create phantom exposure that was never risk-managed. The safe choice is always to void and refund.
24. Settlement Cascade Failure Isolation
The Problem
When an IPL final settles, the system might need to process 4,000 positions across 50 agents. If the database times out at position 2,847, what happens to positions 1-2,846 (already processed) and 2,848-4,000 (not yet processed)? The answer cannot be "start over from scratch" -- that would double-settle the first 2,846 positions. And it cannot be "give up" -- that would leave 1,153 positions unsettled.
Per-Position Settlement State Tracking
Every position has its own settlement state, independent of all other positions:
| State | Meaning | Duration |
|---|---|---|
| PENDING | Event result is known, this position is waiting to be settled | Seconds to minutes (queued) |
| PROCESSING | A settlement worker has claimed this position and is calculating P&L | Milliseconds (fast) |
| SETTLED | P&L has been calculated and the exposure ledger has been updated | Seconds (until reconciliation) |
| FAILED | An error occurred during processing. Will be retried. | Until retry succeeds or exhausts retries |
| CONFIRMED | Post-settlement reconciliation has verified this position's numbers match | Permanent (terminal) |
The Settlement Worker Design
Settlement is processed by independent workers that claim positions in batches:
SETTLEMENT WORKER FLOW
========================
1. Worker polls for positions in PENDING state
→ Claims a batch of up to 100 positions
→ Uses SELECT ... FOR UPDATE SKIP LOCKED
→ This means: lock the rows, but if another worker already locked them, skip and take the next ones
2. For each position in the batch:
a. Set state to PROCESSING
b. Calculate P&L based on event result and position odds/stake
c. Update the agent's exposure ledger (decrement retained_open_liability)
d. Update the agent's settled P&L ledger
e. Set state to SETTLED
f. If any step fails: set state to FAILED, record error, move to next position
3. After the batch:
→ Commit all SETTLED positions
→ FAILED positions remain in FAILED state for retry
→ Worker picks up next batch
The critical design: each position is settled independently. Position 2,847 failing does not block position 2,848. The worker simply records the failure and moves on.
Agent-Level Isolation
Settlement is partitioned by agent. Each agent's positions are settled by a separate worker thread (or, in high-volume scenarios, a separate worker process). This provides fault isolation:
SETTLEMENT PARTITIONING
========================
Event: MI vs CSK (SETTLED: MI wins)
Total positions: 4,000 across 50 agents
Worker 1: Rajesh's 180 positions
Worker 2: Vikram's 420 positions (including forwarded positions)
Worker 3: Priya's 95 positions
Worker 4: Suresh's 310 positions
...
Worker 50: Kwame's 15 positions
Each worker operates independently.
Rajesh's settlement failure does NOT block Vikram's settlement.
Settlement Ordering: Does It Matter?
Within a single agent: Order does not matter. Each position is independent. Settling position 3 before position 1 produces the same final ledger state.
Across agents: Order does not matter for financial accuracy. Rajesh's settlement is independent of Vikram's. The exposure ledgers are per-agent, so there is no cross-agent dependency.
One exception: the platform's Betfair hedge positions. If the platform needs to close hedge positions on Betfair, this should happen AFTER all agent-side positions are settled, because the platform needs to know the final net position before deciding how to close the hedge. Design rule: platform hedge settlement runs after all agent settlements are CONFIRMED (or FAILED-and-escalated).
Settlement Reconciliation
After each settlement batch, a reconciliation check runs:
RECONCILIATION CHECK (per event, per agent)
=============================================
For each agent who held positions on this event:
1. Sum all SETTLED position P&L values
→ This is what we actually settled
2. Compare against the pre-settlement exposure ledger
→ retained_open_liability for this event should now be zero
→ forwarded_open_liability for this event should now be zero
3. Check invariant:
→ Sum(all position stakes) = Original bet stake (for each bet)
→ Sum(retained P&L) + Sum(forwarded P&L) = Total P&L (for each bet)
4. If any invariant fails:
→ Flag for manual review
→ Do NOT proceed to CONFIRMED state
→ Alert: "Settlement reconciliation failed for agent X on event Y"
How Partial Failures Are Detected and Resumed
The system runs a settlement monitor job every 30 seconds:
SETTLEMENT MONITOR
===================
Every 30 seconds, check:
1. Positions in PROCESSING state for > 60 seconds
→ The worker that claimed them probably crashed
→ Reset to PENDING (the SELECT FOR UPDATE lock is released on crash anyway)
2. Positions in FAILED state
→ Retry count < 5: reset to PENDING for auto-retry
→ Retry count >= 5: escalate to DLQ for manual resolution
3. Events where some positions are CONFIRMED but others are still PENDING/FAILED
→ This is a partial settlement
→ Generate a report: "Event X: 3,847 of 4,000 positions settled. 153 pending retry."
→ If any positions have been FAILED for > 15 minutes: alert admin
Walk-Through: IPL Final, 4,000 Positions, DB Timeout at Position 2,847
Setup: MI vs CSK IPL final. MI wins. The settlement dispatcher partitions 4,000 positions across 50 agents and launches settlement workers.
Timeline:
10:45:00 PM: Event result received. Settlement dispatcher starts.
10:45:01 PM: Workers launched. Each worker begins processing their batch.
10:45:05 PM: Workers 1-30 complete successfully. 2,200 positions settled.
10:45:08 PM: Worker 31 (processing Suresh's 310 positions) hits a database timeout at position 2,847 (globally numbered). The worker has already settled 147 of Suresh's positions.
Worker 31 Status:
Suresh's positions: 310 total
SETTLED: 147 (P&L calculated and committed)
PROCESSING: 1 (position 148 -- the one that timed out)
PENDING: 162 (not yet attempted)
FAILED: 0 (the timeout means position 148 is still PROCESSING)
10:45:08 PM: Worker 31 catches the timeout. It sets position 148 to FAILED with reason "DB_TIMEOUT". It continues to position 149.
10:45:09 PM: Workers 32-50 continue processing other agents' positions. They are unaffected by Suresh's timeout. Vikram's 420 positions settle successfully. Rajesh's 180 positions settle successfully.
10:45:12 PM: Worker 31 finishes Suresh's remaining positions. Result:
Worker 31 Final Status:
Suresh's positions: 310 total
SETTLED: 309
FAILED: 1 (position 148, DB_TIMEOUT)
10:45:15 PM: All other workers complete. Global status:
EVENT SETTLEMENT STATUS: MI vs CSK
====================================
Total positions: 4,000
SETTLED: 3,999
FAILED: 1
CONFIRMED: 0 (reconciliation pending)
Failed position:
Agent: Suresh
Position ID: pos_2847
Bet: ₹8,000 on MI to win @ 1.85
Failure: DB_TIMEOUT at ledger update
Retry: scheduled in 30 seconds
10:45:45 PM: The settlement monitor picks up the FAILED position. It resets it to PENDING. A worker claims it and retries. This time, the database is healthy. The position settles successfully.
10:46:00 PM: Reconciliation runs for all agents. All invariants pass. All 4,000 positions move to CONFIRMED.
10:46:01 PM: Agents see settlement results on their dashboards:
RAJESH'S SETTLEMENT: MI vs CSK
================================
Result: MI wins
Your retained positions:
56 bets backing MI: +₹2,34,000 (punters lost)
24 bets backing CSK: -₹1,18,000 (punters won)
Net P&L: +₹1,16,000
Forwarded positions settled with Vikram:
Forwarded P&L: -₹42,000 (Vikram owes you ₹42,000)
Settlement time: 16 seconds (from event result to confirmed)
25. Cash-Out / Early Settlement Design
What Is Cash-Out?
Cash-out allows a punter to settle their bet early, before the event finishes. The punter locks in a guaranteed profit (or limits a loss) instead of waiting for the final result. From the system's perspective, cash-out is not magic -- it is simply a counter-bet at current odds that closes out the original position.
How Cash-Out Price Is Calculated
The cash-out price is the fair value of the punter's position, minus a margin for the platform. Here is the formula:
CASH-OUT CALCULATION
=====================
Original bet: Back MI to win at odds 1.85, stake ₹10,000
→ Potential win if MI wins: ₹8,500
→ Loss if MI loses: ₹10,000
Current situation: MI now at odds 1.20 (MI dominating)
→ The bet is "in profit" because MI is more likely to win now
Fair value of the position:
→ If Amit held the OPPOSITE position now (lay MI at 1.20),
he would need to risk ₹2,000 to win ₹10,000
→ His original bet pays ₹18,500 total return if MI wins
→ A lay at 1.20 for ₹10,000 stake costs ₹2,000 liability
Cash-out value (fair):
= Original stake - (Original stake / Current odds)
= ₹10,000 - (₹10,000 / 1.20)
... wait, let us use the proper formula:
Cash-out value = Stake * (Original odds / Current odds)
= ₹10,000 * (1.85 / 1.20)
= ₹10,000 * 1.5417
= ₹15,417
But this is the TOTAL return. The profit portion:
= ₹15,417 - ₹10,000 = ₹5,417
This is the fair value. Now apply the cash-out margin (typically 3-5%):
Cash-out margin: 5%
Cash-out offer = ₹15,417 * (1 - 0.05) = ₹14,646
Punter profit if they cash out: ₹14,646 - ₹10,000 = ₹4,646
(vs. ₹8,500 if MI wins and ₹0 return if MI loses)
The General Cash-Out Formula
For a back bet:
cash_out_return = stake * (original_odds / current_odds) * (1 - margin)
cash_out_profit = cash_out_return - stake
For a losing position (odds moved against the punter):
Original: MI at 1.85, stake ₹10,000
Current: MI at 3.50 (MI struggling)
cash_out_return = 10,000 * (1.85 / 3.50) * (1 - 0.05)
= 10,000 * 0.5286 * 0.95
= ₹5,021
Punter gets back ₹5,021 of their ₹10,000 stake.
They accept a ₹4,979 loss now instead of risking the full ₹10,000.
How Cash-Out Routes Through the Cascade
This is the critical design decision. The cash-out counter-bet must route through the SAME proportions as the original bet, not the current matrix.
Why? Because the positions are held at specific agents in specific proportions. If Rajesh retained 60% of the original bet, he holds 60% of the position. The cash-out must close 60% of his position, not whatever the current matrix says.
How Each Agent's Position Changes
BEFORE CASH-OUT (MI at 1.20)
==============================
Agent Retained Stake Potential Payout Net Exposure
------- -------------- ---------------- ------------
Rajesh ₹6,000 ₹11,100 (if MI wins) ₹5,100 liability
Vikram ₹2,400 ₹4,440 (if MI wins) ₹2,040 liability
Platform ₹1,600 ₹2,960 (if MI wins) ₹1,360 liability
AFTER CASH-OUT
===============
All positions CLOSED. Each agent's P&L is locked in:
Agent Retained Stake Cash-out Portion Agent's P&L
------- -------------- ---------------- -----------
Rajesh ₹6,000 Pays ₹8,787.60* -₹2,787.60
Vikram ₹2,400 Pays ₹3,515.04* -₹1,115.04
Platform ₹1,600 Pays ₹2,343.36* -₹743.36
────────────────
Total: ₹14,646.00
* Each agent pays: their portion of cash_out_return - their portion of original stake
Rajesh: (₹14,646 * 60%) = ₹8,787.60 payout on ₹6,000 retained = -₹2,787.60 P&L
(This is a loss for Rajesh because MI is winning and he held the liability side)
Partial Cash-Out
Amit does not have to cash out 100% of his position. He can cash out any percentage.
Example: Amit cashes out 50% of his position
PARTIAL CASH-OUT (50%)
========================
Original: ₹10,000 on MI to win at 1.85
Cash-out: 50% at current odds 1.20
Cash-out portion: ₹5,000 (50% of original stake)
Cash-out return: ₹5,000 * (1.85 / 1.20) * 0.95 = ₹7,323
Cash-out profit: ₹7,323 - ₹5,000 = ₹2,323
Remaining active position: ₹5,000 on MI to win at 1.85
→ This continues normally, settles with the event result
Agent impact (using original 60/24/16 split on the cashed-out portion):
Rajesh: closes 60% of ₹5,000 = ₹3,000 position
Vikram: closes 24% of ₹5,000 = ₹1,200 position
Platform: closes 16% of ₹5,000 = ₹800 position
Each agent's REMAINING open position is also halved:
Rajesh: ₹3,000 retained stake remaining (was ₹6,000)
Vikram: ₹1,200 retained stake remaining (was ₹2,400)
Platform: ₹800 retained stake remaining (was ₹1,600)
What Happens If Betfair Liquidity Is Insufficient
When the platform's portion was originally hedged on Betfair, the cash-out must also close that hedge. If Betfair does not have sufficient liquidity to close the hedge at the expected price:
| Situation | What Happens |
|---|---|
| Hedge close-out is fully available | Platform closes hedge, cash-out proceeds normally |
| Hedge close-out is partially available | Platform closes what it can, retains the unhedged remainder as platform risk. Cash-out still proceeds for the punter (the punter's experience is not degraded by hedge-side issues). |
| Hedge close-out is unavailable (Betfair down or no liquidity) | Platform absorbs the full hedge portion as retained risk. Cash-out proceeds for the punter. Platform bears the residual risk. |
The punter always gets their cash-out amount. The platform absorbs liquidity risk on the hedge side.
How Cash-Out Interacts With NO_NEW_RISK
Cash-out creates a counter-position that reduces the agent's exposure. Therefore, it should be treated as a hedge and allowed even when NO_NEW_RISK is active.
NO_NEW_RISK CHECK FOR CASH-OUT
================================
1. Agent is in NO_NEW_RISK for MI vs CSK
2. A cash-out request arrives for a bet on MI to win
3. The cash-out creates a COUNTER-position (effectively a lay on MI)
4. This REDUCES the agent's worst-case liability
5. Therefore: ALLOWED, even under NO_NEW_RISK
This is the same logic as normal hedge detection:
If WorstCaseLiability AFTER < WorstCaseLiability BEFORE → Allow
Walk-Through: Amit Bets MI at 1.85, MI Now at 1.20, Amit Cashes Out
The original bet (placed at 7:30 PM):
Amit places ₹10,000 on MI to win at odds 1.85 during MI vs CSK IPL match, pre-match.
Routing (from original audit record):
- Rajesh retains 60%: ₹6,000 stake, ₹5,100 liability
- Vikram retains 24%: ₹2,400 stake, ₹2,040 liability
- Platform retains 16%: ₹1,600 stake, ₹1,360 liability (₹800 hedged on Betfair)
The match situation (9:15 PM, 15th over):
MI is 145/2, well on track. MI's odds have dropped from 1.85 to 1.20. Amit is sitting on a healthy unrealized profit.
Amit requests cash-out (9:15 PM):
Step 1: System loads original bet audit record. Proportions: 60/24/16.
Step 2: System gets current MI odds: 1.20 (from the live odds feed).
Step 3: Calculate cash-out value:
cash_out_return = 10,000 * (1.85 / 1.20) * (1 - 0.05)
= 10,000 * 1.5417 * 0.95
= ₹14,646
Amit's profit: ₹14,646 - ₹10,000 = ₹4,646
Step 4: System presents offer to Amit:
┌──────────────────────────────────────────────┐
│ CASH OUT │
│ │
│ Your bet: MI to win @ 1.85 │
│ Current odds: MI @ 1.20 │
│ │
│ Cash out for: ₹14,646 │
│ Your profit: ₹4,646 │
│ │
│ (If MI wins, you would win ₹8,500) │
│ (If MI loses, you lose ₹10,000) │
│ │
│ [Accept Cash-Out] [Keep Bet Active] │
└──────────────────────────────────────────────┘
Step 5: Amit accepts. The system creates counter-positions:
CASH-OUT EXECUTION
===================
Rajesh: CLOSE position of ₹6,000
Original liability: ₹5,100
Cash-out cost (his share): ₹14,646 * 0.60 = ₹8,787.60
Rajesh P&L: ₹6,000 (stake received) - ₹8,787.60 (cash-out paid) = -₹2,787.60
Rajesh exposure change: -₹5,100 (liability removed)
Vikram: CLOSE position of ₹2,400
Original liability: ₹2,040
Cash-out cost (his share): ₹14,646 * 0.24 = ₹3,515.04
Vikram P&L: ₹2,400 - ₹3,515.04 = -₹1,115.04
Vikram exposure change: -₹2,040 (liability removed)
Platform: CLOSE position of ₹1,600
Cash-out cost (its share): ₹14,646 * 0.16 = ₹2,343.36
Platform P&L: ₹1,600 - ₹2,343.36 = -₹743.36
Platform exposure change: -₹1,360 (liability removed)
Platform also closes Betfair hedge: counter-trade to flatten
Verification: ₹8,787.60 + ₹3,515.04 + ₹2,343.36 = ₹14,646.00 ✓
Step 6: Amit sees: "Cash-out successful. ₹14,646 credited to your account."
Step 7: Rajesh's dashboard updates: "Amit cashed out MI to win. Your exposure on MI vs CSK reduced by ₹5,100. P&L impact: -₹2,787.60."
Why the agents "lose" on this cash-out: MI is winning. The agents were on the liability side (they pay if MI wins). By cashing Amit out, they are locking in a loss. But this loss is smaller than what they would pay if MI wins and Amit's full ₹8,500 potential win materializes. The agents are actually reducing their worst-case outcome.
If MI had collapsed and lost, the agents would have profited ₹10,000 (the full stake). By allowing cash-out, they gave up some of that potential upside. This is the trade-off -- cash-out reduces volatility for everyone.
26. Lay Bet Support
What Is a Lay Bet?
A lay bet is the opposite of a back bet. When Amit backs MI to win, he is betting that MI will win. When Sonia lays MI to win, she is betting that MI will NOT win. Sonia wins if MI draws or loses.
In exchange-style betting (Betfair), every back bet has a corresponding lay bet. In B-Book systems, the agent is typically the layer -- they lay every back bet the punter places. But Hannibal must also support punters placing lay bets explicitly, because some markets and some punters operate this way.
How Liability Is Different for Lay Bets
For a back bet, the punter risks their stake and wins stake * (odds - 1).
For a lay bet, the punter risks stake * (odds - 1) and wins the stake.
BACK BET: Amit backs MI to win at 1.85 for ₹10,000
Amit risks: ₹10,000 (his stake)
Amit wins: ₹8,500 (if MI wins)
Bookie liability: ₹8,500
LAY BET: Sonia lays MI to win at 1.85 for ₹10,000
Sonia risks: ₹8,500 (her liability = stake * (odds - 1))
Sonia wins: ₹10,000 (if MI does NOT win)
Bookie liability: ₹10,000 (bookie pays if MI does NOT win)
The critical difference for exposure tracking: A lay bet's liability is stake * (odds - 1) for the punter, but from the bookie's (agent's) perspective, the liability depends on which outcome occurs.
Exposure Tracking for Lay Bets
This is where it gets interesting. A lay bet on MI to win is economically equivalent to a back bet on "MI not to win." This means:
EXPOSURE IMPACT OF LAY BETS
=============================
Current position on MI vs CSK (Rajesh's book):
Before any lay bets:
MI Win outcome: ₹5,00,000 liability (back bets on MI)
MI Not Win outcome: ₹0 liability
Sonia lays MI to win for ₹10,000 at 1.85:
If MI wins: Sonia pays ₹8,500 to Rajesh → REDUCES MI Win liability
If MI not win: Rajesh pays ₹10,000 to Sonia → INCREASES MI Not Win liability
After Sonia's lay bet:
MI Win outcome: ₹4,91,500 liability (reduced by ₹8,500!)
MI Not Win outcome: ₹10,000 liability (increased by ₹10,000)
WORST CASE BEFORE: ₹5,00,000 (MI Win)
WORST CASE AFTER: ₹4,91,500 (MI Win is still worse, but reduced)
Key insight: A lay bet on outcome X DECREASES the agent's exposure on outcome X and INCREASES exposure on the other outcomes. This is a natural hedge.
How the Forwarding Matrix Handles Lay Bets
Lay bets use the same 5-dimensional matrix as back bets, with one addition: the matrix resolution considers whether the bet direction (back vs lay) creates a hedge.
FORWARDING MATRIX RESOLUTION FOR LAY BETS
============================================
Step 1: Resolve the forward percentage normally
→ Same matrix, same rules, same precedence chain
→ Sonia's lay MI at 1.85 matches Rule R3: forward 40%
Step 2: Check if this lay bet is a natural hedge for the agent
→ Does the agent have existing liability on MI Win? YES (₹5,00,000)
→ Does this lay bet reduce that liability? YES (by ₹8,500)
→ Therefore: this is a hedge
Step 3: If hedge AND agent is in NO_NEW_RISK:
→ Allow the retention (do not force 100% forward)
→ The agent WANTS to keep this bet because it reduces their exposure
Step 4: If hedge AND agent is NOT in NO_NEW_RISK:
→ Normal matrix rules apply
→ Agent retains their configured percentage
The forwarding matrix does not need a separate "lay" dimension. The bet is processed through the same rules. The only difference is in how the exposure ledger is updated and how NO_NEW_RISK evaluates the bet.
How Hedge Detection Recognizes Lay Bets
The existing hedge detection formula works perfectly for lay bets:
HEDGE DETECTION FOR LAY BETS
==============================
Rule: If WorstCaseLiability AFTER < WorstCaseLiability BEFORE → it is a hedge
Before Sonia's lay bet:
Worst case (MI Win): ₹5,00,000
Worst case (MI Not Win): ₹0
WorstCaseLiability: MAX(₹5,00,000, ₹0) = ₹5,00,000
After Sonia's lay bet (Rajesh's portion, 60%):
Rajesh keeps 60% of Sonia's lay → ₹6,000 stake, ₹5,100 liability adjustment
Worst case (MI Win): ₹5,00,000 - ₹5,100 = ₹4,94,900
Worst case (MI Not Win): ₹0 + ₹6,000 = ₹6,000
WorstCaseLiability: MAX(₹4,94,900, ₹6,000) = ₹4,94,900
₹4,94,900 < ₹5,00,000 → WorstCaseLiability decreased → HEDGE CONFIRMED
How NO_NEW_RISK Correctly Allows Hedging Lay Bets
When would a lay bet NOT be a hedge? If Rajesh's book is already heavily exposed on "MI Not Win" (meaning he has lots of bets backing CSK or backing the draw), then a lay on MI (which increases "MI Not Win" exposure) would increase his worst case. In that scenario, the lay bet is NOT a hedge and is forwarded 100% under NO_NEW_RISK.
Walk-Through: Sonia Lays MI to Win at 1.85 While Rajesh Is in NO_NEW_RISK
Setup: Rajesh has hit his per-match limit on MI vs CSK. His current exposure:
Rajesh's MI vs CSK Book:
MI Win outcome: ₹4,98,000 liability ← this is the worst case
MI Not Win outcome: ₹45,000 liability
Match limit: ₹5,00,000
Status: NO_NEW_RISK (₹4,98,000 / ₹5,00,000 = 99.6%)
Sonia places a lay bet: Lay MI to win at 1.85 for ₹10,000.
Step 1: Win cap check. Sonia's maximum loss on this lay bet is ₹8,500 (stake * (odds - 1)). Her per-click cap is ₹50,000. Pass.
Step 2: Matrix resolution. Rajesh's matrix says forward 40% for this bet type. Rajesh would retain 60%.
Step 3: NO_NEW_RISK check. Rajesh is in NO_NEW_RISK. Is this a hedge?
Rajesh retains 60% of Sonia's lay:
→ Stake portion: ₹6,000
→ If MI wins: Rajesh RECEIVES ₹5,100 from Sonia (reduces MI Win liability)
→ If MI not win: Rajesh PAYS ₹6,000 to Sonia (increases MI Not Win liability)
New worst cases:
MI Win: ₹4,98,000 - ₹5,100 = ₹4,92,900
MI Not Win: ₹45,000 + ₹6,000 = ₹51,000
New worst case liability: MAX(₹4,92,900, ₹51,000) = ₹4,92,900
Old worst case liability: ₹4,98,000
₹4,92,900 < ₹4,98,000 → HEDGE CONFIRMED → ALLOW
Step 4: Position creation. Rajesh retains 60% of Sonia's lay bet. The exposure ledger is updated:
AFTER SONIA'S LAY BET:
Rajesh's MI vs CSK Book:
MI Win outcome: ₹4,92,900 liability ← reduced!
MI Not Win outcome: ₹51,000 liability
Match limit: ₹5,00,000
Status: still NO_NEW_RISK (but closer to exiting)
Step 5: Cascade. The remaining 40% (₹4,000) flows to Vikram, who processes it through his own matrix and cap checks normally.
Result: Rajesh's exposure on MI Win dropped from ₹4,98,000 to ₹4,92,900. If enough lay bets come in, Rajesh could exit NO_NEW_RISK entirely. The system correctly identified the lay bet as a hedge and allowed it.
27. Agent-Punter Collusion Detection
The Problem
Collusion between an agent and a punter is one of the most damaging exploits in a B-Book system. The basic scheme: Rajesh knows that sharp/winning bets get forwarded to his upline (Vikram). If Rajesh conspires with Amit, he can mark Amit as NORMAL (even though Amit is sharp), retain Amit's bets, and pocket the winnings. When Amit loses, Rajesh can retroactively mark him as SHARP to forward the losing flow upline.
A more sophisticated version: Rajesh marks Amit as SHARP to forward most of his bets upline. But Rajesh and Amit have agreed to split the profits. Amit consistently wins, Vikram consistently loses on the forwarded flow, and Rajesh and Amit split the difference off-platform.
Collusion Signals
| Signal | What It Looks Like | Severity |
|---|---|---|
| Classification flip before winning streak | Agent changes user from NORMAL to SHARP (or vice versa), and within 24 hours the user has a winning streak | HIGH |
| Classification flip-flop | Agent changes user classification back and forth more than 3 times in a week | HIGH |
| Override matches outcome | User override percentage changes correlate with subsequent bet outcomes (higher forward when user wins, lower when user loses) | CRITICAL |
| Selective forwarding timing | Matrix changes coincide with specific user's betting patterns | HIGH |
| Forwarded flow consistently loses | Bets forwarded by this agent to upline have a significantly worse P&L than random chance | MEDIUM (could be legitimate if agent has good sharp detection) |
| Single user dominates forwarded flow | One user accounts for >30% of forwarded volume from this agent | MEDIUM |
| Win rate inversion | Agent's retained bets have above-market win rate while forwarded bets have below-market win rate | HIGH |
The Cooling-Off Period for Classification Changes
To prevent the "flip before win" exploit, classification changes have a mandatory cooling-off period:
CLASSIFICATION CHANGE RULES
=============================
When an agent changes a user's classification (e.g., NORMAL → SHARP):
1. The change is QUEUED, not applied immediately
2. Cooling-off period: 24 hours
3. During the cooling-off period:
→ The OLD classification remains active for matrix resolution
→ The new classification is visible as "PENDING" in the agent dashboard
→ The agent cannot change the classification again until the current change completes
4. After 24 hours: the new classification takes effect
5. Exception: SHARP → NORMAL direction has a 72-hour cooling-off period
(because downgrading from SHARP to NORMAL is the more exploitable direction)
OVERRIDE CHANGES follow the same rules:
→ Changing a user override from 80% forward to 20% forward: 24-hour cooling-off
→ Changing from 20% to 80%: 6-hour cooling-off (less risky direction)
Why different periods for different directions? Moving a user from NORMAL to SHARP increases forwarding (less risky for the agent, more protective). Moving from SHARP to NORMAL decreases forwarding (agent retains more, potentially exploitable). The riskier direction gets a longer cooling-off period.
Upline Audit Rights on Downstream Overrides
Vikram (upline) has the right to see and challenge classification changes made by Rajesh (downstream):
UPLINE AUDIT RIGHTS
=====================
Vikram can see (real-time):
✓ All user classifications set by Rajesh
✓ All pending classification changes
✓ History of all classification changes with timestamps
✓ Correlation report: classification changes vs user outcomes
Vikram can do:
✓ Flag a classification change for review
✓ Request the platform to freeze Rajesh's override capability
✓ Set minimum forwarding for specific users (override Rajesh's override)
Vikram CANNOT do:
✗ Directly change Rajesh's user classifications (that is Rajesh's business)
✗ See Rajesh's full user list (only users whose bets are forwarded to Vikram)
The Correlation Engine
The anti-collusion system runs a correlation analysis nightly:
COLLUSION CORRELATION ANALYSIS
================================
For each agent, for each user with classification changes in the last 30 days:
1. Build timeline:
[Classification change timestamps] + [Bet placement timestamps] + [Bet outcomes]
2. Calculate: Within 48 hours after each classification change:
→ Count of bets placed by this user
→ Win rate of those bets
→ Compare against the user's historical win rate
→ Compare against the market-expected win rate
3. Score the correlation:
→ If win rate AFTER classification change is >2 standard deviations above normal:
COLLUSION_SCORE += 25 per occurrence
→ If classification was changed from SHARP to NORMAL just before a winning streak:
COLLUSION_SCORE += 50
→ If the same pattern repeats 3+ times:
COLLUSION_SCORE += 100
4. Thresholds:
→ COLLUSION_SCORE < 25: No action
→ 25-75: Informational alert to platform compliance team
→ 75-150: Warning to agent + upline notification
→ 150+: Automatic freeze on agent's override capability, mandatory review
Alert Escalation Workflow
Walk-Through: Rajesh and Amit Collude
The scheme: Rajesh and Amit have an agreement. Amit is genuinely sharp -- he has a positive CLV over 1,000+ bets. Rajesh knows this. Instead of marking Amit as SHARP (which would forward 95% to Vikram), Rajesh keeps Amit as NORMAL (forwarding only 40%). Amit wins consistently, and Rajesh profits because he retained 60% of winning bets. They split the extra profit off-platform.
Week 1: Amit places 45 bets. Wins 28. Rajesh retained 60% of each. Rajesh's retained P&L from Amit: +₹1,85,000.
The system's sharp detection flags Amit based on CLV and win rate. It suggests to Rajesh: "Amit shows sharp characteristics. Consider classifying as SHARP."
Rajesh ignores the suggestion.
Week 2: Amit places 50 bets. Wins 31. Rajesh's retained P&L from Amit: +₹2,10,000.
The system sends a stronger alert to Rajesh: "Amit's CLV is +4.2% over 95 bets. This is above the SHARP threshold. Classification recommended."
Rajesh still ignores it.
Week 2 (same time): The cross-agent detection system notices that Amit's win rate (62%) is significantly above expected (48% given the odds profile). It also notices that Rajesh has NOT classified Amit as SHARP despite the system's recommendation. This triggers:
ANOMALY DETECTION ALERT
=========================
Agent: Rajesh
User: Amit
Alert Type: SUSPECTED_CLASSIFICATION_MANIPULATION
Evidence:
1. Amit's 95-bet CLV: +4.2% (SHARP threshold: +2.5%)
2. System recommended SHARP classification 2 times
3. Agent has not acted on recommendations for 14 days
4. Agent's retained P&L from Amit: +₹3,95,000 (top 1% among all agents)
5. If classified as SHARP, 95% would have been forwarded to Vikram
Rajesh would have retained ~₹20,000 instead of ₹3,95,000
Collusion Score: 85 (WARNING level)
Actions Taken:
- Compliance team notified
- Vikram (upline) notified: "Your sub-agent Rajesh may be under-classifying user Amit"
- Rajesh's dashboard shows: "Compliance review pending for user Amit"
Week 3: Rajesh, realizing he has been flagged, marks Amit as SHARP. But the 72-hour cooling-off period means the change does not take effect for 3 days. During those 3 days, Amit places 15 more bets at NORMAL classification.
Week 3 + 72 hours: Classification change takes effect. Amit's bets are now forwarded at 95%.
Meanwhile: The compliance team reviews the case. They see:
- Rajesh ignored 2 system recommendations
- Rajesh profited ₹4,50,000+ from a user who should have been classified as SHARP
- The timing of the eventual classification change coincides exactly with the compliance alert
Outcome: The compliance team escalates to the platform operations team, who:
- Review the last 30 days of Amit's bets under Rajesh
- Calculate the financial impact: ₹4,50,000 in retained profit that would have been ₹22,500 at SHARP classification
- Issue a clawback of the excess profit (₹4,27,500) from Rajesh's settlement account
- Place Rajesh on probation: his override capability is frozen for 90 days, all classifications are managed by the platform
28. Agent Hierarchy Migration
The Problem
Agent hierarchies are not static. In the real world, agents switch uplines all the time. Rajesh might leave Vikram's network and join Suresh's. This happens because of better commission terms, personal disputes, or business restructuring. The system must handle this migration cleanly, especially when Rajesh has open positions that were routed through Vikram.
Effective-Dated Hierarchy Changes
Hierarchy changes are never instantaneous. They take effect at a scheduled date and time, giving the system time to prepare:
HIERARCHY MIGRATION REQUEST
=============================
Request: Move Rajesh from Vikram to Suresh
Requested by: Platform admin
Effective date: 2026-03-01 00:00:00 IST (start of next weekly period)
Current state: Rajesh has ₹15,00,000 open liability forwarded through Vikram
Migration phases:
Phase 1 (NOW → effective date): PREPARATION
Phase 2 (effective date): CUTOVER
Phase 3 (effective date → cleanup): DUAL PATH
Phase 4 (after all old positions settle): COMPLETE
Dual-Path Settlement
This is the key design challenge. After cutover:
DUAL-PATH ROUTING
==================
BEFORE cutover (Feb 28):
Rajesh → Vikram → Platform → Betfair
All bets, all positions, all settlements go through Vikram
AFTER cutover (March 1):
NEW bets: Rajesh → Suresh → Platform → Betfair
OLD positions: Still settled through Vikram (he holds the positions!)
Why dual-path? Because Vikram's exposure ledgers reflect the positions he holds.
Settling them through Suresh would be incorrect -- Suresh never held that risk.
| Bet Timing | Routing Path | Settlement Path |
|---|---|---|
| Placed before cutover, settling before cutover | Rajesh → Vikram | Through Vikram |
| Placed before cutover, settling after cutover | Rajesh → Vikram | Through Vikram (dual-path) |
| Placed after cutover | Rajesh → Suresh | Through Suresh |
Open Exposure Handling During Transition
At cutover time, Rajesh has ₹15,00,000 in open liability forwarded to Vikram. This creates a financial obligation:
OPEN EXPOSURE RECONCILIATION
==============================
At cutover (March 1):
1. Freeze: No changes to Rajesh's forwarding through Vikram
→ Vikram's exposure ledger is frozen for Rajesh's old positions
→ New bets from Rajesh do NOT affect Vikram's ledgers
2. Track: Old positions are tagged with migration_id
→ Every position that existed at cutover time gets:
migration_id: "mig_rajesh_vikram_to_suresh_20260301"
routing_path: "OLD" (Vikram)
3. Settle: As old events settle, positions flow through Vikram
→ Vikram settles normally
→ When Vikram's Rajesh-related open liability reaches zero → dual-path ends
4. Financial bridge: If Rajesh owes Vikram (or vice versa) from old positions,
the settlement continues until all old positions are resolved
Financial Settlement Between Old and New Upline
The tricky part: what if Rajesh has a net credit with Vikram from unsettled positions? And what about the weekly settlement cycle?
FINANCIAL SETTLEMENT AT MIGRATION
====================================
Step 1: Calculate Rajesh's net position with Vikram at cutover
Open positions forwarded to Vikram: ₹15,00,000 liability
Unsettled P&L (from recently settled events): ₹2,30,000 (Vikram owes Rajesh)
Step 2: Vikram pays the unsettled P&L to Rajesh immediately
→ ₹2,30,000 transferred (this is money already earned, not speculative)
Step 3: Open positions continue under Vikram until they settle
→ No money changes hands until settlement
→ Each settlement adjusts the balance between Rajesh and Vikram
→ Rajesh's weekly settlements with SURESH only include NEW bets
Step 4: When all old positions have settled:
→ Final reconciliation between Rajesh and Vikram
→ Any remaining balance settled
→ Migration status: COMPLETE
→ Vikram's records for Rajesh archived
Walk-Through: Rajesh Moves From Vikram to Suresh With 15 Lakh Open Liability
Background: Rajesh has been under Vikram for 3 years. Suresh offers better terms (lower forwarding commission). Rajesh negotiates the move. The platform admin approves the migration for March 1.
February 25 -- REQUESTED:
- Admin creates migration request
- System calculates: Rajesh has 42 open bets forwarded to Vikram, totaling ₹15,20,000 liability
- Notification sent to Vikram: "Rajesh is migrating to Suresh, effective March 1. You have 42 open positions to settle."
- Notification sent to Suresh: "Rajesh is joining your network, effective March 1."
February 26-28 -- PREPARATION:
- Vikram confirms he is aware. No action needed from him.
- Suresh confirms he is ready. His limits are checked: can he absorb Rajesh's typical daily flow?
- Rajesh's forwarding matrix is cloned for the new relationship (he can modify it after cutover)
- System pre-computes: "After cutover, Rajesh's bets will be processed by Suresh with these limits..."
March 1, 00:00 IST -- CUTOVER:
CUTOVER EXECUTED
==================
1. Rajesh's hierarchy parent changed: Vikram → Suresh
2. All existing positions tagged: migration_id = mig_rajesh_v2s_20260301
3. New bets from Rajesh's punters now route to Suresh
Dashboard shows:
Rajesh: "You are now under Suresh. 42 old positions still settling through Vikram."
Vikram: "Rajesh has moved. 42 open positions remain for settlement."
Suresh: "Rajesh has joined. New bets are now routing through you."
March 1-7 -- DUAL_PATH:
- 35 of the 42 old positions settle during the week (events complete)
- Each settlement flows through Vikram normally
- New bets (150+ during the week) flow through Suresh
March 8 -- Remaining old positions:
- 7 positions remain (from events that have not yet concluded)
- These are long-dated bets (tournament winner, series result)
- Rajesh and Vikram continue to settle these as the events conclude
March 15 -- Last old position settles:
- The final pre-migration position settles
- Net financial settlement between Rajesh and Vikram: Vikram owes Rajesh ₹42,000
- Transfer executed
- Migration status: COMPLETE
- Vikram's records for Rajesh are archived
- Dual-path routing for Rajesh is deactivated
29. Minimum Forwarding / Skin-in-the-Game Requirements
The Problem
Vikram does not want Rajesh to forward 100% of sharp bets. If Rajesh forwards everything that loses and keeps everything that wins, Vikram is just absorbing toxic flow. Vikram wants to require Rajesh to have "skin in the game" -- a minimum amount of every bet that Rajesh MUST retain, regardless of his matrix settings.
How Minimum Retention Works
The upline agent sets a minimum retention percentage per downstream agent. This floor cannot be overridden by the downstream agent's matrix.
MINIMUM RETENTION CONFIGURATION
=================================
Vikram's settings for his sub-agents:
| Sub-Agent | Min Retention | Why |
|-----------|--------------|-----|
| Rajesh | 20% | Experienced, trusted, but must keep skin in the game |
| Priya | 30% | Newer agent, should retain more to build discipline |
| Arun | 10% | Very experienced, low minimum needed |
Where It Is Checked in the Cascade
The minimum retention check happens AFTER matrix resolution but BEFORE cap evaluation:
The important nuance: If Rajesh's limits would be breached by retaining the minimum 20%, the system does NOT reject the bet. Instead, Rajesh retains as much as his limits allow (which may be less than 20%). The minimum retention is enforced as a floor on the MATRIX percentage, not an absolute floor on the final retained amount. Limits always win over matrix settings -- this is a safety rule.
How Violations Are Handled
When Rajesh tries to configure his matrix in a way that violates Vikram's minimum retention:
SCENARIO: Rajesh tries to set 100% forwarding for SHARP users
Vikram's minimum retention requirement: 20%
SYSTEM RESPONSE:
┌────────────────────────────────────────────────────────┐
│ ⚠ Configuration Conflict │
│ │
│ You set: Forward 100% for SHARP users │
│ Your upline (Vikram) requires: Minimum 20% retention │
│ │
│ Adjusted rule: Forward 80% for SHARP users │
│ You will retain at least 20% of these bets. │
│ │
│ [Accept Adjusted Rule] [Contact Vikram to Negotiate] │
└────────────────────────────────────────────────────────┘
The system auto-adjusts the forwarding percentage to comply with the minimum retention. The agent sees the adjusted value. The agent cannot save a matrix rule that would violate their upline's minimum retention requirement.
Walk-Through: Vikram Requires 20%, Rajesh Tries to Forward 100% for Sharps
Setup: Vikram sets minimum retention of 20% for Rajesh. Rajesh, who has been losing money on a sharp user (Amit), wants to forward 100% of Amit's bets.
Attempt 1: Rajesh sets user override for Amit = 100% forward
System checks: 100% forward means 0% retention. Vikram's minimum is 20%. System response: "Cannot set forwarding above 80% for this user. Your upline requires 20% minimum retention. Adjusted to 80% forward."
Rajesh accepts. Amit's bets are now forwarded at 80%.
Attempt 2: Rajesh modifies his matrix Rule R1 (SHARP users) from 95% to 100%
System checks: 100% > max allowed (80%). System response: "Forwarding capped at 80%. Rule saved as 80% forward."
What about the catch-all rule? If Rajesh's catch-all rule (////*) is set to forward 50%, and Vikram's minimum is 20%, no conflict -- 50% forward means 50% retention, which exceeds 20%. No adjustment needed.
Why this design is correct: Vikram has a legitimate interest in ensuring Rajesh has skin in the game. Without minimum retention, Rajesh could dump all negative-expected-value flow upline while keeping positive-expected-value flow. The minimum retention ensures Rajesh shares in both the upside and downside of every bet, aligning incentives across the hierarchy.
30. Panic Button Abuse Prevention
The Problem
The panic button (Section 14) is a powerful tool: it immediately forwards 100% of new bets and hedges all retained positions on Betfair. Used legitimately, it is a safety net. Used abusively, it is a money machine.
The abuse pattern: Rajesh watches the match. When things go badly (his retained positions are losing), he hits panic -- hedging at current prices and locking in a partial loss. When things go well, he does not press panic -- collecting the full profit. Over time, this asymmetric usage means Rajesh only takes losses when they are small (he panicked early) and takes full profits when things go well. The cost of hedge execution spread is borne by the platform (or Betfair liquidity).
Who Bears the Cost of Hedge Execution
When the panic button is pressed, hedge orders are placed on Betfair. The spread between the price the system gets and the theoretical mid-price is a real cost. Who pays?
PANIC HEDGE COST ALLOCATION
=============================
When Rajesh presses panic at 9:30 PM:
1. System places hedge orders on Betfair for all Rajesh's retained positions
2. Betfair mid-price for MI to win: 1.50
3. System gets filled at: 1.48 (laying) and 1.52 (backing)
4. Spread cost: approximately 1.3% of hedged amount
Cost allocation:
→ First panic in a period: Platform absorbs the spread cost
(This is a legitimate safety feature)
→ Second panic in same period: 50% spread cost charged to Rajesh's P&L
→ Third+ panic in same period: 100% spread cost charged to Rajesh's P&L
This makes the first panic "free" (encouraging use when genuinely needed)
but makes repeated use increasingly expensive (discouraging abuse).
Usage Limits and Cooling-Off Periods
| Control | Value | Rationale |
|---|---|---|
| Panics per night period | 2 free, unlimited at cost | Night sessions are volatile; 2 free panics covers genuine emergencies |
| Panics per week | 5 free, unlimited at cost | Weekly cap prevents chronic abusers |
| Cooling-off after panic | 30 minutes before matrix can be restored | Prevents the "panic, wait 5 minutes, restore, panic again" cycle |
| Minimum hedge duration | 15 minutes | Once hedged, positions stay hedged for at least 15 minutes. Agent cannot un-hedge immediately when conditions improve. |
| Panic cost escalation | 0%, 50%, 100%, 100%... per period | Each subsequent panic in the same period is more expensive |
Monitoring and Flagging
PANIC BUTTON ABUSE DETECTION
==============================
The system tracks per agent, per period:
1. Panic frequency:
→ More than 3 panics per week for 3 consecutive weeks: FLAG
→ More than 2 panics in a single night: WARNING
2. Panic timing correlation:
→ Agent panics when their retained book is losing > ₹X
→ Agent does NOT panic when their retained book is winning
→ Asymmetric panic usage: collusion score increases
3. Panic profitability analysis:
→ Calculate: what would Rajesh's P&L be WITHOUT panic hedges?
→ Compare: what IS Rajesh's P&L WITH panic hedges?
→ If panic usage consistently improves P&L by > 20%: FLAG
4. Post-panic behavior:
→ Agent immediately restores original matrix after cooling-off ends: SUSPICIOUS
→ Agent keeps hedged state for hours: LEGITIMATE
Differentiating Legitimate Panic From Gaming
| Behavior | Classification | Reason |
|---|---|---|
| Panic during a genuinely volatile match event (3 wickets in 1 over) | LEGITIMATE | Match conditions warrant caution |
| Panic at the start of every match, restore after 30 minutes | GAMING | Pattern suggests routine use, not emergency |
| Panic once per month during a crisis | LEGITIMATE | Rare, appropriate use |
| Panic 3 times per week, always when losing | GAMING | Asymmetric usage exploits the hedge |
| Panic after seeing a corruption alert or integrity flag | LEGITIMATE | Responding to genuine threat signal |
Walk-Through: Rajesh Presses Panic When Losing, Repeats Weekly
Week 1, Wednesday: MI vs CSK. Rajesh retains ₹8,00,000 backing MI. MI loses 4 wickets cheaply. Rajesh's retained book is down ₹2,50,000 unrealized. He presses panic.
Result: System hedges all positions. Rajesh locks in a ₹1,80,000 loss (better than the potential ₹8,00,000 if MI collapses completely). Spread cost: ₹10,400. First panic of the period -- platform absorbs the cost.
Week 1, Friday: RCB vs DC. Rajesh retains ₹6,00,000 backing RCB. RCB's top batsman gets out. Rajesh presses panic.
Result: System hedges. Rajesh locks in a ₹95,000 loss. Spread cost: ₹7,800. Second panic of the period -- 50% charged to Rajesh (₹3,900).
Week 1, Sunday: KKR vs SRH. KKR winning. Rajesh's book is up ₹1,50,000. He does NOT press panic. KKR wins. Rajesh collects ₹1,50,000.
Week 2: Same pattern. Panic when losing, hold when winning.
Week 3: Same pattern. The abuse detection system now has 3 weeks of data.
PANIC ABUSE ALERT
==================
Agent: Rajesh
Pattern detected over 3 weeks:
Panics triggered: 7
Panics when book was losing: 7 (100%)
Panics when book was winning: 0 (0%)
P&L without panic: -₹4,20,000 (net loss over 3 weeks)
P&L with panic: -₹1,85,000 (net loss reduced by ₹2,35,000)
Panic improved P&L by: 56%
Spread cost absorbed by platform: ₹38,000
Assessment: GAMING (asymmetric panic usage)
Actions:
1. Rajesh's next panic incurs 100% spread cost
2. Alert sent to Rajesh: "Your panic button usage is under review."
3. Vikram (upline) notified
4. If pattern continues 1 more week: panic feature suspended for 30 days,
replaced with automatic NO_NEW_RISK (which does not hedge existing positions)
31. Timestamp and Period Boundary Security
The Problem
If the client's clock determines when a bet was placed, punters and agents can manipulate timestamps. A punter could backdate a bet to before a period boundary (when limits had more headroom). An agent could manipulate their clock to extend a favorable night period. The system must use server-side timestamps for all authoritative decisions.
Where the Authoritative Timestamp Is Assigned
The authoritative timestamp is assigned at the earliest possible point in the server-side processing pipeline, before any business logic executes:
BET PROCESSING PIPELINE -- TIMESTAMP ASSIGNMENT
=================================================
1. Client sends bet request
→ Client includes client_timestamp (informational only, never trusted)
2. API gateway receives request
→ SERVER TIMESTAMP ASSIGNED HERE: request_received_at = NOW() on the server
→ This is the AUTHORITATIVE timestamp for ALL downstream decisions
→ It is immutable -- no subsequent step can change it
3. Request is queued for processing
→ processing_started_at = NOW() (separate timestamp, for latency tracking)
4. Matrix resolution uses request_received_at for period boundary evaluation
→ "Is this bet in the night period?" uses request_received_at, NOT client_timestamp
5. Position creation uses request_received_at as the official bet placement time
→ All exposure ledger updates reference this timestamp
6. Audit record stores BOTH timestamps:
→ client_timestamp: what the client claimed (for debugging/fraud detection)
→ server_timestamp: the authoritative time (for all business logic)
How Period Boundaries Are Determined
Period boundary evaluation always uses the server clock:
PERIOD BOUNDARY EVALUATION
============================
Input: request_received_at = 2026-02-11T16:29:59.500Z (UTC)
Agent: Rajesh (IST, night period 19:00-02:00)
Step 1: Convert to agent's timezone
→ 16:29:59.500 UTC = 21:59:59.500 IST
Step 2: Is this within the night period?
→ Night start: 19:00 IST → YES, 21:59 is after 19:00
→ Night end: 02:00 IST → YES, 21:59 is before 02:00
→ Result: NIGHT PERIOD
Step 3: Check against night period limits
→ Use night_period exposure ledger
How Clock Skew Between Server Instances Is Handled
In a distributed deployment with multiple server instances, each instance has a slightly different clock. The maximum acceptable clock skew is managed through NTP synchronization:
CLOCK SKEW MANAGEMENT
========================
Requirement: All server instances must be synchronized to within 50ms of UTC
Mechanism: NTP (Network Time Protocol) with multiple time sources
If NTP sync fails:
→ Instance reports CLOCK_DRIFT_WARNING
→ If drift exceeds 200ms: instance is removed from the load balancer
→ If drift exceeds 1 second: instance auto-quarantines (stops accepting bets)
For period boundary decisions:
→ The 50ms skew window is irrelevant for period boundaries
(which are at hour granularity: 19:00, 02:00)
→ A bet at 01:59:59.950 on Instance A and 02:00:00.050 on Instance B
might be evaluated differently, but this is a 100ms window
at most -- acceptable given the hour-scale period boundaries
For exposure counter consistency:
→ Timestamps on exposure ledger updates are server-generated
→ The ordering of writes is determined by the database (which has one clock),
not by the application servers
→ Even if two instances disagree by 50ms on the time, the database
orders writes correctly using its own monotonic clock
Bets at the Period Boundary
What happens when a bet arrives at exactly the boundary? For example, at 02:00:00.000 IST (the night period end for Rajesh)?
PERIOD BOUNDARY TIE-BREAKING
==============================
Rule: Bets at EXACTLY the boundary time belong to the ENDING period.
Why: The night period is defined as 19:00:00.000 to 01:59:59.999.
02:00:00.000 is the first moment of the next period (day).
In practice:
→ request_received_at = 2026-02-12T01:59:59.999 IST → NIGHT period
→ request_received_at = 2026-02-12T02:00:00.000 IST → DAY period
This is a closed-open interval: [19:00, 02:00)
For exposure carry-forward:
→ When the night period ends at 02:00, any open positions from the night
are carried forward to the day period (as described in Section 9)
→ The bet at 01:59:59.999 is the last bet counted against night limits
→ The bet at 02:00:00.000 is the first bet counted against day limits
Client Timestamp Fraud Detection
The client_timestamp is not trusted but is useful for detecting anomalies:
| Anomaly | What It Means | Action |
|---|---|---|
| client_timestamp is > 30 seconds before server_timestamp | Client clock is behind, or deliberate manipulation | Log for monitoring. No immediate action. |
| client_timestamp is > 5 seconds AFTER server_timestamp | Client clock is ahead, which is unusual | Log and flag. Client may be trying to claim a later timestamp. |
| client_timestamp is > 5 minutes different from server_timestamp | Significant discrepancy | Flag for review. Possible automation/bot activity. |
| client_timestamp is consistently exactly N seconds offset | Clock calibration issue or deliberate offset | Informational. Some devices have persistent clock drift. |
32. Sharp Detection Gaming via Multiple Accounts
The Problem
Sharp bettors know that bookmakers track their accounts and limit them. The obvious countermeasure: use many accounts. A syndicate of 50 accounts, each betting small amounts, can fly under the radar of per-account sharp detection. Each account looks like a casual punter. But collectively, they are placing coordinated bets that drain the agent's book.
The Detection Pillars
The cross-account syndicate detection system uses four independent signals. Any one signal alone might be coincidence. Two or more signals together strongly indicate coordination.
SYNDICATE DETECTION: 4 PILLARS
================================
Pillar 1: DEVICE FINGERPRINTING
→ Same device used by multiple accounts
→ Similar device configurations (screen size, OS version, installed fonts)
Pillar 2: IP / NETWORK CORRELATION
→ Multiple accounts from same IP address
→ Multiple accounts from same subnet
→ VPN detection (known VPN exit nodes)
Pillar 3: BETTING PATTERN SIMILARITY
→ Same outcomes, same timing, same markets
→ Correlated staking patterns
→ Similar CLV profiles
Pillar 4: PAYMENT METHOD OVERLAP
→ Same bank account linked to multiple user accounts
→ Same UPI ID, same wallet, same card
→ Money flow between linked accounts
Device Fingerprinting Integration
The system collects device attributes at every bet placement (not just at registration):
| Attribute | Purpose | Collection Point |
|---|---|---|
| Browser/app user agent | Identifies device type and version | Every API request |
| Screen resolution | Distinguishes devices | Session start |
| Timezone offset | Cross-reference with claimed location | Every API request |
| Installed fonts / canvas fingerprint | High-entropy device identifier | Session start (web) |
| Device ID (mobile) | Unique device identifier | App installation |
| Battery level + charging state | Behavioral fingerprint | Session start (mobile) |
Fingerprint matching algorithm:
DEVICE FINGERPRINT SIMILARITY SCORE
=====================================
For each pair of user accounts, compute:
score = 0
If same device_id: score += 100 (near-certain same device)
If same canvas fingerprint: score += 80 (very likely same browser)
If same IP AND same user agent: score += 60
If same screen resolution AND timezone: score += 30
If same subnet (first 3 octets): score += 20
Thresholds:
score >= 100: SAME_DEVICE (automatically link accounts)
score 60-99: LIKELY_RELATED (flag for review)
score 30-59: POSSIBLY_RELATED (monitor)
score < 30: UNRELATED
IP Correlation Analysis
IP CORRELATION ANALYSIS
========================
Data collected: For every bet, record the source IP address.
Analysis 1: Direct IP overlap
→ Two or more accounts placing bets from the same IP within 1 hour
→ Common in household sharing (legitimate) or syndicate operation (illegitimate)
→ Threshold: 3+ accounts from same IP → flag
Analysis 2: Subnet analysis
→ Accounts from the same /24 subnet (e.g., 192.168.1.*)
→ Common in corporate/office networks or coordinated operations
→ Threshold: 5+ accounts from same subnet → flag
Analysis 3: IP timing patterns
→ Account A bets from IP X at 9:00 PM
→ Account B bets from IP X at 9:02 PM
→ Account C bets from IP X at 9:05 PM
→ Sequential use of the same IP → strong syndicate signal
Analysis 4: VPN / Proxy detection
→ Known VPN exit node IPs (maintained list)
→ Tor exit nodes
→ Commercial proxy services
→ If detected: increase scrutiny on all other signals
Betting Pattern Similarity Detection
This is the most powerful signal because it is hard to disguise:
BETTING PATTERN SIMILARITY
============================
For each pair of accounts, compute similarity across these dimensions:
1. Outcome correlation:
→ How often do both accounts bet on the same outcome?
→ Random chance for a 2-outcome market: 50%
→ If 80%+ correlation over 100+ bets: FLAG
2. Timing correlation:
→ Average time between Account A's bet and Account B's bet on the same event
→ If consistently < 5 minutes apart: FLAG
3. Market selection correlation:
→ Do both accounts bet on the same obscure markets?
→ Betting on the same IPL match: not unusual (everyone bets IPL)
→ Betting on the same Ranji Trophy match: unusual (niche market)
→ Weight correlation by market obscurity
4. Stake pattern similarity:
→ Both accounts use round-number stakes (₹10,000, ₹20,000)
→ Both accounts use the same fractional stakes (₹8,731, ₹8,731)
→ Similar stake distributions (mean, variance, skewness)
5. CLV profile similarity:
→ Both accounts have similar CLV trajectories over time
→ Both accounts started profitable at the same time
→ Both accounts' CLV curves are correlated
Composite score:
→ Weight and combine all dimensions
→ If composite score > threshold → SYNDICATE_SUSPECTED
Payment Method Overlap Detection
| Signal | Severity | Example |
|---|---|---|
| Same bank account on 2+ user accounts | CRITICAL | Account A and Account B both linked to SBI account #12345 |
| Same UPI ID on 2+ accounts | HIGH | Account A and B both use amit@upi |
| Money transfer between two user accounts' bank accounts | HIGH | Account A deposits, Account B receives from A's bank |
| Same phone number on 2+ accounts | HIGH | Both accounts registered with +91-98765-43210 |
| Same email domain (non-public) on 2+ accounts | MEDIUM | amit@someprivatecorp.com and raj@someprivatecorp.com |
How Flagged Clusters Are Communicated to Agents
When the system identifies a suspected syndicate, it packages the information for the agent:
SYNDICATE ALERT -- RAJESH'S DASHBOARD
=======================================
⚠ SUSPECTED SYNDICATE: CLUSTER-4821
Accounts identified: 12 of potentially 50+
Confidence: HIGH (3 of 4 detection pillars triggered)
Evidence:
📱 Device: 8 accounts share 3 devices
🌐 Network: 11 accounts used 2 IP addresses in the last week
📊 Patterns: 91% outcome correlation across 234 bets
💳 Payments: 4 accounts share 2 bank accounts
Accounts in YOUR network:
1. Amit (user_4521) -- 45 bets, +₹1,85,000 P&L against you
2. Rahul (user_4588) -- 38 bets, +₹1,42,000 P&L against you
3. Deepak (user_4612) -- 31 bets, +₹98,000 P&L against you
4. Naveen (user_4687) -- 28 bets, +₹76,000 P&L against you
... 8 more accounts
Combined impact on YOUR book: -₹8,45,000 over 4 weeks
Recommended actions:
[Classify All as SHARP] -- forwards 95% of their bets
[Block All Accounts] -- prevents any new bets (requires admin approval)
[Review Individual] -- decide per account
[Ignore Alert] -- acknowledge, no action (logged)
Walk-Through: Syndicate With 50 Accounts Under Rajesh
Setup: A professional betting syndicate creates 50 accounts under Rajesh over a 3-month period. Each account is registered with a different name, phone number, and email. They use a pool of 10 mobile devices and 5 residential IP addresses (via mobile hotspots at different locations).
Month 1: The syndicate operates carefully. Each account places 2-3 bets per day on different markets. Win rates are moderate (53%). Individual account P&L is unremarkable.
What the system sees after Month 1:
Pillar 1 (Device): 50 accounts using 10 devices → 5 accounts per device average
Score: 8 clusters of related accounts identified
Pillar 2 (IP): 50 accounts using 5 IPs
Score: Subnet analysis shows concentrated usage
But: 5 IPs across 50 accounts is not extreme (could be a housing complex)
Pillar 3 (Betting): Outcome correlation at 61% (slightly above 50% random)
Score: MODERATE -- not yet flagged, but being monitored
Pillar 4 (Payment): No payment overlap (syndicate was careful)
Score: CLEAN
Overall assessment: MONITORING (not yet flagged)
Month 2: The syndicate becomes more aggressive. More bets, higher stakes. Their careful 50-account approach means no individual account trips any threshold. But the pattern signal strengthens.
Pillar 3 (Betting) after Month 2:
Outcome correlation: 73% (very suspicious)
Timing correlation: 78% of bets within 10 minutes of each other
Market selection: 22 of 50 accounts bet on the same obscure Ranji match
CLV: All 50 accounts have positive CLV (probability of this by chance: <0.001%)
Month 2, Week 3: The system triggers:
SYNDICATE DETECTION: CLUSTER-4821 CONFIRMED
=============================================
Detection trigger: Betting pattern similarity threshold exceeded
→ 50 accounts with 73% outcome correlation
→ 22 accounts on same obscure market
→ All 50 accounts profitable (p < 0.001)
Cross-reference with device data:
→ 8 device clusters confirmed
→ 50 accounts → 10 devices → likely 3-5 operators
Financial impact under Rajesh:
→ Combined P&L: -₹12,40,000 (Rajesh has lost ₹12.4 lakh to this cluster)
→ Individual account P&L range: -₹15,000 to -₹85,000
Alert sent to:
1. Rajesh (with recommended actions)
2. Vikram (upline, with summary)
3. Platform compliance team (with full evidence package)
Rajesh's response: He classifies all 50 accounts as SHARP. With the 72-hour cooling-off period (Section 27), the classification takes effect 3 days later. Meanwhile, the platform compliance team can also apply platform-level restrictions if they deem it necessary (account suspension, reduced limits).
33. Rate Limiting on Configuration Changes
The Problem
An agent who rapidly changes their forwarding matrix creates multiple problems:
- Cache invalidation storms (every change invalidates all cache tiers)
- Matrix version bloat (each change creates a new immutable version)
- Audit trail confusion (which version applied to which bet?)
- Potential gaming (rapid changes to exploit specific bet outcomes)
Per-Agent Rate Limits
| Configuration Type | Rate Limit | Queue Behavior |
|---|---|---|
| Matrix rule changes | 1 change per 5 minutes | Queue rapid changes, apply only the most recent |
| User override changes | 1 per user per 10 minutes | Queue, apply most recent |
| Market override changes | 1 per market per 5 minutes | Queue, apply most recent |
| Agent default changes | 1 per 15 minutes | Queue, apply most recent |
| Limit changes (sport, match, period) | 1 per limit per 10 minutes | Queue, apply most recent |
| Panic button | No rate limit on activation; 30-minute cooling-off before deactivation | Immediate (this is a safety feature) |
Queue and Apply Most Recent
When an agent makes rapid changes that exceed the rate limit:
RATE-LIMITED CONFIGURATION CHANGES
=====================================
9:30:00 PM Rajesh changes Rule R5: forward 40% → 60%
→ APPLIED immediately (first change, no rate limit hit)
9:30:45 PM Rajesh changes Rule R5: forward 60% → 80%
→ QUEUED (less than 5 minutes since last change)
→ Queue entry: { rule: R5, new_value: 80%, queued_at: 9:30:45 }
9:31:30 PM Rajesh changes Rule R5: forward 80% → 95%
→ REPLACES previous queue entry (queue only keeps most recent)
→ Queue entry: { rule: R5, new_value: 95%, queued_at: 9:31:30 }
9:32:00 PM Rajesh changes Rule R3: forward 40% → 50%
→ QUEUED separately (different rule, its own rate limit)
→ Queue entry: { rule: R3, new_value: 50%, queued_at: 9:32:00 }
9:35:00 PM Rate limit window expires for R5
→ Queue entry for R5 is applied: forward 95%
→ The intermediate value of 80% was never applied
→ Agent is notified: "Your change to Rule R5 has been applied."
9:37:00 PM Rate limit window expires for R3
→ Queue entry for R3 is applied: forward 50%
What the agent sees:
┌──────────────────────────────────────────────────────────────┐
│ CONFIGURATION UPDATE │
│ │
│ Rule R5 updated: Forward 60% (active now) │
│ │
│ ⏳ Pending changes (will apply in ~4 minutes): │
│ Rule R5: Forward 95% │
│ Rule R3: Forward 50% │
│ │
│ Why the delay? Rapid configuration changes are queued to │
│ ensure system stability. Only your most recent value will │
│ be applied. │
│ │
│ [Cancel Pending Changes] │
└──────────────────────────────────────────────────────────────┘
Cache Invalidation Throttling
Even when configuration changes are rate-limited, the cache invalidation must be efficient:
CACHE INVALIDATION STRATEGY
=============================
When a configuration change is applied:
1. PostgreSQL: Write happens immediately (source of truth updated)
2. Redis: Invalidation within 100ms
→ DELETE the affected key(s)
→ Do NOT pre-populate (let the next read fill the cache)
3. Application LRU: Invalidation via pub/sub within 200ms
→ All app instances receive the invalidation message
→ Affected entries evicted from LRU cache
Throttling:
→ If more than 10 invalidations per agent per minute: batch them
→ Instead of 10 individual invalidations, one "flush all for this agent" signal
→ This prevents cache thrashing during rapid configuration periods
How This Interacts
Continuing from where I was cut off, in the middle of Section 33 (Rate Limiting on Configuration Changes), at the subsection about how rate limiting interacts with the panic button.
How Rate Limiting Interacts With the Panic Button
The panic button IS a rapid configuration change -- it sets forwarding to 100% for all sports and markets. But it is exempt from rate limiting because it is a safety feature. The design reconciles these two goals:
PANIC BUTTON VS RATE LIMITING
===============================
Panic button activation:
→ Bypasses ALL rate limits
→ Applies immediately (no queueing)
→ Invalidates all caches immediately
→ Reason: safety always trumps stability
Panic button deactivation (restoring previous settings):
→ Subject to 30-minute cooling-off period (Section 30)
→ NOT subject to the 5-minute matrix change rate limit
→ Reason: the 30-minute cooling-off is already more restrictive
than the 5-minute rate limit
Configuration changes WHILE panic is active:
→ Queued normally under rate limits
→ Applied only AFTER panic is deactivated
→ Agent sees: "You are in panic mode. Configuration changes
will be applied when you exit panic mode."
This means:
1. Rajesh presses panic at 9:30 PM → immediate effect, no rate limit
2. Rajesh tries to change Rule R5 at 9:31 PM → queued (panic is active)
3. Rajesh deactivates panic at 10:00 PM → previous settings restored
4. Queued Rule R5 change applies at 10:00 PM (or later per rate limit)
Rate Limit Overrides for Administrators
Platform administrators can bypass rate limits for specific agents when needed:
| Override Type | Who Can Grant | Duration | Use Case |
|---|---|---|---|
| Temporary unlimited changes | Platform SUPER_ADMIN | 1 hour | Agent onboarding, major event preparation |
| Reduced rate limit (1 min instead of 5) | Platform ADMIN | 4 hours | Agent is actively tuning during a match with admin guidance |
| Rate limit suspension | Platform SUPER_ADMIN | 30 minutes | Emergency reconfiguration |
All overrides are logged in the audit trail with the admin who granted them and the reason.
34. Currency and Multi-Currency Support
The Problem
Hannibal serves agent networks across India, Southeast Asia, and Africa. Agents operate in different currencies: Indian Rupees (INR), Thai Baht (THB), Ghanaian Cedis (GHS), Nigerian Naira (NGN), Kenyan Shillings (KES). But hedges on Betfair are placed in GBP (or EUR). This creates currency risk at multiple points in the system.
Base Currency Per Agent
Every agent has a configured base currency. All their limits, exposure ledgers, and P&L are denominated in this currency:
AGENT BASE CURRENCIES
========================
Agent Base Currency Why
-------- ------------- ---
Rajesh INR Indian sub-agent, punters bet in INR
Vikram INR Indian master agent
Kwame GHS Ghanaian agent, punters bet in Cedis
Priya INR Indian sub-agent
Platform USD Platform operates in USD for cross-border accounting
Betfair GBP Exchange operates in GBP
Where FX Conversion Happens
FX conversion occurs at two points in the bet lifecycle:
Key design rule: FX conversion happens at the boundary between currency zones, not within them. Within the INR agent hierarchy (Rajesh -> Vikram), all calculations are in INR. FX only enters the picture when the position crosses to the platform (which operates in USD) or to Betfair (which operates in GBP).
FX Rate Capture and Audit Trail
Every FX conversion is captured with the exact rate used:
| Field | Type | Description |
|---|---|---|
| conversion_id | UUID | Unique identifier for this conversion |
| bet_id | TEXT | Which bet triggered this conversion |
| source_currency | TEXT | e.g., GHS |
| target_currency | TEXT | e.g., USD |
| source_amount | DECIMAL | Amount in source currency |
| target_amount | DECIMAL | Amount in target currency |
| fx_rate | DECIMAL(18,8) | The rate used: 1 GHS = X USD |
| fx_rate_source | TEXT | Where the rate came from (e.g., "platform_rate_feed", "manual_override") |
| fx_rate_timestamp | TIMESTAMP | When the rate was captured |
| conversion_timestamp | TIMESTAMP | When the conversion was executed |
| spread_applied | DECIMAL | Any spread the platform applied on top of the mid-rate |
FX Rate Determination
The system uses a tiered approach for FX rates:
FX RATE RESOLUTION
====================
Priority 1: Platform rate feed (real-time)
→ Updated every 60 seconds from a market data provider
→ Used for live bet processing
Priority 2: Cached rate (if feed is stale)
→ If the rate feed has not updated for > 5 minutes
→ Use the last known rate with an additional 0.5% spread (safety buffer)
→ Flag the conversion as STALE_RATE in the audit trail
Priority 3: Daily reference rate (if feed is down)
→ If the rate feed is completely unavailable
→ Use the day's opening reference rate with a 2% spread
→ Flag as FALLBACK_RATE
→ Alert operations team
For each conversion, the system also records:
→ The mid-market rate at the time
→ The spread applied by the platform
→ The effective rate (mid + spread)
FX Conversion at Hedge Execution
When a Ghanaian agent's bet reaches the platform and needs hedging on Betfair:
FX CONVERSION EXAMPLE: BET FLOW
=================================
1. Kwame's punter bets GHS 500 on Arsenal to win at 2.10
→ Kwame's cascade: retains GHS 300, forwards GHS 200 to platform
2. GHS 200 arrives at the platform
→ Platform operates in USD
→ Current rate: 1 USD = 15.8 GHS
→ Conversion: GHS 200 / 15.8 = USD 12.66
→ Spread applied: 0.3% → Platform receives USD 12.62
→ Audit: conversion_id=fx_001, rate=15.8, spread=0.3%
3. Platform decides to hedge USD 6.31 on Betfair
→ Betfair operates in GBP
→ Current rate: 1 GBP = 1.27 USD
→ Conversion: USD 6.31 / 1.27 = GBP 4.97
→ Spread applied: 0.2% → Betfair receives GBP 4.96
→ Audit: conversion_id=fx_002, rate=1.27, spread=0.2%
Total FX conversions: GHS → USD → GBP (two hops)
Total FX spread cost: ~0.5% (borne by the platform, priced into the hedge margin)
FX Risk Accounting for Hedged Positions
Between the time a bet is placed and when it settles, exchange rates can move. This creates FX risk on hedged positions:
FX RISK SCENARIO
==================
At bet placement (Monday):
Kwame's punter bet GHS 500 at 2.10
Platform hedged GBP 4.96 on Betfair at 2.10
Rate at placement: 1 GBP = 20.08 GHS (via USD)
At settlement (Sunday, Arsenal won):
Betfair pays out: GBP 4.96 * (2.10 - 1) = GBP 5.46 profit
Rate at settlement: 1 GBP = 21.50 GHS (GHS depreciated)
Converting Betfair payout back to GHS:
GBP 5.46 * 21.50 = GHS 117.39
But the punter is owed: GHS 500 * (2.10 - 1) = GHS 550 payout
The hedge covered:
Platform's portion of liability: some fraction of GHS 550
Betfair payout in GHS: GHS 117.39
The FX movement (GHS weakened) means the GBP payout converts
to MORE GHS than expected. In this case, FX movement HELPED.
If GHS had strengthened, the platform would receive LESS GHS from
the Betfair hedge than expected — creating an FX loss.
How FX Risk Is Managed
| Strategy | Description | When Used |
|---|---|---|
| Accept the risk | Small positions. FX movement over a few days is typically < 2%. Not worth hedging. | Default for most positions |
| Settle quickly | Minimize the time between bet placement and settlement to reduce FX exposure. | Standard practice |
| FX reserve buffer | Platform maintains a reserve buffer (typically 3% of cross-currency hedged volume) to absorb FX losses. | Always active |
| Same-day hedging | For very large cross-currency positions, hedge the FX exposure separately (buy/sell the currency pair). | Only for positions > USD 10,000 |
Settlement in Multi-Currency Scenarios
At settlement, FX conversion happens in reverse:
MULTI-CURRENCY SETTLEMENT FLOW
================================
Event settles: Arsenal wins
Step 1: Betfair settles in GBP
→ Platform receives GBP profit (or pays GBP loss)
Step 2: Convert Betfair settlement to USD (platform base currency)
→ Use settlement-time FX rate (NOT the bet-placement rate)
→ Record FX gain/loss vs expected rate
Step 3: Platform settles its retained portion in USD
Step 4: Convert platform-to-agent settlement to agent's base currency
→ Kwame's upline settlement is in GHS
→ Use settlement-time FX rate
→ Record conversion in audit trail
Step 5: Agent cascade settles in agent base currency
→ Kwame's agents all settle in GHS
→ No FX needed within the GHS hierarchy
FX Audit Report
The platform generates a daily FX reconciliation report:
DAILY FX RECONCILIATION
========================
Date: 2026-02-11
Currency Pair Volume (USD) Avg Rate FX Gain/Loss Reserve Impact
----------- ----------- --------- ----------- --------------
GHS/USD $12,450 15.82 -$145 -$145 from reserve
NGN/USD $8,300 1520.50 +$89 +$89 to reserve
KES/USD $3,200 129.40 -$12 -$12 from reserve
THB/USD $5,800 36.15 +$34 +$34 to reserve
USD/GBP $18,700 0.788 -$210 -$210 from reserve
Net FX Impact: -$244
FX Reserve: $45,000 → $44,756 (0.5% drawdown)
Walk-Through: Ghanaian Agent in Cedis, Hedge in GBP
Setup: Kwame operates in Ghana. His base currency is GHS (Ghana Cedis). He has 150 football punters who bet on Premier League matches.
The bet: Kwame's punter Kofi bets GHS 1,000 on Chelsea to win at odds 3.20.
Step 1: Cascade in GHS (local currency)
Kofi bets GHS 1,000 at 3.20
Potential win: GHS 2,200
Liability: GHS 2,200
Kwame's matrix: forward 50% for Premier League pre-match
Kwame retains: GHS 500 (liability: GHS 1,100)
Kwame forwards: GHS 500 to platform
Step 2: FX conversion at platform boundary
GHS 500 arrives at platform
Current rate: 1 USD = 15.80 GHS (mid-market)
Platform applies 0.3% spread: effective rate = 15.85 GHS per USD
Conversion: GHS 500 / 15.85 = USD 31.55
Audit: fx_rate=15.85, source=platform_feed, spread=0.3%
Step 3: Platform routing in USD
Platform receives USD 31.55
Platform retains 50%: USD 15.78
Platform hedges 50%: USD 15.77 → Betfair
Step 4: FX conversion at Betfair boundary
USD 15.77 to hedge on Betfair
Current rate: 1 GBP = 1.27 USD (mid-market)
Platform applies 0.2% spread: effective rate = 1.2726 USD per GBP
Conversion: USD 15.77 / 1.2726 = GBP 12.39
Audit: fx_rate=1.2726, source=platform_feed, spread=0.2%
Step 5: Betfair hedge execution
Place lay bet on Betfair: Lay Chelsea to win, GBP 12.39 at 3.20
If Chelsea wins: Betfair pays GBP 12.39 * 2.20 = GBP 27.26
If Chelsea loses: Platform pays Betfair GBP 12.39 (the stake)
Step 6: Settlement (Chelsea wins)
Betfair pays: GBP 27.26 profit
Convert to USD: GBP 27.26 * 1.28 (settlement rate) = USD 34.89
(Rate moved slightly: was 1.27, now 1.28)
FX gain: USD 34.89 vs expected USD 34.65 = +USD 0.24
Platform P&L:
Retained: USD 15.78 liability → Chelsea won → Platform pays USD 34.72
Hedge recovery: USD 34.89 from Betfair
Net platform P&L: -USD 34.72 + USD 34.89 = +USD 0.17 (near zero, as expected)
Kwame's settlement in GHS:
Kwame's retained: GHS 500 stake, GHS 1,100 liability
Chelsea won → Kwame pays punter GHS 1,100
Kwame's forwarded: GHS 500 → Kwame does not bear this portion
Kwame's P&L: GHS 500 received - GHS 1,100 paid = -GHS 600
Plus: settlement from platform for forwarded portion
Platform owes Kwame: the forwarded portion's P&L in GHS
Convert: USD 34.72 (platform liability for forwarded) * 15.90 (settlement rate) = GHS 552.05
FX difference: expected GHS 550, actual GHS 552.05, gain GHS 2.05
Kofi (punter) receives: GHS 1,000 stake + GHS 2,200 profit = GHS 3,200 total return
→ All in GHS, Kofi never sees any FX conversion
Key takeaway: The punter always operates in their local currency. FX conversion is invisible to them. Agents also operate in their local currency within the hierarchy. FX only affects the platform-to-exchange boundary, and the platform absorbs FX risk as a cost of doing business.
35. Cache Race Condition Fix at Limit Boundaries (CRITICAL)
The Problem in Plain English
The existing 3-tier caching design (Section 15) has a dangerous gap. Consider this scenario: Rajesh has a cricket night limit of 10 lakh. His current exposure is 9,20,000 (92% utilized). The application LRU cache has a 5-second TTL, and within that 5-second window, 10 simultaneous bets arrive from Rajesh's punters. Each bet checks the LRU cache, sees "9,20,000 used out of 10,00,000 -- 80,000 remaining," and each bet tries to retain 20,000 of liability. If all 10 proceed, Rajesh retains 2,00,000 more -- pushing him to 11,20,000 against a 10,00,000 limit. The limit is breached by 1,20,000.
This is not theoretical. During IPL matches, a popular agent like Rajesh will receive 50+ bets per minute. At 92% utilization, every bet is potentially the one that tips over the limit.
The Safety Margin Approach
The core idea: do not use the fast cache path when you are "close enough" to the limit that a race condition could cause a breach. Define a safety margin that determines when to switch from the fast path (LRU/Redis) to the slow-but-safe path (PostgreSQL with FOR UPDATE locking).
Safety margin formula:
safety_margin = max(
fixed_minimum_margin, -- e.g., ₹50,000
average_bet_liability * expected_bets_per_ttl -- dynamic calculation
)
Where:
fixed_minimum_marginis a per-agent configurable floor (default 50,000)average_bet_liabilityis the rolling average liability per bet for this agent in this scope (recalculated every 60 seconds)expected_bets_per_ttlis the rolling average bet rate multiplied by the LRU cache TTL (5 seconds)
Example calculation for Rajesh during a busy IPL night:
| Parameter | Value |
|---|---|
| Rajesh's cricket night limit | 10,00,000 |
| Average bet liability (last 60s) | 8,500 |
| Average bets per second (last 60s) | 0.8 |
| LRU cache TTL | 5 seconds |
| Expected bets per TTL | 0.8 x 5 = 4 |
| Dynamic margin | 8,500 x 4 = 34,000 |
| Fixed minimum margin | 50,000 |
| Effective safety margin | max(50,000, 34,000) = 50,000 |
| DB-path threshold | 10,00,000 - 50,000 = 9,50,000 |
This means: when Rajesh's exposure reaches 9,50,000 (95% of his limit), every subsequent bet goes through the PostgreSQL FOR UPDATE path. The safety margin absorbs the worst-case race: 4 bets in flight simultaneously, each adding 8,500, totalling 34,000 -- which is within the 50,000 margin.
The Three-Path Decision Flow
Every bet follows this exact decision flow:
Path descriptions:
| Path | When Used | Latency | Correctness Guarantee |
|---|---|---|---|
| FAST PATH | Exposure is below (limit - safety_margin) in any cache tier | 1-5ms | Eventual consistency -- may briefly overshoot by up to safety_margin amount |
| DB PATH | Exposure is at or above (limit - safety_margin) in the freshest available cache | 10-25ms | Strict consistency -- FOR UPDATE lock prevents any overshoot |
Post-Write Validation and Rollback
Even with the safety margin, the FAST PATH can theoretically overshoot if the cache is stale by more than one TTL cycle (extremely rare, but possible during network partitions or Redis failures).
Post-write validation catches this:
- After the FAST PATH writes the position and updates the ledger in PostgreSQL, it reads back the committed ledger total
- If the committed total exceeds the limit, a rollback procedure fires:
- The excess amount is calculated:
overshoot = committed_total - limit - The most recently created position (the one that caused the overshoot) is reduced by the overshoot amount
- The reduced amount is forwarded to the upline as overflow
- A new overflow position is created for the upline agent
- An audit record is created noting the post-write correction
- An alert is fired (this indicates the safety margin may be too small)
- The excess amount is calculated:
POST-WRITE VALIDATION FLOW
============================
1. FAST PATH completes: position created, ledger updated
2. Read back: SELECT current_total FROM exposure_ledger WHERE agent=Rajesh AND scope=cricket_night
3. IF current_total <= limit → DONE (normal case, 99.9% of the time)
4. IF current_total > limit:
a. overshoot = current_total - limit
b. BEGIN TRANSACTION
c. Reduce this bet's retained amount by overshoot
d. Create overflow position at upline level for overshoot amount
e. Update upline's exposure ledger
f. Update Rajesh's exposure ledger (subtract overshoot)
g. COMMIT
h. Fire SAFETY_MARGIN_BREACH alert
i. Increase safety_margin by 50% for next 60 seconds
Walk Through: 10 Simultaneous Bets on Rajesh at 78% Utilization
Setup:
- Rajesh's cricket night limit: 10,00,000
- Current exposure: 7,80,000 (78%)
- Safety margin: 50,000
- DB-path threshold: 9,50,000
- 10 bets arrive within 200 milliseconds, each adding approximately 25,000 liability
Step-by-step:
TIME BET LRU CACHE SHOWS THRESHOLD PATH RESULT
====== ==== ================ ========= ======== ==================================
T+0ms B1 ₹7,80,000 ₹9,50,000 FAST Retain ₹25,000. New actual: ₹8,05,000
T+20ms B2 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹8,30,000
T+40ms B3 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹8,55,000
T+60ms B4 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹8,80,000
T+80ms B5 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹9,05,000
T+100ms B6 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹9,30,000
T+120ms B7 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹9,55,000
T+140ms B8 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹9,80,000
T+160ms B9 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹10,05,000 ← OVERSHOOT!
T+180ms B10 ₹7,80,000 (stale) ₹9,50,000 FAST Retain ₹25,000. New actual: ₹10,30,000 ← OVERSHOOT!
Wait -- the LRU cache is stale for the entire 200ms burst because TTL is 5 seconds. All 10 bets see the same cached value. But at 78%, the cached value (7,80,000) is well below the threshold (9,50,000), so all 10 take the FAST PATH.
But the post-write validation catches the problem:
-
B9 finishes writing, reads back 10,05,000, detects overshoot of 5,000
- B9's retained amount is reduced by 5,000
- 5,000 overflows to Vikram
- Alert fires
-
B10 finishes writing, reads back 10,30,000, detects overshoot of 30,000
- B10's retained amount is reduced by 30,000
- 30,000 overflows to Vikram
- Alert fires
After all 10 bets complete:
- Rajesh's actual exposure: exactly 10,00,000 (the limit)
- Two bets had post-write corrections (B9 and B10)
- Safety margin is temporarily increased by 50% (to 75,000) for the next 60 seconds
- Total correction: 35,000 in overflow that was initially retained but corrected
- No money lost, no limit breached after correction
Now consider if Rajesh was at 96% utilization (9,60,000) instead:
All 10 bets would see the cache showing 9,60,000, which is ABOVE the threshold of 9,50,000. All 10 go through the DB PATH with FOR UPDATE locking. They serialize. Each one reads the true current value, updates it, and the moment the limit is reached, remaining bets overflow to the upline. No corrections needed. Slower (10-25ms each, serialized) but perfectly correct.
The key insight: At 78% utilization, the worst case is a temporary overshoot of (10 bets x 25,000 = 2,50,000) which pushes exposure to 10,30,000. The post-write validation corrects this within milliseconds. The safety margin is designed so that the DB PATH kicks in before the overshoot becomes dangerously large. At 95%+ utilization, the DB PATH prevents any overshoot entirely.
36. Multi-Instance Cache Coherency (HIGH)
Why LRU Per-Instance Is Broken
When Hannibal runs multiple application instances behind a load balancer (which is required for horizontal scaling and high availability), the in-memory LRU cache on each instance diverges immediately.
INSTANCE 1 INSTANCE 2
LRU Cache: LRU Cache:
Rajesh exposure = ₹9,20,000 Rajesh exposure = ₹8,80,000
(updated 2 seconds ago) (updated 4 seconds ago)
REALITY (PostgreSQL):
Rajesh exposure = ₹9,45,000
Instance 1 received recent bets for Rajesh and updated its local cache. Instance 2 has an older cached value. A bet arriving at Instance 2 sees 8,80,000 and takes the FAST PATH. But the real exposure is 9,45,000 -- possibly within the safety margin zone where it should take the DB PATH.
With N instances, each maintaining independent LRU caches with 5-second TTLs, the worst-case staleness is not 5 seconds but 5 seconds multiplied by the probability that a specific agent's bets are spread across instances. For popular agents during IPL, bets WILL be spread across all instances.
Recommended Approach: Redis as Effective Tier 1
The cleanest solution is to eliminate the per-instance LRU cache for exposure data and make Redis the first-tier cache for all exposure reads. Redis is shared across all instances, so there is no coherency problem.
What changes:
| Data Type | Old Architecture | New Architecture |
|---|---|---|
| Exposure counters | LRU (5s) → Redis → PostgreSQL | Redis → PostgreSQL |
| Agent config/matrix | LRU (5min) → Redis → PostgreSQL | LRU (5min) → Redis → PostgreSQL (unchanged -- config is read-heavy, write-rare) |
| NO_NEW_RISK flags | Redis | Redis (unchanged) |
| User win cap state | Redis | Redis (unchanged) |
| Period boundaries | LRU (1hr) | LRU (1hr) (unchanged -- same on all instances) |
Why this works: Exposure counters are the only data that is both write-heavy AND correctness-critical. By routing all exposure reads through Redis, every instance sees the same value. Redis reads are sub-millisecond (0.1-0.5ms), so the latency increase compared to the LRU cache (essentially zero latency) is negligible -- well within the 90ms budget.
Agent configuration, matrix rules, and period boundaries are safe to cache per-instance because they change rarely (admin actions, not bet flow) and a 5-second or 5-minute staleness window is acceptable. When they DO change, a Redis pub/sub notification invalidates all instance caches (see below).
Config Cache Invalidation via Pub/Sub
For configuration data that IS cached per-instance (matrix rules, agent limits, period configs), changes must propagate to all instances:
The pub/sub message format:
| Field | Description | Example |
|---|---|---|
type | What changed | MATRIX_UPDATE, LIMIT_UPDATE, PERIOD_UPDATE, USER_OVERRIDE |
agent_id | Which agent | rajesh_mumbai |
scope | Which scope (if applicable) | cricket, mi_vs_csk_2026_03_15 |
timestamp | When the change was made | 2026-03-15T21:34:12.456Z |
version | New config version number | 47 |
Each instance subscribes to the config.invalidate channel on startup. When a message arrives, the instance evicts the specified entries from its LRU cache. The next request for that data causes a cache miss, which fetches the fresh value from Redis or PostgreSQL.
How This Interacts with the Safety Margin (Section 35)
With Redis as the effective Tier 1 for exposure data, the safety margin calculation from Section 35 becomes more accurate:
- Redis is updated after every DB write (within the same request lifecycle)
- The maximum staleness of a Redis exposure value is the time between one bet's DB write completing and the next bet's Redis read -- typically 1-5ms, not 5 seconds
- This means the safety margin can be SMALLER, because the "expected bets per TTL" is now "expected bets per 5ms" instead of "expected bets per 5 seconds"
Revised safety margin with Redis as Tier 1:
| Parameter | Old (LRU Tier 1) | New (Redis Tier 1) |
|---|---|---|
| Effective TTL for exposure | 5,000ms | ~5ms |
| Expected bets per TTL (Rajesh at 0.8/sec) | 4 | 0.004 |
| Dynamic margin | 8,500 x 4 = 34,000 | 8,500 x 0.004 = 34 |
| Effective safety margin | max(50,000, 34,000) = 50,000 | max(50,000, 34) = 50,000 |
The fixed minimum margin of 50,000 dominates in both cases, but the key insight is that with Redis as Tier 1, the FAST PATH is safe for a much wider range. The probability of a race condition breaching the safety margin drops from "possible during normal operation" to "essentially impossible unless Redis itself is partitioned."
Deployment Topology
Redis Failure Mode
If Redis becomes unavailable, the system falls back to PostgreSQL for ALL exposure reads. This increases latency (from <1ms to 5-15ms per read) but maintains correctness. The circuit breaker pattern detects Redis unavailability within 3 failed requests and switches all instances to DB-direct mode. When Redis recovers, instances resume using it after a health check confirms 3 consecutive successful reads.
37. PostgreSQL Scaling Strategy (HIGH)
Projected Data Volumes for First IPL Season
An IPL season runs approximately 60 days with 70+ matches. Here are the projected volumes:
| Table | Rows per Day (Normal) | Rows per Day (IPL Peak) | Total After First Season | Row Size (avg) | Total Size |
|---|---|---|---|---|---|
| bets | 50,000 | 3,00,000 | 90,00,000 | 500 bytes | ~4.5 GB |
| positions | 1,50,000 | 9,00,000 | 2,70,00,000 | 400 bytes | ~10.8 GB |
| exposure_ledger | 5,000 (updates, not new rows) | 20,000 | 50,000 (rows, updated in place) | 200 bytes | ~10 MB |
| audit_trail | 50,000 | 3,00,000 | 90,00,000 | 2,000 bytes | ~18 GB |
| settlements | 1,50,000 | 9,00,000 | 2,70,00,000 | 300 bytes | ~8.1 GB |
| forwarding_matrix_rules | Rare writes | Rare writes | ~50,000 | 300 bytes | ~15 MB |
| TOTAL | ~41.4 GB |
The database is not enormous by modern standards, but the write contention during peak hours is the real challenge. During the IPL final, the positions table could see 150 inserts per second, and the exposure_ledger table could see 500 updates per second (because each bet updates multiple agents' ledgers).
Partitioning Strategy
Primary partition key: time (monthly range partitioning)
This is the most effective strategy because:
- Most queries are time-bounded (today's bets, this week's settlements, last month's audit trail)
- Old partitions become read-only and can be moved to cheaper storage
- Partition pruning eliminates scanning old data for real-time queries
- Individual partitions stay small enough for efficient indexing
bets table partitions:
bets_2026_01 (January 2026)
bets_2026_02 (February 2026)
bets_2026_03 (March 2026 - IPL starts)
bets_2026_04 (April 2026 - IPL peak)
bets_2026_05 (May 2026 - IPL ends)
...
positions table partitions:
positions_2026_01
positions_2026_02
...
audit_trail table partitions:
audit_trail_2026_01
audit_trail_2026_02
...
Secondary partition consideration: by agent (for very large agents)
If a single agent like Vikram (with 12 sub-agents and thousands of punters) generates disproportionate volume, the positions table can be further sub-partitioned by agent_id using hash partitioning. This is only needed if a single monthly partition exceeds 10 GB for the positions table, which is unlikely in the first season but should be planned for.
Separate Write-Optimized Store for Audit Records
The audit_trail table is append-only and write-heavy. It should be separated from the transactional tables:
| Characteristic | Transactional Tables (bets, positions, exposure_ledger) | Audit Store (audit_trail) |
|---|---|---|
| Write pattern | Insert + Update | Append-only |
| Read pattern | Point lookups, range scans by time/agent | Full-record retrieval by bet_id, range scans for disputes |
| Consistency requirement | Strong (part of bet transaction) | Eventual (can lag by up to 500ms) |
| Index requirements | Heavy (multiple indexes for lookups) | Light (bet_id primary, agent_id + time for scanning) |
Implementation: Audit records are buffered in an in-memory queue and flushed to a separate PostgreSQL schema (or a separate database instance if load warrants it) every 500ms. The audit write is NOT part of the bet placement transaction. If the audit flush fails, records are persisted to a local WAL (write-ahead log) file and retried.
The separate audit store uses:
autovacuum_vacuum_cost_delay = 0(aggressive vacuuming for append-only workload)fillfactor = 100(no space reserved for updates, since rows are never updated)- Minimal indexes: only bet_id (primary), and a composite on (agent_id, created_at)
Read Replicas for Dashboard Queries
WRITE PATH (bet processing):
App Instance → PostgreSQL Primary (positions, ledgers, bets)
READ PATH (dashboards, reports):
App Instance → PostgreSQL Read Replica 1 (real-time dashboards, exposure summary)
Reporting Service → PostgreSQL Read Replica 2 (daily P&L, weekly settlement, analytics)
Support Dashboard → PostgreSQL Read Replica 2 (dispute resolution, audit trail queries)
Replication lag tolerance:
- Dashboard queries: 1 second lag is acceptable (dashboard refreshes every 2-5 seconds anyway)
- Settlement queries: zero lag required (use primary)
- Reporting queries: 30 second lag is acceptable
Connection Pool Management
| Pool | Max Connections | Target | Purpose |
|---|---|---|---|
bet_processing | 20 per instance x 3 instances = 60 | PostgreSQL Primary | Bet placement, exposure updates, position creation |
settlement | 10 per instance x 1 instance = 10 | PostgreSQL Primary | Settlement processing (batch, lower concurrency) |
dashboard_read | 15 per instance x 3 instances = 45 | Read Replica 1 | Agent dashboards, real-time queries |
reporting_read | 10 per instance x 1 instance = 10 | Read Replica 2 | Reports, analytics, support tools |
audit_write | 5 per instance x 3 instances = 15 | Audit DB | Audit trail flushing |
Total connections to Primary: 70 (well within PostgreSQL's default max_connections of 100, with headroom for admin connections)
PgBouncer recommendation: Place PgBouncer in front of PostgreSQL Primary in transaction pooling mode. This allows the application to open more logical connections than physical database connections, which is critical during traffic spikes.
When to Consider Event Sourcing for Audit Trail
Event sourcing (storing every state change as an immutable event rather than mutating rows) is already partially described in Section 11 for configuration changes. For the full audit trail, event sourcing should be considered when:
- The replay capability in Section 11 is used more than 10 times per week -- this indicates frequent disputes or compliance reviews, making a native event-sourced store more efficient than reconstructing state from audit records
- Audit trail queries become a performance bottleneck on the main database -- event-sourced stores (like EventStoreDB or a Kafka topic with compaction) are optimized for append and sequential read
- Regulatory requirements mandate immutable, tamper-proof audit trails -- an event store with cryptographic chaining provides stronger guarantees than a mutable PostgreSQL table
For the first IPL season: Use the PostgreSQL-based audit trail with append-only semantics and monthly partitioning. This is simpler to operate, easier to query, and sufficient for the projected volumes. Revisit event sourcing before the second season based on actual usage patterns.
38. Atomic Transaction Scaling (HIGH)
The Contention Problem
Every bet updates exposure ledgers for multiple agents atomically. Amit's bet touches Rajesh's ledger, Vikram's ledger, and the Platform's ledger -- all within a single PostgreSQL transaction. If another punter under Rajesh places a bet simultaneously, both transactions compete for a lock on Rajesh's ledger row.
With sharded counters (Section 15), the contention is reduced by N (where N is the shard count). But the cross-agent atomicity requirement means the transaction must lock shards across multiple agents, which increases the lock duration and deadlock risk.
Contention Analysis: Vikram with 12 Sub-Agents at 5 Bets/Sec Each
VIKRAM'S TRAFFIC PROFILE
==========================
Sub-agents: 12
Bets per second per sub-agent: 5
Total bets per second touching Vikram's ledger: 60
Each bet's transaction:
1. Lock sub-agent's exposure shard: ~2ms
2. Lock Vikram's exposure shard: ~2ms
3. Lock Platform's exposure shard: ~2ms
4. Write positions (3 rows): ~5ms
5. Commit: ~3ms
Total lock duration: ~14ms
With 60 bets/sec and 14ms lock duration:
Probability of contention on SAME Vikram shard:
60 bets/sec * 14ms per bet = 0.84
(84% of the time, at least one other bet is holding a Vikram shard lock)
With 8 Vikram shards:
60/8 = 7.5 bets/sec per shard
7.5 * 14ms = 0.105
(10.5% contention rate per shard -- acceptable)
The Tiered Atomicity Model
Not all agents need the same level of atomicity. The system uses three tiers:
Tier 1: Per-Level Atomicity (for hot agents like Vikram)
Instead of one cross-agent transaction, each level's ledger update is independent:
BET PROCESSING FOR HOT AGENTS
================================
Step 1: Process at Rajesh's level
BEGIN TRANSACTION
Lock Rajesh's exposure shard (random shard)
Check Rajesh's limit
Calculate Rajesh's retention vs forwarding
Write Rajesh's position
Update Rajesh's exposure shard
COMMIT
→ Output: ₹4,000 forwarded to Vikram
Step 2: Process at Vikram's level (separate transaction)
BEGIN TRANSACTION
Lock Vikram's exposure shard (random shard)
Check Vikram's limit
Calculate Vikram's retention vs forwarding
Write Vikram's position
Update Vikram's exposure shard
COMMIT
→ Output: ₹1,600 forwarded to Platform
Step 3: Process at Platform level (separate transaction)
BEGIN TRANSACTION
Lock Platform's exposure shard (random shard)
Write Platform's position
Update Platform's exposure shard
Queue hedge order
COMMIT
What happens if Step 2 fails after Step 1 succeeds?
Rajesh's position is created but Vikram's is not. The system enters a partial routing state. This is handled by:
- A
routing_statusfield on the bet:PARTIAL(not all levels processed) - A background retry job picks up partial bets within 1 second
- The retry job completes the remaining levels
- If retry also fails 3 times, the bet enters the dead letter queue
- The dead letter queue handler can either complete the routing or reverse Rajesh's position
The key insight: a partial routing state is not dangerous. Rajesh has already retained his portion. The forwarded amount simply has not been allocated to Vikram yet. The exposure is "in transit" -- it is neither overcounted nor undercounted at the system level, because the total stake is still ₹10,000.
Tier 2: Cross-Level Atomicity (for normal agents)
For agents with moderate traffic (under 10 bets/sec touching their ledger), the original single-transaction approach works fine. One transaction locks all relevant shards across all levels, writes all positions, and commits atomically.
Tier 3: Eventually Consistent (for the Platform level)
The Platform is the final level in every cascade. It receives the most traffic (every bet eventually reaches the Platform). The Platform's ledger can be updated asynchronously:
- The bet processing pipeline creates positions for all agent levels synchronously
- The Platform's exposure ledger is updated via an async counter increment in Redis
- A background job reconciles the Redis counter with the PostgreSQL ledger every 5 seconds
- The Platform's hedge queue reads from the Redis counter for real-time decisions
This works because the Platform has the deepest pockets and the highest limits. A 5-second lag in the Platform's exposure ledger does not create meaningful risk. The Platform's limits are set with sufficient headroom to absorb any lag.
Agent Classification for Atomicity Tier
| Agent Characteristic | Atomicity Tier | Shard Count | Reasoning |
|---|---|---|---|
| Top-level agent with 10+ sub-agents | Tier 1 (per-level) | 16 shards | Highest contention, needs maximum parallelism |
| Mid-level agent with 3-9 sub-agents | Tier 2 (cross-level) | 8 shards | Moderate contention, single transaction still viable |
| Leaf agent with direct punters only | Tier 2 (cross-level) | 4 shards | Low contention, simplest approach |
| Platform | Tier 3 (eventual) | 32 shards | Highest throughput, can tolerate lag |
Optimized Locking Strategy for Vikram
Deadlock prevention: By always processing levels in order (Level 1 first, then Level 2, then Level 3, etc.), and by using per-level transactions for hot agents, deadlocks are structurally impossible. Two bets from different sub-agents under Vikram might contend on the same Vikram shard, but they will never hold conflicting locks across levels because each level is a separate transaction.
39. Audit Trail Storage Architecture (MEDIUM)
Hot/Warm/Cold Storage Tiers
| Tier | Age of Data | Storage | Indexes | Query Latency Target | Cost |
|---|---|---|---|---|---|
| HOT | 0-7 days | PostgreSQL Primary (same DB, audit schema) | Full indexes on bet_id, agent_id, user_id, created_at, event_id | P99 < 50ms | Highest |
| WARM | 7-90 days | PostgreSQL (separate tablespace on slower disk, or read replica) | Reduced indexes: bet_id, agent_id + created_at composite only | P99 < 500ms | Medium |
| COLD | 90+ days | Compressed Parquet files on object storage (S3-compatible or local NAS) | External index in PostgreSQL (bet_id → file + offset mapping) | P99 < 5 seconds | Lowest |
Retention Policies
| Data | Hot Retention | Warm Retention | Cold Retention | Total Retention |
|---|---|---|---|---|
| Audit trail records | 7 days | 90 days | 3 years | 3 years |
| Bet records | 30 days | 1 year | 5 years | 5 years |
| Position records | 30 days | 1 year | 5 years | 5 years |
| Settlement records | 30 days | 1 year | 7 years | 7 years (regulatory) |
| Exposure ledger snapshots | 7 days (hourly snapshots) | 90 days (daily snapshots) | 1 year | 1 year |
The Append-Only Audit Store
The audit trail uses an append-only design. Records are never updated or deleted in place. Corrections or amendments are stored as new records that reference the original.
AUDIT RECORD STRUCTURE
========================
Each record contains:
- record_id (UUID, primary key)
- bet_id (UUID, indexed)
- record_type: BET_PLACED, BET_SETTLED, POSITION_CORRECTED, CONFIG_CHANGED
- agent_id (indexed)
- user_id (indexed)
- event_id (indexed -- for looking up all bets on a specific match)
- created_at (indexed, partition key)
- payload (JSONB -- the full structured audit data)
- checksum (SHA-256 hash of the payload, for tamper detection)
- previous_checksum (hash of the previous record for this bet_id, creating a chain)
The previous_checksum field creates a hash chain similar to a blockchain. Each new audit record for a given bet references the checksum of the previous record. This makes tampering detectable: if any record in the chain is modified, all subsequent checksums become invalid.
Indexing Strategy
Hot tier indexes (full):
| Index | Columns | Purpose |
|---|---|---|
| Primary key | record_id | Unique lookup |
| bet_lookup | bet_id | Find all audit records for a specific bet |
| agent_time | agent_id, created_at DESC | Agent's recent activity, dispute resolution |
| user_time | user_id, created_at DESC | User's betting history |
| event_lookup | event_id, created_at DESC | All bets on a specific match |
| type_time | record_type, created_at DESC | Find all settlements, all corrections, etc. |
Warm tier indexes (reduced):
| Index | Columns | Purpose |
|---|---|---|
| Primary key | record_id | Unique lookup |
| bet_lookup | bet_id | Dispute resolution (most common warm-tier query) |
| agent_time | agent_id, created_at DESC | Historical agent queries |
Cold tier indexes (external):
A separate mapping table in PostgreSQL:
| Column | Type | Description |
|---|---|---|
| bet_id | UUID | The bet to look up |
| file_path | TEXT | Path to the Parquet file in object storage |
| row_offset | INTEGER | Row position within the file |
| month | DATE | Month partition of the cold data |
Query Performance for Dispute Resolution
Common dispute queries and their targets:
| Query | Expected Latency | Tier | How |
|---|---|---|---|
| "Show me bet XYZ's full audit trail" | P99 < 50ms | Hot (if recent) | Index lookup on bet_id |
| "Show me all of Rajesh's bets last night" | P99 < 200ms | Hot | Range scan on agent_id + created_at |
| "Show me all bets on MI vs CSK final" | P99 < 500ms | Hot | Range scan on event_id |
| "Show me bet XYZ from 2 months ago" | P99 < 500ms | Warm | Index lookup on bet_id in warm partition |
| "Show me bet XYZ from last year" | P99 < 5s | Cold | Lookup mapping table, fetch from object storage |
Storage Cost Projections
| Tier | Volume After Year 1 | Storage Cost (approximate, cloud) | Total Annual |
|---|---|---|---|
| Hot (SSD, indexed) | 18 GB (7 days of audit) | $0.25/GB/month | $54/year |
| Warm (HDD, partial indexes) | 150 GB (83 days of audit) | $0.05/GB/month | $90/year |
| Cold (object storage, compressed) | 50 GB (compressed from ~200 GB) | $0.01/GB/month | $6/year |
| Total | ~$150/year |
Storage costs are negligible. The real cost is in compute (query processing) and IOPS (index maintenance). The tiered approach ensures that the expensive hot tier stays small while the cheap cold tier absorbs the bulk.
Tier Migration Job
A nightly job moves records between tiers:
NIGHTLY TIER MIGRATION JOB (runs at 4:00 AM IST)
==================================================
1. HOT → WARM migration:
- SELECT records WHERE created_at < NOW() - INTERVAL '7 days'
- INSERT into warm partition
- DELETE from hot partition
- Rebuild hot tier indexes (REINDEX CONCURRENTLY)
- Expected duration: 2-5 minutes
2. WARM → COLD migration:
- SELECT records WHERE created_at < NOW() - INTERVAL '90 days'
- Export to Parquet file (one file per day, compressed)
- Upload to object storage
- INSERT mapping rows into cold_index table
- DELETE from warm partition
- Expected duration: 5-15 minutes
3. COLD expiry:
- DELETE mapping rows WHERE month < NOW() - INTERVAL '3 years'
- Delete corresponding Parquet files from object storage
- Expected duration: < 1 minute
40. Horizontal Scaling for the Cascade Engine (MEDIUM)
Partitioning by Top-Level Agent Subtree
The cascade engine processes bets through agent hierarchies. The natural partition boundary is the top-level agent subtree. All agents and punters under Vikram form one subtree; all agents and punters under another master agent form a separate subtree.
Why this partitioning works:
- A bet from Amit (under Rajesh, under Vikram) ONLY touches Rajesh's and Vikram's ledgers at the agent level. It never touches Suresh's or Kumar's data. So Partition A can process independently of Partition B.
- The Platform level is the convergence point, but it uses the eventually-consistent model from Section 38 (Tier 3 atomicity), so it does not create cross-partition locking.
- Each partition can run on a separate application instance or thread pool, with its own Redis key space for exposure counters.
How Cross-Agent Detection Works Across Partitions
The syndicate detection problem from Section 13 requires cross-agent visibility. If a syndicate member bets through Rajesh (Partition A) and also through Arun (Partition B), neither partition alone can detect the correlation.
Solution: a separate detection service that reads from all partitions.
Each partition publishes a lightweight bet event to a shared Redis Stream after completing the bet. The event contains: user_id, agent_id, event_id, outcome, stake, timestamp. The cross-agent detection service consumes this stream and maintains a sliding window of recent bets, looking for correlation patterns.
The detection service does NOT participate in the bet processing pipeline. It runs asynchronously. If it detects a syndicate pattern, it publishes a flag that the relevant partitions pick up on their next bet from the flagged user.
Load Balancing Strategy
| Strategy | How It Works | When to Use |
|---|---|---|
| Agent-affinity routing | All bets for a given top-level subtree go to the same instance | Default strategy. Load balancer uses a consistent hash of the top-level agent_id |
| Overflow routing | If the assigned instance is overloaded (queue depth > threshold), bets overflow to any available instance | During traffic spikes on a single subtree (e.g., Vikram's agents during IPL final) |
| Hot-agent splitting | A single subtree is split across 2+ instances, with sub-agents assigned to different instances | When a single master agent's traffic exceeds one instance's capacity |
The load balancer (nginx or application-level) maintains a routing table:
ROUTING TABLE
===============
Vikram subtree → Instance 1 (primary), Instance 2 (overflow)
Suresh subtree → Instance 2 (primary), Instance 3 (overflow)
Kumar subtree → Instance 3 (primary), Instance 1 (overflow)
Platform cascade → All instances (round-robin, eventually consistent)
Handling Agent Hierarchy Changes That Cross Partitions
When an agent moves from one master agent to another (e.g., Rajesh leaves Vikram and joins Suresh), the partition assignment changes:
- Admin initiates the transfer via API
- System sets Rajesh's status to
TRANSFERRING-- no new bets accepted for Rajesh's punters (brief pause, typically 2-5 seconds) - All in-flight bets for Rajesh's punters complete on the old partition
- Rajesh's exposure ledger state is frozen and serialized
- Routing table is updated: Rajesh's punters now route to Suresh's partition
- Rajesh's exposure state is loaded into the new partition
- Rajesh's status is set to
ACTIVEon the new partition - New bets resume
The transfer window (2-5 seconds of paused betting) is acceptable because hierarchy changes are rare admin operations, not real-time events. During the transfer, punters see "placing bet..." for a few extra seconds rather than an error.
Deployment Diagram
┌─────────────────────┐
│ Load Balancer │
│ (Agent-Affinity │
│ Consistent Hash) │
└─────────┬────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌─────────▼─────────┐ ┌─────────▼─────────┐ ┌─────────▼─────────┐
│ Instance 1 │ │ Instance 2 │ │ Instance 3 │
│ Vikram subtree │ │ Suresh subtree │ │ Kumar subtree │
│ + overflow for │ │ + overflow for │ │ + overflow for │
│ Kumar │ │ Vikram │ │ Suresh │
│ │ │ │ │ │
│ Cascade Engine │ │ Cascade Engine │ │ Cascade Engine │
│ Matrix Resolver │ │ Matrix Resolver │ │ Matrix Resolver │
│ Limit Checker │ │ Limit Checker │ │ Limit Checker │
└────────┬───────────┘ └────────┬───────────┘ └────────┬───────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌───────▼──┐ ┌──────▼───┐ ┌─────▼────┐
│ Redis │ │ PG │ │ Cross- │
│ Cluster │ │ Primary │ │ Agent │
│ (shared) │ │ + Replicas│ │ Detector │
└──────────┘ └──────────┘ └──────────┘
41. Monitoring and Alerting System (MEDIUM)
Key Metrics per Pipeline Stage
Bet Processing Pipeline:
| Stage | Metric | Collection Method | Alert Threshold |
|---|---|---|---|
| Request ingestion | bet.requests_per_second | Counter, per instance | > 200/sec (approaching capacity) |
| Request ingestion | bet.request_parse_error_rate | Counter | > 1% of requests |
| Validation | bet.validation_failure_rate | Counter | > 10% (potential attack) |
| User win cap check | bet.win_cap_latency_p99 | Histogram | > 15ms |
| Stake reduction | bet.stake_reduction_rate | Counter | > 20% of bets (limits may be too low) |
| Matrix resolution | bet.matrix_resolve_latency_p99 | Histogram | > 20ms |
| Matrix resolution | bet.matrix_cache_miss_rate | Counter | > 30% (cache issue) |
| Agent cap check | bet.cap_check_latency_p99 | Histogram | > 30ms per level |
| Exposure ledger | bet.exposure_update_latency_p99 | Histogram | > 25ms |
| Position creation | bet.position_write_latency_p99 | Histogram | > 20ms |
| Audit write | bet.audit_write_latency_p99 | Histogram | > 15ms |
| End-to-end | bet.total_latency_p99 | Histogram | > 90ms (SLA breach) |
| End-to-end | bet.total_latency_p50 | Histogram | > 40ms (performance degradation) |
| End-to-end | bet.success_rate | Counter | < 99.5% |
Hedge Execution Pipeline:
| Metric | Collection Method | Alert Threshold |
|---|---|---|
hedge.queue_depth | Gauge | > 50 orders (backlog building) |
hedge.execution_latency_p99 | Histogram | > 2 seconds |
hedge.betfair_api_latency_p99 | Histogram | > 1 second |
hedge.partial_fill_rate | Counter | > 40% (liquidity problem) |
hedge.unhedged_exposure_total | Gauge | > 10 lakh (risk accumulation) |
hedge.betfair_error_rate | Counter | > 5% (API degradation) |
hedge.slippage_average | Gauge | > 0.05 (pricing problem) |
Settlement Pipeline:
| Metric | Collection Method | Alert Threshold |
|---|---|---|
settlement.latency_p99 | Histogram | > 30 seconds per event |
settlement.failure_rate | Counter | > 0.1% (any settlement failure is serious) |
settlement.idempotency_collision_rate | Counter | > 0 (should be zero in normal operation) |
settlement.reconciliation_drift | Gauge | > ₹1,000 (ledger mismatch) |
Infrastructure Metrics:
| Metric | Alert Threshold |
|---|---|
redis.latency_p99 | > 5ms |
redis.memory_usage_percent | > 80% |
redis.connection_pool_exhaustion | > 90% |
postgres.active_connections | > 80% of max |
postgres.replication_lag_seconds | > 5 seconds |
postgres.lock_wait_time_p99 | > 100ms |
postgres.dead_tuples_ratio | > 20% (vacuum falling behind) |
Alert Thresholds and Escalation Paths
ESCALATION MATRIX
==================
P1 - CRITICAL (immediate response required)
Who: On-call engineer (PagerDuty) + Engineering lead
When: 24/7
SLA: Acknowledge within 5 minutes, resolve within 30 minutes
Examples:
- bet.total_latency_p99 > 200ms for 2 minutes
- bet.success_rate < 95% for 1 minute
- settlement.failure_rate > 1% for any settlement batch
- hedge.unhedged_exposure_total > 50 lakh
- postgres primary down or unreachable
- redis primary down or unreachable
P2 - HIGH (response within 1 hour)
Who: On-call engineer (Slack + PagerDuty)
When: Business hours + match hours
SLA: Acknowledge within 15 minutes, resolve within 2 hours
Examples:
- bet.total_latency_p99 > 90ms for 5 minutes
- bet.matrix_cache_miss_rate > 50% for 5 minutes
- hedge.betfair_api_latency_p99 > 2 seconds for 5 minutes
- settlement.reconciliation_drift > ₹10,000
- postgres.replication_lag_seconds > 30
P3 - MEDIUM (next business day)
Who: Engineering team (Slack channel)
When: Business hours
SLA: Acknowledge within 4 hours, resolve within 24 hours
Examples:
- bet.stake_reduction_rate > 30% for 1 hour
- redis.memory_usage_percent > 80%
- postgres.dead_tuples_ratio > 20%
- audit.write_lag > 5 seconds
P4 - LOW (weekly review)
Who: Engineering team (weekly metrics review)
Examples:
- bet.total_latency_p50 trending upward over 7 days
- hedge.slippage_average trending upward over 7 days
- storage utilization approaching 70%
Dashboard Design for Ops Team
Dashboard 1: Real-Time Operations (primary display during matches)
┌─────────────────────────────────────────────────────────────────────────┐
│ HANNIBAL OPS DASHBOARD 2026-03-15 21:47 │
├─────────────────────────┬───────────────────────────────────────────────┤
│ SYSTEM HEALTH │ BET THROUGHPUT (last 5 min) │
│ │ │
│ Bet Pipeline: 🟢 OK │ ████████████████████░░ 167/sec │
│ Hedge Engine: 🟢 OK │ Peak today: 203/sec (21:32) │
│ Settlement: 🟢 OK │ P99 latency: 72ms │
│ Redis: 🟢 OK │ Success: 99.92% │
│ PostgreSQL: 🟢 OK │ │
│ Betfair API: 🟡 SLOW │ Error breakdown: │
│ │ Validation: 12/min │
│ │ Timeout: 0/min │
│ │ DB Error: 0/min │
├─────────────────────────┼───────────────────────────────────────────────┤
│ EXPOSURE BY AGENT │ HEDGE STATUS │
│ (top 10 by %) │ │
│ │ Queue depth: 3 orders │
│ 1. Rajesh 76% ████░ │ Unhedged total: ₹2.1L │
│ 2. Vikram 54% ███░░ │ Betfair latency: 850ms ⚠ │
│ 3. Priya 41% ██░░░ │ Fill rate: 94% │
│ 4. Sanjay 38% ██░░░ │ Avg slippage: 0.02 │
│ 5. Arun 22% █░░░░ │ │
│ │ Last 10 hedges: │
│ NO_NEW_RISK active: 0 │ 21:45 MI 1.85 → filled 1.86 ✓ │
│ │ 21:44 CSK 2.10 → filled 2.10 ✓ │
│ │ 21:43 Draw 3.50 → partial 60% ⚠ │
├─────────────────────────┴───────────────────────────────────────────────┤
│ ACTIVE ALERTS │
│ │
│ ⚠ 21:45 Betfair API latency elevated (850ms, threshold 500ms) │
│ Status: Auto-monitoring, no action needed yet │
│ │
│ ✓ 21:30 Rajesh night limit at 84% - INFO (auto-resolved) │
│ │
│ [View All Alerts] [Silence Non-Critical] [Run Health Check] │
└─────────────────────────────────────────────────────────────────────────┘
Dashboard 2: Reconciliation & Financial (daily review)
Dashboard 3: Agent Health (support team view)
Specific Alert Definitions
Cache miss rate spike:
| Field | Value |
|---|---|
| Alert name | cache_miss_rate_spike |
| Metric | bet.matrix_cache_miss_rate OR bet.exposure_cache_miss_rate |
| Condition | > 50% for 3 consecutive minutes |
| Severity | P2 |
| Probable cause | Redis memory pressure, network partition, config invalidation storm |
| Auto-mitigation | None (requires investigation) |
| Run book action | Check Redis memory, check recent config change frequency, verify pub/sub connectivity |
Betfair API latency:
| Field | Value |
|---|---|
| Alert name | betfair_api_degraded |
| Metric | hedge.betfair_api_latency_p99 |
| Condition | > 1 second for 2 consecutive minutes |
| Severity | P2 (escalate to P1 if > 5 seconds for 5 minutes) |
| Probable cause | Betfair infrastructure issue, network routing, API rate limiting |
| Auto-mitigation | Increase hedge retry delay, reduce concurrent hedge requests |
| Run book action | Check Betfair status page, check outbound network, verify API key validity |
Exposure ledger drift:
| Field | Value |
|---|---|
| Alert name | exposure_ledger_drift |
| Metric | settlement.reconciliation_drift |
| Condition | > ₹1,000 for any agent |
| Severity | P1 (financial accuracy issue) |
| Probable cause | Race condition in ledger update, missed settlement, partial transaction commit |
| Auto-mitigation | Trigger immediate recompute for the affected agent |
| Run book action | Run manual recompute, compare with position sum, identify the divergent bet |
Settlement failure rate:
| Field | Value |
|---|---|
| Alert name | settlement_failure_elevated |
| Metric | settlement.failure_rate |
| Condition | > 0.1% for any settlement batch |
| Severity | P1 |
| Probable cause | DB connection exhaustion, data inconsistency, event result ambiguity |
| Auto-mitigation | Retry failed settlements 3 times with exponential backoff |
| Run book action | Check DB connections, inspect failed settlement IDs, verify event results |
Run Book Topics
| Topic | What to Do |
|---|---|
| Redis primary down | System auto-falls back to PostgreSQL for all reads. Monitor bet latency (will increase to 10-20ms from <1ms). Restart Redis. After restart, run redis-warmup script to repopulate exposure counters from PostgreSQL. |
| PostgreSQL primary down | Critical outage. All bet placement fails. Switch to read replica as emergency primary (manual failover). Accept data loss risk for last few seconds of unreplicated writes. After recovery, reconcile. |
| Bet latency spike (P99 > 200ms) | Check PostgreSQL lock wait times. If elevated, identify the hot agent (likely the one with the most bets/sec) and increase their shard count temporarily. Check Redis latency. Check for long-running queries on the primary. |
| Betfair completely unreachable | Hedge queue will grow. Platform absorbs all hedge-intended risk. Monitor hedge.unhedged_exposure_total. If it exceeds 50 lakh, consider temporarily increasing all agents' forward percentages to reduce platform risk. |
| Single agent's exposure ledger diverges from position sum | Run reconciliation recompute --agent=AGENT_ID. Compare the recomputed total with the current ledger value. If they differ, the ledger is stale. Update the ledger to match the position sum. Investigate the root cause (check for missed settlement, partial commit). |
| Configuration change not propagating | Check Redis pub/sub channel. Verify all instances are subscribed. Manually trigger a cache flush on all instances via admin API endpoint /admin/cache/flush?agent_id=X. |
| Surge of stake reductions | Indicates user win limits are being hit frequently. Check if a single user is hammering the system (potential abuse). Check if limits were accidentally lowered. Review the agent's win cap configuration. |
42. Reconciliation System (HIGH)
What Is Reconciled
The reconciliation system verifies that the exposure ledger (the fast-access counter that the bet processing pipeline reads) matches the actual sum of open positions. These are two independent sources of the same truth, and they can drift apart due to bugs, partial commits, or race conditions.
| Reconciliation Check | Source A (Expected) | Source B (Actual) | Acceptable Drift |
|---|---|---|---|
| Agent retained exposure | exposure_ledger.retained_open_liability | SUM of positions.liability WHERE agent=X AND status=OPEN AND type=RETAINED | ₹0 (zero tolerance) |
| Agent forwarded exposure | exposure_ledger.forwarded_open_liability | SUM of positions.liability WHERE agent=X AND status=OPEN AND type=FORWARDED | ₹0 (zero tolerance) |
| Agent potential win | exposure_ledger.open_potential_win | SUM of positions.potential_win WHERE agent=X AND status=OPEN | ₹0 (zero tolerance) |
| Stake conservation | Original bet stake | SUM of all positions for that bet (retained + forwarded across all levels) | ₹0 (absolute conservation) |
| Settlement completeness | Count of positions for settled event | Count of settlement records for that event | 0 (all positions must be settled) |
The Reconciliation Job Workflow
How Discrepancies Are Flagged and Categorized
Each discrepancy record contains:
| Field | Description | Example |
|---|---|---|
| discrepancy_id | Unique identifier | disc_a1b2c3 |
| agent_id | Affected agent | rajesh_mumbai |
| scope | Which scope diverged | cricket_night_2026_03_15 |
| ledger_value | What the exposure ledger says | ₹15,00,000 |
| computed_value | What the position sum says | ₹13,50,000 |
| drift_amount | The difference | ₹1,50,000 |
| drift_direction | LEDGER_HIGH or LEDGER_LOW | LEDGER_HIGH |
| detected_at | When it was found | 2026-03-15T22:00:00Z |
| detection_method | Which reconciliation job found it | SCHEDULED_15MIN |
| category | MINOR, MAJOR, CRITICAL | CRITICAL |
| resolution_status | OPEN, INVESTIGATING, RESOLVED, AUTO_CORRECTED | OPEN |
| root_cause | Filled in during investigation | Partial commit on bet_xyz at 21:47 |
The Manual Recompute Tool
The recompute tool is the primary remediation mechanism. It reconstructs the exposure ledger value from scratch by summing all open positions.
RECOMPUTE PROCEDURE
=====================
Command: reconciliation recompute --agent=rajesh_mumbai --scope=cricket
Step 1: Acquire advisory lock on agent+scope (prevents concurrent bets from modifying positions)
Step 2: SELECT SUM(liability) as retained FROM positions
WHERE agent_id='rajesh_mumbai' AND sport='cricket'
AND status='OPEN' AND position_type='RETAINED'
Step 3: SELECT SUM(liability) as forwarded FROM positions
WHERE agent_id='rajesh_mumbai' AND sport='cricket'
AND status='OPEN' AND position_type='FORWARDED'
Step 4: SELECT SUM(potential_win) as potential_win FROM positions
WHERE agent_id='rajesh_mumbai' AND sport='cricket'
AND status='OPEN'
Step 5: Compare with current ledger values
Step 6: If different:
UPDATE exposure_ledger SET
retained_open_liability = [computed retained],
forwarded_open_liability = [computed forwarded],
open_potential_win = [computed potential_win]
WHERE agent_id='rajesh_mumbai' AND scope='cricket'
Step 7: Log the correction with before/after values
Step 8: Release advisory lock
Step 9: Update Redis with new ledger values
Duration: 1-5 seconds per agent per scope
Impact: Agent's bets are briefly delayed (advisory lock held), not rejected
Tracking Drift Over Time to Detect Systemic Bugs
Every reconciliation result is stored in a time-series table:
| Column | Type | Description |
|---|---|---|
| check_id | UUID | Unique identifier |
| agent_id | TEXT | Agent |
| scope | TEXT | Scope (sport, market, period) |
| checked_at | TIMESTAMP | When the check ran |
| ledger_value | BIGINT | Ledger amount in paisa |
| computed_value | BIGINT | Position sum in paisa |
| drift_amount | BIGINT | Difference in paisa |
| drift_direction | TEXT | ZERO, LEDGER_HIGH, LEDGER_LOW |
A weekly analysis job examines this table for patterns:
- Same agent drifting repeatedly: Indicates a bug in that agent's specific configuration or traffic pattern
- All agents drifting in the same direction: Indicates a systemic bug in the exposure update logic
- Drift correlating with high traffic periods: Indicates a race condition that manifests under load
- Drift correlating with settlement batches: Indicates a bug in the settlement ledger decrement logic
Walk Through: Rajesh's Ledger Shows 15 Lakh, Actual Positions Sum to 13.5 Lakh
Situation: The scheduled 15-minute reconciliation job runs at 10:00 PM. It finds that Rajesh's cricket retained_open_liability ledger reads ₹15,00,000, but the sum of all his open retained cricket positions is only ₹13,50,000. The drift is ₹1,50,000 (LEDGER_HIGH, CRITICAL category).
What happened (likely root cause):
At 9:47 PM, a settlement batch processed the MI vs RR match. Rajesh had ₹1,50,000 of retained positions on that match. The settlement correctly set those positions to status=SETTLED, but the ledger decrement failed (perhaps due to a transient database connection error). The settlement service logged the error and moved on. The positions are settled, but the ledger still thinks they are open.
Investigation steps:
INVESTIGATION LOG
==================
10:00 PM - Reconciliation detects drift: ₹15L ledger vs ₹13.5L positions
10:00 PM - P1 alert fired. Rajesh switched to DB-PATH only
10:02 PM - On-call engineer acknowledges
10:03 PM - Engineer runs: reconciliation investigate --agent=rajesh_mumbai --scope=cricket
Output: "Ledger is ₹1,50,000 higher than position sum.
Last settlement batch at 9:47 PM settled 12 positions for MI vs RR.
Settlement records exist for all 12 positions.
Ledger decrement for MI vs RR settlement: NOT FOUND.
Root cause: Settlement decremented positions but failed to decrement ledger."
10:05 PM - Engineer runs: reconciliation recompute --agent=rajesh_mumbai --scope=cricket
Output: "Ledger updated from ₹15,00,000 to ₹13,50,000.
Correction: -₹1,50,000.
Audit record created: recompute_abc123.
Redis updated."
10:06 PM - Engineer verifies Rajesh's dashboard shows correct exposure
10:06 PM - Rajesh switched back from DB-PATH only to normal caching
10:07 PM - P1 alert resolved with root cause documented
10:08 PM - Bug ticket created: "Settlement ledger decrement lacks retry logic.
When the DB connection fails during decrement, the error is logged
but the decrement is not retried. Fix: add retry with 3 attempts
and dead letter queue for persistent failures."
43. Hedge Execution Engine (CRITICAL)
Design Overview
The hedge execution engine is responsible for placing bets on Betfair to offset the platform's retained risk. It is a separate service that consumes a hedge order queue and executes against the Betfair API.
Limit Order Placement with Configurable Max Slippage
Every hedge order is placed as a limit order on Betfair, not a market order. This prevents the platform from being filled at arbitrarily bad prices during volatile moments.
| Parameter | Description | Default | Configurable Per |
|---|---|---|---|
max_slippage_ticks | Maximum number of price ticks worse than the target price that the system will accept | 3 ticks | Per sport, per market type |
target_price | The price at which the punter's bet was accepted | From bet record | Per bet |
limit_price | The worst price the system will accept: target_price + max_slippage_ticks | Computed | Computed |
order_size | Amount to hedge in GBP equivalent | From cascade output | Per bet |
time_in_force | How long the order stays active before cancellation | 30 seconds for pre-match, 10 seconds for in-play | Per event phase |
Example: Amit's bet on MI at 1.85, platform needs to hedge ₹800 (approx £7.50).
Target price: 1.85 (back MI)
Max slippage: 3 ticks
Betfair price ladder: 1.85, 1.86, 1.87, 1.88, 1.89, 1.90 ...
Limit price: 1.88 (3 ticks worse than 1.85)
Order: BACK MI at 1.88 or better, size £7.50
Time in force: 30 seconds (pre-match)
If the best available price on Betfair is 1.86, the order fills at 1.86 (within slippage). If the best available price is 1.92, the order sits in the market at 1.88 for 30 seconds. If not filled, it is cancelled and re-evaluated.
Partial Fill Tracking and Re-Pricing Strategy
Betfair often provides partial liquidity. The hedge engine must track partial fills and decide whether to pursue the remainder.
PARTIAL FILL EXAMPLE
=====================
Hedge order: BACK MI £100 at limit 1.88
Betfair response: Filled £60 at 1.86, £40 unmatched
State after partial fill:
Hedged: £60 at 1.86
Unhedged: £40 at limit 1.88 (still in market)
After 10 seconds (in-play time_in_force):
Remaining £40 still unmatched
Decision tree:
1. Current best price on Betfair: 1.91
2. 1.91 > limit price 1.88 → cannot fill at current prices
3. Is £40 worth re-pricing? £40 > £5 threshold → YES
4. New limit price: 1.91 + 2 ticks = 1.93 (allow MORE slippage for the remainder)
5. Place new order: BACK MI £40 at 1.93
6. If this also partially fills or expires, repeat up to max_reprice_attempts (3)
7. After 3 re-price attempts, accept the unhedged remainder as platform risk
Re-pricing rules:
| Attempt | Slippage Allowed | Time in Force | Rationale |
|---|---|---|---|
| Initial | 3 ticks | 30s pre-match / 10s in-play | Conservative first attempt |
| Re-price 1 | 5 ticks from current market | 20s pre-match / 5s in-play | More aggressive, shorter wait |
| Re-price 2 | 8 ticks from current market | 10s pre-match / 3s in-play | Even more aggressive |
| Re-price 3 (final) | Market order equivalent (1000 ticks) | 5s | Last resort -- get filled at any price |
| After all attempts | N/A | N/A | Accept as unhedged platform risk |
Execution Quality Reporting
Every hedge order produces an execution quality record:
| Field | Description | Example |
|---|---|---|
| hedge_order_id | Unique identifier | hedge_x1y2z3 |
| bet_id | The originating bet | bet_a1b2c3 |
| target_price | Price the punter received | 1.85 |
| achieved_price | Weighted average fill price | 1.87 |
| slippage | achieved_price - target_price | 0.02 |
| fill_rate | Percentage of order filled | 85% |
| fill_time_ms | Time from order placement to full fill | 4,200ms |
| reprice_count | How many times the order was re-priced | 1 |
| unhedged_amount | Amount left unhedged | £6.25 |
Daily execution quality report:
HEDGE EXECUTION QUALITY - March 15, 2026
==========================================
Total hedge orders: 342
Fully filled: 287 (83.9%)
Partially filled: 41 (12.0%)
Unfilled (unhedged): 14 (4.1%)
Average slippage: 0.024 (2.4 ticks)
Worst slippage: 0.08 (8 ticks) -- KKR vs PBKS in-play
Best execution: -0.01 (better than target, market moved in our favor)
Total intended hedge: £12,400
Total actually hedged: £11,650 (93.9%)
Total unhedged: £750 (6.1%)
Slippage cost (vs perfect execution): £48.20
Unhedged Exposure Tracker
The unhedged exposure tracker maintains a real-time view of exposure that SHOULD be hedged but is NOT hedged, separate from deliberately retained risk.
UNHEDGED EXPOSURE DASHBOARD
==============================
Total unhedged: ₹2,34,000
By reason:
Betfair no liquidity: ₹1,20,000 (3 orders)
Betfair API timeout: ₹45,000 (1 order, retrying)
Slippage exceeded: ₹69,000 (2 orders, re-pricing)
By event:
MI vs CSK (Live): ₹1,65,000 ⚠ (largest single event)
RCB vs DC (Pre): ₹42,000
KKR vs SRH (Pre): ₹27,000
Aging:
< 1 minute: ₹45,000 (still in progress)
1-5 minutes: ₹69,000 (re-pricing)
5-30 minutes: ₹1,20,000 ⚠ (no liquidity, monitor)
> 30 minutes: ₹0
Alert threshold: ₹10,00,000 (₹10 lakh)
Current status: 🟢 Well below threshold
Stale Hedge Cleanup Process
Hedge orders that are older than a configurable threshold without being filled are considered stale. The cleanup process runs every 60 seconds:
STALE HEDGE CLEANUP (runs every 60 seconds)
=============================================
1. Find all hedge orders WHERE:
status = 'PENDING' or 'PARTIALLY_FILLED'
AND created_at < NOW() - stale_threshold
Stale thresholds:
Pre-match events: 5 minutes
In-play events: 60 seconds
Settled events: Immediate (hedge is pointless)
2. For each stale order:
a. Cancel the order on Betfair (if still open)
b. Record the partial fill amount (if any)
c. Mark order as STALE_CANCELLED
d. Move the unhedged amount to the unhedged exposure tracker
e. Fire alert if total unhedged exceeds threshold
3. For orders on settled events:
a. Cancel immediately
b. The platform already bears the outcome as retained risk
c. No further action needed
Queue Management for Hedge Orders During High Volume
During peak betting (IPL finals, 167 bets/sec), the hedge queue can receive 50+ orders per second (assuming ~30% of stake reaches the platform and ~50% of that is hedge-targeted).
| Queue Parameter | Value | Rationale |
|---|---|---|
| Queue technology | Redis Stream | Persistent, supports consumer groups, at-least-once delivery |
| Consumer count | 4 concurrent consumers | Betfair API allows 5 req/sec per app key; 4 consumers with rate limiting |
| Rate limit | 5 orders per second to Betfair | Betfair API throttle limit |
| Priority | In-play hedges prioritized over pre-match | In-play prices change faster; pre-match can wait |
| Batching | Aggregate multiple small hedges on the same selection into one order | ₹800 + ₹600 + ₹400 on same MI back → single £18 order |
| Max queue depth | 200 orders | If exceeded, temporarily increase max_slippage and use more aggressive pricing |
| Deduplication | By bet_id + event_id + selection | Prevent double-hedging from retry logic |
Failover When Betfair Is Slow or Down
Health check: The hedge engine pings Betfair every 5 seconds with a lightweight listMarketBook call. Three consecutive failures (15 seconds) transitions to DOWN status. Three consecutive successes transitions back to HEALTHY. One success from DOWN transitions to DEGRADED.
Walk Through: Platform Needs to Hedge 5 Lakh on MI at 1.85, Only 2 Lakh Liquidity at 1.90
Scenario: Multiple bets have cascaded through the hierarchy. The platform needs to hedge ₹5,00,000 (approximately £4,700) on MI to win. The target price is 1.85. The Betfair order book shows:
BETFAIR ORDER BOOK - MI to win
================================
Back side (we need to back):
1.86: £1,200 available
1.88: £800 available
1.90: £2,000 available ← combined: only £4,000 available to 1.90
1.92: £1,500 available
1.95: £3,000 available
Execution sequence:
Step 1: Place limit order BACK MI £4,700 at 1.88 (target 1.85 + 3 ticks slippage)
Result: Filled £1,200 at 1.86 + £800 at 1.88 = £2,000 filled, £2,700 unmatched
Status: PARTIALLY_FILLED (42.5%)
Step 2: Wait 10 seconds (in-play time_in_force)
No additional fills at 1.88
Step 3: Re-price attempt 1
Current best available: 1.90
New limit: 1.90 + 2 ticks = 1.92
Place: BACK MI £2,700 at 1.92
Result: Filled £2,000 at 1.90 + £700 at 1.92 = £2,700 filled
Status: FULLY_FILLED
Total execution:
£1,200 at 1.86
£800 at 1.88
£2,000 at 1.90
£700 at 1.92
Weighted average price: (1200×1.86 + 800×1.88 + 2000×1.90 + 700×1.92) / 4700 = 1.889
Execution quality:
Target: 1.85
Achieved: 1.889
Slippage: 0.039 (3.9 ticks)
Cost of slippage: £4,700 × 0.039 = £183.30 (approximately ₹19,500)
This ₹19,500 slippage cost is deducted from the hedge effectiveness and
reported in the daily execution quality report.
44. Migration and Backfill Strategy (MEDIUM)
Mapping Existing Flat B-Book Configs to Forwarding Matrix Rules
The existing codebase has bbookConfigService.ts with a flat B-Book percentage per agent. The migration maps each flat config to a forwarding matrix with a single catch-all rule.
MIGRATION MAPPING
==================
Existing config (for Rajesh):
bbook_percentage: 60 (Rajesh keeps 60%)
Becomes forwarding matrix:
| Rule | market_type | sport_type | event_phase | source_type | liquidity_band | Forward % |
|------|-------------|------------|-------------|-------------|----------------|-----------|
| R1 | * | * | * | * | * | 40% |
Agent default forward: 40%
This is functionally identical to the existing behavior.
Migration steps:
- For each agent with a bbook_percentage, create a forwarding_matrix_rules row with all wildcards and forward_percentage = (100 - bbook_percentage)
- Set the agent's default_forward_percentage to the same value
- Mark the migration as
MIGRATED_FROM_FLATin the agent's config for auditability - The agent's existing behavior is completely unchanged
Handling Open Positions During Cutover
Open positions (bets placed before migration, not yet settled) must continue to work correctly under the new system.
Rule: Open positions are NOT re-routed. A bet that was placed under the old system retains its original routing. The new cascade engine only applies to new bets. This means:
- Before cutover: freeze the list of all open bet IDs
- During cutover: deploy the new code with the forwarding matrix enabled (behind feature flag)
- After cutover: new bets go through the cascade engine; open bets settle using the positions that were created under the old system
- Once all pre-cutover bets settle (typically within 1-3 days for cricket), the old routing logic can be removed
Parallel-Run Mode
Before switching any agent to the new cascade engine, run both engines in parallel and compare results:
PARALLEL-RUN MODE
==================
1. A bet arrives for an agent with parallel_run_mode = true
2. EXECUTE on OLD engine:
- Route using flat bbook_percentage
- CREATE real positions (this is what actually runs)
- Record the routing decision as old_routing
3. EXECUTE on NEW engine (shadow mode):
- Route using forwarding matrix + cascade
- DO NOT create positions (shadow only)
- Record the routing decision as new_routing
4. COMPARE:
- Did the new engine produce the same retained amount for this agent? (for migrated flat configs, it should)
- Did the cascade produce valid routing for upline agents?
- Did any limit checks differ?
- Record comparison result
5. REPORT:
- Daily comparison report: X% of bets had identical routing, Y% differed
- For differing bets, show why (new limits kicked in, matrix rule difference, etc.)
- When 100% agreement for 3 consecutive days: agent is ready for cutover
Per-Agent Rollback Plan
Each agent can be individually rolled back from the new cascade engine to the old flat routing:
- Disable feature flag
bbook.cascading_routing.enabledfor the specific agent - New bets immediately revert to flat bbook_percentage routing
- Positions created by the cascade engine remain valid and settle normally
- The agent's forwarding matrix remains in the database (not deleted) for future re-enablement
Rollback does NOT require:
- Database migration reversal
- Deployment of old code
- Reprocessing of any existing bets
Data Migration for Historical Positions and Settlements
Historical positions and settlements from the old system must be migrated to the new schema so that reporting and reconciliation work across the cutover boundary.
| Old System Data | Migration Target | Mapping |
|---|---|---|
| Old position (flat) | positions table with cascade_level = 1 | One position becomes one L1 position + one forwarded position at platform level |
| Old settlement | settlements table with migration_source = 'v1' | Direct mapping, no restructuring needed |
| Old agent config | forwarding_matrix_rules + agent_limits | As described in mapping section above |
| Old bet record | bets table with routing_engine = 'v1' | Direct copy with engine version flag |
Migration is non-destructive: Old tables are renamed with _v1 suffix but NOT dropped until 90 days after successful cutover with no issues.
45. Support Tooling for Dispute Resolution (MEDIUM)
Bet Lookup
The support dashboard provides multiple paths to find a bet:
| Lookup Method | Use Case | Query |
|---|---|---|
| By bet ID | "Show me bet XYZ" | Direct primary key lookup |
| By user + time range | "Show me Amit's bets last night" | user_id + created_at range |
| By agent + time range | "Show me all bets under Rajesh today" | agent_id + created_at range |
| By event | "Show me all bets on MI vs CSK" | event_id lookup |
| By amount | "Show me all bets over ₹1 lakh today" | stake > threshold + created_at range |
| By status | "Show me all unsettled bets from yesterday" | status=OPEN + created_at range |
Audit Trail Visualization
For each bet, the support dashboard renders the cascade as a visual flow:
AUDIT TRAIL VISUALIZATION - Bet bet_a1b2c3d4
==============================================
Amit places ₹50,000 on MI at 1.85
┌──────────────────────────────────────────────────────┐
│ AMIT (Punter) │
│ Stake: ₹50,000 → Reduced to ₹50,000 (no reduction) │
│ Win cap check: ₹42,500 < ₹50,000 limit ✓ │
└──────────────────────┬───────────────────────────────┘
│ ₹50,000
▼
┌──────────────────────────────────────────────────────┐
│ RAJESH (Level 1) │
│ Matrix rule: R3 (MATCH_ODDS + CRICKET + PRE_MATCH) │
│ Forward: 40% → Retains: ₹30,000 │
│ Limit check: Cricket ₹12.3L/₹50L (24.6%) ✓ │
│ Limit check: Match ₹1.5L/₹5L (30%) ✓ │
│ Limit check: Night ₹3.3L/₹10L (33%) ✓ │
│ Overflow: ₹0 │
│ RETAINED: ₹30,000 (₹25,500 liability) │
└──────────────────────┬───────────────────────────────┘
│ ₹20,000
▼
┌──────────────────────────────────────────────────────┐
│ VIKRAM (Level 2) │
│ Matrix rule: V2 (CRICKET + PRE_MATCH) │
│ Forward: 40% → Retains: ₹12,000 │
│ Source type: NORMAL (own classification) │
│ Limit checks: All ✓ │
│ RETAINED: ₹12,000 (₹10,200 liability) │
└──────────────────────┬───────────────────────────────┘
│ ₹8,000
▼
┌──────────────────────────────────────────────────────┐
│ PLATFORM (Level 3) │
│ Retained: ₹4,000 │
│ Hedged: ₹4,000 → Betfair order hedge_x1y2z3 │
│ Hedge status: FILLED at 1.86 (slippage 0.01) │
└──────────────────────────────────────────────────────┘
Timeline:
21:47:12.001 Bet received
21:47:12.004 Win cap check passed
21:47:12.014 Matrix resolved (R3, specificity 3)
21:47:12.025 Rajesh limits checked
21:47:12.036 Vikram limits checked
21:47:12.048 Platform processed
21:47:12.063 All positions created
21:47:12.068 Audit record written
21:47:12.070 Response sent to Amit
Total: 69ms
21:47:12.085 Hedge order queued
21:47:13.200 Hedge order filled on Betfair
Re-Simulate Capability
The "re-simulate" button allows a support agent to replay a bet with the configuration state as it existed at the time of the bet:
- Load the forwarding matrix rules that were active at the bet's timestamp (using the versioned config)
- Load the exposure ledger state as it existed just before the bet (from the audit record)
- Run the cascade engine with these inputs
- Display the result alongside the actual result
- If they match: the system behaved correctly
- If they differ: flag the discrepancy for engineering investigation
Dispute Workflow
| Stage | Description | Actions Available |
|---|---|---|
| OPEN | Dispute filed by agent or user | Assign to support agent, set priority, link to bet(s) |
| INVESTIGATING | Support agent reviewing | View audit trail, re-simulate bet, compare ledgers, add notes |
| PENDING_AGENT | Waiting for agent to provide information | Send request to agent, set response deadline |
| PENDING_ENGINEERING | Requires engineering investigation | Escalate to engineering, provide all context |
| RESOLVED_CORRECT | System was correct, dispute dismissed | Document finding, notify all parties |
| RESOLVED_CORRECTION | System was wrong, correction applied | Apply financial correction, update ledgers, notify all parties |
| RESOLVED_GOODWILL | System was correct, but goodwill credit given | Apply credit, document reason, notify agent |
46. Responsible Gambling Controls (MEDIUM)
Self-Exclusion Mechanism
A punter can self-exclude for a configurable duration (24 hours, 7 days, 30 days, 6 months, permanent). Self-exclusion is the FIRST check in the bet processing pipeline.
| Self-Exclusion Parameter | Description |
|---|---|
| Duration options | 24h, 7d, 30d, 6m, permanent |
| Cooling-off period | Cannot reverse self-exclusion for the first 24 hours |
| Scope | All betting across all agents (cannot bet through any path) |
| Implementation | Redis flag checked before ANY processing (sub-millisecond check) |
| Admin override | Only permanent exclusions can be lifted by admin after 6 months, with verification |
Deposit Limits
| Limit Type | Description | Where in Pipeline |
|---|---|---|
| Daily deposit limit | Maximum deposit in 24 hours | Payment service (before funds reach betting wallet) |
| Weekly deposit limit | Maximum deposit in 7 days | Payment service |
| Monthly deposit limit | Maximum deposit in 30 days | Payment service |
Deposit limits are NOT part of the bet processing pipeline. They are enforced at the payment layer. However, the B-Book system must be aware of them for display purposes (showing the user their remaining deposit capacity on the dashboard).
Session Time Limits
| Feature | Description | Implementation |
|---|---|---|
| Session duration limit | Maximum continuous session time (configurable, default 4 hours) | WebSocket/session middleware sends warning at 80% of limit, auto-logs out at 100% |
| Mandatory break | Minimum break duration after session limit reached (default 15 minutes) | Session creation blocked for break_duration after limit-triggered logout |
| Activity tracker | Track time since last break, number of bets placed, amount wagered | In-memory per-session counter, persisted to DB every 5 minutes |
Reality Check Notifications
Reality check notifications are periodic messages that remind the punter of their activity during the session.
| Trigger | Message Content | Delivery |
|---|---|---|
| Every 60 minutes of play | "You have been playing for 1 hour. Total bets: 23. Net result: -₹4,200." | In-app popup, requires acknowledgment to continue |
| After 10 consecutive losses | "You have had 10 consecutive losses. Consider taking a break." | In-app popup |
| After ₹50,000 total loss in session | "You have lost ₹50,000 this session. Your daily deposit limit is ₹1,00,000." | In-app popup with option to self-exclude |
| Approaching deposit limit | "You have deposited ₹80,000 of your ₹1,00,000 daily limit." | In-app notification |
Where These Hooks Go in the Bet Flow Pipeline
Steps 0, 0.5, and 2 are the responsible gambling checkpoints. They add minimal latency (sub-millisecond for Redis flag checks, zero latency if no check is triggered) but ensure that gambling controls are enforced before any money flows.
Part III: Complete Implementation Architecture
The following sections provide the complete implementation specification for the entire Hannibal B-Book system. Every database table, every API endpoint, every pipeline step, every error case, and every deployment detail is documented. An LLM or developer reading this can build the entire system without asking a single question about design intent, data models, or processing logic.
47. Technology Stack (Confirmed)
Core Technologies
| Technology | Version | Purpose | Why This Choice |
|---|---|---|---|
| Node.js | 20 LTS | Application runtime | Event-loop model handles high concurrency with low overhead. The team already has expertise. Non-blocking I/O is ideal for the many Redis and DB calls in the bet pipeline. |
| TypeScript | 5.x | Language | Type safety prevents entire categories of financial bugs (wrong types for money, missing fields). Domain types (Stake, Liability, ForwardPercentage) enforce correctness at compile time. |
| PostgreSQL | 16 | Primary database | ACID transactions for financial data. FOR UPDATE locking for limit enforcement. Partitioning for scaling. JSONB for flexible audit payloads. |
| Prisma | 5.x | ORM | Type-safe database access. Schema-as-code for migrations. Works with PostgreSQL partitioning through raw queries where needed. |
| Redis | 7.x | Cache, counters, queues, pub/sub | Sub-millisecond reads for exposure checks. Atomic INCRBY for sharded counters. Streams for hedge order queue. Pub/sub for cache invalidation. |
| Docker | Latest | Containerization | Consistent environments across development, staging, production. Docker Compose for local development. |
Additional Technologies
| Technology | Purpose | Why |
|---|---|---|
| Bull (BullMQ) | Job queue for background tasks | Settlement processing, reconciliation jobs, audit tier migration, hedge retry. Built on Redis. Supports delayed jobs, retries, priority queues. |
| Prometheus + Grafana | Metrics and dashboards | Industry standard for monitoring. Prometheus scrapes application metrics. Grafana renders dashboards. AlertManager handles alert routing. |
| Pino | Structured logging | Fast JSON logger for Node.js. Structured logs are queryable. Low overhead even at high throughput. |
| Zod | Runtime validation | Validates all API inputs and configuration at runtime. Complements TypeScript compile-time types with runtime safety. |
| Socket.IO | WebSocket connections | Real-time dashboard updates to agents. Push notifications for alerts. Session management for responsible gambling. |
| node-cron | Scheduled jobs | Period rollovers, reconciliation scheduling, audit tier migration. |
| Helmet + cors | HTTP security | Standard security headers. CORS configuration for dashboard frontend. |
| prom-client | Prometheus metrics | Native Prometheus metrics collection for Node.js. Histograms, counters, gauges. |
Not Included (and Why)
| Technology | Why Not |
|---|---|
| Kafka | Overkill for current throughput (167 bets/sec). Redis Streams provide sufficient queue functionality with simpler operations. Reconsider at 1000+ bets/sec. |
| MongoDB | Financial data requires ACID transactions and relational integrity. PostgreSQL provides both. |
| GraphQL | The API consumers (dashboard frontend, mobile, WhatsApp bot) all have well-defined data needs. REST is simpler, faster, and sufficient. |
| Microservices (separate deployments per service) | For the first season, a modular monolith is simpler to deploy, debug, and operate. Services are separated in code (modules) but deployed as one application. Extract into microservices only when a specific module needs independent scaling. |
48. System Architecture Overview
Complete System Diagram
Communication Patterns
| From | To | Method | Pattern |
|---|---|---|---|
| Client → API | REST API | HTTP/JSON | Request-response, auth via JWT |
| Client → Dashboard | WebSocket | Socket.IO | Real-time push for exposure updates, alerts |
| API → Bet Processing | Function call | In-process | Synchronous (same monolith) |
| Bet Processing → Cascade Engine | Function call | In-process | Synchronous |
| Bet Processing → Hedge Queue | Redis Stream | Async publish | Fire-and-forget from bet pipeline |
| Hedge Worker → Betfair | HTTP | REST API | Rate-limited, with retry |
| Settlement → Bet Processing | BullMQ Job | Async | Settlement jobs queued when events settle |
| Config Change → All Instances | Redis Pub/Sub | Async broadcast | Cache invalidation messages |
| Reconciliation → Alert | BullMQ Job | Async | Discrepancy alerts queued for processing |
Deployment Topology (Production)
PRODUCTION DEPLOYMENT
======================
Application Instances: 3 (behind load balancer)
- Each runs the full modular monolith
- Agent-affinity routing via consistent hash on agent_id header
- Each instance: 2 vCPU, 4 GB RAM
Background Workers: 2
- Instance 4: Settlement worker + Reconciliation worker
- Instance 5: Hedge worker + Audit migration worker + Sharp detection worker
- Each instance: 2 vCPU, 2 GB RAM
PostgreSQL:
- Primary: 4 vCPU, 16 GB RAM, 500 GB SSD
- Read Replica 1: 2 vCPU, 8 GB RAM (dashboard)
- Read Replica 2: 2 vCPU, 8 GB RAM (reporting)
- Audit DB: 2 vCPU, 4 GB RAM, 200 GB HDD (append-only)
Redis:
- Primary: 2 vCPU, 4 GB RAM
- Replica: 1 vCPU, 2 GB RAM (read-only)
Load Balancer: nginx or cloud ALB
Monitoring: Prometheus + Grafana (1 instance)
49. Database Schema Design
Entity-Relationship Diagram
Table: agents
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY, DEFAULT gen_random_uuid() | Unique agent identifier |
| external_id | VARCHAR(100) | UNIQUE, NOT NULL | Human-readable agent ID (e.g., rajesh_mumbai) |
| name | VARCHAR(255) | NOT NULL | Agent display name |
| parent_agent_id | UUID | REFERENCES agents(id), NULLABLE | Upline agent. NULL for top-level agents and platform |
| level | INTEGER | NOT NULL | Hierarchy depth. 0 = platform, 1 = master agent, 2 = sub-agent, etc. |
| status | VARCHAR(20) | NOT NULL, DEFAULT 'ACTIVE' | ACTIVE, SUSPENDED, TRANSFERRING, DEACTIVATED |
| timezone | VARCHAR(50) | NOT NULL, DEFAULT 'Asia/Kolkata' | Agent's local timezone (IANA format) |
| default_forward_percentage | DECIMAL(5,2) | NOT NULL, DEFAULT 50.00 | Fallback forward % when no matrix rule matches |
| night_period_start | TIME | NULLABLE | Night period start in local time |
| night_period_end | TIME | NULLABLE | Night period end in local time |
| weekly_period_start_day | INTEGER | NOT NULL, DEFAULT 1 | 1=Monday, 7=Sunday |
| tier | VARCHAR(20) | NOT NULL, DEFAULT 'TIER_1' | TIER_1, TIER_2, TIER_3 (UX experience tier) |
| is_platform | BOOLEAN | NOT NULL, DEFAULT false | True for the single platform agent |
| platform_retain_percentage | DECIMAL(5,2) | NULLABLE | Only for platform: % to retain vs hedge |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
idx_agents_parenton (parent_agent_id)idx_agents_statuson (status)idx_agents_external_idon (external_id) -- UNIQUE
Table: agent_limits
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| agent_id | UUID | REFERENCES agents(id), NOT NULL | |
| limit_type | VARCHAR(30) | NOT NULL | SPORT, MARKET, NIGHT_PERIOD, WEEKLY_PERIOD |
| sport_type | VARCHAR(30) | NULLABLE | CRICKET, FOOTBALL, TENNIS, KABADDI, etc. NULL for period limits that apply to all sports |
| event_id | VARCHAR(100) | NULLABLE | Only for MARKET type limits. The specific event/market ID |
| limit_amount | BIGINT | NOT NULL | Limit in paisa (1 lakh = 10,000,000 paisa) |
| is_active | BOOLEAN | NOT NULL, DEFAULT true | |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
idx_agent_limits_agent_typeon (agent_id, limit_type)idx_agent_limits_agent_sporton (agent_id, sport_type)- UNIQUE on (agent_id, limit_type, sport_type, event_id) to prevent duplicate limits
Table: forwarding_matrix_rules
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| agent_id | UUID | REFERENCES agents(id), NOT NULL | |
| version | INTEGER | NOT NULL | Incremented on every change. Used for audit snapshot |
| market_type | VARCHAR(30) | NOT NULL, DEFAULT '*' | MATCH_ODDS, FANCY, BOOKMAKER, OVER_UNDER, LINE, or * |
| sport_type | VARCHAR(30) | NOT NULL, DEFAULT '*' | CRICKET, FOOTBALL, TENNIS, KABADDI, or * |
| event_phase | VARCHAR(30) | NOT NULL, DEFAULT '*' | PRE_MATCH, IN_PLAY, APPROACHING_START, or * |
| source_type | VARCHAR(30) | NOT NULL, DEFAULT '*' | NORMAL, SHARP, VIP, NEW_ACCOUNT, or * |
| liquidity_band | VARCHAR(30) | NOT NULL, DEFAULT '*' | HIGH, MEDIUM, LOW, NONE, or * |
| forward_percentage | DECIMAL(5,2) | NOT NULL, CHECK (0 <= forward_percentage <= 100) | Percentage to forward to upline |
| specificity | INTEGER | NOT NULL, GENERATED ALWAYS AS (computed) | Count of non-wildcard dimensions (0-5). Stored for fast sorting |
| priority | INTEGER | NOT NULL, DEFAULT 0 | Tie-breaker when specificity and forward_percentage are equal |
| is_active | BOOLEAN | NOT NULL, DEFAULT true | Soft delete / disable |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | Used for deterministic ordering tie-break |
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
idx_fmr_agent_versionon (agent_id, version)idx_fmr_agent_activeon (agent_id, is_active) WHERE is_active = trueidx_fmr_lookupon (agent_id, market_type, sport_type, event_phase, source_type, liquidity_band) WHERE is_active = true
Note: The specificity column is computed as the count of dimensions that are NOT ''. For example, a rule with market_type=FANCY, sport_type=CRICKET, event_phase=IN_PLAY, source_type=, liquidity_band=* has specificity 3.
Table: users
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| external_id | VARCHAR(100) | UNIQUE, NOT NULL | User ID from the main platform |
| agent_id | UUID | REFERENCES agents(id), NOT NULL | The agent this user belongs to |
| name | VARCHAR(255) | NOT NULL | |
| per_click_win_limit | BIGINT | NOT NULL, DEFAULT 5000000 | In paisa. Default ₹50,000 |
| aggregate_win_limit_daily | BIGINT | NOT NULL, DEFAULT 20000000 | In paisa. Default ₹2,00,000 |
| min_stake | BIGINT | NOT NULL, DEFAULT 10000 | In paisa. Default ₹100 |
| self_exclusion_until | TIMESTAMPTZ | NULLABLE | NULL if not excluded |
| session_time_limit_minutes | INTEGER | NOT NULL, DEFAULT 240 | Default 4 hours |
| deposit_limit_daily | BIGINT | NULLABLE | In paisa |
| deposit_limit_weekly | BIGINT | NULLABLE | In paisa |
| deposit_limit_monthly | BIGINT | NULLABLE | In paisa |
| status | VARCHAR(20) | NOT NULL, DEFAULT 'ACTIVE' | ACTIVE, SUSPENDED, SELF_EXCLUDED |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
idx_users_agenton (agent_id)idx_users_externalon (external_id) -- UNIQUEidx_users_statuson (status)
Table: user_overrides
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| user_id | UUID | REFERENCES users(id), NOT NULL | |
| agent_id | UUID | REFERENCES agents(id), NOT NULL | The agent applying this override |
| forward_percentage | DECIMAL(5,2) | NOT NULL | Override forward % for this user at this agent |
| reason | TEXT | NOT NULL | Why the override was set (e.g., "known sharp user") |
| created_by | UUID | NOT NULL | Admin or agent who set the override |
| is_active | BOOLEAN | NOT NULL, DEFAULT true | |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |
| expires_at | TIMESTAMPTZ | NULLABLE | Optional expiry |
Indexes:
- UNIQUE on (user_id, agent_id) WHERE is_active = true
idx_user_overrides_agenton (agent_id)
Table: user_classifications
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| user_id | UUID | REFERENCES users(id), NOT NULL | |
| agent_id | UUID | REFERENCES agents(id), NOT NULL | The agent making this classification |
| classification | VARCHAR(30) | NOT NULL | NORMAL, SHARP, VIP, NEW_ACCOUNT |
| confidence_score | DECIMAL(5,4) | NULLABLE | 0.0000 to 1.0000 for ML-based classifications |
| reason | TEXT | NULLABLE | Why classified (manual note or detection signal) |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
- UNIQUE on (user_id, agent_id)
idx_user_class_agenton (agent_id, classification)
Table: agent_trust_config
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| agent_id | UUID | REFERENCES agents(id), NOT NULL | The upline agent |
| sub_agent_id | UUID | REFERENCES agents(id), NOT NULL | The downstream agent |
| trust_downstream_flags | BOOLEAN | NOT NULL, DEFAULT false | Whether to trust the sub-agent's user classifications |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
- UNIQUE on (agent_id, sub_agent_id)
Table: market_overrides
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| agent_id | UUID | REFERENCES agents(id), NOT NULL | |
| event_id | VARCHAR(100) | NOT NULL | The specific event/market |
| forward_percentage | DECIMAL(5,2) | NOT NULL | Override forward % for this event |
| reason | TEXT | NOT NULL | |
| created_by | UUID | NOT NULL | |
| is_active | BOOLEAN | NOT NULL, DEFAULT true | |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |
| expires_at | TIMESTAMPTZ | NULLABLE |
Indexes:
- UNIQUE on (agent_id, event_id) WHERE is_active = true
Table: events
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | VARCHAR(100) | PRIMARY KEY | External event ID from odds provider |
| sport_type | VARCHAR(30) | NOT NULL | |
| name | VARCHAR(500) | NOT NULL | "MI vs CSK, IPL 2026" |
| start_time | TIMESTAMPTZ | NOT NULL | |
| status | VARCHAR(20) | NOT NULL, DEFAULT 'UPCOMING' | UPCOMING, LIVE, SUSPENDED, SETTLED, VOID |
| result | JSONB | NULLABLE | Settlement result data |
| settled_at | TIMESTAMPTZ | NULLABLE | |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
idx_events_sport_statuson (sport_type, status)idx_events_start_timeon (start_time)
Table: markets
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | VARCHAR(100) | PRIMARY KEY | External market ID |
| event_id | VARCHAR(100) | REFERENCES events(id), NOT NULL | |
| market_type | VARCHAR(30) | NOT NULL | MATCH_ODDS, FANCY, BOOKMAKER, etc. |
| name | VARCHAR(255) | NOT NULL | |
| status | VARCHAR(20) | NOT NULL, DEFAULT 'OPEN' | OPEN, SUSPENDED, CLOSED, SETTLED, VOID |
| settled_at | TIMESTAMPTZ | NULLABLE | |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
idx_markets_eventon (event_id)idx_markets_statuson (status)
Table: bets (partitioned by month on created_at)
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| user_id | UUID | NOT NULL | REFERENCES users(id) -- enforced at application level due to partitioning |
| agent_id | UUID | NOT NULL | The originating agent (Level 1) |
| event_id | VARCHAR(100) | NOT NULL | |
| market_id | VARCHAR(100) | NOT NULL | |
| selection | VARCHAR(255) | NOT NULL | What the punter bet on (e.g., "MI to win") |
| side | VARCHAR(10) | NOT NULL | BACK or LAY |
| requested_stake | BIGINT | NOT NULL | Original stake in paisa |
| accepted_stake | BIGINT | NOT NULL | After stake reduction, in paisa |
| odds | DECIMAL(10,4) | NOT NULL | Decimal odds |
| potential_win | BIGINT | NOT NULL | In paisa |
| liability | BIGINT | NOT NULL | In paisa |
| stake_reduction_reason | VARCHAR(50) | NULLABLE | PER_CLICK_LIMIT, AGGREGATE_LIMIT, null |
| market_type | VARCHAR(30) | NOT NULL | |
| sport_type | VARCHAR(30) | NOT NULL | |
| event_phase | VARCHAR(30) | NOT NULL | PRE_MATCH, IN_PLAY |
| source_type | VARCHAR(30) | NOT NULL | NORMAL, SHARP, VIP, etc. (as resolved at originating agent) |
| liquidity_band | VARCHAR(30) | NOT NULL | HIGH, MEDIUM, LOW, NONE |
| routing_engine | VARCHAR(10) | NOT NULL, DEFAULT 'v2' | v1 (legacy flat) or v2 (cascade) |
| routing |
I'll continue from where I was cut off in the database schema design, picking up from the bets table.
Table: bets (continued)
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| user_id | UUID | NOT NULL | The punter who placed the bet |
| agent_id | UUID | NOT NULL | The originating agent (Level 1) |
| event_id | VARCHAR(100) | NOT NULL | |
| market_id | VARCHAR(100) | NOT NULL | |
| selection | VARCHAR(255) | NOT NULL | What the punter bet on (e.g., "MI to win") |
| side | VARCHAR(10) | NOT NULL | BACK or LAY |
| requested_stake | BIGINT | NOT NULL | Original stake in paisa |
| accepted_stake | BIGINT | NOT NULL | After stake reduction, in paisa |
| odds | DECIMAL(10,4) | NOT NULL | Decimal odds |
| potential_win | BIGINT | NOT NULL | In paisa |
| liability | BIGINT | NOT NULL | In paisa |
| stake_reduction_reason | VARCHAR(50) | NULLABLE | PER_CLICK_LIMIT, AGGREGATE_LIMIT, null |
| market_type | VARCHAR(30) | NOT NULL | |
| sport_type | VARCHAR(30) | NOT NULL | |
| event_phase | VARCHAR(30) | NOT NULL | PRE_MATCH, IN_PLAY |
| source_type | VARCHAR(30) | NOT NULL | As resolved at originating agent |
| liquidity_band | VARCHAR(30) | NOT NULL | HIGH, MEDIUM, LOW, NONE |
| routing_engine | VARCHAR(10) | NOT NULL, DEFAULT 'v2' | v1 (legacy) or v2 (cascade) |
| routing_status | VARCHAR(20) | NOT NULL, DEFAULT 'COMPLETE' | COMPLETE, PARTIAL, FAILED |
| matrix_version_snapshot | INTEGER | NOT NULL | The matrix version used at time of routing |
| status | VARCHAR(20) | NOT NULL, DEFAULT 'OPEN' | OPEN, SETTLED, VOID, CANCELLED |
| settled_at | TIMESTAMPTZ | NULLABLE | |
| period_context | VARCHAR(20) | NOT NULL | NIGHT, DAY, or specific period identifier |
| total_processing_time_ms | INTEGER | NOT NULL | End-to-end latency |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | Partition key |
Indexes:
idx_bets_user_timeon (user_id, created_at DESC)idx_bets_agent_timeon (agent_id, created_at DESC)idx_bets_eventon (event_id)idx_bets_marketon (market_id)idx_bets_statuson (status) WHERE status = 'OPEN'idx_bets_routing_statuson (routing_status) WHERE routing_status != 'COMPLETE'
Table: positions (partitioned by month on created_at)
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| bet_id | UUID | NOT NULL | The originating bet |
| agent_id | UUID | NOT NULL | The agent holding this position |
| cascade_level | INTEGER | NOT NULL | 1 = first agent, 2 = upline, etc. |
| position_type | VARCHAR(20) | NOT NULL | RETAINED or FORWARDED |
| stake | BIGINT | NOT NULL | Stake portion in paisa |
| liability | BIGINT | NOT NULL | Liability portion in paisa |
| potential_win | BIGINT | NOT NULL | Potential win portion in paisa |
| forward_percentage_used | DECIMAL(5,2) | NOT NULL | The forward % that produced this split |
| forward_source | VARCHAR(30) | NOT NULL | USER_OVERRIDE, MARKET_OVERRIDE, MATRIX_RULE, AGENT_DEFAULT |
| matrix_rule_id | UUID | NULLABLE | The specific matrix rule that matched (if MATRIX_RULE) |
| overflow_amount | BIGINT | NOT NULL, DEFAULT 0 | How much of this was overflow from limit breach |
| event_id | VARCHAR(100) | NOT NULL | Denormalized for fast queries |
| market_id | VARCHAR(100) | NOT NULL | Denormalized |
| sport_type | VARCHAR(30) | NOT NULL | Denormalized |
| selection | VARCHAR(255) | NOT NULL | Denormalized |
| side | VARCHAR(10) | NOT NULL | Denormalized |
| odds | DECIMAL(10,4) | NOT NULL | Denormalized |
| status | VARCHAR(20) | NOT NULL, DEFAULT 'OPEN' | OPEN, SETTLED, VOID |
| settlement_id | UUID | NULLABLE | Link to settlement record |
| settled_amount | BIGINT | NULLABLE | Actual P&L in paisa (positive = agent profit, negative = agent loss) |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | Partition key |
Indexes:
idx_positions_beton (bet_id)idx_positions_agent_statuson (agent_id, status) WHERE status = 'OPEN'idx_positions_agent_eventon (agent_id, event_id) WHERE status = 'OPEN'idx_positions_agent_sporton (agent_id, sport_type) WHERE status = 'OPEN'idx_positions_event_statuson (event_id, status)idx_positions_market_statuson (market_id, status)
Table: exposure_ledger
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| agent_id | UUID | NOT NULL | |
| scope_type | VARCHAR(30) | NOT NULL | SPORT, MARKET, NIGHT_PERIOD, WEEKLY_PERIOD |
| scope_key | VARCHAR(200) | NOT NULL | e.g., "cricket", "mi_vs_csk_2026_03_15", "night_2026_03_15", "week_2026_11" |
| shard_index | INTEGER | NOT NULL, DEFAULT 0 | 0 to N-1 for sharded counters |
| retained_open_liability | BIGINT | NOT NULL, DEFAULT 0 | In paisa |
| forwarded_open_liability | BIGINT | NOT NULL, DEFAULT 0 | In paisa |
| open_potential_win | BIGINT | NOT NULL, DEFAULT 0 | In paisa |
| no_new_risk_active | BOOLEAN | NOT NULL, DEFAULT false | Whether this scope is in NO_NEW_RISK |
| no_new_risk_triggered_at | TIMESTAMPTZ | NULLABLE | When NO_NEW_RISK was activated |
| last_updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
- UNIQUE on (agent_id, scope_type, scope_key, shard_index)
idx_exposure_agent_scopeon (agent_id, scope_type, scope_key)idx_exposure_no_new_riskon (no_new_risk_active) WHERE no_new_risk_active = true
Table: settlements (partitioned by month on created_at)
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| event_id | VARCHAR(100) | NOT NULL | |
| market_id | VARCHAR(100) | NOT NULL | |
| position_id | UUID | NOT NULL | The position being settled |
| agent_id | UUID | NOT NULL | |
| bet_id | UUID | NOT NULL | |
| settlement_type | VARCHAR(20) | NOT NULL | WIN, LOSS, VOID, PUSH |
| stake | BIGINT | NOT NULL | The position's stake |
| payout | BIGINT | NOT NULL | Amount paid to/from agent. Positive = agent pays punter. Negative = agent receives. |
| profit_loss | BIGINT | NOT NULL | Agent P&L. Positive = profit, negative = loss |
| idempotency_key | VARCHAR(200) | UNIQUE, NOT NULL | Prevents double settlement: {position_id}_{event_result_hash} |
| status | VARCHAR(20) | NOT NULL, DEFAULT 'COMPLETED' | COMPLETED, REVERSED, RE_SETTLED |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
idx_settlements_eventon (event_id)idx_settlements_agent_timeon (agent_id, created_at DESC)idx_settlements_beton (bet_id)idx_settlements_positionon (position_id)idx_settlements_idempotencyon (idempotency_key) -- UNIQUE
Table: audit_trail (partitioned by month on created_at, separate schema)
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| bet_id | UUID | NOT NULL | |
| record_type | VARCHAR(30) | NOT NULL | BET_PLACED, BET_SETTLED, POSITION_CORRECTED, CONFIG_CHANGED, RECOMPUTE |
| agent_id | UUID | NOT NULL | Primary agent for this record |
| user_id | UUID | NULLABLE | |
| event_id | VARCHAR(100) | NULLABLE | |
| payload | JSONB | NOT NULL | Full structured audit data (forwarding chain, limit checks, etc.) |
| checksum | VARCHAR(64) | NOT NULL | SHA-256 of payload |
| previous_checksum | VARCHAR(64) | NULLABLE | Checksum of previous record for this bet_id (chain) |
| processing_time_ms | INTEGER | NULLABLE | |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | Partition key |
Indexes:
idx_audit_beton (bet_id)idx_audit_agent_timeon (agent_id, created_at DESC)idx_audit_user_timeon (user_id, created_at DESC) WHERE user_id IS NOT NULLidx_audit_eventon (event_id) WHERE event_id IS NOT NULLidx_audit_typeon (record_type, created_at DESC)
Table: dead_letter_queue
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| source | VARCHAR(50) | NOT NULL | BET_PROCESSING, SETTLEMENT, HEDGE, RECONCILIATION |
| reference_id | UUID | NOT NULL | The bet_id, settlement_id, etc. that failed |
| error_message | TEXT | NOT NULL | |
| error_stack | TEXT | NULLABLE | |
| payload | JSONB | NOT NULL | Full context for retry |
| retry_count | INTEGER | NOT NULL, DEFAULT 0 | |
| max_retries | INTEGER | NOT NULL, DEFAULT 3 | |
| status | VARCHAR(20) | NOT NULL, DEFAULT 'PENDING' | PENDING, RETRYING, RESOLVED, ESCALATED |
| resolved_by | UUID | NULLABLE | Admin who resolved |
| resolved_at | TIMESTAMPTZ | NULLABLE | |
| resolution_notes | TEXT | NULLABLE | |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
idx_dlq_statuson (status) WHERE status IN ('PENDING', 'RETRYING')idx_dlq_sourceon (source, created_at DESC)idx_dlq_referenceon (reference_id)
Table: hedge_orders
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| bet_id | UUID | NOT NULL | Originating bet |
| event_id | VARCHAR(100) | NOT NULL | |
| market_id | VARCHAR(100) | NOT NULL | |
| selection | VARCHAR(255) | NOT NULL | |
| side | VARCHAR(10) | NOT NULL | BACK or LAY |
| target_price | DECIMAL(10,4) | NOT NULL | Price the punter received |
| limit_price | DECIMAL(10,4) | NOT NULL | Worst acceptable price |
| requested_amount | BIGINT | NOT NULL | In paisa |
| filled_amount | BIGINT | NOT NULL, DEFAULT 0 | In paisa |
| unfilled_amount | BIGINT | GENERATED ALWAYS AS (requested_amount - filled_amount) STORED | |
| average_fill_price | DECIMAL(10,4) | NULLABLE | Weighted average of all fills |
| slippage | DECIMAL(10,4) | NULLABLE | average_fill_price - target_price |
| betfair_bet_id | VARCHAR(100) | NULLABLE | Betfair's order reference |
| status | VARCHAR(20) | NOT NULL, DEFAULT 'QUEUED' | QUEUED, PENDING, PARTIALLY_FILLED, FILLED, CANCELLED, STALE_CANCELLED, FAILED |
| reprice_count | INTEGER | NOT NULL, DEFAULT 0 | |
| max_reprice_attempts | INTEGER | NOT NULL, DEFAULT 3 | |
| time_in_force_seconds | INTEGER | NOT NULL | |
| error_message | TEXT | NULLABLE | |
| queued_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |
| sent_at | TIMESTAMPTZ | NULLABLE | When sent to Betfair |
| filled_at | TIMESTAMPTZ | NULLABLE | When fully filled |
| cancelled_at | TIMESTAMPTZ | NULLABLE | |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
idx_hedge_beton (bet_id)idx_hedge_statuson (status) WHERE status IN ('QUEUED', 'PENDING', 'PARTIALLY_FILLED')idx_hedge_eventon (event_id)idx_hedge_betfairon (betfair_bet_id) WHERE betfair_bet_id IS NOT NULL
Table: agent_hierarchy_history
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| agent_id | UUID | NOT NULL | |
| previous_parent_id | UUID | NULLABLE | |
| new_parent_id | UUID | NULLABLE | |
| change_type | VARCHAR(30) | NOT NULL | CREATED, PARENT_CHANGED, SUSPENDED, REACTIVATED, DEACTIVATED |
| changed_by | UUID | NOT NULL | Admin who made the change |
| reason | TEXT | NULLABLE | |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
idx_ahh_agenton (agent_id, created_at DESC)
Table: reconciliation_results
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| reconciliation_type | VARCHAR(30) | NOT NULL | SCHEDULED_15MIN, POST_SETTLEMENT, FULL, TARGETED |
| started_at | TIMESTAMPTZ | NOT NULL | |
| completed_at | TIMESTAMPTZ | NULLABLE | |
| agents_checked | INTEGER | NOT NULL, DEFAULT 0 | |
| discrepancies_found | INTEGER | NOT NULL, DEFAULT 0 | |
| status | VARCHAR(20) | NOT NULL | RUNNING, COMPLETED, FAILED |
| summary | JSONB | NULLABLE | |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Table: reconciliation_discrepancies
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| reconciliation_id | UUID | REFERENCES reconciliation_results(id), NOT NULL | |
| agent_id | UUID | NOT NULL | |
| scope_type | VARCHAR(30) | NOT NULL | |
| scope_key | VARCHAR(200) | NOT NULL | |
| ledger_value | BIGINT | NOT NULL | In paisa |
| computed_value | BIGINT | NOT NULL | In paisa |
| drift_amount | BIGINT | NOT NULL | In paisa |
| drift_direction | VARCHAR(15) | NOT NULL | LEDGER_HIGH, LEDGER_LOW |
| category | VARCHAR(10) | NOT NULL | MINOR, MAJOR, CRITICAL |
| resolution_status | VARCHAR(20) | NOT NULL, DEFAULT 'OPEN' | OPEN, INVESTIGATING, RESOLVED, AUTO_CORRECTED |
| root_cause | TEXT | NULLABLE | |
| resolved_by | UUID | NULLABLE | |
| resolved_at | TIMESTAMPTZ | NULLABLE | |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
idx_recon_disc_statuson (resolution_status) WHERE resolution_status IN ('OPEN', 'INVESTIGATING')idx_recon_disc_agenton (agent_id, created_at DESC)
Table: alerts
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| alert_type | VARCHAR(50) | NOT NULL | e.g., NO_NEW_RISK_ACTIVATED, LIMIT_APPROACHING, BETFAIR_DEGRADED |
| severity | VARCHAR(5) | NOT NULL | P1, P2, P3, P4 |
| agent_id | UUID | NULLABLE | |
| title | VARCHAR(255) | NOT NULL | |
| description | TEXT | NOT NULL | |
| metadata | JSONB | NULLABLE | Additional context |
| status | VARCHAR(20) | NOT NULL, DEFAULT 'ACTIVE' | ACTIVE, ACKNOWLEDGED, RESOLVED, SILENCED |
| acknowledged_by | UUID | NULLABLE | |
| acknowledged_at | TIMESTAMPTZ | NULLABLE | |
| resolved_at | TIMESTAMPTZ | NULLABLE | |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
idx_alerts_statuson (status) WHERE status = 'ACTIVE'idx_alerts_agenton (agent_id, created_at DESC)idx_alerts_severityon (severity, created_at DESC)
Table: config_changelog
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| entity_type | VARCHAR(30) | NOT NULL | MATRIX_RULE, AGENT_LIMIT, USER_OVERRIDE, MARKET_OVERRIDE, AGENT_CONFIG |
| entity_id | UUID | NOT NULL | |
| agent_id | UUID | NOT NULL | |
| change_type | VARCHAR(20) | NOT NULL | CREATED, UPDATED, DELETED |
| old_value | JSONB | NULLABLE | Previous state |
| new_value | JSONB | NOT NULL | New state |
| changed_by | UUID | NOT NULL | |
| reason | TEXT | NULLABLE | |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
idx_config_changelog_entityon (entity_type, entity_id, created_at DESC)idx_config_changelog_agenton (agent_id, created_at DESC)
Table: feature_flags
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PRIMARY KEY | |
| flag_name | VARCHAR(100) | NOT NULL | e.g., bbook.cascading_routing.enabled |
| scope | VARCHAR(20) | NOT NULL | GLOBAL, PER_AGENT |
| agent_id | UUID | NULLABLE | NULL for GLOBAL flags |
| is_enabled | BOOLEAN | NOT NULL, DEFAULT false | |
| created_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() | |
| updated_at | TIMESTAMPTZ | NOT NULL, DEFAULT NOW() |
Indexes:
- UNIQUE on (flag_name, agent_id)
idx_ff_flagon (flag_name)
50. API Design
Bet Placement APIs
| Method | Path | Description | Auth |
|---|---|---|---|
| POST | /api/v1/bets | Place a new bet | User JWT |
| GET | /api/v1/bets/:betId | Get bet details with full routing | User JWT or Agent JWT |
| GET | /api/v1/bets | List bets (with filters) | Agent JWT |
| POST | /api/v1/bets/:betId/void | Void an open bet | Admin JWT |
| POST | /api/v1/bets/simulate | Dry-run a bet (no money, full routing) | Agent JWT |
POST /api/v1/bets -- Place a new bet:
Request body:
{
"user_id": "uuid",
"event_id": "string",
"market_id": "string",
"selection": "MI to win",
"side": "BACK",
"stake": 1000000, // in paisa (₹10,000)
"odds": 1.85,
"market_type": "MATCH_ODDS",
"sport_type": "CRICKET",
"event_phase": "PRE_MATCH",
"liquidity_band": "HIGH"
}
Response body (success):
{
"bet_id": "uuid",
"status": "ACCEPTED",
"accepted_stake": 1000000,
"stake_reduced": false,
"potential_win": 850000,
"message": null
}
Response body (stake reduced):
{
"bet_id": "uuid",
"status": "ACCEPTED_REDUCED",
"accepted_stake": 588200,
"original_stake": 1000000,
"stake_reduced": true,
"potential_win": 500000,
"message": "Maximum stake at these odds: ₹5,882"
}
Response body (rejected):
{
"bet_id": null,
"status": "REJECTED",
"reason": "SELF_EXCLUDED" | "SESSION_EXPIRED" | "MARKET_SUSPENDED" | "BELOW_MINIMUM",
"message": "This market is currently unavailable."
}
Agent Configuration APIs
| Method | Path | Description | Auth |
|---|---|---|---|
| GET | /api/v1/agents/:agentId | Get agent profile and config | Agent JWT |
| PATCH | /api/v1/agents/:agentId | Update agent settings | Agent JWT |
| GET | /api/v1/agents/:agentId/limits | Get all agent limits | Agent JWT |
| PUT | /api/v1/agents/:agentId/limits | Set/update agent limits | Agent JWT |
| GET | /api/v1/agents/:agentId/matrix | Get forwarding matrix rules | Agent JWT |
| POST | /api/v1/agents/:agentId/matrix/rules | Add a matrix rule | Agent JWT |
| PUT | /api/v1/agents/:agentId/matrix/rules/:ruleId | Update a matrix rule | Agent JWT |
| DELETE | /api/v1/agents/:agentId/matrix/rules/:ruleId | Delete a matrix rule | Agent JWT |
| POST | /api/v1/agents/:agentId/matrix/test | Test a bet against the matrix | Agent JWT |
| GET | /api/v1/agents/:agentId/exposure | Get current exposure summary | Agent JWT |
| GET | /api/v1/agents/:agentId/exposure/:scope | Get exposure for a specific scope | Agent JWT |
| POST | /api/v1/agents/:agentId/panic | Trigger panic mode (hedge all) | Agent JWT |
| GET | /api/v1/agents/:agentId/sub-agents | List sub-agents | Agent JWT |
| GET | /api/v1/agents/:agentId/trust-config | Get trust settings for sub-agents | Agent JWT |
| PUT | /api/v1/agents/:agentId/trust-config/:subAgentId | Update trust for a sub-agent | Agent JWT |
POST /api/v1/agents/:agentId/matrix/rules -- Add a matrix rule:
Request body:
{
"market_type": "FANCY",
"sport_type": "CRICKET",
"event_phase": "IN_PLAY",
"source_type": "*",
"liquidity_band": "*",
"forward_percentage": 70.00
}
Response body:
{
"rule_id": "uuid",
"version": 48,
"specificity": 3,
"conflicts": [],
"effective_immediately": true
}
User Management APIs
| Method | Path | Description | Auth |
|---|---|---|---|
| GET | /api/v1/users/:userId | Get user profile | Agent JWT |
| PATCH | /api/v1/users/:userId | Update user settings (limits, etc.) | Agent JWT |
| POST | /api/v1/users/:userId/override | Set user forward % override | Agent JWT |
| DELETE | /api/v1/users/:userId/override | Remove user override | Agent JWT |
| POST | /api/v1/users/:userId/classify | Set user classification | Agent JWT |
| GET | /api/v1/users/:userId/bets | Get user bet history | Agent JWT |
| POST | /api/v1/users/:userId/self-exclude | Self-exclude user | User JWT |
| GET | /api/v1/users/:userId/session | Get session info | User JWT |
Settlement APIs
| Method | Path | Description | Auth |
|---|---|---|---|
| POST | /api/v1/settlements/events/:eventId | Trigger settlement for an event | System / Admin JWT |
| GET | /api/v1/settlements/events/:eventId | Get settlement status for an event | Agent JWT |
| POST | /api/v1/settlements/events/:eventId/reverse | Reverse a settlement (for corrections) | Admin JWT |
| POST | /api/v1/settlements/events/:eventId/resettle | Re-settle with corrected results | Admin JWT |
| GET | /api/v1/settlements/agents/:agentId | Get agent settlement history | Agent JWT |
| GET | /api/v1/settlements/agents/:agentId/weekly | Get weekly settlement summary | Agent JWT |
POST /api/v1/settlements/events/:eventId:
Request body:
{
"result": {
"winner": "MI",
"market_results": {
"match_odds": { "winning_selection": "MI to win" },
"fancy_180_runs": { "actual_value": 187, "line": 180 }
}
},
"result_source": "OFFICIAL",
"confirmed_by": "admin_uuid"
}
Admin APIs
| Method | Path | Description | Auth |
|---|---|---|---|
| POST | /api/v1/admin/agents | Create a new agent | Admin JWT |
| POST | /api/v1/admin/agents/:agentId/suspend | Suspend an agent | Admin JWT |
| POST | /api/v1/admin/agents/:agentId/reactivate | Reactivate an agent | Admin JWT |
| POST | /api/v1/admin/agents/:agentId/transfer | Transfer agent to new parent | Admin JWT |
| GET | /api/v1/admin/dead-letter-queue | View DLQ entries | Admin JWT |
| POST | /api/v1/admin/dead-letter-queue/:id/retry | Retry a DLQ entry | Admin JWT |
| POST | /api/v1/admin/dead-letter-queue/:id/resolve | Manually resolve a DLQ entry | Admin JWT |
| POST | /api/v1/admin/reconciliation/run | Trigger manual reconciliation | Admin JWT |
| POST | /api/v1/admin/reconciliation/recompute | Recompute an agent's ledger | Admin JWT |
| GET | /api/v1/admin/feature-flags | List all feature flags | Admin JWT |
| PUT | /api/v1/admin/feature-flags/:flagName | Toggle a feature flag | Admin JWT |
| POST | /api/v1/admin/cache/flush | Flush caches for an agent | Admin JWT |
Monitoring APIs
| Method | Path | Description | Auth |
|---|---|---|---|
| GET | /api/v1/monitoring/health | System health check | Public |
| GET | /api/v1/monitoring/metrics | Prometheus metrics endpoint | Internal |
| GET | /api/v1/monitoring/alerts | List active alerts | Admin JWT |
| POST | /api/v1/monitoring/alerts/:id/acknowledge | Acknowledge an alert | Admin JWT |
| GET | /api/v1/monitoring/dashboard/overview | Ops dashboard data | Admin JWT |
| GET | /api/v1/monitoring/dashboard/agent/:agentId | Agent-specific dashboard data | Agent JWT |
| GET | /api/v1/monitoring/hedge-status | Hedge execution status | Admin JWT |
Dispute/Support APIs
| Method | Path | Description | Auth |
|---|---|---|---|
| GET | /api/v1/support/bets/:betId/audit | Full audit trail for a bet | Support JWT |
| POST | /api/v1/support/bets/:betId/resimulate | Re-simulate bet routing | Support JWT |
| GET | /api/v1/support/agents/:agentId/positions | Open positions for an agent | Support JWT |
| POST | /api/v1/support/disputes | Create a dispute | Agent JWT |
| GET | /api/v1/support/disputes/:disputeId | Get dispute details | Support JWT |
| PATCH | /api/v1/support/disputes/:disputeId | Update dispute status | Support JWT |
51. Service Architecture
Module Breakdown
| Module | Responsibility | Dependencies |
|---|---|---|
| BetProcessingModule | Orchestrates the entire bet placement pipeline. Entry point for all bets. | MatrixResolutionModule, CascadeEngineModule, LimitEnforcementModule, AuditModule, ResponsibleGamblingModule |
| MatrixResolutionModule | Resolves forwarding percentage from the precedence chain (user override > market override > matrix > default) | ConfigModule (for cached rules) |
| CascadeEngineModule | Routes a bet through the full agent hierarchy. Iterates level by level, calling MatrixResolution and LimitEnforcement at each level | MatrixResolutionModule, LimitEnforcementModule |
| LimitEnforcementModule | Checks all agent limits, determines max retainable amount, triggers NO_NEW_RISK | ExposureLedgerModule |
| ExposureLedgerModule | Manages the 3-tier exposure counters. Reads from Redis, writes to PostgreSQL, handles sharding | Redis, PostgreSQL |
| HedgeExecutionModule | Consumes hedge order queue, places orders on Betfair, manages partial fills and retries | Betfair API client, Redis Stream |
| SettlementModule | Processes event results, settles all positions, decrements exposure ledgers | ExposureLedgerModule, AuditModule |
| AuditModule | Creates structured audit records, manages checksum chains, handles tier migration | PostgreSQL (audit schema) |
| ReconciliationModule | Runs scheduled and on-demand reconciliation, detects and categorizes discrepancies | ExposureLedgerModule, PostgreSQL |
| AgentManagementModule | CRUD for agents, hierarchy management, trust configuration, preset profiles | PostgreSQL |
| UserManagementModule | CRUD for users, win limit management, classification, self-exclusion | PostgreSQL, Redis |
| ConfigModule | Manages all configuration with caching, versioning, and pub/sub invalidation | PostgreSQL, Redis (cache + pub/sub) |
| NotificationModule | Sends alerts via push, SMS, WhatsApp. Manages escalation | SMS gateway, WhatsApp API, Socket.IO |
| ResponsibleGamblingModule | Self-exclusion checks, session limits, reality checks, deposit limit integration | Redis (fast flag checks) |
| MonitoringModule | Prometheus metrics collection, health checks, alert generation | prom-client |
52. The Bet Processing Pipeline (Step by Step)
This is the heart of the system. Every step is documented with what happens, what can go wrong, and how errors are handled.
Step 1: Request Received (Budget: 5ms)
What happens: HTTP POST arrives at /api/v1/bets. Express middleware parses the JSON body. Zod schema validates the shape and types of all fields (user_id is UUID, stake is positive integer, odds is positive decimal, etc.).
What can go wrong:
- Malformed JSON: Return 400 with parsing error
- Missing required fields: Return 400 with validation errors listing every missing field
- Invalid types (string for stake, negative odds): Return 400 with type errors
Error handling: Validation errors are returned immediately. No audit record is created because no bet processing was attempted. The request counter metric is incremented for monitoring.
Step 2: Responsible Gambling Checks (Budget: 2ms)
What happens: Three sub-checks in sequence:
2a. Self-exclusion check: Read self_exclusion:{user_id} flag from Redis. If present and not expired, reject immediately.
2b. Session time check: Read session:{user_id} from Redis. If session duration exceeds the user's configured limit, reject with session expired message.
2c. Reality check trigger: Check if a reality check notification is pending (time since last acknowledgment exceeds the configured interval). If so, the API returns a special status requiring the client to show the reality check popup and re-submit with an acknowledgment token.
What can go wrong:
- Redis unavailable: Fall back to PostgreSQL for self-exclusion check (add ~5ms). Session checks are skipped (fail-open for session limits, fail-closed for self-exclusion).
Error handling: Self-exclusion is fail-CLOSED (if we cannot check, reject the bet -- protecting the user is paramount). Session limits are fail-OPEN (if we cannot check, allow the bet -- a few extra minutes of play is acceptable).
Step 3: Timestamp Assignment (Budget: 0ms)
What happens: The system assigns a monotonic processing timestamp using Date.now(). This timestamp is used for:
- Determining which period the bet falls in (night vs day)
- Ordering concurrent bets deterministically
- Audit trail timing
What can go wrong: Nothing. This is a local operation.
Step 4: Compute Metrics (Budget: 3ms)
What happens:
potential_win = stake * (odds - 1)
liability = potential_win (for a back bet from the bookie's perspective)
All calculations use integer arithmetic in paisa (the smallest currency unit) to avoid floating-point errors. The odds value is the only decimal in the calculation; it is multiplied by the integer stake and the result is floored to the nearest paisa.
What can go wrong:
- Overflow: If stake * odds exceeds Number.MAX_SAFE_INTEGER in paisa. For context, MAX_SAFE_INTEGER in paisa = ₹90,071 crore. No single bet will ever approach this. Validation rejects stakes above ₹1 crore as a safety measure.
Step 5: User Win Cap Check (Budget: 5ms)
What happens:
5a. Per-click win cap: Compare potential_win against the user's per_click_win_limit. If exceeded, compute the maximum allowable stake:
max_stake = floor(per_click_win_limit / (odds - 1))
5b. Aggregate win cap: Read the user's accumulated potential wins for the current period from Redis key user_agg_win:{user_id}:{period}. If adding this bet's potential_win exceeds the daily aggregate limit, compute the remaining allowable win:
remaining_win = aggregate_limit - current_accumulated_wins
max_stake_from_aggregate = floor(remaining_win / (odds - 1))
5c. Take the minimum of the per-click max stake and the aggregate max stake. If this is less than the original stake, the stake is reduced.
What can go wrong:
- Redis unavailable for aggregate check: Fall back to PostgreSQL query (SUM of potential_win from today's bets for this user). Slower (~15ms) but correct.
- Concurrent aggregate updates: The Redis INCRBY is atomic, but two simultaneous bets could both read the same accumulated value before either increments it. The aggregate limit has a 10% buffer built in (actual limit checked is 110% of configured limit) to absorb this minor race. The PostgreSQL settlement path corrects any drift.
Error handling: If the reduced stake falls below the user's minimum stake, the bet is rejected with "This market is currently unavailable at these odds."
Step 6: Matrix Version Capture (Budget: 1ms)
What happens: For each agent in the cascade, the system captures the current matrix version number. This version is stored with the bet record to enable deterministic replay. The version is read from the cached matrix (Redis or LRU).
Why this matters: If an agent changes their matrix between the bet being placed and the bet being disputed, the replay must use the original matrix version to produce the same result.
Step 7: Forwarding Percentage Resolution (Budget: 10ms)
What happens: For the first agent in the cascade (the user's direct agent), resolve the forwarding percentage using the 4-level precedence chain:
7a. Check user override: Query user_overrides (cached in Redis) for this user + agent combination. If found and active, use the override's forward_percentage. Skip to Step 8.
7b. Check market override: Query market_overrides (cached in Redis) for this event + agent combination. If found and active, use the override's forward_percentage. Skip to Step 8.
7c. Matrix lookup: Load the agent's active matrix rules (cached in Redis). Filter to rules where every non-wildcard dimension matches the bet's characteristics. From the matching rules, select the one with the highest specificity. If tied, select the highest forward_percentage. If still tied, select the oldest rule (lowest created_at).
7d. Agent default: If no matrix rule matches (should not happen if a catch-all rule exists), use the agent's default_forward_percentage.
What can go wrong:
- Agent has no matrix rules AND no default: Configuration error. Log a P2 alert. Use a hard-coded safe default of 100% forward (forward everything, retain nothing -- the safest option for the agent).
- Matrix cache miss: Read from PostgreSQL. Add ~5ms.
Step 8: Cascade Routing -- Level by Level (Budget: 10ms per level)
What happens: The cascade engine iterates through the agent hierarchy, starting at the punter's direct agent and moving upward to the platform.
For each level:
8a. Determine source_type for this level: If the agent has their own classification for this user, use it. Otherwise, if trust_downstream_flags is true for the originating sub-agent, use the downstream classification. Otherwise, default to NORMAL.
8b. Resolve forward percentage for this level (same precedence chain as Step 7, using this agent's matrix and the resolved source_type).
8c. Calculate retention: retained_stake = incoming_stake * (1 - forward_percentage / 100). Round down to nearest paisa.
8d. Check limits: Query all applicable limits for this agent (sport, market, night period, weekly period). For each limit, read the current exposure from the exposure ledger (Redis or PostgreSQL depending on utilization -- see Gap A). The most restrictive limit determines the maximum retainable amount.
8e. If all limits pass: Agent retains the calculated amount.
8f. If any limit would be breached: Agent retains only up to the most restrictive remaining capacity. The difference becomes overflow that is added to the forwarded amount.
8g. If agent is in NO_NEW_RISK for this scope: Check if this bet is a hedge (worst-case liability after > worst-case liability before). If hedge: retain as normal. If not hedge: retain nothing, forward 100%.
8h. If agent is suspended: Skip this level entirely. Forward 100% to the next level.
8i. Record the decision for this level (for the audit trail).
8j. Forward the remaining amount to the next level (parent agent). If the current agent is the platform, the remaining amount is queued for hedge execution.
What can go wrong:
- Parent agent not found: Configuration error (broken hierarchy). Log P1 alert. Forward to platform directly (skip the missing level).
- Database lock timeout on near-limit exposure check: Retry once (most lock waits resolve in <20ms). If retry fails, assume limit is breached and forward 100% from this level. This is the safe default -- the agent does not retain risk they cannot verify they can afford.
- Partial routing failure (Step 2 of a 3-level cascade fails): Mark the bet as
routing_status = PARTIAL. Queue for background retry. See Gap D for details.
Step 9: Exposure Ledger Updates (Budget: 10ms)
What happens: All exposure ledger changes for all agents in the cascade are written. For each agent:
- Increment
retained_open_liabilityby the retained liability amount - Increment
forwarded_open_liabilityby the forwarded liability amount - Increment
open_potential_winby the agent-level potential win
The sharded counter approach is used: pick a random shard for each agent and use UPDATE exposure_ledger SET retained_open_liability = retained_open_liability + $amount WHERE agent_id = $agent AND shard_index = $shard.
After the DB write, update Redis with the new total (read-then-write to Redis, or use the DB-committed value).
What can go wrong:
- DB write fails: This is within the position creation transaction. If it fails, the entire transaction rolls back. No positions are created, no ledger is updated. Return error to client.
- Redis update fails after DB commit: The Redis value becomes stale. The next bet that hits Redis will see a slightly outdated value. This is acceptable because the safety margin (Gap A) accounts for this.
Step 10: Position Creation (Budget: 15ms)
What happens: Within the same database transaction as the ledger update (or in the same-level transaction for hot agents using per-level atomicity):
Create one position record per agent per level:
- Level 1 (Rajesh): RETAINED position for ₹6,000 stake
- Level 1 (Rajesh): FORWARDED position for ₹4,000 stake (optional -- can be derived)
- Level 2 (Vikram): RETAINED position for ₹2,400 stake
- Level 2 (Vikram): FORWARDED position for ₹1,600 stake
- Level 3 (Platform): RETAINED position for ₹800 stake
- Level 3 (Platform): FORWARDED position for ₹800 stake (to Betfair)
What can go wrong:
- Unique constraint violation: Extremely unlikely (UUID collision). If it happens, retry with a new UUID.
- Transaction deadlock: If using cross-level atomicity and two bets lock agents in different orders. Prevented by always locking in hierarchy order (Level 1 first, then Level 2, etc.). If detected, the database will roll back one transaction automatically; retry.
Step 11: Hedge Queue Placement (Budget: 2ms)
What happens: If the platform's forwarded amount is greater than zero, a hedge order is published to the Redis Stream hedge_orders:
{
"bet_id": "uuid",
"event_id": "string",
"market_id": "string",
"selection": "MI to win",
"side": "BACK",
"target_price": 1.85,
"amount_paisa": 80000,
"sport_type": "CRICKET",
"event_phase": "PRE_MATCH"
}
This is a fire-and-forget publish. The hedge execution is asynchronous and does not block the bet response.
What can go wrong:
- Redis Stream unavailable: Write the hedge order to a
hedge_orders_fallbacktable in PostgreSQL. The hedge worker polls this table every 5 seconds as a fallback.
Step 12: Audit Trail Write (Budget: 10ms)
What happens: Construct the complete audit record (all fields described in Section 11 of the original document) and buffer it for async writing. The audit write is NOT synchronous with the bet response. It is added to an in-memory buffer that flushes every 500ms.
The audit payload includes:
- The complete forwarding chain (every level, every decision)
- Every limit check result
- The matrix rule that matched at each level
- The source_type resolution at each level
- The period context
- Processing timestamps per step
What can go wrong:
- Audit buffer flush fails: Retry 3 times. If all retries fail, write to a local file as a WAL. A recovery job reads this file on startup and writes to the audit store.
- The bet is accepted even if the audit write ultimately fails. Audit trail completeness is critical but not worth rejecting a bet over.
Step 13: Response to Punter (Budget: 5ms)
What happens: Return the HTTP response with bet confirmation. Update the user's session activity counter in Redis (for responsible gambling tracking). Emit a WebSocket event to the agent's dashboard with the new bet details.
What can go wrong:
- Client connection already closed (timeout): The bet is still processed. The client can query
/api/v1/bets/:betIdto confirm. - WebSocket delivery failure: Non-critical. The dashboard will pick up the new bet on its next polling cycle (2-5 seconds).
53. The Settlement Pipeline (Step by Step)
Step 1: Event Result Received
What happens: An external system (admin, odds provider, or automated feed) posts the event result to POST /api/v1/settlements/events/:eventId. The result specifies the outcome for each market within the event.
The settlement module validates the result format, verifies the event exists and is in a settleable state (LIVE or SUSPENDED, not already SETTLED), and enqueues a settlement job in BullMQ.
Step 2: Market Resolution
What happens: For each market in the event, determine which positions won and which lost. For MATCH_ODDS markets, this is straightforward (the winning selection wins, all others lose). For FANCY markets, the actual value is compared to the line.
| Market Type | Resolution Logic |
|---|---|
| MATCH_ODDS | Position on winning selection = WIN. All others = LOSS. |
| FANCY (over/under) | If actual value >= line: OVER wins, UNDER loses. Vice versa. |
| BOOKMAKER | Same as MATCH_ODDS |
| LINE | Based on handicap: actual result + handicap compared to line |
Step 3: Position Identification
What happens: Query all open positions for this event:
SELECT * FROM positions
WHERE event_id = :eventId AND status = 'OPEN'
ORDER BY bet_id, cascade_level
Group positions by bet_id, then by agent_id. Each position will be settled individually.
Step 4: Per-Position Settlement (Idempotent)
What happens: For each position:
4a. Generate the idempotency key: {position_id}_{hash(event_result)}
4b. Check if a settlement with this idempotency key already exists. If yes, skip (already settled).
4c. Determine if this position won or lost based on the market resolution.
4d. Calculate the settlement amount:
- WIN: agent pays
position.liabilityto the punter side - LOSS: agent receives
position.stake(the punter loses their stake)
4e. Create the settlement record.
4f. Update the position status to SETTLED.
What can go wrong:
- Idempotency collision on retry: By design, the duplicate is ignored. This makes settlement safe to retry.
- Position not found (deleted or corrupted): Log P1 alert. Add to DLQ for manual investigation.
Step 5: Exposure Ledger Decrement
What happens: For each settled position, decrement the agent's exposure ledger:
retained_open_liability -= position.liability(for RETAINED positions)forwarded_open_liability -= position.liability(for FORWARDED positions)open_potential_win -= position.potential_win
The decrement is within the same database transaction as the position status update.
After the DB commit, update Redis with the new exposure values.
What can go wrong:
- Ledger goes negative: Should never happen. If it does, log P1 alert (indicates a bug). Clamp to zero and flag for reconciliation.
- DB transaction failure: Retry 3 times with exponential backoff. If all retries fail, add to DLQ. The position remains OPEN until manually resolved.
Step 6: Balance Updates
What happens: The platform's internal accounting system is notified of the settlement result. Each agent's balance is credited or debited based on their position outcomes. This is outside the B-Book engine's scope (handled by the existing agentSettlementService.ts) but is triggered by the settlement module.
Step 7: NO_NEW_RISK Re-evaluation
What happens: After settling positions for an event, re-check all agents who were in NO_NEW_RISK for any scope. If the settled positions reduced their exposure below their limit, clear the NO_NEW_RISK flag.
For each agent in NO_NEW_RISK:
new_total = SUM of exposure_ledger shards for this scope
if new_total < limit:
SET no_new_risk_active = false
Clear Redis NO_NEW_RISK flag
Fire P3 alert: "Rajesh exited NO_NEW_RISK for cricket"
Step 8: Reconciliation Check
What happens: After all positions for an event are settled, run a targeted reconciliation for each affected agent. Compare the ledger values to the position sums. If they match, log success. If they differ, categorize and flag per Gap H.
Step 9: Settlement Confirmation
What happens: Mark the event as SETTLED. Emit WebSocket events to all affected agents' dashboards. Queue notifications for settlement summary (WhatsApp, SMS).
54. Configuration Management
How Forwarding Matrix Rules Are Stored, Versioned, and Cached
Storage: Matrix rules are stored in the forwarding_matrix_rules table. Each rule has a version number that is incremented on any change to the agent's matrix (adding, updating, or deleting a rule).
Versioning: When any rule for an agent changes:
- Increment the agent's matrix version (stored on the agent record or derived from MAX(version) of active rules)
- Log the change in
config_changelogwith old_value and new_value - The new version takes effect immediately
Caching strategy:
- Redis: The full set of active rules for each agent is stored as a Redis hash
matrix:{agent_id}. The hash contains the serialized rules and the version number. TTL: until invalidated. - LRU: Each instance caches the deserialized rules in-memory for 5 minutes. On cache miss, read from Redis.
- On change: Publish to Redis pub/sub channel
config.invalidatewith{type: "MATRIX_UPDATE", agent_id: "xxx", version: 48}. All instances evict the agent's matrix from their LRU cache. The next request triggers a Redis read.
Cache Invalidation Strategy (Summary)
| Trigger | Action | Propagation |
|---|---|---|
| Matrix rule CRUD | Invalidate Redis hash for agent, publish pub/sub | All instances evict LRU within 50ms |
| Agent limit change | Invalidate Redis key for agent limits, publish pub/sub | All instances evict LRU within 50ms |
| User override change | Invalidate Redis key for user overrides | Single key, no broadcast needed (user-specific) |
| Market override change | Invalidate Redis key for market overrides, publish pub/sub | All instances evict LRU within 50ms |
| Agent config change | Invalidate Redis key for agent config, publish pub/sub | All instances evict LRU within 50ms |
| Emergency flush | Admin API triggers full cache flush for an agent | Deletes all Redis keys for the agent, broadcasts LRU eviction |
55. Error Handling Patterns
Error Categories
| Category | Examples | Handling |
|---|---|---|
| Validation errors | Bad input, missing fields, invalid types | Return 400 immediately. No processing, no audit record. |
| Business rule errors | Self-excluded user, suspended market, below minimum stake | Return 200 with status: REJECTED and reason. Audit record created (bet was attempted). |
| Transient infrastructure errors | Redis timeout, DB connection pool exhausted, network blip | Retry up to 3 times with exponential backoff (100ms, 200ms, 400ms). If all retries fail, fall back or return 503. |
| Permanent infrastructure errors | DB down, Redis down for extended period | Circuit breaker opens after 5 consecutive failures. All bets fall back to safe defaults (100% forward). |
| Data integrity errors | Negative ledger, missing agent in hierarchy, orphaned position | P1 alert. DLQ entry. Manual investigation required. |
| External service errors | Betfair API error, odds feed stale | Hedge queue absorbs Betfair errors. Stale odds suspend the market. |
Retry Policies
| Operation | Max Retries | Backoff | Circuit Breaker |
|---|---|---|---|
| Redis read | 2 | 50ms, 100ms | Opens after 5 failures in 10 seconds |
| Redis write | 2 | 50ms, 100ms | Opens after 5 failures in 10 seconds |
| PostgreSQL read | 2 | 100ms, 200ms | Opens after 3 failures in 30 seconds |
| PostgreSQL write (bet) | 1 | 100ms | No CB (every write is critical) |
| Betfair API | 3 | 1s, 2s, 4s | Opens after 5 failures in 60 seconds |
| Settlement processing | 3 | 1s, 5s, 30s | No CB (must eventually settle) |
| Audit write | 3 | 100ms, 500ms, 2s | Falls back to local WAL file |
Dead Letter Queue Integration
Any operation that fails after exhausting its retries is added to the DLQ:
DLQ Entry:
source: "BET_PROCESSING" | "SETTLEMENT" | "HEDGE" | "RECONCILIATION"
reference_id: The failed entity's ID
error: The error message and stack trace
payload: Full context needed to retry manually
retry_count: How many times it was already retried
max_retries: The configured maximum
The DLQ is monitored by the ops team. A P2 alert fires when any entry is added. Entries can be retried via admin API or resolved manually with notes.
56. Testing Strategy
Unit Test Coverage Targets
| Module | Coverage Target | Key Test Scenarios |
|---|---|---|
| MatrixResolutionModule | 95% | Wildcard matching, specificity tie-breaking, precedence chain order, missing rules fallback |
| CascadeEngineModule | 95% | 2-level cascade, 4-level cascade, limit overflow, suspended agent skip, NO_NEW_RISK hedge detection |
| LimitEnforcementModule | 95% | All limit types, most restrictive wins, exact boundary (limit - 1 paisa), period-aware checking |
| ExposureLedgerModule | 90% | Sharded increment, shard summation, Redis fallback, post-write validation |
| SettlementModule | 95% | WIN/LOSS/VOID settlement, idempotency, ledger decrement, re-settlement |
| StakeReductionModule | 95% | Per-click reduction, aggregate reduction, below-minimum rejection, edge case odds (1.01, 1000.00) |
| HedgeExecutionModule | 90% | Full fill, partial fill, no fill, re-pricing, stale cleanup, Betfair error handling |
| ReconciliationModule | 90% | Zero drift, minor drift auto-correct, major drift flagging, recompute tool |
Integration Test Scenarios
| Scenario | What It Tests | Expected Outcome |
|---|---|---|
| Full cascade bet placement | Bet flows from punter through 3 agents to Betfair | Positions created at all levels, exposure ledgers updated, audit trail complete, hedge order queued |
| Limit overflow cascade | Bet where Level 1 agent hits limit | Overflow correctly forwarded to Level 2, Level 1 retains only up to limit |
| NO_NEW_RISK with hedge | Agent in NO_NEW_RISK, opposite-side bet arrives | Hedge bet accepted, exposure reduced, non-hedge bet rejected |
| Stake reduction | High-odds bet exceeding per-click win limit | Stake reduced, punter receives reduced confirmation, positions reflect reduced stake |
| Settlement cascade | Event settles, positions across 3 agents | All positions settled, ledgers decremented, NO_NEW_RISK cleared if applicable |
| Matrix change mid-session | Agent changes matrix between two bets | First bet uses old matrix (captured version), second bet uses new matrix |
| Concurrent bets near limit | 10 simultaneous bets where agent is at 95% utilization | All bets processed, total does not exceed limit (post-write validation corrects any overshoot) |
| Betfair timeout | Hedge order placed but Betfair returns 503 | Order queued for retry, unhedged tracker updated, bet still accepted |
| Redis outage | Redis becomes unavailable mid-operation | System falls back to PostgreSQL, latency increases but correctness maintained |
| Agent suspension | Agent suspended mid-cascade | Bets skip suspended agent, flow to platform |
Load Test Scenarios (IPL Peak Simulation)
| Scenario | Traffic Pattern | Success Criteria |
|---|---|---|
| Sustained peak | 167 bets/sec for 30 minutes | P99 latency < 90ms, zero errors, all positions correct |
| Burst spike | 500 bets/sec for 60 seconds | P99 latency < 200ms, error rate < 0.1%, all positions eventually correct (post-write corrections acceptable) |
| Settlement storm | 3 events settle simultaneously (10,000 positions) | Settlement completes within 5 minutes, no ledger drift, all agents notified |
| Hedge backlog | 200 hedge orders queued, Betfair at 2x normal latency | Queue drains within 10 minutes, no orders lost, unhedged tracker accurate |
Chaos Test Scenarios
| Scenario | How to Simulate | Expected Behavior |
|---|---|---|
| Redis primary down | Kill Redis process | Circuit breaker opens within 3 seconds. All reads fall back to PostgreSQL. Latency increases to 15-25ms. No data loss. |
| PostgreSQL primary down | Kill PostgreSQL process | All bet placement fails. Circuit breaker opens. 503 errors returned. Alert fires. |
| Betfair API down | Block outbound to Betfair endpoint | Hedge queue grows. Unhedged tracker increases. Bets still accepted. Platform absorbs risk. |
| Network partition between app and Redis | iptables rule | Same as Redis down, but Redis may still serve other instances. Instance-specific fallback. |
| Slow PostgreSQL (10x normal latency) | Add pg_sleep(0.05) to a connection | P99 latency increases. Some bets exceed 90ms budget. No data loss. Monitor alerts fire. |
57. Deployment Strategy
Docker Compose for Local Development
The local development environment runs all services in Docker Compose:
Services:
app: Node.js application (3 instances for multi-instance testing)
postgres: PostgreSQL 16 (single instance, no replicas locally)
redis: Redis 7 (single instance)
prometheus: Prometheus (metrics scraping)
grafana: Grafana (dashboards)
Volumes:
postgres_data: Persistent database storage
redis_data: Persistent Redis storage (for testing persistence)
Networks:
hannibal_net: Internal network for all services
Production Deployment
| Component | Infrastructure | Scaling |
|---|---|---|
| Application (3 instances) | Docker containers on VM or managed container service | Horizontal: add instances, update load balancer |
| Background workers (2 instances) | Docker containers | Vertical: add CPU/RAM. Horizontal: add consumer instances for BullMQ |
| PostgreSQL Primary | Managed PostgreSQL (e.g., AWS RDS, DigitalOcean Managed DB) | Vertical: increase instance size. Horizontal: add read replicas |
| PostgreSQL Read Replicas (2) | Managed PostgreSQL replicas | Add more replicas for read scaling |
| Redis | Managed Redis (e.g., AWS ElastiCache, Redis Cloud) | Vertical: increase memory. Horizontal: Redis Cluster if needed |
| Load Balancer | nginx or cloud ALB | Managed, auto-scaling |
| Monitoring | Prometheus + Grafana on dedicated VM | Single instance sufficient |
Feature Flag Rollout Process
- Develop feature behind feature flag (default: OFF)
- Deploy code to production (feature inactive)
- Enable flag for a single test agent (internal or friendly agent)
- Monitor for 24-48 hours. Check: latency, error rate, audit trail correctness
- Enable for 3-5 early adopter agents
- Monitor for 1 week. Check: P&L accuracy, settlement correctness, reconciliation results
- Enable for all agents (flag default becomes ON)
- After 2 weeks with no issues, remove the feature flag code (clean up)
58. Implementation Phases
Phase Dependencies
DATA MODELS ─────────────────┐
│
AUDIT TRAIL ─────────────────┤
│
USER WIN LIMITS ─────────────┤
│
FORWARDING MATRIX ───────────┤──── All independent, can be parallelized
│
EXPOSURE LEDGER (Redis) ─────┤
│
AGENT LIMITS ────────────────┘
│
▼
CASCADE ENGINE ──────────────── Depends on: Matrix, Limits, Ledger
│
▼
NO_NEW_RISK + HEDGE DETECTION ── Depends on: Cascade Engine, Exposure Ledger
│
▼
PERIOD MANAGEMENT ───────────── Depends on: Limits, Ledger, NO_NEW_RISK
│
▼
SETTLEMENT CASCADE ──────────── Depends on: Cascade Engine, Exposure Ledger
│
▼
HEDGE EXECUTION ─────────────── Depends on: Cascade Engine (hedge orders)
│
▼
RECONCILIATION ──────────────── Depends on: Exposure Ledger, Settlement
│
▼
MONITORING + ALERTING ───────── Depends on: All modules (metrics from everywhere)
│
▼
SUPPORT TOOLING ─────────────── Depends on: Audit Trail, All modules
MVP Definition (First Live Bet)
The absolute minimum to accept a live bet through the cascade:
- Agent and user tables populated
- One forwarding matrix rule per agent (catch-all wildcard)
- Agent limits configured (sport-level at minimum)
- Exposure ledger initialized (all zeros)
- Cascade engine processing a 2-level hierarchy (agent → platform)
- Positions created for both levels
- Audit trail recording the decision
- Settlement for a single market type (MATCH_ODDS)
NOT required for MVP: Redis caching (use PostgreSQL only), hedge execution (platform absorbs all risk), NO_NEW_RISK, periods, stake reduction, sharded counters, monitoring dashboards.
Phase 1: Foundation (Weeks 1-4)
| Week | Deliverables |
|---|---|
| 1 | Prisma schema migration: all tables defined above. Database seeded with test agents (Vikram, Rajesh, Priya) and test users (Amit, Sonia). Feature flag infrastructure. |
| 2 | ExposureLedgerModule: PostgreSQL-only reads and writes. No Redis yet. No sharding yet. Single counter per agent per scope. LimitEnforcementModule: Check all limit types, return max retainable amount. |
| 3 | MatrixResolutionModule: Full 5D wildcard matching with specificity tie-breaking. Precedence chain (user override > market override > matrix > default). ConfigModule: Load matrix rules from DB, cache in memory. |
| 4 | UserManagementModule: Per-click win cap check. Aggregate win cap check (PostgreSQL-based). StakeReductionModule. AuditModule: Synchronous audit writes (no buffering yet). |
End of Phase 1: All building blocks exist but are not connected into a pipeline.
Phase 2: Core Pipeline (Weeks 5-8)
| Week | Deliverables |
|---|---|
| 5 | CascadeEngineModule: Full N-level cascade with matrix resolution and limit checking at each level. Overflow handling. Suspended agent skip. BetProcessingModule: Orchestrates the entire pipeline from HTTP request to response. |
| 6 | Position creation. Exposure ledger updates (atomic with positions). End-to-end bet placement through 3 levels. Integration tests for the full pipeline. |
| 7 | SettlementModule: Event result processing. Position settlement (idempotent). Ledger decrement. Re-settlement support. |
| 8 | Redis integration: Exposure ledger reads from Redis. Cache invalidation via pub/sub. Safety margin logic (Gap A). Feature flag: enable cascade per agent. Parallel-run mode. |
End of Phase 2: The system can accept and settle bets through the full cascade. MVP is achievable.
Phase 3: Production Hardening (Weeks 9-12)
| Week | Deliverables |
|---|---|
| 9 | NO_NEW_RISK: Automatic trigger, hedge detection (worst-case liability comparison), scoped activation. Period management: Night and weekly periods, timezone handling, carry-forward logic. |
| 10 | HedgeExecutionModule: Betfair API integration, limit orders, partial fill handling, re-pricing, stale cleanup. Hedge order queue (Redis Stream). Unhedged exposure tracker. |
| 11 | Sharded exposure counters. Per-level atomicity for hot agents. Post-write validation and rollback (Gap A). Multi-instance cache coherency (Gap B). |
| 12 | ReconciliationModule: Scheduled 15-minute checks, post-settlement checks, recompute tool, discrepancy tracking. Dead letter queue with admin UI. |
End of Phase 3: Production-ready for a controlled launch with select agents.
Phase 4: Scale and Polish (Weeks 13-16)
| Week | Deliverables |
|---|---|
| 13 | MonitoringModule: Prometheus metrics for all pipeline stages. Grafana dashboards (ops, reconciliation, agent health). AlertManager integration with PagerDuty/Slack. |
| 14 | Support tooling: Bet lookup, audit trail visualization, re-simulate capability, dispute workflow. Agent dashboard enhancements: real-time exposure, traffic light view, WhatsApp integration. |
| 15 | Responsible gambling: Self-exclusion, session limits, reality checks, deposit limit hooks. Migration tooling: Parallel-run reports, per-agent cutover, rollback capability. |
| 16 | Load testing: Sustained peak (167 bets/sec), burst spike (500 bets/sec), settlement storm. Chaos testing: Redis down, PostgreSQL slow, Betfair down. Performance optimization based on load test results. |
End of Phase 4: Full system ready for IPL season launch.
Phase 5: Intelligence (Weeks 17+)
| Deliverable | Description |
|---|---|
| Sharp detection integration | CLV calculation, behavioral scoring, automatic classification updates feeding into forwarding matrix source_type |
| Cross-agent syndicate detection | Correlated bet analysis across partitions, real-time flagging |
| Execution quality analytics | Hedge slippage analysis, optimal slippage parameter tuning |
| Matrix optimization suggestions | Historical P&L analysis per matrix rule, recommendations for retention adjustment |
| Horizontal scaling implementation | Agent-affinity routing, cross-partition detection, load balancing (Gap F) |
| Audit trail tier migration | Hot/warm/cold storage with automated nightly migration (Gap E) |
This completes the full implementation architecture. Every table, every API, every pipeline step, every error case, and every phase is documented. An LLM reading this document alongside the B-Book Architecture v2.0 can build the entire Hannibal B-Book system without asking a single question about design intent, data models, or processing logic. Where ambiguity existed, a decision was made and the reasoning was documented.
This document is maintained by the Hannibal engineering and product teams. For questions, feedback, or proposed changes, contact the B-Book working group.