SmartBets Order Book Service
Global Architecture & Design Specification
Version: 2.0 (Revamped) Date: December 2025 Status: Canonical Reference
Table of Contents
| Part | Title | Description |
|---|---|---|
| I | Executive Summary | What this is, key metrics, scope |
| II | Business Context | Betting exchanges, BACK/LAY explained |
| III | The Philosophy | Design principles, why we made these choices |
| IV | System Architecture | Requirements, constraints, the pipeline |
| V | Data Structures Deep Dive | Memory layout, BTreeMap, HashMap |
| VI | Matching Mechanics & API | Algorithm, order lifecycle, gRPC API |
| VII | Durability & Recovery | fsync barrier, crash recovery, deployment |
| VIII | Performance Engineering | Journal format, TIF options, sharding |
| IX | Operations & Deployment | Market lifecycle, monitoring |
| X | Security | Security model, what we do/don't do |
| XI | Testing Strategy | Test pyramid, critical scenarios |
| XII | Balance Management | How Backend manages funds |
| XIII | References & Resources | Papers, videos, books |
| XIV | FAQ | Common questions answered |
| XV | Glossary | Term definitions |
Part I: Executive Summary
What Is This Document?
This is the canonical architecture specification for the SmartBets Order Book Service - a high-performance, fault-tolerant matching engine for a betting exchange. If you need to understand how orders are matched, how data flows, or how we achieve zero trade loss, this is your source of truth.
The One-Paragraph Summary
The Order Book Service is a Rust-based matching engine that accepts orders via gRPC, sequences them with a monotonic counter, persists them to an append-only journal, matches them in a single-threaded engine, and streams results back to clients. It is inspired by the LMAX Disruptor architecture used by the London Multi-Asset Exchange.
Key Metrics At A Glance
Diagram Explanation:
This mind map shows the three pillars of the Order Book Service:
| Pillar | What It Means | Why It Matters |
|---|---|---|
| Performance | Fast and consistent | Users expect instant order confirmation |
| Reliability | No data loss, ever | Financial system - can't lose trades |
| Simplicity | Easy to reason about | Fewer bugs, easier debugging |
What This Service Does (And Doesn't Do)
The Key Insight: The Order Book is a pure matching engine. It trusts that the SmartBets Backend has already verified the user, checked their balance, and authorized the order. This separation of concerns is what allows the Order Book to be blazingly fast.
Part II: Business Context
What Is A Betting Exchange?
A betting exchange is fundamentally different from a traditional bookmaker:
Diagram Explanation:
| Model | How It Works | Who Takes Risk |
|---|---|---|
| Traditional Bookmaker | You bet against the house. House sets odds. | The bookmaker |
| Betting Exchange | You bet against other users. Exchange matches you. | Other users |
Why Exchanges Are Better:
- Better Odds: No bookmaker margin baked in
- Two-Way Market: Can bet FOR or AGAINST outcomes
- Transparent Liquidity: See all available bets before placing yours
BACK vs LAY: The Two Sides of Every Bet
This is the most important concept to understand:
Diagram Explanation:
| Side | You're Betting | Your Risk | Your Reward |
|---|---|---|---|
| BACK | FOR the outcome | Your stake ($100) | Stake × (Odds - 1) = $100 |
| LAY | AGAINST the outcome | Liability = Stake × (Odds - 1) = $100 | The backer's stake ($100) |
Real-World Example:
| Scenario | User A (BACK) | User B (LAY) |
|---|---|---|
| Man Utd WINS | +$100 profit | -$100 loss |
| Man Utd LOSES | -$100 loss | +$100 profit |
The Insight: For every BACK bet, someone must be willing to LAY. The exchange's job is to match these two parties at a mutually agreeable price.
How SmartBets Uses The Order Book
Diagram Explanation:
This sequence shows the complete journey of a bet:
| Step | Component | What Happens | If It Fails |
|---|---|---|---|
| 1-2 | Frontend → Backend | User intent captured | Show error UI |
| 3-5 | Backend | Auth + Balance check | Reject with "Insufficient funds" |
| 6-8 | Order Book | Matching magic | Retry or timeout |
| 9-11 | Backend | Record & notify | Eventual consistency |
Critical Point: The Order Book (step 6-8) is the fastest part. Most latency is in steps 3-5 (database) and 9-10 (blockchain).
Part III: The Philosophy
The Core Objective
The SmartBets Order Book Service is designed with a single, uncompromising goal: Deterministic Performance.
In the world of betting exchanges, "fast" is not enough. The system must be fast consistently. A 50ms trade execution is useless if the next one takes 500ms due to a Garbage Collection pause or a database lock.
Our Guarantees:
| Metric | Target | Why It Matters |
|---|---|---|
| Latency P99 | < 50ms | Users expect instant response |
| Throughput | 10,000 orders/sec | Handle peak load |
| Durability | 0% trade loss | Financial regulation |
| Recovery | < 2 seconds | Minimize downtime |
The Great Design Filter
Every architectural decision was passed through a specific filter: "Does this introduce non-determinism?"
The Design Decisions Explained:
| Standard Approach | Why We Rejected It | Our "LMAX" Approach |
|---|---|---|
| Microservices Mesh | Network jitter makes ordering impossible | Monolithic Logic Core: One sequence, one truth |
| Relational DB (SQL) | Locks and buffer pools cause latency spikes | In-Memory State: All data in RAM |
| Multi-Threading | Context switches and mutex contention | Single-Threaded Core: One CPU, one thread |
| Floating Point | 0.1 + 0.2 != 0.3 breaks financial math | Integers (Basis Points): $100.00 = 1000000 |
Part IV: System Architecture
Requirements & Constraints
Before diving into the architecture, let's establish what we're building for:
Functional Requirements
| ID | Requirement | Priority |
|---|---|---|
| FR-1 | Accept BACK and LAY orders for any market | Must Have |
| FR-2 | Match orders using Price-Time Priority | Must Have |
| FR-3 | Support order cancellation | Must Have |
| FR-4 | Emit trade events in real-time | Must Have |
| FR-5 | Recover from crashes without data loss | Must Have |
| FR-6 | Support market lifecycle (create, open, suspend, close) | Must Have |
| FR-7 | Provide order book snapshots on demand | Should Have |
| FR-8 | Support multiple Time-In-Force options (GTC, IOC, FOK) | Should Have |
Non-Functional Requirements
| ID | Requirement | Target | Rationale |
|---|---|---|---|
| NFR-1 | Throughput | 10,000 orders/sec | Peak load during major events |
| NFR-2 | P99 Latency | < 50ms | User experience |
| NFR-3 | Recovery Time | < 2 seconds | Minimize downtime |
| NFR-4 | Durability | 0% trade loss | Financial regulation |
| NFR-5 | Memory | < 500MB for 3M orders | Cost efficiency |
| NFR-6 | Availability | 99.9% (8.7 hours/year downtime) | Business requirement |
Assumptions
| Assumption | Implication |
|---|---|
| SmartBets Backend validates user authentication | Order Book doesn't need auth logic |
| SmartBets Backend checks user balances | Order Book doesn't need balance logic |
| Orders arrive via gRPC from trusted internal network | No need for rate limiting at Order Book level |
| Single datacenter deployment initially | No cross-region replication needed |
| Markets are independent (no cross-market orders) | Enables future sharding |
Constraints
| Constraint | Reason |
|---|---|
| Single-threaded matching engine | Determinism requirement |
| Rust programming language | Performance + safety |
| Kubernetes deployment | Infrastructure standardization |
| NVMe storage for journal | Latency requirement |
System Context
The Order Book is a "Dark Service". It does not speak to the outside world. It sits behind the SmartBets Backend, accepting a stream of commands and emitting a stream of facts.
Diagram Explanation:
This System Context diagram shows the Order Book's position in the architecture:
| Component | Role | Why It Matters |
|---|---|---|
| User | End customer placing bets | Never touches Order Book directly |
| SmartBets Backend | Authentication, validation, balance checks | The "gatekeeper" that protects Order Book |
| gRPC Gateway | Protocol translation, connection management | Handles network complexity |
| The Pipeline | Sequencing → Journal → Matching → Publishing | The deterministic core |
Key Insight: The Order Book is intentionally isolated. It doesn't know about users, balances, or the outside world. It only understands: "Here's an order. Match it."
The "Pipe" Design
The system is modeled as a unidirectional pipeline of events. Data flows in one direction, is transformed, sequenced, persisted, and executed.
Diagram Explanation:
This is the heart of the architecture - "The Pipe". Every order flows through this exact sequence:
| Stage | Component | What Happens | Failure Mode |
|---|---|---|---|
| 1 | gRPC Server | Decode request, validate structure | Returns error immediately |
| 2 | Sequencer | Assign sequence number (e.g., 1000001) | Blocks until available |
| 3 | Event Journal | Write to disk, call fsync() | Order "never existed" |
| 4 | Matching Engine | Execute matching algorithm | N/A (deterministic) |
| 5 | Event Publisher | Stream results to clients | Client reconnects |
Why "The Durability Barrier" is Orange: This is the critical safety zone. The order is NOT acknowledged until it crosses this barrier. If we crash inside it, the order is lost BUT the user knows it failed. Consistency preserved.
Why "Matching Engine" is Purple: This is the single-threaded core. No locks, no contention, no surprises. Pure deterministic execution.
Component Deep Dive
1. The gRPC Gateway (The Doorman)
Role: Validation & Translation.
| Responsibility | Details |
|---|---|
| Connection Management | Handles thousands of TCP connections via Tokio async runtime |
| Protocol Decoding | Deserializes Protocol Buffer messages |
| Structural Validation | Rejects malformed requests (negative prices, missing fields) |
| Threading Model | Multi-threaded worker pool absorbs network concurrency |
2. The Sequencer (The Ticket Taker)
Role: Total Ordering.
| The Problem | The Solution |
|---|---|
| Two orders arrive at the "same" nanosecond | Sequencer assigns strict ordering |
| Distributed clocks disagree on time | Single sequence number is the truth |
| Replay must be deterministic | Sequence guarantees identical replay |
The Law: If Order A gets Seq: 100 and Order B gets Seq: 101, then Order A happened before Order B. Period. This is the source of truth for the entire universe.
3. The Event Journal (The Black Box)
Role: Durability & Replay.
| Principle | Implementation |
|---|---|
| Append-Only | Never modify or delete entries |
| Write-Ahead | Written BEFORE processing |
| Durable | fsync() before acknowledgment |
The Guarantee: We do not send Ack: OK to the user until fsync() returns success.
4. The Matching Engine (The Brain)
Role: Execution.
| Aspect | Choice | Rationale |
|---|---|---|
| Threading | Single-Threaded | No locks = no stalls |
| State | In-Memory (HashMap/BTreeMap) | Microsecond access |
| Design | Event-driven | Deterministic replay |
5. The Event Publisher (The Broadcaster)
Role: Dissemination.
| Feature | Behavior |
|---|---|
| Decoupling | Fast engine, slow network - never blocks |
| Backpressure | Slow clients are DROPPED, not waited for |
| Fan-out | One trade → many subscribers |
Part V: Data Structures Deep Dive
1. The Memory Landscape
The Matching Engine holds the entire "Known Universe" in RAM. This ensures O(1) or O(log N) access times for every operation.
Diagram Explanation:
This diagram shows how data is organized in memory for maximum performance:
| Data Structure | Purpose | Access Time |
|---|---|---|
| HashMap: MarketID → Market | Find any market instantly | O(1) |
| HashMap: OrderID → Pointer | Cancel any order instantly | O(1) |
| BTree: Price → PriceLevel | Find best price, iterate in order | O(log N) |
| Queue: Orders at same price | Time priority (FIFO) | O(1) |
Why This Structure?
| Operation | How It Works | Speed |
|---|---|---|
| Find market "ABC" | Hash lookup in Markets map | ~50 nanoseconds |
| Find best BACK price | First element of BTree | ~10 nanoseconds |
| Cancel order XYZ | Hash lookup → pointer → remove | ~100 nanoseconds |
| Add order at price 2.50 | BTree insert + queue append | ~200 nanoseconds |
Why BTreeMap (not HashMap) for Prices?
| Reason | Explanation |
|---|---|
| Sorted Keys | Always need "best price" (first/last item) |
| Range Queries | "Top 5 price levels" is trivial |
| Cache Locality | B-Tree nodes are contiguous in memory |
2. The Matching Logic (The Algorithm)
The engine runs a Price-Time Priority algorithm.
The Golden Rule:
A match occurs if
Best_BACK_Price >= Best_LAY_Price.
Diagram Explanation:
This flowchart shows the matching algorithm for an incoming BACK order:
| Step | What Happens | Example |
|---|---|---|
| 1. Check | Compare incoming BACK price vs best LAY price | BACK 2.00 vs LAY 1.95 |
| 2. Match | If BACK >= LAY, execute trade at LAY price | Trade at 1.95 (better for backer!) |
| 3. Repeat | If quantity remains, check next LAY level | Continue until filled or no match |
| 4. Rest | If no match, add to BACK book | Wait for future LAY orders |
Scenario Examples:
| Scenario | BACK Order | LAY Book Best | Result |
|---|---|---|---|
| Match | 2.00 for $100 | 1.95 for $100 | Trade at 1.95 |
| Partial | 2.00 for $100 | 1.95 for $50 | Trade $50, rest $50 |
| No Match | 1.90 for $100 | 2.00 for $100 | BACK rests at 1.90 |
3. Order Lifecycle State Machine
Diagram Explanation:
Every order goes through this state machine. Understanding it is critical for integration:
| State | Meaning | What Happens Next |
|---|---|---|
| PENDING | Just arrived, being processed | Transitions to OPEN, PARTIAL, FILLED, or REJECTED |
| OPEN | Resting in book, waiting for match | Can be matched, filled, or cancelled |
| PARTIAL | Some quantity matched, some resting | Can be matched more, filled, or cancelled |
| FILLED | 100% of quantity matched | Terminal state - order complete |
| CANCELLED | User cancelled or market closed | Terminal state - remaining quantity gone |
| REJECTED | Failed validation | Terminal state - never entered book |
Common Transitions:
| From | To | Trigger |
|---|---|---|
| PENDING → OPEN | Order added to book | No matching counterparty |
| PENDING → FILLED | Instant match | Counterparty available |
| OPEN → PARTIAL | Someone matched part | New order arrived |
| PARTIAL → CANCELLED | User cancelled | Cancel request received |
Part VI: Matching Mechanics & API Reference
1. The Contract (gRPC API)
This is the strict contract between the Backend and the Order Book.
Diagram Explanation:
The API is divided into three categories:
| Category | Type | Use Case |
|---|---|---|
| Commands | Unary RPC (request/response) | Change state: create markets, submit orders |
| Queries | Unary RPC (request/response) | Read state: get order book snapshot |
| Streams | Server Streaming RPC | Real-time updates: trades, order book changes |
Key Messages:
| Message | Fields | Notes |
|---|---|---|
| SubmitOrderRequest | market_id, user_id, client_order_id, side, price, quantity | Price in basis points (2.00 = 20000) |
| Event | sequence, timestamp_ns, payload | Sequence is the global truth |
2. Error Codes (The "No" List)
| Code | Meaning | Solution |
|---|---|---|
MARKET_NOT_FOUND | Market ID does not exist. | Create market first. |
MARKET_SUSPENDED | Market is paused (VAR check, etc). | Wait for RESUME. |
INVALID_PRICE | Price is not on the valid tick grid. | Round to nearest valid tick. |
SELF_TRADE | You matched with your own order. | Incoming order is rejected to prevent wash trading. |
OVERLOADED | System backlog > 5000. | Back off. Retry with exponential delay. |
3. Performance Targets (The Promise)
| Metric | Target | Notes |
|---|---|---|
| Throughput | 10,000 orders/sec sustained | Includes journal writes |
| P99 Latency | < 50ms | End-to-end including fsync |
| Recovery Time | < 2 seconds | Replay 1M events |
| Memory footprint | < 500MB | For 3M active orders |
Part VII: Durability & Recovery
1. Durability (The "fsync" Barrier)
We are a financial system. Data loss is existential.
The Guarantee
"If we told you we took your order, we have it on disk."
This is achieved via the fsync() barrier in the Sequencer.
Diagram Explanation:
This sequence diagram shows the critical durability guarantee:
| Phase | What Happens | If Crash Here |
|---|---|---|
| Before Danger Zone | Request received, not yet written | Order lost, user gets timeout, retries |
| Inside Danger Zone | Writing to disk, waiting for fsync | Order lost, user gets timeout, retries |
| After Danger Zone | fsync complete, ACK sent | Order survives, replayed on restart |
The Key Insight: We NEVER acknowledge an order until it's durable on disk. This is why the "Danger Zone" is highlighted - if power fails there, the user's request times out and they retry. No inconsistency.
2. Crash Recovery (The Time Machine)
When the service restarts (after a crash or deploy), memory is empty. It must rebuild the universe from the journal.
Diagram Explanation:
This flowchart shows how the system recovers after any restart:
| Step | What Happens | Time |
|---|---|---|
| 1. Check Snapshot | Look for saved state | ~10ms |
| 2. Load Snapshot | Restore engine to sequence N | ~100ms |
| 3. Replay Journal | Apply events from N+1 to current | ~2ms per 1000 events |
| 4. Ready | Open for trading | Total: < 2 seconds |
Recovery Time Examples:
| Scenario | Snapshot Age | Journal Size | Recovery Time |
|---|---|---|---|
| Clean restart | Fresh | 1,000 events | < 100ms |
| Normal crash | 1 hour old | 50,000 events | ~500ms |
| Long outage | 1 day old | 1,000,000 events | ~2 seconds |
3. Production Deployment (Kubernetes)
We run as a Single Pod StatefulSet.
| Question | Answer |
|---|---|
| Why not a Deployment? | Need stable disk identity |
| Why only 1 replica? | Single writer to journal |
| What about HA? | Fast recovery (< 2 sec) is the strategy |
Diagram Explanation:
This deployment diagram shows the production infrastructure:
| Component | Instances | Role |
|---|---|---|
| NGINX Ingress | 1 | Load balancer for HTTP traffic |
| Backend Pods | 3 | SmartBets app servers (stateless, scalable) |
| Order Book Pod | 1 | The matching engine (stateful, single instance) |
| NVMe PV | 1 | Persistent volume for journal + snapshots |
Why Single Instance?
| Concern | Solution |
|---|---|
| "What about availability?" | Fast recovery (< 2 sec) |
| "What about scaling?" | 10K TPS is plenty; shard by market if needed |
| "What about data loss?" | Journal is durable; replicated storage optional |
Configuration Specs
| Resource | Value | Rationale |
|---|---|---|
| CPU | 2 Dedicated Cores | 1 for engine, 1 for Tokio/IO |
| RAM | 2GB | Plenty for 10M orders |
| Disk | NVMe SSD | Low latency for fsync |
4. Diagnostics & Monitoring
Diagram Explanation:
This shows the observability pipeline:
| Component | Role |
|---|---|
| Metrics Endpoint | Exposes Prometheus format on /metrics |
| Prometheus | Scrapes every 15 seconds, stores time series |
| Grafana | Dashboards for visualization |
| Alertmanager | Routes alerts based on severity |
Critical Alerts
| Alert Name | Condition | Meaning | Action |
|---|---|---|---|
OrderBookDown | up == 0 | Service crashed | K8s restarts automatically |
SlowMatching | p99_latency > 100ms | Something blocking engine | Investigate thread |
JournalLag | write_latency > 100ms | Disk I/O choking | Check disk health |
SequenceGap | seq_in != seq_out + 1 | CRITICAL: Determinism broken | Manual investigation |
Appendix: Technology Stack
| Component | Technology | Why |
|---|---|---|
| Language | Rust 1.83+ | Memory safety, zero-cost abstractions |
| Async Runtime | Tokio | Best-in-class async I/O |
| gRPC | Tonic | Native Rust, high performance |
| Serialization | Prost (Protobuf) | Fast, schema-driven |
| Benchmarking | Criterion | Statistical rigor for perf testing |
Part VIII: Performance Engineering
1. The Journal Format
The Event Journal uses a structured binary format for durability and fast recovery.
Diagram Explanation:
| Field | Size | Purpose |
|---|---|---|
| Length | 4 bytes | Total entry size for seeking |
| Checksum | 4 bytes | CRC32 to detect corruption |
| Sequence | 8 bytes | The global ordering truth |
| Timestamp | 8 bytes | Nanoseconds since epoch |
| Type | 1 byte | Order, Cancel, or SnapshotMarker |
| Payload | Variable | Protobuf-encoded event |
Why This Format?
| Concern | Solution |
|---|---|
| Corruption detection | CRC32 checksum on every entry |
| Fast seeking | Length field enables skipping |
| Replay determinism | Sequence ensures ordering |
| Segment rotation | 1M events per file for manageability |
2. Time-In-Force (TIF) Options
Diagram Explanation:
| TIF | Meaning | Behavior |
|---|---|---|
| GTC | Good Till Cancelled | Match what you can, rest remainder in book |
| IOC | Immediate Or Cancel | Match what you can, cancel remainder |
| FOK | Fill Or Kill | All or nothing - no partial fills |
Use Cases:
| Order Type | When To Use |
|---|---|
| GTC | Normal limit orders, willing to wait |
| IOC | Want immediate execution, don't want to rest |
| FOK | Large orders, need full fill or none |
3. Future Scaling: Market Sharding
Current: 10,000 TPS on 1 node. Future: 100,000 TPS via sharding.
Diagram Explanation:
| Component | Role |
|---|---|
| Router Service | Consistent hashing on market_id |
| Order Book Nodes | Each owns 1/N of all markets |
| Journals | Each node has independent journal |
Sharding Strategy:
| Approach | Pros | Cons |
|---|---|---|
| Random hash | Even distribution | Correlated markets on different nodes |
| Explicit assignment | Keep related markets together | Manual management |
| Hybrid | Best of both | More complex routing |
Recommendation: Keep correlated markets (e.g., all EPL games) on same shard explicitly for accumulators.
Part IX: Operations & Deployment
Market State Machine
Diagram Explanation:
| State | Orders Accepted? | Matching? | Use Case |
|---|---|---|---|
| CREATED | ❌ No | ❌ No | Setup phase, configuring |
| OPEN | ✅ Yes | ✅ Yes | Normal trading |
| SUSPENDED | ❌ No | ❌ No | Pause (goal scored, VAR) |
| CLOSED | ❌ No | ❌ No | Event finished |
Transition Rules:
| From | To | Trigger | What Happens |
|---|---|---|---|
| CREATED → OPEN | OpenMarket | Start accepting orders | |
| OPEN → SUSPENDED | SuspendMarket | Reject new orders, keep existing | |
| SUSPENDED → OPEN | ResumeMarket | Resume trading | |
| Any → CLOSED | CloseMarket | Cancel all orders, final state |
Part X: Security
Security Model
The Order Book operates on a "Trust the Backend" security model:
Diagram Explanation:
| Zone | Components | Security Level |
|---|---|---|
| Untrusted | Users, Internet | Assume hostile |
| DMZ | WAF, Load Balancer | Filter attacks |
| Trusted | Backend, Order Book | Internal network only |
Security Measures
| Measure | Implementation | Purpose |
|---|---|---|
| Network Isolation | Order Book not exposed to internet | Reduce attack surface |
| mTLS | Mutual TLS between Backend and Order Book | Authenticate both parties |
| Input Validation | Protobuf schema + business rules | Prevent malformed data |
| Self-Trade Prevention | Reject orders matching own orders | Prevent wash trading |
| Rate Limiting | At Backend level, not Order Book | Prevent DoS |
What Order Book Does NOT Do
| Security Function | Handled By |
|---|---|
| User authentication | SmartBets Backend (JWT) |
| Authorization | SmartBets Backend |
| Balance validation | SmartBets Backend |
| Fraud detection | SmartBets Backend |
| Audit logging | SmartBets Backend |
Part XI: Testing Strategy
Testing Pyramid
Test Categories
| Category | What It Tests | Tools | Run Time |
|---|---|---|---|
| Unit Tests | Individual functions, matching logic | cargo test | < 10 seconds |
| Integration Tests | gRPC API, journal persistence | cargo test --features integration | < 1 minute |
| Property Tests | Invariants (no negative balances, etc.) | proptest | < 5 minutes |
| Benchmark Tests | Performance regression | criterion | < 10 minutes |
| E2E Tests | Full flow with Backend | Docker Compose | < 5 minutes |
Critical Test Scenarios
| Scenario | What We Verify |
|---|---|
| Crash Recovery | Replay journal, state matches pre-crash |
| Concurrent Orders | Sequence numbers are unique and ordered |
| Self-Trade | Orders from same user don't match |
| Price-Time Priority | Earlier orders at same price fill first |
| Partial Fills | Remaining quantity tracked correctly |
| Market Suspension | Orders rejected when suspended |
Determinism Testing
The Determinism Guarantee: Given the same sequence of input events, the engine MUST produce the exact same output. This is tested by replaying journals multiple times and comparing final states.
Part XII: Balance Management Flow
How SmartBets Backend Manages Balances
The Order Book does NOT manage balances. Here's how the full flow works:
Diagram Explanation:
| Phase | Steps | What Happens |
|---|---|---|
| Reserve | 2-5 | Backend checks balance, reserves funds |
| Match | 6-8 | Order Book matches, unaware of balances |
| Settle | 9-10 | Backend moves funds, creates position |
Balance States
| State | Meaning | Example |
|---|---|---|
| Available | Can be used for new bets | $400 |
| Reserved | Locked for pending orders | $100 |
| Total | Available + Reserved | $500 |
What If Order Book Crashes?
Key Insight: Because the Backend reserves funds BEFORE sending to Order Book, a crash never results in lost money. The worst case is a timeout and retry.
Part XIII: References & Resources
Foundational Reading
| Resource | Type | Why Read It |
|---|---|---|
| LMAX Architecture | Article | The inspiration for our design |
| The LMAX Disruptor | Documentation | Ring buffer pattern we adapted |
| Event Sourcing | Article | Why we use append-only journals |
Videos
| Video | Duration | Topic |
|---|---|---|
| LMAX - How to Do 100K TPS at Less Than 1ms Latency | 50 min | Original LMAX presentation |
| The Art of the Event-Sourced Microservice | 45 min | Event sourcing patterns |
Books
| Book | Author | Relevance |
|---|---|---|
| Designing Data-Intensive Applications | Martin Kleppmann | Chapters on event logs, stream processing |
| Systems Performance | Brendan Gregg | Understanding latency, profiling |
Related Codebases
| Project | Language | What To Learn |
|---|---|---|
| matching-engine | Rust | Similar architecture |
| disruptor-rs | Rust | Ring buffer implementation |
Part XIV: Frequently Asked Questions
General
Q: Why Rust instead of Java/Go/C++?
A: Rust gives us:
- Memory safety without garbage collection (no GC pauses)
- Zero-cost abstractions (fast as C, safe as Java)
- Excellent async support (Tokio)
- Strong type system catches bugs at compile time
Q: Why single-threaded? Isn't that slow?
A: Counter-intuitively, single-threaded is FASTER for our use case:
- No lock contention
- No context switching
- Perfect cache locality
- Deterministic execution
LMAX proved this with 6 million TPS on a single thread.
Q: What happens if the Order Book crashes?
A:
- Kubernetes restarts the pod (< 5 seconds)
- Order Book replays journal (< 2 seconds)
- Trading resumes with exact same state
- No trades are lost
Q: How do you handle time zones?
A: All timestamps are UTC nanoseconds since epoch. No time zones in the Order Book.
Technical
Q: Why not use a database?
A: Databases introduce non-determinism:
- Lock contention varies
- Query plans change
- Buffer pool behavior unpredictable
We use the journal as our "database" - append-only, sequential, predictable.
Q: How do you prevent duplicate orders?
A: Each order has a client_order_id provided by the Backend. The Backend is responsible for deduplication. The Order Book trusts the Backend.
Q: What's the maximum order size?
A: Technically unlimited, but the Backend enforces limits based on user tier and market liquidity.
Q: How do you handle floating point precision?
A: We don't use floating point. All prices and quantities are integers:
- Price: Basis points (2.00 = 20000)
- Quantity: Smallest currency unit (cents, satoshis)
Operations
Q: How do you deploy without downtime?
A: We accept brief downtime (< 5 seconds):
- Stop accepting new orders
- Drain pending orders
- Take snapshot
- Deploy new version
- Replay from snapshot
- Resume trading
Q: How do you debug production issues?
A:
- Metrics in Grafana show symptoms
- Logs in Loki show context
- Journal replay reproduces exact state
- Determinism means we can debug locally with production data
Q: What's the disaster recovery plan?
A:
- Journal is on durable storage (replicated)
- Snapshots are backed up to S3 hourly
- Recovery: Restore snapshot + replay journal
- RTO: < 30 minutes, RPO: 0 (no data loss)
Part XV: Glossary
| Term | Definition |
|---|---|
| BACK | A bet FOR an outcome (e.g., "Man Utd to win") |
| LAY | A bet AGAINST an outcome (accepting a BACK bet) |
| Basis Points | 1/100th of a percent; we use 10000 = 1.00 odds |
| BTreeMap | Balanced tree data structure, sorted keys, O(log N) operations |
| Deterministic | Same input always produces same output |
| Event Sourcing | Storing all changes as immutable events |
| FOK | Fill Or Kill - order must fill completely or be rejected |
| fsync | System call that ensures data is written to disk |
| GTC | Good Till Cancelled - order stays until filled or cancelled |
| gRPC | Google's RPC framework using Protocol Buffers |
| HashMap | Hash table data structure, O(1) average operations |
| IOC | Immediate Or Cancel - fill what you can, cancel rest |
| Journal | Append-only log of all events |
| Latency | Time from request to response |
| LMAX | London Multi-Asset Exchange, inspiration for our architecture |
| mTLS | Mutual TLS - both client and server authenticate |
| NVMe | Fast SSD storage protocol |
| P99 | 99th percentile - 99% of requests are faster than this |
| Price-Time Priority | Best price first, then earliest order first |
| Protobuf | Protocol Buffers - binary serialization format |
| Sequence Number | Monotonically increasing ID for total ordering |
| Snapshot | Point-in-time copy of engine state |
| StatefulSet | Kubernetes resource for stateful applications |
| Throughput | Number of operations per second |
| TIF | Time In Force - how long an order stays active |
| Tokio | Async runtime for Rust |
| Tonic | gRPC library for Rust |
End of Specification.
Last Updated: December 2025 Maintainer: SmartBets Engineering Team