Orderbook Testing Architecture
Executive Summary
This document defines a comprehensive testing architecture for the Universal Orderbook system. The goal is to achieve Correctness by Construction - moving beyond "testing" (checking if it works 99% of the time) to Verification (proving it cannot fail).
The testing framework is organized into 6 Phases plus 3 Correctness Layers:
| Phase | Name | Purpose | Tools |
|---|
| 1 | Oracle Testing | Compare fast engine against simple reference | Python + Rust |
| 2 | Property-Based Testing | Fuzz invariants with random inputs | proptest |
| 3 | Deterministic Simulation | Replay verification, hash comparison | Journal + SHA-256 |
| 4 | Performance Testing | Latency p99, jitter, flamegraphs | criterion |
| 5 | Chaos Engineering | Crash recovery, flood resilience | kill -9, load gen |
| 6 | Adversarial Testing | Malicious actors, exploit attempts | Evil bots |
| Layer | Name | Purpose |
|---|
| A | Type-State Correctness | Make invalid states unrepresentable |
| B | Financial Safety | Double-entry ledger, liability bounds |
| C | Formal Verification | Mathematical proofs of properties |
Testing Architecture Overview
Phase 1: Oracle Testing (Reference Model)
Concept
The Oracle Pattern compares your optimized Rust engine against a "golden source" - a simple, obviously-correct reference implementation.
What to Compare
| Property | Slow Engine | Fast Engine | Must Match |
|---|
| Best Bid Price | max(bids) | BTreeMap.last() | ✓ Exact |
| Best Ask Price | min(asks) | BTreeMap.first() | ✓ Exact |
| Total Volume | Sum all orders | Cached counter | ✓ Exact |
| Order Count | len(orders) | Index size | ✓ Exact |
| Match Results | List of fills | List of fills | ✓ Exact |
| Order State | Dict lookup | Index lookup | ✓ Exact |
Reference Engine Design
Phase 2: Property-Based Testing (Invariants)
The Laws of Physics
These invariants must never be violated, regardless of input sequence:
Invariant Test Matrix
| Invariant | Trigger Scenario | Detection Method |
|---|
| Crossed Book | Aggressive order crosses spread | Assert best_bid < best_ask post-op |
| Conservation | Any add/cancel/match | Track running totals, compare |
| ID Uniqueness | Rapid order submission | HashSet membership check |
| Price-Time Priority | Same-price orders | Verify fill order = insertion order |
| Probability Bounds | Extreme prices | Clamp check in type constructor |
| Self-Trade | Same user crossing | Check user_id before match |
Fuzzer Strategy
Phase 3: Deterministic Simulation (Replay Testing)
The Time Machine
If you feed the exact same inputs in the same order, you must get the exact same state - down to the byte.
Sources of Non-Determinism
| Source | Problem | Solution |
|---|
HashMap iteration | Random order | Use BTreeMap |
SystemTime::now() | Wall clock in logic | Pass timestamp as input |
| Thread scheduling | Race conditions | Single-threaded core |
| Floating point | CPU-specific rounding | Use fixed-point integers |
| Random numbers | Different seeds | Seed from sequence number |
| Compiler optimization | Instruction reordering | Pin compiler version |
State Hashing Strategy
Cross-Machine Determinism Test
Latency Targets
| Metric | Target | Critical | Measurement Point |
|---|
| p50 Latency | < 10µs | < 50µs | Order → Ack |
| p99 Latency | < 50µs | < 200µs | Order → Ack |
| p99.9 Latency | < 100µs | < 500µs | Order → Ack |
| Tick-to-Trade | < 20µs | < 100µs | Receive → Fill Event |
| Jitter (σ) | < 5µs | < 20µs | Std dev of latency |
Benchmark Architecture
Benchmark Scenarios
Phase 5: Chaos Engineering
Failure Scenarios
Power Plug Test Protocol
Flood Test Protocol
Phase 6: Adversarial Testing
Evil Bot Taxonomy
Agent-Based Simulation
Correctness Layer A: Type-State Pattern
Making Invalid States Unrepresentable
The Type-State pattern uses Rust's type system to ensure that invalid state transitions are compile-time errors, not runtime bugs.
Type-State Implementation
Market State Enforcement
What Type-State Prevents
| Bug | Without Type-State | With Type-State |
|---|
| Cancel filled order | Runtime check, might miss | Won't compile |
| Match in suspended market | Boolean flag check | Type mismatch error |
| Double settlement | Flag check | Method doesn't exist |
| Order on closed market | State validation | Type prevents |
Correctness Layer B: Financial Safety
Double-Entry Ledger
Every operation must balance. No money is ever created or destroyed.
Worst-Case Liability Bounding
Settlement Idempotency
Proof Hierarchy
What Kani Proves
| Property | Kani Assertion | Failure Mode |
|---|
| No overflow | price * quantity safe | Integer overflow |
| No panic | unwrap() always succeeds | None unwrap on None |
| No OOB | Array index in bounds | Index out of bounds |
| Termination | Loops terminate | Infinite loop |
Overflow Analysis Points
Bet Type & Market Correctness Matrix
Market Types
Settlement Test Matrix
| Market Type | Outcome | Settlement Logic | Test Scenario |
|---|
| Binary | Yes wins | Pay all Yes backers | Full payout |
| Binary | No wins | Pay all No backers | Full payout |
| 1X2 | Home wins | Pay Home backers | One of three |
| 1X2 | Void | Refund all | Event cancelled |
| Asian -0.5 | Win by 1+ | Full win | Clear outcome |
| Asian -0.5 | Draw | Full loss | Push scenario |
| Asian -0.25 | Win by 1 | Half win | Split stake |
| Over 2.5 | 3 goals | Over wins | Threshold crossed |
| Over 2.5 | 2 goals | Under wins | Below threshold |
Void/Push Decision Tree
Price-Time Priority Verification
The Fairness Guarantee
If Order A arrives before Order B at the same price,
then Order A is always matched first — under all cancel/modify scenarios.
Modify Semantics
| Modification | Loses Priority? | Rationale |
|---|
| Price change (any) | ✅ Yes | New price = new queue position |
| Quantity increase | ✅ Yes | Advantage over earlier orders |
| Quantity decrease | ❌ No | No advantage gained |
| Metadata only | ❌ No | No market impact |
Priority Fuzzing Test
Auction State Correctness
Market States and Allowed Operations
Auction Invariants
| Invariant | Description | Test |
|---|
| No matching in auction | Orders rest, no fills | Submit crossing orders, verify no fills |
| Uncrossing at auction end | Single price clears | Verify all fills at equilibrium price |
| Order priority preserved | Earlier orders first | Check fill order |
| Indicative price correct | Maximizes volume | Compare with calculated equilibrium |
Test Directory Structure
CI/CD Pipeline Integration
Operational Safeguards
Kill Switch Correctness
Config Drift Detection
Liveness Proofs
Beyond safety (nothing bad happens), we must prove liveness (good things eventually happen):
| Property | Definition | Test Method |
|---|
| Order eventually matches or cancels | No order stuck forever | Timeout simulation |
| Market eventually settles | No perpetual open markets | State machine analysis |
| No deadlocks | System always makes progress | Model checking (TLA+) |
| No livelocks | Not spinning without progress | Kani loop analysis |
Complete Test Checklist
Matching Engine
| Test | Type | Status | Priority |
|---|
| Oracle comparison | Phase 1 | 🔲 | P0 |
| Property fuzzing | Phase 2 | ✅ Exists | P0 |
| Price-time priority proofs | Phase 2 | 🔲 | P0 |
| Modify semantics invariants | Phase 2 | 🔲 | P1 |
| Auction state enforcement | Phase 2 | 🔲 | P1 |
| Replay determinism | Phase 3 | 🔲 | P0 |
| Cross-machine determinism | Phase 3 | 🔲 | P1 |
| Latency benchmarks | Phase 4 | ✅ Exists | P1 |
| Crash recovery | Phase 5 | ✅ Exists | P0 |
| Flood testing | Phase 5 | 🔲 | P1 |
| Adversarial bots | Phase 6 | 🔲 | P2 |
Financial Safety
| Test | Type | Status | Priority |
|---|
| Double-entry ledger invariant | Layer B | 🔲 | P0 |
| Worst-case liability bounds | Layer B | 🔲 | P0 |
| Settlement idempotency | Layer B | 🔲 | P0 |
| Insurance fund drain test | Chaos | 🔲 | P1 |
| Test | Type | Status | Priority |
|---|
| Type-state transitions | Layer A | 🔲 | P1 |
| Kani overflow proofs | Layer C | 🔲 | P0 |
| Kani panic-freedom | Layer C | 🔲 | P0 |
| Liveness (no deadlock) | Layer C | 🔲 | P2 |
Market/Bet Coverage
| Market Type | Direct Match | Synthetic | Settlement | Void/Push |
|---|
| Binary | ✅ | N/A | ✅ | 🔲 |
| 1X2 | ✅ | ✅ | 🔲 | 🔲 |
| Asian Handicap | ✅ | N/A | 🔲 | 🔲 |
| Over/Under | 🔲 | N/A | 🔲 | 🔲 |
| Multi-Runner | 🔲 | 🔲 | 🔲 | 🔲 |
| Outright | 🔲 | 🔲 | 🔲 | 🔲 |
Summary: The Testing Hierarchy
| Tool | Purpose | Command |
|---|
| cargo test | Unit & integration tests | cargo test |
| proptest | Property-based fuzzing | Built into test suite |
| criterion | Micro-benchmarks | cargo bench |
| cargo flamegraph | CPU profiling | cargo flamegraph --bin test_ui_server |
| Kani | Formal verification | cargo kani --harness <name> |
| cargo fuzz | Coverage-guided fuzzing | cargo +nightly fuzz run <target> |
Document Version: 1.0
Last Updated: 2025-12-27