TenisMonster:
Quantitative Analytics
& Paper Trading PoC
A personal experiment: what happens when you apply serious product architecture thinking to a fully AI-generated stack? The math, the business logic, the operator UI — all human decisions. The Python infrastructure, Docker setup, CI/CD — all AI-generated. The question was whether that split could produce something actually deployable. It did.
Executive Summary
Honestly, this started from a conversation with my sister. She follows professional tennis seriously. We got into the usual debate about whether top-level matches are actually predictable or just glorified coin flips — and I realized I didn't have a rigorous answer. So instead of arguing, I built the infrastructure to find out. Beyond the tennis question, I was also curious about a second one: how far can AI-assisted development actually go when the person directing it understands the domain deeply? The result is TenisMonster: a full-stack ATP tennis prediction system built entirely through AI Prompting. Seven years of historical match data, live market odds, and ten competing statistical models running through a shared pipeline — with Kelly criterion staking, automated paper-bet registration, and 24/7 Docker deployment on cloud infrastructure. Every architectural decision, every model selection, every UI design choice was made deliberately. Every line of Python and Docker was AI-generated through structured prompting. The experiment answered the question: yes, it works — and the bottleneck is never the code generation.
10
Statistical Models
ensemble
7
Years of ATP Data
Sackmann CSVs
~67%
OOS Accuracy
Log-loss 0.60
24/7
Deployment
Hetzner Cloud
System Architecture
Four-Layer Pipeline Design
Each layer has a single responsibility and a defined output contract. Data flows one-way from ingestion to stake registration — no layer reaches back up the chain.
Data & Enrichment Layer
Ingests and normalizes seven years of Sackmann ATP CSVs — match results, player rankings, surface history — alongside live market odds, Open-Meteo weather conditions at match location, and operator-annotated injury/motivation signals. All sources are reconciled into a single event object before reaching the modeling layer.
The Modeling Engine
A 10-model ensemble runs in parallel on each match event. Models range from surface-specific Elo with opponent-adjusted metrics and Bayesian head-to-head shrinkage, to a closed-form Monte Carlo service-game simulation that operates at the point level — translating granular statistical structure into match win probabilities.
Risk & Value Logic
Raw model probabilities are converted to implied edges against market odds. A strict Kelly criterion staking algorithm sizes positions, capped by configurable bankroll limits. Multi-layer safety filters screen for value traps, line movement anomalies, and sharp money indicators before any bet is registered.
Ops & Automation
A 24/7 background scheduler deployed on Hetzner via Docker Compose handles automated paper-bet registration and 4-hourly result resolution. GitHub Actions manages CI/CD — linting, testing, and container rebuilds on every push to main. The system runs headlessly with no manual intervention required.
Operator Interface
Streamlit Control Center
The operator UI was architected and designed as a Streamlit control center — with the interface structure, information hierarchy, and UX logic defined as the human contribution, and the implementation generated through AI prompting. The dashboard allows non-technical stakeholders to review model recommendations, inject human or LLM injury signals that override statistical baselines, and track ROI and Closing Line Value (CLV) in real-time.
Model Recommendations
Per-match ensemble consensus with confidence intervals
Signal Override
LLM or manual injury/motivation flag injection
Live P&L Tracking
Real-time ROI, CLV, and stake history dashboards
# Signal injection overrides statistical model output
def resolve_match(event: MatchEvent) -> Recommendation:
base_prob = ensemble.predict(event) # 10-model average
adj_prob = signal_layer.apply(base_prob,
event.operator_flags) # LLM / human override
edge = market.implied_edge(adj_prob,
event.live_odds)
stake = kelly.size(edge,
config.bankroll_fraction)
return Recommendation(edge=edge, stake=stake,
confidence=adj_prob)Senior-Level Analysis
Friction, Limitations & Learnings
Shipping this with a polished summary and zero caveats would be misleading. Here is what the data and the architecture actually revealed.
Market Reality vs. Code
The system reliably identifies positive expected value in historical and shadow data. However, connecting to live bookmaker execution APIs introduces regulatory constraints, KYC requirements, and latency complexities entirely outside the scope of a rapid PoC. Shipping to real capital would demand a separate compliance and execution architecture layer — a deliberately deferred problem.
Model Variance at the Extremes
Out-of-sample backtesting stabilized at ~67% match prediction accuracy (Log-loss 0.60), which is competitive for open-odds markets. The ensemble, however, degrades sharply on extreme underdog events and tour-level outliers. Human override flags for fatigue and motivation exist in the UI precisely because no model can quantify a player mentally withdrawing mid-tournament.
Third-Party Data Fragility
The pipeline's scheduling logic is tightly coupled to The Odds API's rate limits, response structure, and coverage decisions. Any schema change or quota reduction propagates as a silent failure upstream. This was the sharpest architectural lesson: robust product infrastructure must treat every external dependency as a liability with its own failure budget, not a guaranteed utility.
What This Demonstrates
The hard part was never the code. Deciding which models to run, how to weight the ensemble, what data sources to trust, how to size stakes responsibly — those are judgment calls. AI can't make them for you.
Quantitative reasoning is a design skill. Defining Kelly criterion parameters, choosing between Bayesian shrinkage approaches, deciding when Monte Carlo simulation adds signal vs. noise — that's architecture, not math homework.
AI compressed the timeline dramatically. Infrastructure that would have taken weeks to write from scratch was generated in hours. The freed bandwidth went entirely into model refinement and edge case validation.
Shipping to real infrastructure was a deliberate choice. It's easy to claim something "works" in a notebook. Deploying on a cloud VPS, setting up CI/CD, running 24/7 — that's where the real constraints surface.
Explore more work