Back to portfolio
AI-Assisted PoCSports Analytics · PythonPaper Trading

TenisMonster:
Quantitative Analytics
& Paper Trading PoC

A personal experiment: what happens when you apply serious product architecture thinking to a fully AI-generated stack? The math, the business logic, the operator UI — all human decisions. The Python infrastructure, Docker setup, CI/CD — all AI-generated. The question was whether that split could produce something actually deployable. It did.

Python 3.12StreamlitDockerGitHub Actions CI/CDHetzner CloudThe Odds API

Executive Summary

Honestly, this started from a conversation with my sister. She follows professional tennis seriously. We got into the usual debate about whether top-level matches are actually predictable or just glorified coin flips — and I realized I didn't have a rigorous answer. So instead of arguing, I built the infrastructure to find out. Beyond the tennis question, I was also curious about a second one: how far can AI-assisted development actually go when the person directing it understands the domain deeply? The result is TenisMonster: a full-stack ATP tennis prediction system built entirely through AI Prompting. Seven years of historical match data, live market odds, and ten competing statistical models running through a shared pipeline — with Kelly criterion staking, automated paper-bet registration, and 24/7 Docker deployment on cloud infrastructure. Every architectural decision, every model selection, every UI design choice was made deliberately. Every line of Python and Docker was AI-generated through structured prompting. The experiment answered the question: yes, it works — and the bottleneck is never the code generation.

10

Statistical Models

ensemble

7

Years of ATP Data

Sackmann CSVs

~67%

OOS Accuracy

Log-loss 0.60

24/7

Deployment

Hetzner Cloud

System Architecture

Four-Layer Pipeline Design

Each layer has a single responsibility and a defined output contract. Data flows one-way from ingestion to stake registration — no layer reaches back up the chain.

01

Data & Enrichment Layer

Ingests and normalizes seven years of Sackmann ATP CSVs — match results, player rankings, surface history — alongside live market odds, Open-Meteo weather conditions at match location, and operator-annotated injury/motivation signals. All sources are reconciled into a single event object before reaching the modeling layer.

ATP CSVsLive OddsWeather APIOperator Signals
02

The Modeling Engine

A 10-model ensemble runs in parallel on each match event. Models range from surface-specific Elo with opponent-adjusted metrics and Bayesian head-to-head shrinkage, to a closed-form Monte Carlo service-game simulation that operates at the point level — translating granular statistical structure into match win probabilities.

Surface EloH2H BayesianMonte CarloEnsemble Voting
03

Risk & Value Logic

Raw model probabilities are converted to implied edges against market odds. A strict Kelly criterion staking algorithm sizes positions, capped by configurable bankroll limits. Multi-layer safety filters screen for value traps, line movement anomalies, and sharp money indicators before any bet is registered.

Kelly CriterionCLV TrackingLine MovementSafety Filters
04

Ops & Automation

A 24/7 background scheduler deployed on Hetzner via Docker Compose handles automated paper-bet registration and 4-hourly result resolution. GitHub Actions manages CI/CD — linting, testing, and container rebuilds on every push to main. The system runs headlessly with no manual intervention required.

Docker ComposeGitHub ActionsHetzner VPS4h Resolution

Operator Interface

Streamlit Control Center

The operator UI was architected and designed as a Streamlit control center — with the interface structure, information hierarchy, and UX logic defined as the human contribution, and the implementation generated through AI prompting. The dashboard allows non-technical stakeholders to review model recommendations, inject human or LLM injury signals that override statistical baselines, and track ROI and Closing Line Value (CLV) in real-time.

Model Recommendations

Per-match ensemble consensus with confidence intervals

Signal Override

LLM or manual injury/motivation flag injection

Live P&L Tracking

Real-time ROI, CLV, and stake history dashboards

pipeline.py — simplified
# Signal injection overrides statistical model output
def resolve_match(event: MatchEvent) -> Recommendation:
    base_prob   = ensemble.predict(event)          # 10-model average
    adj_prob    = signal_layer.apply(base_prob,
                    event.operator_flags)          # LLM / human override
    edge        = market.implied_edge(adj_prob,
                    event.live_odds)
    stake       = kelly.size(edge,
                    config.bankroll_fraction)
    return Recommendation(edge=edge, stake=stake,
                    confidence=adj_prob)

Senior-Level Analysis

Friction, Limitations & Learnings

Shipping this with a polished summary and zero caveats would be misleading. Here is what the data and the architecture actually revealed.

Market Reality vs. Code

The system reliably identifies positive expected value in historical and shadow data. However, connecting to live bookmaker execution APIs introduces regulatory constraints, KYC requirements, and latency complexities entirely outside the scope of a rapid PoC. Shipping to real capital would demand a separate compliance and execution architecture layer — a deliberately deferred problem.

Model Variance at the Extremes

Out-of-sample backtesting stabilized at ~67% match prediction accuracy (Log-loss 0.60), which is competitive for open-odds markets. The ensemble, however, degrades sharply on extreme underdog events and tour-level outliers. Human override flags for fatigue and motivation exist in the UI precisely because no model can quantify a player mentally withdrawing mid-tournament.

Third-Party Data Fragility

The pipeline's scheduling logic is tightly coupled to The Odds API's rate limits, response structure, and coverage decisions. Any schema change or quota reduction propagates as a silent failure upstream. This was the sharpest architectural lesson: robust product infrastructure must treat every external dependency as a liability with its own failure budget, not a guaranteed utility.

What This Demonstrates

The hard part was never the code. Deciding which models to run, how to weight the ensemble, what data sources to trust, how to size stakes responsibly — those are judgment calls. AI can't make them for you.

Quantitative reasoning is a design skill. Defining Kelly criterion parameters, choosing between Bayesian shrinkage approaches, deciding when Monte Carlo simulation adds signal vs. noise — that's architecture, not math homework.

AI compressed the timeline dramatically. Infrastructure that would have taken weeks to write from scratch was generated in hours. The freed bandwidth went entirely into model refinement and edge case validation.

Shipping to real infrastructure was a deliberate choice. It's easy to claim something "works" in a notebook. Deploying on a cloud VPS, setting up CI/CD, running 24/7 — that's where the real constraints surface.