tennis-predictions

Comprehensive tennis match-prediction system powered by historical data, bookmaker odds, and machine learning. Designed to run as a Streamlit web application; data pipelines operate autonomously on GitHub Actions.

This repository originally took inspiration from LewisWJackson's tennis predictor, but has evolved substantially with new data sources, caching layers, and a modern UI.

📦 Key Features

Historical data (2020–present) built from TennisMyLife and tennis-data.co.uk; odds matched via intelligent name normalisation (81.5% success rate).
Live pre-match odds fetched from Matchstat RapidAPI with per-day caching and a 500-call/month budget guard.
Full feature engineering pipeline generating ELO, serve stats, surface form, H2H counts, and market probabilities. Built daily via GitHub Actions.
Streamlit UI with three tabs:
1. Today's Matches (live odds, ELO, market value)
2. Match Explorer (filterable historical dataset)
3. ELO Rankings (overall and surface leaderboards)
Automated update workflow (.github/workflows/update_data.yml) downloads latest matches and rebuilds features, committing changes back to main.
MIT-licensed data sources: TennisMyLife (1968–present) and tennis-data.co.uk (odds 2020–2025). All code is permissively licensed.

🚀 Getting Started

Clone this repo and create a Python 3.11 venv.
Install dependencies:
```
pip install -r requirements.txt
```
Populate keys:
- ODDS_API_KEY for The Odds API (optional; historical odds join).
- RAPIDAPI_KEY for Matchstat tennis API; place in .env or .streamlit/secrets.toml (required for live odds).

Run initial data prep:

python update_tml_data.py       # download current-year TML files
python features.py              # build feature matrix (2020+)

Train or update the prediction model (optional but required for model probabilities & betting edge shown in the UI):
```
python train.py                # trains & saves best model; metrics printed
```
The training script evaluates accuracy, AUC‑ROC, Brier score, and log loss; results are stored in data_files/tennis_predictor.pkl and displayed in the "Model Stats" tab of the app.
Start the app:
```
streamlit run predictions.py
```
- Today's Matches tab now shows market odds, model win probabilities, and a green-highlighted "edge" column when the model's probability exceeds the market's implied probability.
- Cells remain blank when neither player has ATP main-tour history (e.g. futures/ITF events).
Deploy to Streamlit Cloud by connecting this repo; the GitHub Action will keep data fresh each morning.

📁 Repository Structure

tennis-predictions/
├── data_files/                 # intermediate and output datasets
│   ├── features_2020_present.parquet  # feature matrix used by app
│   └── *.xlsx                   # raw tennis-data.co.uk downloads
├── docs/                       # design and reference documentation
├── tml-data/                   # TennisMyLife CSVs + enriched odds
├── matchstat_api.py            # client with caching & budget tracking
├── features.py                 # feature engineering pipeline
├── update_tml_data.py          # daily TML downloader
├── enrich_with_odds.py         # join tennis-data.co.uk odds onto TML
├── predictions.py              # Streamlit application
└── .github/workflows/update_data.yml  # scheduled data-refresh CI

🛠 CI / Automation

A GitHub Action (update_data.yml) runs daily at 05:00 UTC to:

Refresh current-year TML files.
Re-run features.py to rebuild features_2020_present.parquet.
Commit and push any changed data.

📚 Documentation

See the docs/ folder for deeper guides — data acquisition, feature engineering, odds integration, and more. Start with docs/01_roadmap.md.

Happy coding and may your nets be full of aces! 🎾

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
.github		.github
.streamlit		.streamlit
data_files		data_files
docs		docs
scripts		scripts
tml-data		tml-data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_test_edge.py		_test_edge.py
bzzoiro_api.py		bzzoiro_api.py
compute_backlog.py		compute_backlog.py
enrich_with_odds.py		enrich_with_odds.py
features.py		features.py
fetch_odds_api.py		fetch_odds_api.py
fetch_today_odds.py		fetch_today_odds.py
flashscore_odds.py		flashscore_odds.py
footer.py		footer.py
ingest_tennis_data_co_uk.py		ingest_tennis_data_co_uk.py
matchstat_api.py		matchstat_api.py
merge_2025_data.py		merge_2025_data.py
predict.py		predict.py
predictions.py		predictions.py
requirements.txt		requirements.txt
scraper_atp.py		scraper_atp.py
scraper_flashscore.py		scraper_flashscore.py
scraper_itf.py		scraper_itf.py
scraper_tennis_abstract.py		scraper_tennis_abstract.py
train.py		train.py
update_tml_data.py		update_tml_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tennis-predictions

📦 Key Features

🚀 Getting Started

📁 Repository Structure

🛠 CI / Automation

📚 Documentation

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tennis-predictions

📦 Key Features

🚀 Getting Started

📁 Repository Structure

🛠 CI / Automation

📚 Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages