Roadmap#

This roadmap outlines planned features, ideas under exploration, and long-term goals for penaltyblog.

It’s not a guarantee, but a guide - contributions, feedback, and suggestions are welcome!

πŸ”œ Planned#

MatchFlow#

Usability + Helper Expansion

  • ☐ General speed optimisations + cythonization to make faster

  • ☐ More where_ and get_ helpers

  • ☐ Flow.describe() improvements

  • ☐ Docs: Writing custom helpers tutorial

  • ☐ Docs: More Flow recipes

  • ☐ Add plugin interface to make it easy to add in other data providers

  • β˜‘ Progress bars

  • β˜‘ Custom query DSL for natural quering - flow.query("player.name == 'Kevin de Bruyne'")

  • β˜‘ Optimization of internal DAG plan

Joins & I/O Enhancements

  • ☐ Join-on-multiple-fields support

  • ☐ Benchmarks page in docs

  • ☐ Parallel loading of files

Rolling & Windowed Aggregates

  • β˜‘ .rolling(...) and .expanding(...) on grouped flows

  • ☐ Support for rolling summary fields like moving average xG

Plotting#

  • β˜‘ Publish penaltyblog plotting library

  • β˜‘ Native support for plotting Flow pipelines

Models#

  • ☐ Bring the Bayesian models back to the party

  • ☐ Add new models based on time-series approaches

  • ☐ Pre-trained models, e.g. xT

  • ☐ Updated player ratings model

Scrapers#

  • ☐ Give scraper module an overhaul to make it more efficient and easier to use

  • ☐ Add support for new data sources such as Sofa Score

  • ☐ Add automatic throttling to avoid overloading servers

  • ☐ Hook up scrapers to MatchFlow

  • ☐ Caching of scraped data sources

General#

  • ☐ Refresh / expand rest of documentation


πŸ§ͺ Under Exploration#

These are bigger ideas I’m researching - feedback welcome!

MatchFlow#

  • FlowZ: A custom binary format for fast I/O on nested JSON

  • Partitioning of large datasets for faster processing

  • Built-in indexing or predicate pushdown

  • Streaming joins for large datasets

  • A lightweight visual data explorer (maybe based on my upcoming plotting library)

  • Declarative YAML/JSON pipeline definitions.

  • Pluggable transforms (e.g. xT, formation_detection, pressing_zones)

Models#

  • Custom Bayesian library focussed on building sports models without depenency hassles


Contributing#

If you’re interested in helping with anything here, feel free to open an issue, submit a PR, or just reach out.