MatchFlow
==========
.. raw:: html
**MatchFlow** is a lightweight toolkit for working with structured football data, especially nested JSON like StatsBomb event files or match-level logs. Whether you're building quick explorations or full pipelines, MatchFlow helps you work directly with deeply structured data using a clean, lazy, and chainable API.
What is MatchFlow?
------------------
Flow is not a DataFrame, it's a **stream-first query engine** built for irregular, event-based football data.
You can:
- Load JSON, JSONL, or entire folders of match data
- Filter and transform records lazily with ``.filter()``, ``.assign()``, ``.select()``
- Group and summarize using ``.group_by()`` + ``.summary()``
- Join datasets, explode lists, split arrays, pivot rows
- Work with nested data without flattening too early
- Chain steps fluently, materialize only when ready
- Filtering using string expressions, like ``"age > 30 and team == @team_name"``
- Stream data directly from the StatsBomb or Opta APIs
All transformations are **lazy**; nothing runs until you ask for results with ``.collect()``, ``.to_pandas()``, ``.to_jsonl()`` etc.
Interactive Examples
--------------------
For a comprehensive, hands-on demonstration of the Matchflow, try the interactive Colab notebook.
The notebook walks you downloading data directly from the Statsbomb API (including Statsbomb's free, open data sets),
building data pipelines, and creating interactive vizualisations using ``penaltyblog``'s ``Pitch`` plotting library.
You can modify the code, experiment with different parameters, and see how the data changes in real-time.
.. raw:: html
Guide Index
-----------
.. list-table:: Guide Index
:widths: 25 75
:header-rows: 1
* - Section
- Description
* - :doc:`why`
- Why working with nested football data needs a new tool
* - :doc:`introduction`
- Introduction to MatchFlow
* - :doc:`basic_pipeline`
- Filtering, assigning, selecting, and shaping your data
* - :doc:`grouping_and_aggregating`
- Summarizing by team, player, period, and more
* - :doc:`advanced`
- Sorting, ranking, joining and deduplicating
* - :doc:`schema`
- Schema inference, casting, and field validation
* - :doc:`file_io`
- Working with JSON, JSONL, folders, glob patterns
* - :doc:`inspection`
- Exploring structure, peeking at records, debugging
* - :doc:`best_practices`
- Materialization, memory, performance, clean code
* - :doc:`predicates`
- Reusable filters like ``where_equals()``, ``and_()``
* - :doc:`query`
- Filtering using string expressions, like ``"age > 30 and team == @team_name"``
* - :doc:`optimizer`
- Smart plan rewrites for faster execution
* - :doc:`statsbomb`
- Streaming data directly from the StatsBomb API
* - :doc:`opta`
- Streaming data directly from the Opta API
Quick Start
------------
.. code-block:: python
from penaltyblog.matchflow import Flow, where_equals
# Load and filter StatsBomb shots
flow = (
Flow.statsbomb.events(match_id=19716)
.filter(where_equals("type.name", "Shot"))
.select("player.name", "location", "shot.statsbomb_xg")
)
for shot in flow.head(5):
print(shot)
Ready to Flow?
--------------
Pick a section from the guide above, or jump in with ``.from_jsonl()``, ``.from_folder()``, or ``.statsbomb.events()`` and start building your pipeline.
Need help? Ask questions, file issues, or suggest improvements any time.
.. toctree::
:hidden:
why
introduction
basic_pipeline
grouping_and_aggregating
advanced
schema
optimizer
file_io
inspection
best_practices
predicates
query
statsbomb
opta