===============================
Using Flow with StatsBomb Data
===============================
.. raw:: html
Flow includes a built-in integration with the StatsBomb API, making it easy to stream structured football data directly into your pipelines.
Rather than loading everything upfront, Flow wraps the API as **lazy operations** - each call builds a plan that fetches the data only when needed (e.g., on ``.collect()`` or ``.to_pandas()``).
โ๏ธ Setup
========
Ensure your **StatsBomb credentials** are set as environment variables if you're using private access:
.. code-block:: bash
export SB_USERNAME="your_username"
export SB_PASSWORD="your_password"
๐ Getting Started
==================
.. code-block:: python
from penaltyblog.matchflow import Flow
# Fetch all competitions
comps = Flow.statsbomb.competitions()
for comp in comps.head(3):
print(comp)
All API calls return a ``Flow``, so you can apply all usual transformations like ``.filter()``, ``.select()``, ``.assign()``, etc.
๐ Available Endpoints
======================
+-----------------------------------------------------+------------------------------------+
| Method | Description |
+=====================================================+====================================+
| ``.competitions()`` | All competitions available via API |
+-----------------------------------------------------+------------------------------------+
| ``.matches(competition_id, season_id)`` | Matches for a specific season |
+-----------------------------------------------------+------------------------------------+
| ``.events(match_id)`` | All events in a match |
+-----------------------------------------------------+------------------------------------+
| ``.lineups(match_id)`` | Lineups and formation for a match |
+-----------------------------------------------------+------------------------------------+
| ``.player_match_stats(match_id)`` | Player-level stats for a match |
+-----------------------------------------------------+------------------------------------+
| ``.player_season_stats(competition_id, season_id)`` | Player stats over a season |
+-----------------------------------------------------+------------------------------------+
| ``.team_match_stats(match_id)`` | Team stats for a match |
+-----------------------------------------------------+------------------------------------+
| ``.team_season_stats(competition_id, season_id)`` | Team stats over a season |
+-----------------------------------------------------+------------------------------------+
All of these return a lazy Flow
๐งช Example: Shots in a Match
============================
.. code-block:: python
from penaltyblog.matchflow import Flow, where_equals
shots = (
Flow.statsbomb.events(match_id=19716)
.filter(where_equals("type.name", "Shot"))
.select("player.name", "location", "shot.outcome.name")
)
for shot in shots.head(3):
print(shot)
๐งผ Filtering & Transforming
===========================
Because Flow supports deep access to nested fields, you can work directly with StatsBomb's JSON structure without needing to flatten first:
.. code-block:: python
from penaltyblog.matchflow import Flow, where_equals
top_scorers = (
Flow.statsbomb.player_season_stats(competition_id=43, season_id=106)
.filter(lambda r: r["goals"] >= 5)
.select("player.name", "team.name", "goals")
)
๐ข Lazy Until Needed
====================
Remember, nothing is downloaded or processed until you **materialize the flow**:
- ``.collect()`` โ fetches all records
- ``.to_pandas()`` โ fetches and converts to DataFrame
- ``.head(n)`` โ fetches just the first n records
.. code-block:: python
df = Flow.statsbomb.competitions().to_pandas()
print(df)
๐ Authenticated Access
=======================
All API methods accept a creds dictionary, or you can use environment variables:
.. code-block:: python
Flow.statsbomb.events(match_id=123, creds={"user": "...", "passwd": "..."})
๐ง Tips
=======
- Useful for clubs or analysts already using StatsBomb data
- Flows can be joined with your internal data or flattened and saved
- Try ``.flatten().to_jsonl()`` to export clean JSONL for later
๐ Summary
==========
Flow's StatsBomb integration:
- โ
Keeps your data structured
- โ
Streams on demand (not loaded eagerly)
- โ
Integrates with full Flow pipeline tools
- โ
Works with both open and authenticated endpoints
Interactive Example
--------------------
For a comprehensive, hands-on demonstration of working with StatsBomb data, try the interactive Colab notebook.
The notebook walks you through loading data from the StatsBomb API, filtering it, and visualizing the results.
You can modify the code, experiment with different parameters, and see how the data change in real-time.
.. raw:: html