{ "cells": [ { "cell_type": "markdown", "id": "532518b0", "metadata": {}, "source": [ "# 📍 Recipe: Statsbomb API Integration\n", "\n", "This example shows how to connect directly to the official StatsBomb API using `Flow`, then query, filter, and process match data with minimal boilerplate.\n", "\n", "`Flow.statsbomb` wraps the `statsbombpy` client to give you fast, flexible access to event, lineup, and 360 data - in a format ready for analysis.\n", "\n", "## 🧰 In This Recipe, You’ll Learn:\n", "\n", "- How to access StatsBomb data using `Flow.statsbomb` methods\n", "- How to query nested JSON records without flattening\n", "- How to filter, transform, and prepare data for analysis or storage\n", "- How to use `Flow` as an ETL tool to move data from API to file or database\n", "\n", "This is the easiest way to stream structured football data into your own pipelines.\n", "\n", "## Imports" ] }, { "cell_type": "code", "execution_count": 1, "id": "3b63462f", "metadata": {}, "outputs": [], "source": [ "from pprint import pprint\n", "import sqlite3\n", "import warnings\n", "\n", "from penaltyblog.matchflow import Flow, where_equals, get_field, get_index\n", "from statsbombpy.api_client import NoAuthWarning\n", "\n", "# Suppress Statsbomb's NoAuthWarning warnings since we're using the open data\n", "warnings.filterwarnings(\"ignore\", category=NoAuthWarning)" ] }, { "cell_type": "markdown", "id": "ef67e3c5", "metadata": {}, "source": [ "## Competitions\n", "\n", "Get the first competition listed for Italy." ] }, { "cell_type": "code", "execution_count": 2, "id": "3df59121", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'competition_gender': 'male',\n", " 'competition_id': 12,\n", " 'competition_international': False,\n", " 'competition_name': 'Serie A',\n", " 'competition_youth': False,\n", " 'country_name': 'Italy',\n", " 'match_available': '2024-06-25T23:56:11.910924',\n", " 'match_available_360': None,\n", " 'match_updated': '2024-06-25T23:56:11.910924',\n", " 'match_updated_360': None,\n", " 'season_id': 27,\n", " 'season_name': '2015/2016'}\n" ] } ], "source": [ "result = (\n", " Flow.statsbomb.competitions()\n", " .filter(where_equals(\"country_name\", \"Italy\"))\n", " .collect()\n", ")\n", "\n", "pprint(result[0])" ] }, { "cell_type": "markdown", "id": "54943bde", "metadata": {}, "source": [ "## Matches\n", "\n", "Get the matches for a given competition and season and filter to specific nested fields using \"dot\" notation." ] }, { "cell_type": "code", "execution_count": 3, "id": "a059ab28", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'away_team': {},\n", " 'away_team_name': 'Kerala Blasters',\n", " 'competition': {},\n", " 'competition_name': 'Indian Super league',\n", " 'home_team': {},\n", " 'home_team_name': 'Hyderabad',\n", " 'referee': {'country': {}},\n", " 'referee_country_name': 'India',\n", " 'referee_name': 'Crystal John'},\n", " {'away_team': {},\n", " 'away_team_name': 'Jamshedpur',\n", " 'competition': {},\n", " 'competition_name': 'Indian Super league',\n", " 'home_team': {},\n", " 'home_team_name': 'Kerala Blasters',\n", " 'referee': {'country': {}},\n", " 'referee_country_name': 'India',\n", " 'referee_name': 'Harish Kundu'},\n", " {'away_team': {},\n", " 'away_team_name': 'Hyderabad',\n", " 'competition': {},\n", " 'competition_name': 'Indian Super league',\n", " 'home_team': {},\n", " 'home_team_name': 'ATK Mohun Bagan',\n", " 'referee': {'country': {}},\n", " 'referee_country_name': 'India',\n", " 'referee_name': 'Ramachandran Venkatesh'}]\n" ] } ], "source": [ "result = (\n", " Flow.statsbomb.matches(competition_id=1238, season_id=108)\n", " .select(\n", " \"competition.competition_name\",\n", " \"home_team.home_team_name\",\n", " \"away_team.away_team_name\",\n", " \"referee.name\",\n", " \"referee.country.name\",\n", " )\n", " .rename(\n", " **{\n", " \"competition.competition_name\": \"competition_name\",\n", " \"home_team.home_team_name\": \"home_team_name\",\n", " \"away_team.away_team_name\": \"away_team_name\",\n", " \"referee.name\": \"referee_name\",\n", " \"referee.country.name\": \"referee_country_name\",\n", " }\n", " )\n", " .collect()\n", ")\n", "\n", "pprint(result[:3])" ] }, { "cell_type": "markdown", "id": "6730228c", "metadata": {}, "source": [ "## Lineups\n", "\n", "Get the first player in the lineup for each team for a given match_id." ] }, { "cell_type": "code", "execution_count": 4, "id": "2714de91", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'player': {'cards': [],\n", " 'country': {'id': 22, 'name': 'Belgium'},\n", " 'jersey_number': 17,\n", " 'player_id': 2954,\n", " 'player_name': 'Youri Tielemans',\n", " 'player_nickname': None,\n", " 'positions': [{'end_reason': 'Substitution - Off (Tactical)',\n", " 'from': '00:00',\n", " 'from_period': 1,\n", " 'position': 'Right Defensive Midfield',\n", " 'position_id': 9,\n", " 'start_reason': 'Starting XI',\n", " 'to': '77:47',\n", " 'to_period': 2}]},\n", " 'team_name': 'Belgium'},\n", " {'player': {'cards': [],\n", " 'country': {'id': 68, 'name': 'England'},\n", " 'jersey_number': 20,\n", " 'player_id': 3094,\n", " 'player_name': 'Bamidele Alli',\n", " 'player_nickname': 'Dele Alli',\n", " 'positions': [{'end_reason': 'Final Whistle',\n", " 'from': '83:28',\n", " 'from_period': 2,\n", " 'position': 'Right Center Midfield',\n", " 'position_id': 13,\n", " 'start_reason': 'Substitution - On (Tactical)',\n", " 'to': None,\n", " 'to_period': None}]},\n", " 'team_name': 'England'}]\n" ] } ], "source": [ "result = (\n", " Flow.statsbomb.lineups(match_id=8657)\n", " .assign(player=lambda x: get_field(x, \"lineup.0\"))\n", " .select(\"team_name\", \"player\")\n", " .collect()\n", ")\n", "\n", "pprint(result)" ] }, { "cell_type": "markdown", "id": "f9ab13f4", "metadata": {}, "source": [ "## Events\n", "\n", "Get all events for a given match_id and count who took the most shots." ] }, { "cell_type": "code", "execution_count": 5, "id": "e6306f4d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'player.name': 'Harry Maguire', 'n_shots': 3}\n", "{'player.name': 'Eric Dier', 'n_shots': 3}\n", "{'player.name': 'Thomas Meunier', 'n_shots': 2}\n" ] } ], "source": [ "result = (\n", " Flow.statsbomb.events(match_id=8657)\n", " .filter(where_equals(\"type.name\", \"Shot\"))\n", " .group_by(\"player.name\")\n", " .summary({\"n_shots\": (\"count\", \"player.name\")})\n", " .sort_by(\"n_shots\", ascending=False)\n", " .limit(3)\n", ")\n", "\n", "for record in result:\n", " print(record)" ] }, { "cell_type": "markdown", "id": "60d470dd", "metadata": {}, "source": [ "Filter the events for a given match_id to select only passes and save them to a database table. For simplicity, we'll just use a local SQLite database." ] }, { "cell_type": "code", "execution_count": 6, "id": "7c48f851", "metadata": {}, "outputs": [], "source": [ "# Create a connection to the SQLite database\n", "conn = sqlite3.connect(\"/tmp/passes.db\")\n", "\n", "# Save the DataFrame to the SQLite database, appending to the table\n", "results = (\n", " Flow.statsbomb.events(match_id=8657)\n", " .filter(lambda r: get_field(r, \"type.name\") == \"Pass\")\n", " .assign(\n", " player_id=lambda r: get_field(r, \"player.id\"),\n", " player_name=lambda r: get_field(r, \"player.name\"),\n", " start_x=lambda r: get_field(r, \"location.0\"),\n", " start_y=lambda r: get_field(r, \"location.1\"),\n", " end_x=lambda r: get_field(r, \"pass.end_location.0\"),\n", " end_y=lambda r: get_field(r, \"pass.end_location.1\"),\n", " outcome=lambda r: get_field(r, \"pass.outcome.name\"),\n", " )\n", " .select(\n", " \"player_id\", \"player_name\", \"start_x\", \"start_y\", \"end_x\", \"end_y\", \"outcome\"\n", " )\n", " .to_pandas()\n", " .to_sql(\"passes\", conn, if_exists=\"append\", index=False)\n", ")\n", "\n", "# Close the connection\n", "conn.close()" ] } ], "metadata": { "kernelspec": { "display_name": "venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.1" } }, "nbformat": 4, "nbformat_minor": 5 }