The Query Method#

The Flow class includes a powerful .query() method, enabling you to filter records using a concise, string-based expression. This improves the readability and flexibility of your data processing pipelines.

You can specify complex filtering conditions using standard Python comparison and logical operators, field access, and even built-in functions or string methods.

Use .query() when you want to:

  • Prototype filters quickly.

  • Write more readable and maintainable pipeline logic.

  • Let users define filters without writing custom Python code.

How it Works#

The .query() method parses a string expression using Python’s Abstract Syntax Tree (AST) module. It then converts this AST into an efficient predicate function, which is applied to each record in the Flow stream. This approach ensures security (as it doesn’t use eval() directly on arbitrary input) and allows for robust validation of the query syntax.

Basic Comparisons#

You can compare record fields to literal values using standard comparison operators:

  • == (equals)

  • != (not equals)

  • > (greater than)

  • >= (greater than or equal to)

  • < (less than)

  • <= (less than or equal to)

# Example: Filter for matches with more than 3 goals
flow.query("goals > 3")

# Example: Filter for matches played by 'Man City'
flow.query("home_team == 'Man City'")

Logical Operators#

Combine multiple conditions using and, or, and not:

  • and

  • or

  • not

# Example: Home wins for 'Liverpool'
flow.query("home_team == 'Liverpool' and home_goals > away_goals")

# Example: Matches not involving 'Arsenal'
flow.query("not (home_team == 'Arsenal' or away_team == 'Arsenal')")

Field Access#

Access nested fields using dot notation:

  • field.subfield

# Example: Filter based on nested 'venue.city' field
flow.query("venue.city == 'London'")

Membership Operators (in, not in)#

Check if a field’s value is present in a list or tuple:

  • in

  • not in

# Example: Filter for matches involving specific teams
flow.query("home_team in ['Chelsea', 'Tottenham']")

# Example: Filter for matches NOT involving a specific league
flow.query("league not in ['Premier League', 'La Liga']")

Warning

in and not in require the field to appear on the left-hand side of the expression. Reverse usage is not currently supported (e.g., “Man City” in home_team will raise an error).

Checking for NULLs#

Check for null/missing values:

  • is None

  • is not None

# Example: Find records where 'player.injury_status' is null
flow.query("player.injury_status is None")

# Example: Find records where 'player.injury_status' is not null
flow.query("player.injury_status is not None")

String Methods#

Apply common string transformations for comparison. Note these are used within a comparison:

  • len(): Get the length of a string or list/tuple.

  • .lower(): Convert a string to lowercase.

  • .upper(): Convert a string to uppercase.

# Example: Find teams whose name is exactly 'manchester united' (case-insensitive)
flow.query("home_team.lower() == 'manchester united'")

# Example: Find teams with a short name
flow.query("len(home_team) < 8")

Predicate-Style String Methods (Standalone)#

Directly check string properties using method calls:

  • .contains(substring)

  • .startswith(prefix)

  • .endswith(suffix)

  • .regex(pattern, flags) or .match(pattern, flags)

# Example: Find home teams containing 'united'
flow.query("home_team.contains('united')")

# Example: Find away teams starting with 'West'
flow.query("away_team.startswith('West')")

# Example: match player name using a regex
flow.query("player.name.regex('^Mo')")

Referencing Local Python Variables (@var)#

To make your queries dynamic, you can inject external Python variables using the @ symbol. This allows you to construct queries programmatically while maintaining readability. For example, @team_name will be replaced with the actual value of the variable team_name from your Python scope.

This is especially useful when working with date ranges, parameterized filters, or reusable queries.

import datetime

min_goals = 2
team_name = "Liverpool"
start_date = datetime.date(2023, 1, 1)

# Example: Using numeric and string variables
flow.query("home_goals >= @min_goals and home_team == @team_name")

# Example: Using a date object
flow.query("match_date >= @start_date")

For regular expressions, you should pass flags such as re.IGNORECASE or re.MULTILINE by referencing them the same way:

import re

pattern = r"liverpool"
flags = re.IGNORECASE

# Example: matching a string using a regular expression
flow.query("home_team.regex(@pattern, @flags)")

Remember:

  • Regex flags must be passed as values from the re module.

  • The query parser substitutes @var with safe, scoped values - no arbitrary code execution occurs.

Filtering by Date and Time#

You can filter results using datetime() and date() objects from Python’s built-in datetime module. These can be used directly in your query strings to create date or datetime values for comparison.

# Example: Matches after a specific date
flow.query("match_date > date(2024, 6, 30)")