vibe.review.parsing.rules.predicates

Predicate evaluation for the rules engine using Python expressions.

Predicates are Python expressions evaluated against nodes. They support full Python syntax including boolean operators, comparisons, and function calls.

Examples: - "node.is_bold" - "len(node.text) < 100" - "node.is_bold and len(node.text) < 50" - "'definition' in node.text.lower()" - "re.search(r'^\d+\.', node.text)" - "node.bbox.width > 400" - "has_verb(node.text)" # Uses spaCy for verb detection - "previous() and previous().is_heading" # Check previous node in list - "next() and 'continued' in next().text.lower()" # Check next node

Context available in predicates: - node: The node being evaluated - ctx: Additional context (page dimensions, helper functions) - re: The re module for regex operations

Built-in predicate functions: - has_verb(text, language="sv"): Check if text contains a verb (requires spaCy)

List traversal functions (when rule engine provides traversal context): - next(): Returns the next node in the list, or None if at the end - previous(): Returns the previous node in the list, or None if at the start - node_index(): Returns the 0-based index of the current node - nodes_count(): Returns the total number of nodes in the list

CompiledPredicate

A compiled predicate expression ready for evaluation.

Attributes:
  • expression (str) –

    Original expression string.

  • code (CodeType | None) –

    Compiled Python code object.

  • is_function_ref (bool) –

    True if this is a predicate_function reference.

  • function_name (str | None) –

    Name of the function (if is_function_ref).

PredicateEvaluator

Evaluate Python expression predicates against nodes.

Supports: - Full Python expression syntax - Boolean operators (and, or, not) - Comparisons (==, !=, <, <=, >, >=, in, not in) - String methods (.startswith(), .endswith(), .lower(), etc.) - Regex via re module (re.search(), re.match()) - Nested attribute access (node.bbox.width) - predicate_function references to external Python functions

__init__

__init__(context: dict[str, Any] | None = None, predicate_functions_dir: Path | None = None) -> None

Initialize the evaluator.

Parameters:
  • context (dict[str, Any] | None, default: None ) –

    Additional context variables available as ctx in predicates.

  • predicate_functions_dir (Path | None, default: None ) –

    Directory containing predicates.py for function loading.

set_traversal_context

set_traversal_context(nodes: Sequence[Any] | None, current_idx: int = -1) -> None

Set the list context for next()/previous() traversal functions.

This should be called before evaluating predicates when processing a list of nodes, to enable predicates like: - "previous() and previous().is_heading" - "next() and 'continued' in next().text.lower()"

Parameters:
  • nodes (Sequence[Any] | None) –

    The list of nodes being processed, or None to clear.

  • current_idx (int, default: -1 ) –

    Index of the current node in the list.

clear_traversal_context

clear_traversal_context() -> None

Clear the traversal context.

parse

parse(expression: str) -> CompiledPredicate | None

Parse and compile a predicate expression string.

This is the main entry point for compiling predicates from YAML.

Parameters:
  • expression (str) –

    Python expression like "node.is_bold and len(node.text) < 100".

Returns:
  • CompiledPredicate | None

    CompiledPredicate object or None if compilation fails.

compile

compile(expression: str) -> CompiledPredicate

Compile and validate a predicate expression.

Parameters:
  • expression (str) –

    Python expression string.

Returns:
Raises:
  • ValueError

    If expression is invalid or contains unsafe constructs.

compile_function_ref

compile_function_ref(function_name: str) -> CompiledPredicate

Create a predicate that references an external function.

Parameters:
  • function_name (str) –

    Name of the function in predicates.py.

Returns:

evaluate

evaluate(predicate: CompiledPredicate, node: object) -> bool

Evaluate a compiled predicate against a node.

Parameters:
  • predicate (CompiledPredicate) –

    The compiled predicate to evaluate.

  • node (object) –

    The node to test.

Returns:
  • bool

    True if predicate matches, False otherwise.

evaluate_all

evaluate_all(predicates: list[CompiledPredicate], node: object) -> bool

Evaluate all predicates (AND logic).

Returns True only if all predicates match.

evaluate_any

evaluate_any(predicates: list[CompiledPredicate], node: object) -> bool

Evaluate predicates with OR logic.

Returns True if any predicate matches.

has_verb

has_verb(text: str, language: str = 'sv') -> bool

Check if text contains a verb using spaCy POS tagging.

This is useful for distinguishing headings (noun phrases, typically no verbs) from paragraph starts or sentences (which contain verbs).

Can be used in predicate expressions

"has_verb(node.text)" "not has_verb(node.text) and node.is_bold"

Parameters:
  • text (str) –

    The text to analyze.

  • language (str, default: 'sv' ) –

    ISO 639-1 language code (sv, en, de, fr, es).

Returns:
  • bool

    True if text contains a verb (VERB or AUX POS tag).

  • bool

    False if no verb is found OR if spaCy is unavailable.

Note

Returns False (not None) when spaCy is unavailable, making it safe for use in boolean predicate expressions.