vibe.review.parsing.layout.analyzer¶

Layout analysis: Convert extracted words into layout structure.

This analyzer: 1. Groups words into lines based on vertical alignment 2. Groups lines into blocks based on spacing and alignment 3. Detects page regions (header, footer, body, columns) 4. Identifies repeated headers/footers across pages

LayoutConfig ¶

Configuration for layout analysis.

LayoutAnalyzer ¶

Analyze document layout from extracted words.

Converts ExtractedWord[] into LayoutPage[] with hierarchical structure: Page → Region → Block → Line → Word.

init ¶

__init__(config: LayoutConfig | None = None, rules_dir: Path | None = None, rule_engine: RuleEngine | None = None, doclayout_detector: YoloLayoutDetector | None = None, table_structure_detector: TableStructureDetector | None = None) -> None

Initialize the layout analyzer.

Parameters:

config (LayoutConfig | None, default: None ) –

Layout analysis configuration. Uses defaults if not provided.
rules_dir (Path | None, default: None ) –

Directory containing rule YAML files. Defaults to built-in rules.
rule_engine (RuleEngine | None, default: None ) –

Pre-configured rule engine. If provided, rules_dir is ignored.
doclayout_detector (YoloLayoutDetector | None, default: None ) –

Optional YOLO layout detector override.
table_structure_detector (TableStructureDetector | None, default: None ) –

Optional Table Transformer detector override.

analyze ¶

analyze(extraction: ExtractionResult, page_progress: Callable[[int, int], None] | None = None) -> list[LayoutPage]

Analyze layout of all pages.

Parameters:	`extraction` (`ExtractionResult`) – ExtractionResult from the extraction layer. `page_progress` (`Callable[[int, int], None] \| None`, default: `None` ) – Optional callback invoked after each page is processed. Receives (page_number, total_pages).

Returns:	`list[LayoutPage]` – List of LayoutPage objects.

extract_table_from_words ¶

extract_table_from_words(words: list[ExtractedWord], page_num: int, table_bbox: BBox | None = None, element_label: str = 'table') -> LayoutTable | None

Extract table structure from words using position-based analysis.

This method detects table columns by clustering text x-coordinates, groups text into rows by y-coordinate, and merges continuation rows where the first column (typically a row identifier) is empty.

Parameters:	`words` (`list[ExtractedWord]`) – Words within the table region. `page_num` (`int`) – Page number (1-based). `table_bbox` (`BBox \| None`, default: `None` ) – Bounding box of the table region (optional). `element_label` (`str`, default: `'table'` ) – The doclayout label that triggered detection.

Returns:	`LayoutTable \| None` – LayoutTable if table structure was detected, None otherwise.