vibe.review.parsing.structure.nodes

Structure layer node types.

These represent the logical structure of a document: - StructuredBlock: A logical block (heading, paragraph, list item, etc.) - DocumentStructure: Complete document with all blocks in reading order

BlockType

Types of logical document blocks.

ListType

Types of lists.

StructuredBlock

A logical block in the document structure.

This is the main output of the structure layer - blocks with semantic type classification and hierarchical relationships.

Attributes:
  • text (str) –

    The text content of the block.

  • block_type (BlockType) –

    Logical type (heading, paragraph, list_item, etc.).

  • level (int) –

    Heading level (1-6) or list nesting depth (0-based).

  • page (int) –

    Primary page number (1-based).

  • bbox (BBox | None) –

    Bounding box (if from PDF).

  • list_type (ListType | None) –

    For list items, the type of list.

  • list_marker (str | None) –

    For list items, the marker (e.g., "1.", "a)", "-").

  • parent_id (str | None) –

    ID of parent block (for nesting).

  • children (list[str]) –

    Child block IDs (for tables, nested lists).

  • source_layout_id (str | None) –

    ID of the source LayoutBlock (for traceability).

  • Layout-derived (properties (propagated from source LayoutBlock) –
  • line_count (int) –

    Number of lines in the source layout block.

  • indent_level (int) –

    Visual indentation level from layout analysis.

  • is_bold (bool) –

    Whether majority of text is bold.

  • is_single_line (bool) –

    Whether block contains exactly one line.

  • Merge (semantics for layout-derived properties) –
  • - (line_count) –

    sum of merged blocks

  • - (indent_level) –

    preserved from primary (first) block

  • - (is_bold) –

    AND (true only if all merged blocks are bold)

  • - (is_single_line) –

    False after merge (multiple blocks = multiple lines)

is_heading

is_heading: bool

Check if this is a heading.

is_list_item

is_list_item: bool

Check if this is a list item.

is_table_element

is_table_element: bool

Check if this is part of a table.

to_dict

to_dict() -> dict[str, Any]

Convert to dictionary.

from_dict

from_dict(d: dict[str, Any]) -> StructuredBlock

Create from dictionary.

DocumentStructure

Complete document structure.

Contains all blocks in reading order, plus document-level metadata and structure information.

Attributes:
  • blocks (list[StructuredBlock]) –

    All blocks in reading order.

  • title (str | None) –

    Document title (if detected).

  • source_path (str | None) –

    Path to source document.

  • source_type (str) –

    Type of source ("pdf", "docx", "markdown", "html").

  • page_count (int) –

    Total number of pages.

  • headings (list[str]) –

    List of heading block IDs (for quick navigation).

get_block

get_block(block_id: str) -> StructuredBlock | None

Get a block by ID.

get_headings

get_headings(max_level: int | None = None) -> list[StructuredBlock]

Get all heading blocks, optionally filtered by level.

get_blocks_under_heading

get_blocks_under_heading(heading_id: str) -> list[StructuredBlock]

Get all blocks under a heading until the next heading of same/higher level.

to_markdown

to_markdown() -> str

Convert to Markdown representation.

to_dict

to_dict() -> dict[str, Any]

Convert to dictionary.

from_dict

from_dict(d: dict[str, Any]) -> DocumentStructure

Create from dictionary.