Structure layer node types.
These represent the logical structure of a document:
- StructuredBlock: A logical block (heading, paragraph, list item, etc.)
- DocumentStructure: Complete document with all blocks in reading order
BlockType
Types of logical document blocks.
StructuredBlock
A logical block in the document structure.
This is the main output of the structure layer - blocks with
semantic type classification and hierarchical relationships.
| Attributes: |
-
text
(str)
–
The text content of the block.
-
block_type
(BlockType)
–
Logical type (heading, paragraph, list_item, etc.).
-
level
(int)
–
Heading level (1-6) or list nesting depth (0-based).
-
page
(int)
–
Primary page number (1-based).
-
bbox
(BBox | None)
–
Bounding box (if from PDF).
-
list_type
(ListType | None)
–
For list items, the type of list.
-
list_marker
(str | None)
–
For list items, the marker (e.g., "1.", "a)", "-").
-
parent_id
(str | None)
–
ID of parent block (for nesting).
-
children
(list[str])
–
Child block IDs (for tables, nested lists).
-
source_layout_id
(str | None)
–
ID of the source LayoutBlock (for traceability).
-
Layout-derived
(properties (propagated from source LayoutBlock)
–
-
line_count
(int)
–
Number of lines in the source layout block.
-
indent_level
(int)
–
Visual indentation level from layout analysis.
-
is_bold
(bool)
–
Whether majority of text is bold.
-
is_single_line
(bool)
–
Whether block contains exactly one line.
-
Merge
(semantics for layout-derived properties)
–
-
-
(line_count)
–
-
-
(indent_level)
–
preserved from primary (first) block
-
-
(is_bold)
–
AND (true only if all merged blocks are bold)
-
-
(is_single_line)
–
False after merge (multiple blocks = multiple lines)
|
is_heading
Check if this is a heading.
is_list_item
Check if this is a list item.
is_table_element
Check if this is part of a table.
to_dict
to_dict() -> dict[str, Any]
DocumentStructure
Complete document structure.
Contains all blocks in reading order, plus document-level metadata
and structure information.
| Attributes: |
-
blocks
(list[StructuredBlock])
–
All blocks in reading order.
-
title
(str | None)
–
Document title (if detected).
-
source_path
(str | None)
–
-
source_type
(str)
–
Type of source ("pdf", "docx", "markdown", "html").
-
page_count
(int)
–
-
headings
(list[str])
–
List of heading block IDs (for quick navigation).
|
get_headings
Get all heading blocks, optionally filtered by level.
get_blocks_under_heading
Get all blocks under a heading until the next heading of same/higher level.
to_markdown
Convert to Markdown representation.
to_dict
to_dict() -> dict[str, Any]