vibe.review.parsing.structure.adapters¶
Adapters for non-PDF document formats.
Markdown, DOCX, and HTML documents enter the pipeline at the structure layer, bypassing extraction and layout. These adapters convert directly to DocumentStructure.
MarkdownAdapter ¶
Convert Markdown documents to DocumentStructure.
Parses Markdown syntax and creates structured blocks for: - Headings (ATX style: # to ######) - Paragraphs - Lists (ordered and unordered) - Code blocks (fenced) - Blockquotes
adapt ¶
adapt(content: str, source_path: str | None = None) -> DocumentStructure
Convert Markdown content to DocumentStructure.
| Parameters: |
|
|---|
| Returns: |
|
|---|
DocxAdapter ¶
Convert DOCX documents to DocumentStructure.
Uses python-docx to parse Word documents and creates structured blocks based on paragraph styles and content.
Extracts Word outline numbering from heading styles to create properly numbered clause blocks that the semantic layer can detect.
adapt ¶
adapt(path: Path, source_path: str | None = None) -> DocumentStructure
Convert DOCX file to DocumentStructure.
| Parameters: |
|
|---|
| Returns: |
|
|---|
HtmlAdapter ¶
Convert HTML documents to DocumentStructure.
Uses BeautifulSoup to parse HTML and creates structured blocks based on HTML elements.
adapt ¶
adapt(content: str, source_path: str | None = None) -> DocumentStructure
Convert HTML content to DocumentStructure.
| Parameters: |
|
|---|
| Returns: |
|
|---|