vibe.review.parsers.dora_parser

Parser for DORA (EU Regulation 2022/2554) HTML files from EUR-Lex.

Extracts articles with granular sub-article parts into a structured JSON format suitable for import into the reference_sources system.

Part IDs use language-independent format for direct lookup: - Articles: "art30.3(e)(i)" for maximum granularity - Recitals: "rec42"

Reference format in config.yml: "dora_2022_2554:art30.2(a)"

parse_dora_html

parse_dora_html(html_path: str | Path, language: str = 'en') -> dict[str, Any]

Parse a DORA HTML file and return structured data with granular parts.

Parameters:
  • html_path (str | Path) –

    Path to the EUR-Lex HTML file

  • language (str, default: 'en' ) –

    Language code (en, sv, etc.)

Returns:
  • dict[str, Any]

    Dictionary with source metadata and parts list