Skip to content

VIBE Review

Extension note: VIBE Review is optional and requires additional dependencies (install the review group). Core VIBE runs without it.

VIBE Review extends the document assembly platform to support structured document compliance review. Instead of generating documents through interviews, review templates evaluate existing documents against a set of requirements.

What is VIBE Review?

VIBE Review helps professionals systematically evaluate documents (contracts, policies, assessments) against requirements. The system combines:

  1. Human-authored requirements as the source of truth
  2. AI-assisted retrieval to find relevant document parts
  3. LLM evaluation for preliminary classification
  4. Human decision-making for final determinations

Creating a Review Template

Review templates use the standard VIBE template structure with interview_mode: review in the configuration.

Basic Structure

gdpr-dpa-review/
├── config.yml           # Requirements and configuration
├── template.md          # Report template (uses req() to control relevance)
└── components/          # Optional reusable sections

Defining Requirements

At the core of a review template is the requirements mapping. Each entry under requirements defines one review criterion: what the reviewer should evaluate, how retrieval should search for evidence, and any optional regulatory references or guidance.

The simplest layout is a flat top-level requirements: block:

template_name: "GDPR DPA Compliance Review"
interview_mode: review
description: >
  Review data processing agreements for compliance with GDPR Article 28.3.

# AI provider configuration (optional, references global providers)
review:
  embedding: local      # key from embedding_providers
  reranking: local      # key from rerank_providers
  evaluation: claude    # key from llm_endpoints
  # See "Review Pipeline Settings" below for additional tuning knobs
  # (assessment_concurrency, reasoning_effort, …).

requirements:
  GDPR-28-3a:
    label: "Documented instructions"
    description: |
      The processor must only process personal data on documented
      instructions from the controller.
    keywords:
      en: >
        documented instructions controller instructions written instructions
        processor shall only process on instructions
    example:
      en: >
        The Processor shall process personal data only on the documented
        instructions of the Controller, including with regard to transfers
        to third countries or international organisations.
    reference: "gdpr_2016_679:art28.3(a)"
    help: |
      Check that the agreement:
      - Requires processing only on controller's documented instructions
      - Covers transfers outside EU/EEA

      YES: "documented instructions", "written instructions only"
      NO: "processor's discretion", "as processor deems necessary"

Requirement Structure

Each requirement has:

Field Required Description
label Yes Short label shown in requirement lists
description Yes Full requirement text explaining what to evaluate
keywords Yes Language-keyed BM25 search terms (see "Retrieval-Tuning Fields" below)
example Yes Language-keyed idealised example clause used as the vector-search query
reference No Regulatory source reference(s) - see "Reference Sources" below
reference_text No Full regulatory text excerpt (auto-populated from reference if imported)
help No Evaluation guidance with YES/NO indicators

Requirements and Questions

Requirements and questions are configured in similar ways:

  • Both are keyed records in config.yml
  • Both can appear at the top level or inside groups
  • Both may have label and help text for the reviewer
  • Both become part of the review flow and report context

The main difference is their job:

  • A question captures context about the document, such as whether third-country transfers are allowed
  • A requirement defines something the document should be assessed against, such as whether a valid transfer mechanism is present

Questions can also use condition: to control follow-up questions. Requirements do not have their own conditional config; their relevance is controlled by the template via req(...).

Grouping Requirements

Like questions, requirements may be grouped when that makes the template easier to organize. Groups are optional. Use them when you want a shared title/description, or when a set of related questions and requirements belong together.

groups:
  instructions:
    title: "Documented Instructions"
    description: "Requirements related to processing instructions"
    requirements:
      GDPR-28-3a:
        label: "Documented instructions"
        description: |
          The processor must only process personal data on documented
          instructions from the controller.
        keywords:
          en: >
            documented instructions controller instructions written instructions
        example:
          en: >
            The Processor shall process personal data only on the documented
            instructions of the Controller.
        reference: "gdpr_2016_679:art28.3(a)"

      GDPR-28-3b:
        label: "Confidentiality obligations"
        description: |
          Persons authorized to process personal data have committed
          to confidentiality or are under statutory obligation.
        keywords:
          en: >
            confidentiality personnel confidentiality authorized persons
            statutory confidentiality obligation
        example:
          en: >
            The Processor shall ensure that persons authorized to process
            personal data are bound by confidentiality obligations.
        reference:
          - "gdpr_2016_679:art28.3(b)"
          - "gdpr_2016_679:rec83"
        help: |
          Check for confidentiality commitments from personnel.

Retrieval-Tuning Fields

keywords and example are what the retrieval pipeline actually searches with: keywords feeds BM25, example feeds the embedding-similarity query. Both are mandatory language-keyed mappings; loading a template that omits them (or supplies a non-dict value) raises an error at load time rather than failing quietly at query time.

sakerhetsatgarder:
  label: Säkerhetsåtgärder hos leverantören
  description: Avtalet ska reglera vilka säkerhetsåtgärder leverantören ska vidta.
  keywords:
    sv: >
      säkerhetsåtgärder tekniska organisatoriska kryptering åtkomstkontroll
      behörighetsstyrning loggning övervakning segmentering säkerhetskopiering
    en: >
      security measures technical organizational encryption access control
      authentication logging monitoring segmentation backup
  example:
    sv: >
      Leverantören ska vidta de tekniska och organisatoriska säkerhetsåtgärder
      som anges i bilaga Säkerhet, innefattande kryptering av data vid överföring
      och lagring, rollbaserad åtkomstkontroll med multifaktorautentisering…
    en: >
      The Provider shall implement technical and organizational security measures
      as specified in the Security Appendix, including encryption at rest and in
      transit, role-based access control with multi-factor authentication…
  • keywords[lang] — a flat string of the terms a human curator would expect to find in a compliant clause. Feeds the BM25 index directly.
  • example[lang] — one or two sentences of what an ideal compliant clause would look like. Feeds the embedding provider so vector search retrieves semantically similar passages even when the keyword surface differs.

Provide an entry per language your corpus covers. If a requirement's keywords or example mapping doesn't include the document's detected language, the retrieval layer logs a warning for that query and uses label + description as a per-call substitute — the template still loads. The mandatory-field check above is about the mapping itself being present, not about every language key being present inside it.

Review Pipeline Settings

The top-level review: block configures how the AI pipeline runs for this template. Provider keys reference globally-defined providers; the remaining keys tune runtime behaviour.

review:
  # Provider references (required)
  embedding: local       # key from embedding_providers
  reranking: local       # key from rerank_providers
  evaluation: claude     # key from llm_endpoints

  # Concurrency for batch assessment (optional, default 4)
  assessment_concurrency: 8

  # Per-stage reasoning effort (optional; only takes effect on
  # reasoning-capable models)
  reasoning_effort:
    stage1: low          # default: "low"
    stage2: medium       # default: the provider's own setting
Field Default Description
embedding required Provider key for vector search. References an entry under embedding_providers in the app config.
reranking required Provider key for cross-encoder reranking.
evaluation required Provider key for the LLM that performs stage-1 relevance filtering and stage-2 compliance/answer assessment.
assessment_concurrency 4 Maximum number of requirements assessed in parallel during a batch run. Increase if your LLM endpoint can sustain higher concurrency without rate-limiting; decrease if you start seeing 429s. Forced to 1 for single-item flows (e.g. clicking "Ask AI" on a single requirement) regardless of this setting.
reasoning_effort.stage1 "low" Reasoning effort for stage 1 (relevance filtering — does this document part touch on this requirement?). Stage 1 is high-volume and the task is essentially binary, so "low" is usually sufficient. Use "off" to disable reasoning entirely on models that support it.
reasoning_effort.stage2 (provider default) Reasoning effort for stage 2 (compliance verdict + user-facing reasoning text). One call per requirement, output is shown to reviewers — "medium" or "high" typically yields better-quality rationales at modest extra cost.

Concurrency

The full assessment pipeline runs each requirement in its own worker thread inside a bounded pool. Each worker handles one requirement end-to-end: hybrid search → rerank → stage-1 relevance → stage-2 verdict → DB save. Workers are submitted with a small initial-fill stagger so the first batch doesn't burst on the LLM endpoint.

Per-item failures are isolated: if one requirement raises, the rest of the batch continues. The failed item emits an item_error SSE event the workbench logs to the browser console; the requirement keeps its previous state. Set assessment_concurrency: 1 to fall back to fully serial assessment if you need that for debugging.

Reasoning Effort

reasoning_effort only takes effect on models that expose a reasoning control. Where it's recognised:

  • OpenAI gpt-5 and o-series models accept reasoning_effort as a top-level request field.
  • gpt-oss-style models (e.g. on vllm/Berget) honour a Reasoning: <level> line at the start of the system prompt — VIBE injects this automatically when reasoning_effort is set, so template authors don't need to bake it into their own prompts.
  • Anthropic Claude models with thinking enabled use the same level naming.

Non-reasoning models (gpt-4, gpt-4o, llama-3, etc.) silently ignore the field. The setting is therefore safe to leave configured even if you switch evaluation backends.

A practical recipe for cost-sensitive templates against a thinking model: leave stage1 at "low" (the default) and bump stage2 to "high" so the user-facing reasoning text is well-considered, while the per-part relevance filter stays cheap.

Context Questions

Review templates support standard VIBE questions alongside requirements. Questions serve two purposes:

  1. Capture document context - Facts about the agreement that affect which requirements apply
  2. AI-assisted answering - The system can analyze the document and suggest answers

Define questions in your config, either at top-level or within groups:

groups:
  transfers:
    title: "International Transfers"
    description: "Requirements for transfers outside EU/EEA"

    questions:
      allows_third_country_transfer:
        type: bool
        assistable: true
        label: "Does the agreement permit transfers to third countries?"
        help: |
          Check if the processor is allowed to transfer personal data
          outside the EU/EEA. Look for clauses about subprocessors
          in non-EU countries or data center locations.

      transfer_mechanism:
        type: radio
        assistable: true
        label: "What transfer mechanism is specified?"
        options:
          - value: adequacy
            label: "Adequacy decision (Art. 45)"
          - value: scc
            label: "Standard Contractual Clauses (Art. 46.2c)"
          - value: bcr
            label: "Binding Corporate Rules (Art. 47)"
          - value: derogation
            label: "Derogation (Art. 49)"
          - value: none
            label: "None specified"
        condition: allows_third_country_transfer

    requirements:
      GDPR-28-transfer-mechanism:
        label: "Valid transfer mechanism"
        description: |
          Transfers to third countries must be based on an adequacy
          decision, appropriate safeguards, or a specific derogation.
        reference: "gdpr_2016_679:art46"

Opting questions in to AI assistance (assistable: true)

assistable: true exposes an Ask AI button alongside the question's input control. Clicking it runs the same retrieval and two-stage classification pipeline used for requirement assessment (see How Review Works below) and presents a suggested answer with supporting evidence from the document. The reviewer then either accepts the suggestion (which becomes the question's value) or discards it. Without assistable: true the question still works but reviewers can only fill it in manually.

Two characteristics worth knowing when designing a review template:

  • Suggestions are presented for review, never auto-applied. Even with assistable: true, every assistable question still requires the reviewer to confirm or override the AI's proposal — there is no way for the AI to silently set a value.
  • The "Ask all" batch action covers requirements only. The button at the top of the workbench that runs assessment over the full applicable set does not include context questions. Each assistable question must be triggered individually with its own Ask AI button.

In practice, set assistable: true on context questions whose answers gate which requirements appear (the highest-leverage time saver for reviewers), and leave it off on questions a reviewer would always answer manually anyway.

Requirement Relevance

Not all requirements apply to every document. Question answers determine which requirements are relevant through the template.

The Mental Model

Requirements become relevant through the template, not through configuration:

  1. Answer questions that capture document context (manually or via AI suggestion)
  2. Template probing executes the template with current answers
  3. Conditional req() calls produce only the relevant requirements
  4. Re-probing on change - when answers change, requirements update automatically

This is the same "template is truth" principle used for interview questions.

Example: GDPR Third Country Transfers

A data processing agreement review might have requirements that only apply when international transfers are involved:

{# template.md - Controls which requirements are relevant #}

# GDPR Article 28 Compliance Review

## Core Processing Requirements

{{ req("GDPR-28-3a") }}  {# Documented instructions - always applies #}
{{ req("GDPR-28-3b") }}  {# Confidentiality - always applies #}

{% if allows_third_country_transfer %}
## International Transfer Requirements

The agreement permits transfers to third countries.

{% if transfer_mechanism == "adequacy" %}
{{ req("GDPR-45-adequacy") }}
{% elif transfer_mechanism == "scc" %}
{{ req("GDPR-46-scc") }}
{{ req("GDPR-46-supplementary") }}
{% elif transfer_mechanism == "bcr" %}
{{ req("GDPR-47-bcr") }}
{% else %}
{{ req("GDPR-49-derogation") }}
{% endif %}

{{ req("GDPR-28-transfer-mechanism") }}
{% endif %}

In this example:

  • Core requirements (28-3a, 28-3b) always apply
  • Transfer requirements only appear if allows_third_country_transfer is true
  • The specific transfer mechanism requirements depend on which legal basis is used
  • If the reviewer changes an answer, the requirement list updates immediately

How Review Works

1. Document Ingestion

VIBE Review supports Markdown, Word (DOCX), and PDF files.

  • DOCX: Rendered to PDF for layout-faithful viewing.
  • PDF: Processed using high-fidelity text extraction.
  • Scanned PDFs: Automatically processed via OCR (Optical Character Recognition).

The system segments documents into semantically meaningful parts (headings, paragraphs, tables) with stable identifiers.

2. Requirement Matching

For each relevant requirement, the system uses Hybrid Search (combining BM25 keyword matching and semantic vector similarity) to find the most relevant document parts. Results are further refined using a Cross-Encoder Reranker for maximum precision.

3. AI Classification

For each requirement, the AI performs a two-stage evaluation:

  1. Relevance Filtering: Determines if matched parts actually address the requirement.
  2. Compliance Evaluation: Assesses whether the relevant parts satisfy the requirement (YES/NO/PARTIAL/NOT_APPLICABLE).

Each stage produces a confidence level (High/Medium/Low) and reasoning. The UI displays the overall confidence alongside the classification, helping reviewers prioritize which items need closer attention.

Classifications are tracked with metadata:

Field Description
is_ai_suggested Whether the classification came from AI
is_human_verified Whether a human has reviewed/confirmed the classification
confidence Numeric confidence score (0.0-1.0)
reasoning AI explanation for the classification
human_notes Optional notes added by the human reviewer

4. AI-Assisted Questions

Context questions opted in via assistable: true (see Opting questions in to AI assistance above) get an Ask AI button next to the input. Clicking it analyses the document and proposes an answer with supporting evidence; the reviewer accepts or discards the suggestion before it becomes the question's value. Each assistable question is triggered individually — the workbench's batch "Ask all" action covers requirements only.

5. Evidence Curation

After AI classification, each requirement's matched document parts are split into two roles:

  • Primary evidence — the parts that most directly determine the verdict. Rendered as large cards at the top of the "Matched Sections" block.
  • Supporting evidence — parts that reinforce the primary. Rendered as compact rows below.

Reviewers can curate this list in several ways:

  • Remove parts: Click the trash icon on a card or row to drop it from the assessment. Removed parts appear as ghost rows with an undo button, so a misclick can be reversed without redoing the search.
  • Promote / demote: Move a supporting part into primary (or a primary part back into support) with the promote/demote buttons on each row. There can be more than one primary at a time.
  • Add from the document viewer: Hover a document part and click the "+" button to add it to the currently selected assessment item. The route accepts an is_primary flag so parts can be added directly as primary or as support.
  • Reset matched sections: An undo button on the "Matched Sections" header restores the parts (and their primary/support roles) to the AI's original suggestion, without touching the verdict.
  • Reset the decision: When the reviewer's verdict diverges from the AI's, a full undo control next to the save button clears the human verdict, notes, and matched-parts changes back to the AI baseline in one step.

When you re-run AI classification on an item with curated parts, the system skips the search and reranking stages and uses your curated parts directly. This lets you correct AI mistakes and ensure the classification uses exactly the document sections you've identified.

6. AI Baseline and Divergence

When the AI produces a suggestion, the review is frozen as an AI baseline: the original verdict plus the primary/supporting part ids. If a human subsequently changes the verdict, the review records a diverged_at timestamp and the workbench surfaces the divergence in three places:

  • A "Diverges from AI" pill next to the classification controls.
  • A collapsible AI banner showing the original AI suggestion. It starts expanded while the verdict still matches the AI, and auto-collapses once the human changes the verdict (you can expand it at any time to compare).
  • The two reset-to-AI controls above (matched-sections undo and decision undo) let you return to the baseline without re-running the AI.

The same baseline powers the provenance markers on the session overview page (see Session Overview below).

Human-in-the-Loop Workbench

The Review Workbench provides a multi-pane interface for efficient review:

┌──────────────────────────────────────────────────────────────┐
│ ┌───────────┐ ┌────────────────────────────────────────────┐ │
│ │  Sidebar  │ │              Main Container                │ │
│ │           │ ├──────────────────┬─────────────────────────┤ │
│ │ Questions │ │   Detail Panel   │   Document Viewer       │ │
│ │     +     │ │                  │   / Draft Report        │ │
│ │ Req List  │ │  Selected item,  │                         │ │
│ │           │ │  AI reasoning,   │  Document tabs +        │ │
│ │           │ │  evidence panel  │  persistent "Draft      │ │
│ │           │ │  with primary/   │  report" tab            │ │
│ │           │ │  supporting split│                         │ │
│ └───────────┘ └──────────────────┴─────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
  1. Left Sidebar: A unified stream of context questions and compliance requirements. Verified items show a check marker. Click any item to select it.
  2. Detail Panel: Shows the selected question or requirement with AI reasoning, primary/supporting evidence cards, classification controls, and the divergence pill and reset-to-AI controls when the verdict has been changed.
  3. Document Viewer / Draft Report: A single pane that toggles between two views via tabs at the top:
    • Document tabs — high-fidelity rendering of each uploaded document. Matched parts are highlighted and clickable; text selections inside PDFs and DOCX can be sent to the current assessment item as primary or supporting evidence.
    • Draft report tab — a live preview of the generated report, rendered from the review template. The tab is always present. When classifications or context answers change while the report is visible, the preview refreshes automatically. When review progress reaches 100% for the first time, the workbench briefly flashes the tab and auto-opens it so the reviewer sees the finished draft without hunting for it.

A draggable divider between the detail panel and viewer area allows you to adjust the column widths.

Session Overview (template sessions view)

The per-template sessions page lists every review session for a given template together with a row of requirement tiles summarising the current state of the session at a glance. Each tile represents one applicable requirement and encodes both classification and provenance:

  • Color — classification (fulfilled / partial / not fulfilled / not applicable / not assessed).
  • Border / fill weight — whether the result is AI-only, human-verified, or not yet assessed.
  • Divergence marker — indicates requirements where the human reviewer's verdict differs from the AI's original suggestion.

A legend under the table explains each combination. Hovering a tile surfaces the same information in tooltip form (e.g. "Partially fulfilled · reviewed by human · diverges from AI"). Tiles are ordered to match the template's requirement order, so the pattern is comparable across sessions.

Review CLI

Use vibe review for the operational side of a review template: ingesting documents, importing reference sources, maintaining examples, and managing the review database. If you are debugging the parsing pipeline or benchmarking the extension itself, use vibe-dev review instead; those commands are documented in Developer Review CLI.

The runtime command groups are:

  • vibe review session ... for creating, listing, updating, and inspecting review sessions
  • vibe review refs ... for importing and searching regulatory reference sources
  • vibe review examples import ... for seeding the few-shot example corpus from JSON
  • vibe review db ... for database initialization, reset, and status checks
  • vibe review embeddings build ... for (re)building embeddings after ingestion
  • vibe review render collect-fonts for DOCX rendering support

Typical commands:

vibe review session create contract.docx --template dora-review
vibe review session list
vibe review session results 12
vibe review refs search "audit rights" --source dora_2022_2554
vibe review db status

Template Output

The template.md (or template.docx) defines the final report. The following variables are available in the context:

Variable Description
review_session Metadata about the session (ID, status, created_at, updated_at as ISO timestamps)
req(id) Returns a RequirementProxy with boolean logic and property access (see below)
group(id) Returns a RequirementGroupProxy for group-level compliance checks (see below)
YES, NO, PARTIAL, NOT_APPLICABLE, PENDING Classification constants for comparison
documents List of uploaded documents with their metadata
[question_id] Answers to context questions are available at the top level

Requirement and Group Proxies

Use req() and group() functions that return rich proxy objects with boolean logic and property access. For static requirement metadata (label, description, help, example text), use question(id) — see Template Functions, Filters & Metadata.

The req() Function

req('id') returns a RequirementProxy object that:

  • Is truthy when compliant{% if req('D2-1') %} is true only when result == YES
  • Exposes review-state fields as properties
  • Renders as a formatted string ("{label}: {result}") in output context
Property Description
id Requirement ID (e.g., "GDPR-28-3a")
label Short label from config (used by the default string template)
result Classification result (YES, NO, PARTIAL, NOT_APPLICABLE, PENDING)
confidence Numeric score (0.0-1.0)
reasoning AI or human explanation
human_notes Reviewer's notes
is_ai_suggested True if classification came from AI
is_human_verified True if human confirmed

For static definition fields — description, help, example[lang], keywords[lang] — use question(id) instead. See Template Functions, Filters & Metadata.

Boolean semantics example:

{# Only true when result == YES #}
{% if req('GDPR-28-3a') %}
✓ Documented instructions requirement is fully satisfied.
{% endif %}

{# Use 'not' to check for non-compliance #}
{% if not req('GDPR-28-3a') %}
⚠ Missing or incomplete: Add clause for documented instructions.
{% endif %}

Property access example:

{% if req('GDPR-28-3a').result == PARTIAL %}
**Partial compliance**: {{ req('GDPR-28-3a').reasoning }}

{% if req('GDPR-28-3a').human_notes %}
Reviewer notes: {{ req('GDPR-28-3a').human_notes }}
{% endif %}
{% endif %}

The group() Function

group('id') returns a RequirementGroupProxy for group-level compliance checks:

  • Is truthy when all requirements are compliant (all have result == YES)
  • Provides aggregate properties for compliance counts
  • Supports filtering by classification result
Property Description
id Group ID from config
title Group title
requirements List of RequirementProxy objects in the group
all_compliant True if all requirements are YES
none_compliant True if no requirements are YES
compliant_count Number of YES requirements
incompliant_count Number of non-YES requirements

Method:

Method Description
filter(result) Returns requirements matching the given result

Group-level checks example:

{% if group('audit_rights') %}
## Audit Rights ✓
All audit requirements are satisfied.
{% else %}
## Audit Rights ({{ group('audit_rights').incompliant_count }} issues)

{% for r in group('audit_rights').filter(NO) %}
- ❌ **{{ r.label }}**: {{ r.reasoning }}
{% endfor %}

{% for r in group('audit_rights').filter(PARTIAL) %}
- ⚠ **{{ r.label }}**: {{ r.human_notes or r.reasoning }}
{% endfor %}
{% endif %}

Classification Constants

The following constants are available for comparison:

Constant Description
YES Requirement is satisfied
NO Requirement is not satisfied
PARTIAL Requirement is partially satisfied
NOT_APPLICABLE Requirement does not apply
PENDING Not yet classified

Using constants in conditions:

{% if req('DORA-30-3e').result == NOT_APPLICABLE %}
_This requirement does not apply to this agreement._
{% elif req('DORA-30-3e').result == PARTIAL %}
**Partial**: {{ req('DORA-30-3e').reasoning }}
{% endif %}

Complete Report Example

# Compliance Review Report

## Summary

{% if group('core_requirements') %}
**Core Requirements**: All {{ group('core_requirements').compliant_count }} requirements met ✓
{% else %}
**Core Requirements**: {{ group('core_requirements').incompliant_count }} of {{ group('core_requirements').requirements|length }} issues found
{% endif %}

## Detailed Findings

### Core Processing Requirements

{% for r in group('core_requirements').requirements %}
#### {{ r.label }}

{% if r.result == YES %}
✓ **Compliant** ({{ "%.0f"|format(r.confidence * 100) }}% confidence)
{{ r.reasoning }}
{% elif r.result == PARTIAL %}
⚠ **Partial** - {{ r.reasoning }}
{% if r.human_notes %}
_Reviewer: {{ r.human_notes }}_
{% endif %}
{% elif r.result == NO %}
❌ **Non-compliant** - {{ r.reasoning }}
{% endif %}

{% endfor %}

{% if allows_third_country_transfer %}
### International Transfer Requirements

{% if not group('transfers') %}
**Action Required**: {{ group('transfers').incompliant_count }} transfer requirement(s) need attention.
{% endif %}
{% endif %}

Reference Sources

Requirements can link to regulatory source texts via the reference field. This enables the system to retrieve authoritative regulatory text to enrich search queries and improve AI classification accuracy.

Reference Format

The reference field uses the format <source_id>:<part_id>:

groups:
  audit:
    title: "Audit Rights"
    requirements:
      DORA-30-3e:
        label: "Audit rights"
        description: "The contract must grant audit and inspection rights."
        reference: "dora_2022_2554:art30.3(e)"

For multiple references, use a list:

reference:
  - "dora_2022_2554:art30.3(e)"
  - "dora_2022_2554:rec71"
  - "eba_guidelines:gl29"

Importing Reference Sources

Before references can be resolved, the regulatory source documents must be imported into the database:

vibe review refs import sources.json --embedding-provider local

The JSON file should contain the source metadata and its parts:

{
  "source": {
    "id": "dora_2022_2554",
    "language": "en",
    "title": "Regulation (EU) 2022/2554 (DORA)",
    "type": "regulation",
    "reference": "EU 2022/2554"
  },
  "parts": [
    {
      "part_id": "art30.3(e)",
      "part_type": "article",
      "title": "Article 30(3)(e)",
      "text": "The full regulatory text for this article...",
      "hierarchy": ["Chapter IV", "Article 30"]
    },
    {
      "part_id": "rec71",
      "part_type": "recital",
      "title": "Recital 71",
      "text": "The full text of recital 71..."
    }
  ]
}

How References Are Used

When references are configured and the source documents are imported:

  1. Reference resolution: The system matches <source_id>:<part_id> to database entries
  2. Text population: The reference_text field is automatically populated with the official regulatory text
  3. Enhanced search: Regulatory text is used alongside the requirement description for similarity search
  4. Rich LLM context: The AI classifier receives both the requirement description and the authoritative source text, improving classification accuracy

Few-Shot Examples

VIBE Review uses few-shot learning to improve classification accuracy. Examples are curated document excerpts with known classifications that help the AI understand how to evaluate similar content.

How Examples Work

When classifying a requirement, the system:

  1. Retrieves relevant examples for that requirement from the examples database
  2. Reranks by similarity to the current document excerpt
  3. Includes diverse examples (mixing YES/NO/PARTIAL classifications) in the AI prompt
  4. Records which examples were used for transparency and debugging

Creating Examples

There are two ways to create examples:

  1. Promote from matched parts: After AI classification, click the 🎯 icon next to any matched section to open the "Save as Example" form. The form pre-fills with the document excerpt and current classification.

  2. Manual creation: Navigate to the Examples page (via the "Examples" nav link) and create examples directly with custom excerpts.

Each example includes:

Field Description
requirement_id The requirement this example applies to
document_excerpt The relevant text from a document
classification YES, NO, PARTIAL, or NOT_APPLICABLE
reasoning Explanation of why this classification applies
quality_score 0.0-1.0 score indicating example quality (higher = better)

Managing Examples

The Examples page (/<template_id>/examples) provides:

  • Filtering by requirement, classification, or minimum quality score
  • Editing excerpts, reasoning, and quality scores
  • Deleting outdated or poor-quality examples

In development mode, a button appears after AI classification showing which examples were used, helping you understand and improve the example corpus.

If you want to seed examples from JSON rather than the UI, use:

vibe review examples import examples.json