Skip to content

Review Module Architecture

Architectural reference for VIBE's document compliance review system. Optimized for LLM consumption.

See also:

  • core.md - Core VIBE engine
  • components.md - Component system details
  • assistant.md - AI-assisted interview system
  • parsing-pipeline.md - Document parsing pipeline (4-layer architecture)

1. SYSTEM OVERVIEW

The Review module enables AI-assisted document compliance review against structured requirement frameworks. Unlike VIBE's core interview mode (which generates documents), Review mode analyzes uploaded documents against predefined requirements (e.g., DORA ICT contract compliance).

1.1 Core Philosophy

  • A review template consists of requirements plus a reporting template (template.md/template.docx)
  • Requirements use VIBE Core patterns: only relevant requirements need assessment (determined by template probing)
  • Review is a template capability (interview_mode: review), not a standalone module
  • Entry point: /interview/<template_id>/ redirects to /review/<template_id>/sessions

1.2 Key Capabilities

  • Document Processing: Multi-document upload (Markdown, DOCX, PDF); documents segmented into parts with hierarchy and stable IDs (content-hash based)
  • Hybrid Search: BM25 keyword + semantic embeddings retrieve candidate parts per requirement
  • Two-Stage LLM Classification: Relevance filtering, then compliance evaluation (YES/NO/PARTIAL)
  • Human-in-the-Loop: AI suggests classifications; humans verify/override; verified reviews promote to few-shot examples
  • Output: Excel export (any state) and template-based report generation (finished reviews)

2. DATA MODEL

2.1 Database Models

Persistence layout (split):

  • vibe/review/db_models.py — SQLAlchemy ORM models. Single source of truth for the Review schema; Alembic introspects Base.metadata from this module. Kept import-light so migrations/env.py doesn't pull in business logic.
  • vibe/review/models.py — In-memory dataclasses, enums (ClassificationResult, DocumentStatus), and view-model types (RequirementTile, TemplateSessionRow, MatchedPart, etc.) used by the service layer and templates.
Model Purpose
ReviewSessionModel Session linking template + documents (status: pending → processing → ready)
DocumentModel Uploaded document metadata and status
DocumentPartModel Document segments with embeddings, stable IDs, hierarchy metadata
RequirementReviewModel Classification result per requirement (result, confidence, reasoning, matched parts)
QuestionReviewModel AI/human answers for template questions
ExampleModel Few-shot examples for LLM classification
RequirementCacheModel Cached requirement definitions with embeddings
ReferenceSourceModel Regulatory source documents (DORA, EBA, etc.)
ReferencePartModel Segments of regulatory sources

Schema migrations: Alembic versions live under vibe/review/migrations/ (alembic.ini, env.py, versions/). The schema is generated from db_models.Base.metadata; BM25 indexes (_create_bm25_indexes in database.py) are application-managed and filtered out of autogenerate via env.py::include_object. Programmatic invocation lives in vibe/review/db_migrate.py (upgrade_to_head, current_revision, check_pending); whole-database export/import wrappers around pg_dump/pg_restore/psql live in vibe/review/db_dump.py.

Classification Results: YES, NO, PARTIAL, NOT_APPLICABLE, PENDING

AI Snapshot & Curation Columns (on both RequirementReviewModel and QuestionReviewModel):

Column Purpose
ai_original_primary_part_ids Frozen baseline copy of primary evidence from the latest AI run
ai_original_supporting_part_ids Frozen baseline copy of supporting evidence from the latest AI run
ai_original_verdict Frozen baseline verdict (requirements) / answer (questions) from the latest AI run
diverged_at Timestamp stamped the first time a reviewer edit moves away from the snapshot; cleared by "reset all"
is_parts_curated True once the reviewer adds/removes/promotes/demotes a part; makes AI re-runs skip stage-1 relevance
assessment_part_ordinals {part_id_str: ordinal} persisted from the stage-2 prompt so the UI can redisplay "DEL N" / "EXCERPT N"
MatchedPart.is_primary First-class flag — multiple primaries are allowed, reviewers promote/demote, UI shows mixed primary lists

RequirementTile and TemplateSessionRow (dataclasses in vibe/review/models.py) aggregate these fields for the per-template session dashboard. SessionListRow is the parallel view-model for the global review index — it carries session, template_info, and progress only (no per-requirement tiles).

2.2 Requirements Definition

Location: vibe/review/requirements.py

Requirements defined in template config.yml under unified groups format with context questions and requirements co-located:

groups:
  audit_rights:
    title: Audit Rights
    questions:
      critical_service: { type: bool, label: Critical or Important Function }
    requirements:
      D5-1:
        label: Unlimited audit rights
        description: Full access to inspect ICT service provider

Requirements can include optional retrieval-tuning fields:

requirements:
  D5-1:
    label: Unlimited audit rights
    description: Full access to inspect ICT service provider
    keywords:
      en: "audit rights inspection access unlimited"
      sv: "revisionsrättigheter inspektion åtkomst obegränsad"
    example:
      en: "The client shall have unlimited rights to audit and inspect..."
      sv: "Klienten ska ha obegränsade rättigheter att granska..."
  • keywords -- Language-keyed BM25 search terms (curated for keyword recall)
  • example -- Language-keyed idealized contract clause (optimized for vector similarity)

Template-Driven Applicability: {% if critical_service %}{{ req('D5-1') }}{% endif %}

Purpose-Specific Query Builders (Requirement dataclass methods):

Method Used By Content
build_search_query(language) BM25 search Curated keywords for language, else label + desc
build_vector_query(language) Embedding sim. Idealized example for language, else description
build_rerank_query() Cross-encoder label + description + reference (no help text)
build_query_text() LLM prompts label + description + reference + help (everything)

Key Classes: RequirementLoader, RequirementSet, Requirement, probe_template_for_requirements()

Template Probing — "template is truth": vibe/review/template_functions.py exposes two probe entry points. probe_template_for_requirements() is the thin wrapper that returns just the discovered req() ids; the underlying probe_template() returns a ProbeResult(requirement_ids, accessed_keys) dataclass. accessed_keys is the set of top-level Jinja variable names looked up while rendering — captured via a _TrackingContext subclass of jinja2.runtime.Context whose resolve_or_missing records every lookup. The Review UI computes question relevance as accessed_keys ∩ template_question_ids: a question is relevant iff the report template touches its variable in an active conditional branch. Probing uses ReviewProbeUndefined (a jinja2.Undefined subclass whose comparisons all return False) so unanswered questions don't accidentally satisfy comparison guards and unlock requirements early.

Computable resolution before probing. Before probing, ReviewService._evaluate_computables (vibe/review/services/review_service.py) resolves every type: computable question in the template and merges the derived values into the rendering context. Two preconditions matter: (1) every unanswered non-computable template question is pre-populated as None so a compute expression like leverantor != False evaluates to a real Python value (None != FalseTrue) instead of raising ComputableProbeError, and (2) resolution is multi-pass — a computable can depend on the result of an earlier computable; the loop stops when a pass yields no progress. Without this step, the review-side probe used to skip computable resolution for unanswered dependencies, and any req() gated on a computable would silently disappear until every dependency had been answered. The same helper runs from build_template_context before rendering the final compliance report so gates like {% if first_gate %} fire correctly there too. The full interview orchestrator already resolves computables; this is the review-side parity step.

3. DOCUMENT INGESTION

Location: vibe/review/ingestion.py, vibe/review/document_sources.py

Two-Phase Process:

  1. Upload (create_session) -- Store documents in filestore, create session with status=pending
  2. Ingest (stream_session_ingestion) -- Parse documents, compute embeddings via SSE stream

Document Sources: The DocumentSource protocol (document_sources.py) provides per-format abstraction with MarkdownSource, PdfSource, DocxSource implementations. Factory: create_document_source().

Ingestion Pipeline:

1. Detect content type → create DocumentSource
2. Route to Parsing Pipeline (see parsing-pipeline.md)
3. Consolidate SemanticUnits for RAG-friendly chunks
4. Convert SemanticUnits to DocumentPartModels
5. Detect language (for multilingual embedding/prompts)
6. Generate embeddings (batch)
7. Store parts with metadata (hierarchy, bounding boxes)

Chunk Consolidation (_consolidate_units_for_chunking in ingestion.py): Parsed units are trimmed into RAG-sized chunks before becoming parts. A deep unit (level > max_level=3) normally merges into the nearest preceding shallow unit unless it is long (≥min_child_length=200 chars) or semantically distinct (PartType.DEFINITION). PartType.LIST_ITEM is a special case: list items always merge into their parent regardless of depth, length, or exemption — individual bullets lack the surrounding context an LLM needs to assess a requirement, so the list-as-a-whole is the useful retrieval target. Merged source block ids are preserved on the parent for provenance.

Heading-Prefixed Embedding Text: The embedding provider sees f"{title}\n\n{content}" for body parts and bare content for heading-type parts (PartType.HEADING, ANNEX_HEADING). See _embedding_text in ingestion.py. The same canonical form is built by vibe/review/document_text.py::part_text so the reranker scores the same semantic unit the embedder encoded. Changing the composition rule (or the BM25 index shape — see §5.1) invalidates stored embeddings; benchmark runs re-ingest every time, workbench sessions require a re-ingest.

Filestore: (vibe/review/filestore.py) Content-addressed binary storage. store_bytes(data, suffix) → (sha256, path). Documents reference binaries via doc_metadata.filestore.sha256.

Outline Anchors (vibe/review/anchors.py::compute_anchors): Before consolidation, DocumentIngester._process_extraction_result calls compute_anchors(units) over the full pre-consolidation SemanticUnit tree and assigns every resulting anchor to DocumentPartModel.part_anchor. An anchor is a slash-separated outline path derived by walking the parser's parent_id chain up to the document root — e.g. 13/13.1 for clause 13.1 in the main agreement, or BILAGA 1/2/2.2/2.2.4 for clause 2.2.4 inside Bilaga 1. ANNEX_HEADING (and numbered annex/appendix/schedule units) reset the scope so an annex's 2.2.4 never collides with a main-agreement 2.2.4. Numbered units use their number as the segment; unnumbered headings / definitions / paragraphs fall back to typed labels (h1:…, def:…, p1). Anchors are computed over the pre-consolidation tree, so merging deep units into parents during chunking (§3) cannot reshape them. Unlike the content-hashed stable part_id, anchors survive chunker churn — if the source document's clause numbering is stable, the anchor is stable. This is what makes the benchmark's anchor-based matcher (§5.6) robust to retrieval-layer reshuffling. The parser invariant that parent_id resolves within the unit list is enforced — a dangling parent raises ValueError.

4. CONFIGURATION

The Review module resolves configuration from the app_config mapping passed to ReviewService and ReviewProviderFactory:

  • Database: DATABASE_URL from app config or environment
  • OCR: review_backends.ocr in app config (backend, dpi, text_layer_min_chars)
  • DOCX Conversion: review_backends.docx in app config (backend)
  • LLM/Embedding/Rerank: Resolved via ReviewProviderFactory from template-level config
  • Caching: OCR, PDF, and DOCX render caches via helper functions

Schema Migrations on Startup: FlaskDBExtension.init_app (vibe/review/database.py) calls db_migrate.upgrade_to_head against the configured database before the app starts serving requests, so any schema changes shipped with the running code are applied automatically. The call is idempotent (no-op when the DB is already at head) and handles three start states: fresh DB (run baseline), stamped DB (forward-only upgrade), and pre-Alembic DB (apply additive repairs and stamp at head). Operators who want to gate migrations behind a separate step can run vibe review db migrate manually before bringing the app up. CI uses db_migrate.check_pending as a gate to fail builds that touch db_models.py without a matching migration.

5. CLASSIFICATION PIPELINE

5.1 Retrieval

Location: vibe/review/retrieval/, vibe/review/hybrid_search.py

iter_retrieve_parts() yields intermediate RetrievalStep objects so callers can emit progress between stages:

1. Build split queries from requirement:
   - bm25_query  = requirement.build_search_query(language)   # curated keywords
   - vector_query = requirement.build_vector_query(language)   # idealized example clause
   - rerank_query = requirement.build_rerank_query()           # label + desc + ref
2. Partition the requested document_ids by language.
3. For each language partition: one BM25 + Embedding similarity search across
   all docs in that partition (HybridSearcher with configurable fusion;
   bm25_query/vector_query override defaults).
4. Concatenate per-language results, sort by fusion score, cap at search_limit.
5. Filter short candidates (configurable min_candidate_length).
6. Cross-encoder reranking (PartReranker) using rerank_query.

HybridSearcher.search() accepts optional bm25_query and vector_query parameters to override the default query for their respective sub-searches. When omitted, both fall back to query.

The end-to-end flow is rendered in retrieval-pipeline.mmd.

Cross-Document Search (Per Language): Retrieval runs one hybrid search per language partition, not per document. Running BM25 globally over the bundle means IDF is computed across the full corpus and ranks reflect bundle-level relevance — the DPA's strongly-relevant clause competes against the SOW's weakly-relevant clause on one ranked list instead of each contributing a pro-forma "rank 1". Language partitioning is still required because BM25's analyzer is language-specific; each document's own DocumentModel.language wins over the caller-provided fallback. Monolingual bundles collapse to a single search call.

Fusion Mode: Two fusion strategies are configurable via review.fusion_mode in config.yml:

  • rrf (default) — Reciprocal Rank Fusion. Rank-based and scale-invariant; review.rrf_k controls dampening. Our candidate pools are small (tens of hits) so the classical k=60 over-flattens differences; default k=10 gives meaningful separation between rank 1 and rank 10.
  • minmax — Per-ranker min-max normalized weighted sum of raw BM25 and cosine-similarity scores. Preserves the confidence gap that RRF throws away (BM25's 37 vs runner-up 7 stays meaningfully ahead of the pack). Useful when one ranker is much more confident than the other; rrf_k is ignored in this mode.

Heading-Aware Retrieval: A document part has a body (content) and a section heading (title). The retrieval pipeline treats these as one semantic unit so bodies that are identical across sections (e.g. "No deviations from standard terms" under differing headings) stay distinguishable:

  • Embeddings are computed over f"{title}\n\n{content}" at ingest time (see _embedding_text in ingestion.py). Heading-type parts are embedded as-is to avoid duplication.
  • BM25 index covers both content and title. The multi-field index idx_document_parts_bm25 USING bm25 (id, content, title) lets DocumentPartStrategy.bm25_search filter with (content @@@ q) OR (title @@@ q); pdb.score(id) returns a single combined BM25 score. A token appearing only in the heading still contributes to relevance.
  • Reranker sees the same heading-prefixed text as the embedder (_rerank_document_text in reranker.py). The returned RankedPart.part_text stays body-only so downstream consumers (classifier, LLM prompts, UI) keep their existing contract, with section_heading carried alongside.

The canonical "{heading}\n\n{body}" composition lives in vibe/review/document_text.pycombine() produces the HeadingAugmentedText NewType, and part_text() wraps it for DocumentPartModels. Every ingestion/ranking/benchmark-matching call site goes through those helpers so the heading-aware shape can't silently diverge across subsystems.

Both write and read sides apply the same 2000-character cap via truncate_at_whitespace: ingestion uses EMBEDDING_TEXT_CHAR_CAP in ingestion.py, the reranker uses RERANK_DOCUMENT_CHAR_CAP in reranker.py. The two caps must move together — an asymmetry would truncate one provider's input but not the other's.

Changing embedding input or the BM25 index definition invalidates data stored with the prior rules. Benchmark runs re-ingest every time and self-heal; workbench sessions require a re-ingest to benefit.

Corpus-Aware Rerank Scaling: scale_top_n(total_chunks, base=10) provides logarithmic scaling of rerank top_n based on corpus size (50 chunks -> 10, 500 -> 20, 5000 -> 30). Used by retrieve_parts() when no explicit limit is given.

retrieve_parts() is a convenience wrapper that consumes the iterator and returns the final result.

5.2 Two-Stage LLM Classification

Location: vibe/review/classifier.py::RequirementClassifier, vibe/review/question_answerer.py::QuestionAnswerer

Requirements and questions share the same two-stage pipeline:

  • Stage 1 — Batch relevance filter. One LLM call keyed by PART_ID returns R/N per candidate (+ confidence, short reasoning). The shared Jinja template (relevance_batch_user_{sv,en}.jinja2) is parameterised by item_kind so the same prompt serves both assessment kinds. Parts marked N are dropped.
  • Stage 2 — Aggregate verdict. One LLM call over just the relevant parts returns a single verdict.
    • For requirements: {compliance, confidence, reasoning, primary_evidence, supporting_evidence} where primary_evidence usually contains one PART_ID — the excerpt that most clearly determines the verdict — and supporting_evidence lists additional excerpts that reinforce it. A PART_ID appears in exactly one list.
    • For questions: {answer, confidence, reasoning, primary_evidence, supporting_evidence, needs_user_input} with the same primary/supporting split.

SSE progress emits two distinct events between retrieval and persistence: stage="assessing_relevance" with the candidate count, then stage="assessing_compliance" with the relevant-after-stage-1 count.

Persistence: RequirementReviewModel and QuestionReviewModel store primary_part_ids and supporting_part_ids as separate JSON columns. all_part_ids (primary first, then supporting) is what the workbench "Matched Sections" list displays.

5.3 Batch Classification

Location: vibe/review/services/review_service.py::ReviewService.stream_assessment

Orchestrates batch assessment over SSE: resolve the applicable targets from the template/session state, then for each item retrieve parts, classify with AssessmentClassifier, persist the result, and emit progress events to the workbench.

Parallel execution: Per-item assessments run on a ThreadPoolExecutor so independent requirements/questions don't queue behind each other on the LLM. Concurrency comes from template_config.review.assessment_concurrency (default 4); single-item interactive flows force concurrency to 1 to avoid spinning up a pool that's not needed. Worker submissions are staggered by ~250 ms so the first batch of workers doesn't all hit the slow stage-1 LLM call simultaneously — by the time the last worker fires, the first worker's response is already arriving (better pipeline overlap and a smoother fill on the workbench progress grid).

Per-worker DB session: SQLAlchemy Session instances are not thread-safe. Each worker opens its own session via database.get_db_session(), instantiates a worker-local ReviewService, and re-fetches the ReviewSessionModel by id. Workers also re-push the captured Flask app context (current_app._get_current_object()) — the QuestionAnswerer, classifier prompts, and the gettext-translated progress messages all reach for current_app / flask_babel.gettext / url_for, which raise RuntimeError("Working outside of application context") without it.

SSE event ordering: Workers drain progress events into a shared Queue; the main thread emits them as SSE in arrival order (not target order). Each worker emits an item_started marker, a stream of progress events (one per pipeline stage), and a terminating done or error. Per-item exceptions are routed through the queue so other in-flight items keep running (graceful degradation per item; the run as a whole succeeds with item_error events interleaved). HTML rendering of completed results stays on the request thread — render_html_callback uses render_template, which needs the active Flask request context that workers don't share — so workers always pass render_html_callback=None and the main-thread queue drain calls the callback when a result arrives.

5.4 Unified Assessment

Location: vibe/review/assessment.py

AssessmentClassifier unifies question-answering and requirement-classification pipelines. AssessmentItem and AssessmentResult provide a common interface for both assessment types.

5.5 AI Snapshot & Evidence Curation

Location: vibe/review/services/review_service.py (add_matched_part, promote_matched_part, demote_matched_part, reset_matched_parts_to_ai, reset_all_to_ai), vibe/review/web/routes.py

Every AI run writes a frozen baseline snapshot of the verdict and the primary/supporting evidence to the ai_original_* columns on RequirementReviewModel / QuestionReviewModel (§2.1). The snapshot lets reviewer-initiated evidence curation coexist with AI re-runs:

  • First reviewer edit after an AI baseline exists stamps diverged_at with the current timestamp. save_human_classification compares the new verdict against ai_original_verdict; curation actions (add_matched_part, promote, demote) stamp on any divergence from the stored ai_original_* lists. Matching the snapshot leaves diverged_at cleared.
  • Evidence curation is multi-primary. add_matched_part(..., is_primary=True) appends rather than replacing — a single requirement can cite several primary clauses. promote_matched_part moves one support into primary without demoting the others; demote_matched_part is the mirror. Reviewers can also add entirely new parts to the matched list.
  • is_parts_curated=True gets set the first time a reviewer mutates the evidence list. It gates the AI re-run shortcut described in §5.1.
  • Reset paths. reset_matched_parts_to_ai(requirement_id) restores the primary/supporting lists from the snapshot while leaving the verdict untouched. reset_all_to_ai(requirement_id) also clears human_override, is_human_verified, verified_at, and diverged_at — undoing the reviewer's disagreement entirely.
  • Endpoint shape. The curation routes respond with HX-Trigger headers so the reactive UI re-renders the tile, sidebar, and detail pane without a full navigation.
  • Questions share the same code path. add_matched_part(item_type="question", ...) creates a QuestionReviewModel with is_ai_suggested=False if none exists — manual curation before AI has run must not let analytics claim AI involvement.

Question-side verdict divergence. vibe/review/web/routes.py::_question_divergence is the question equivalent of the requirement-side ai_original_verdict comparison. It only reports divergence when the QuestionReviewModel is AI-suggested and human-verified and human_override is not None and the normalized human value differs from the normalized AI answer; otherwise the question is treated as agreement. Normalization goes through _normalize_answer_for_comparison, which coerces bool-shaped strings (yes/no/ja/nej/true/false/1/0, case-insensitive, whitespace-tolerant) to Python True/False/None and leaves other strings alone. This matters because AI suggestions are persisted as strings (e.g. "yes") while bool-form handlers store Python True; without the coercion, a user accepting the AI's verdict would falsely register as a divergence and stamp diverged_at.

Curated-parts fast path in the classifier: When is_parts_curated=True and the matched-parts list is non-empty, the AI re-run constructs an AssessmentClassifier(skip_relevance_check=True) and classifies directly over the curated set. Stage-1 (relevance filter) is skipped — the reviewer has already decided which parts matter, so filtering would second-guess the curation and burn an extra LLM call. Stage-2 (aggregate verdict) still runs. When the curated list is empty, the full retrieval pipeline runs.

5.6 Few-Shot Examples

Location: vibe/review/retrieval/example_retriever.py

ExampleRetriever finds similar examples from database. Users can promote matched document parts to examples from the workbench, creating a feedback loop where human-verified classifications become training data.

5.7 Benchmark Harness

The benchmark harness (tools/vibe_dev/benchmark.py, CLI vibe-dev review eval benchmark) is a developer tool, not part of the runtime Review extension. It lives under tools/vibe_dev/ and evaluates retrieval + assessment quality against curated *.golden-assessment.yml records.

Reference: doc/developer/review-benchmark.md covers the golden-assessment schema, stage selection, scoring semantics (compliance / evidence / combined), match categories, multi-model evaluation, and the full CLI surface.

Points the runtime engine should be aware of:

  • Benchmark runs ingest documents through the same parsing pipeline the workbench uses (setup_session dispatches to ingest_markdown / ingest_pdf_file / ingest_docx_file), so re-ingestion is lossless.
  • Retrieval-stage matching uses combine(heading, body) from vibe/review/document_text.py on both sides before fuzzy comparison — mirrors the heading-aware form embeddings and the reranker see (§5.1). Any change to that composition rule invalidates stored embeddings and goldens alike.
  • Benchmark runs bind batch_run_id, session_id, template_id, requirement_id, assessment_mode, and (in multi-model runs) evaluation_model via review_log_context, which the structured-logging summary tools (vibe-dev review logs summarize) consume.

Golden-Assessment Matching (Anchors + Fuzzy Text): Golden evidence parts carry an optional anchor: selector (a single anchor or list — e.g. 13/13.1, BILAGA 1/2/2.2/2.2.4). The matcher in tools/vibe_dev/benchmark.py (match_retrieved_to_golden) resolves by outline path first and falls back to fuzzy text similarity only when no anchor is set or no anchor hits. Anchor path equality is EXACT; prefix equality in either direction (retrieved is a descendant of the golden anchor, or golden is a descendant of the retrieved chunk that absorbed it) is BROAD. Anchor hits are scored above every fuzzy hit during greedy assignment, so a correct anchor never loses a retrieved part to a noisier text-based claim. The MatcherMode enum selects which signals count: BOTH (default — anchors preferred, fuzzy fallback), ANCHOR (anchors only; unresolved goldens become MISS), TEXT (fuzzy only; ignores part_anchor). The anchor side depends on DocumentPartModel.part_anchor populated at ingest (§3); the text side depends on combine(heading, body) being symmetrical between ingestion and matcher.

6. PROMPT SYSTEM

Location: vibe/review/templates/prompts/

Naming Convention: {type}_{role}_{lang}.jinja2

  • Types: relevance_batch (stage 1, shared by requirements and questions), compliance_aggregate (stage 2, requirements), question_answer (stage 2, questions)
  • Roles: system, user
  • Languages: en, sv. Other locales fall back to English until translations are added.
  • Partials prefixed with _ (e.g., _requirement_info_en.jinja2)

Prompt Building: vibe/review/prompts.py::PromptBuilder constructs prompts with requirement details, document context, and few-shot examples. The JSON shape of the response is not described in the prompt — it is enforced at the API level via response_format.json_schema (see "Structured Output" below). User prompts only carry semantic guidance (what Y/N/P mean, when to set needs_user_input, how primary vs supporting evidence ids work).

Structured Output: All structured-output requests are issued via LLMProvider.generate_structured, which sends OpenAI's response_format: {"type": "json_schema", "json_schema": {"name": <schema_name>, "schema": <schema>, "strict": false}} envelope (mirrored on the wire by LLMProvider.describe_structured_request and the per-provider _generate_structured_once implementations). This is the form Berget and other vLLM-backed endpoints accept, and is what Claude's JSON-Schema guided decoding expects. strict=false is used deliberately: vLLM-backed endpoints constrain generation with the schema regardless of the flag, while OpenAI's own strict mode imposes contract requirements (every property in required, additionalProperties:false everywhere, limited keyword set) that not every schema in this module chooses to satisfy. The schema_name argument is the pipeline stage label (requirement_relevance_batch / requirement_compliance_aggregate / question_relevance_batch / question_answer_aggregate) so server-side logs can correlate a constraint back to its caller.

Stage-1 schema shape: RELEVANCE_BATCH_SCHEMA is a homogeneous-array envelope: {"verdicts": [{"id": <int>, "relevance": "R"|"N", "confidence": "H"|"M"|"L", "reasoning": "..."}, …]}. Each verdict's id matches the 1-based ordinal on the corresponding <document_part id="N"> tag. Parsers in classifier.py and question_answerer.py index the list by id and walk candidates in order to translate back to stable part_ids; missing, duplicate, or non-integer ids raise a ClassifierResponseError / QuestionAnswererError. The homogeneous-array shape is strict-mode-compatible on any JSON-Schema endpoint (OpenAI's strict mode included); the earlier ordinal-keyed-object shape required additionalProperties: <subschema>, which only vLLM permits.

Document-Part Identifier for the LLM: Each candidate / relevant part is rendered to the LLM as <document_part id="N">...</document_part> where N is a 1-based ordinal scoped to the prompt. The opaque stable part_id (DocumentPartModel.id) and the content-addressable part_id string (cel-...) are never shown to the model — prior versions included a PART_ID: line in addition to an EXCERPT N: marker, and the model occasionally confused the two. Stage-1 returns a JSON object keyed by the ordinal string; stage-2 primary_evidence / supporting_evidence are arrays of integer ordinals. The classifier and question-answerer parsers translate ordinals back to stable part_ids via an ordinal_to_part_id map before persistence, so primary_part_ids / supporting_part_ids continue to carry the stable form.

Ordinals have two lifetimes: a transient ordinal_map built inside ReviewService while a prompt is being assembled (keeps the stage-1 verdict, the stage-2 aggregate, and the UI labels consistent within a single run), and a persisted assessment_part_ordinals column (§2.1) on RequirementReviewModel / QuestionReviewModel that freezes the mapping used when the AI result was stored. The persisted form is what the workbench consults later to redisplay "DEL N" / "EXCERPT N" markers — long after the transient map is gone — so reviewer-time evidence displays still match the ordinals the LLM reasoned about.

Prompt-Injection Mitigation: Document-originated content — part text and section headings — is untrusted. It is wrapped in <document_part id="N">...</document_part> delimiter tags in every user prompt, and the PromptBuilder Jinja2 Environment has autoescape=True so that a document attempting to close the wrapper (</document_part>) renders as escaped entities and cannot break out of the tag. Each system prompt begins with a "SECURITY — UNTRUSTED DOCUMENT CONTENT" section instructing the model that content inside <document_part> is data to analyse, not commands to follow; the user prompts intentionally do not repeat the framing. Trusted fields (requirement label/description/help, reference text, question framing, reviewer-curated few-shot example excerpts) come from the template author's config.yml or the reviewer's own promotion action and render plainly without the wrapper. Defence-in-depth relies on three layers together: (1) autoescape prevents wrapper break-out; (2) the delimiter + system-prompt framing tells the model how to interpret the content; (3) humans verify every verdict before it takes effect.

LLM Client: Review uses the same provider abstraction as the assistant — vibe.providers.llm.base.LLMProvider — but only via the structured-output single-shot surface. Classifier and question-answerer call LLMProvider.generate_structured(system_prompt, user_prompt, schema, schema_name=...), which wraps the per-attempt method _generate_structured_once in a retry loop that handles transient failures, truncation-aware max_tokens doubling (capped at 32 768), and a free-form → schema-constrained fallback on parse failures. Failure modes are surfaced as StructuredOutputError / StructuredOutputTruncatedError / StructuredOutputParseError (see llm-providers.md §2). See vibe/providers/llm/base.py for the full structured-output API.

Mock LLM provider for tests: vibe/review/services/mock_llm_provider.py::MockReviewLLMProvider is an LLMProvider subclass that overrides _generate_structured_once with deterministic per-schema_name payloads (requirement_relevance_batch, question_relevance_batch, requirement_compliance_aggregate, question_answer_aggregate). Streaming/tool-calling abstract methods raise NotImplementedError because Review never calls them. Tests can preset specific stage-2 responses via set_question_response(question_id, ...) / set_response(system_prompt, user_prompt, ...), inspect the calls list to assert on the prompts the pipeline produced, and configure module-level fallbacks via configure_mock_llm / set_mock_response (matching the previous globals so existing test fixtures don't have to change shape). The mock binds preset evidence ids onto the real candidate ordinals from the prompt, so tests can write evidence assertions without knowing which <document_part id="N"> the retrieval layer happened to choose.

7. SERVICE LAYER

7.1 ReviewService

Location: vibe/review/services/review_service.py::ReviewService

Facade providing stable interface for web routes. Instantiated per-request with db_session, template_provider, and app_config: Mapping[str, Any] | None (decoupled from Flask globals).

Responsibility Areas:

  • Session lifecycle: list_sessions(limit), get_session, create_session, delete_session, stream_session_ingestion
  • Requirements: get_template_requirement_set, load_session_requirements, get_requirement_groups, get_requirement_tiles, get_classification_stats, _evaluate_computables (resolves type: computable answers and pre-populates unanswered template questions with None before requirement probing and report rendering — see §2.2)
  • Classification: stream_assessment, save_human_classification
  • Questions: get_template_questions, get_question_to_group, get_questions_for_group, get_relevant_question_ids, suggest_question_answer, save_human_question_answer
  • Navigation: get_panel_navigation, get_panel_navigation_stream, get_assessment_navigation -- drive the workbench's panel and per-item nav (current/prev/next, redirect when an item drops out of the stream).
  • Documents: get_matched_parts, prepare_documents_for_display, render_document_html, open_document_binary_for_download, export_results_xlsx
  • Examples: list_examples, get_example, update_example, delete_example
  • Reports: render_report, build_template_context

get_relevant_question_ids is load-bearing for the "template is truth" contract on the question side: the workbench filters group panels and the sidebar to just the questions whose ids appear in accessed_keys ∩ template_question_ids for the current probe. Questions hidden by report-template conditionals never reach the user.

Composed Services: ReviewAnalyticsService (vibe/review/services/analytics_service.py) -- accuracy calculation from human override patterns.

7.2 Provider Factory

Location: vibe/review/services/provider_factory.py::ReviewProviderFactory

Creates embedding, reranking, and LLM providers based on template config. OCR backend creation happens in ReviewService._create_cached_ocr_backend().

7.3 Progress Reporting

Location: vibe/review/progress.py

BaseProgress base class for all SSE progress updates in long-running operations (ingestion, classification).

8. WEB INTERFACE

Location: vibe/review/web/routes.py (Blueprint under /review/)

Workbench Layout: A global top bar (gbar) sits above a four-pane grid (workbench.html). The bar is fixed-height; the panes fill the remaining viewport.

Global top bar (gbar) — single unified header for the workbench:

  • Brand cluster on the left (back-to-sessions, VIBE logo).
  • Session crumb + name + edit-settings pencil that opens the <dialog id="session-settings-dialog"> (rename session, reorder/rename/remove documents, delete session).
  • Combined progress + "Ask AI about all" button (.gbar-progress.run-assessment-btn) carrying data-stream-mode="all_requirements" and data-assessment-concurrency (sourced from template_config.review.assessment_concurrency, §5.3). When AI is running, the button is replaced (via x-show="runningAssessment") by an AI-running strip with a spinner, ETA, cancel button, and a per-item progress dot grid (#assessment-progress-grid) — one dot per assessment, color-coded by pipeline stage (searching → reranking → assessing_relevance → assessing_compliance → complete) so the parallelism described in §5.3 is visible.
  • Download dropdown (partials/download_dropdown.html) — entries are "Preview draft report" (always enabled), per-format "Download Report" links (HTML/PDF/DOCX, gated on is_complete = progress.percentage == 100; disabled placeholders show a tooltip until then), and "Export Excel (.xlsx)" (always enabled).
  • User-initials chip on the far right.

OOB-swap pattern from progress_bar.html: Saving any verdict triggers progress_bar.html to render with three hx-swap-oob targets: the gbar's progress count and fill (so the counter ticks up without a reload), the assessment sidebar (so the per-group state updates), and the download dropdown (so disabled placeholders flip to active links the moment the final requirement is verified). This is the load-bearing pattern that lets the gbar stay in sync with reviewer activity in any pane.

Four-pane grid (left → right; comment "============ 4-PANE GRID ============" in workbench.html):

  1. pane--rail pane--left — Assessment-steps sidebar. Items are grouped by requirement group. Each group shows a single grouped-questions panel entry (the "context questions" that gate the group's requirements) followed by the group's requirements. Top-level questions defined outside any named group fall into a synthetic bucket rendered as "Context Questions" and identified by the sentinel UNGROUPED_PANEL_ID = "__ungrouped__" (defined in vibe/review/services/__init__.py, exported from services.review_service).
  2. pane--action (center) — Current assessment panel. One of two shapes depending on the selected sidebar item:
    • Single assessment item — classification form (for a requirement) or one question widget (for a single-question item).
    • Grouped-questions panel — multi-question detail panel rendered by _render_group_questions_panel_html (vibe/review/web/routes.py). Filters the group's members through get_relevant_question_ids so questions hidden by template conditionals don't appear. Synthetic group id __ungrouped__ selects the "Context Questions" bucket; resolved back to group_id=None for downstream lookups.
  3. pane--read (right of action) — Document viewer with a tab strip across the top (one tab per uploaded document, plus an "All" overflow menu when tabs overflow). The pane has two modes, switched by the currentView state on the reviewWorkbench Alpine component (vibe/review/static/review.js):
    • currentView === 'document' (default) — renders the parsed document via prepare_documents_for_display on ReviewService, with data-part-id attributes (and optional highlight) for every document the panel needs. PDFs and rendered DOCX use a rasterized page layout with a positioned text overlay; HTML/markdown documents render their parsed_html inline. A floating doc-selection toolbar lets reviewers attach selected text as primary/support evidence.
    • currentView === 'draft-report' — the same #document-viewer element is replaced by an HTMX fetch of data-draft-report-url (the review.report_preview route, /review/<template_id>/<session_id>/preview). A .draft-report-header is shown via x-show. While this mode is active, every assessment-saved / context-saved event re-fetches the preview so reviewer edits are reflected without switching tabs. When the final requirement transitions to complete, review-completed auto-switches into this mode once per session (tracked in sessionStorage).
  4. pane--rail pane--right — Document parts TOC. Renders the document_parts_tree for the active document (used for jump-navigation inside the viewer). An info icon in the pane header toggles a .doc-info panel that displays metadata for the active document (name, original filename, page count, identified-parts count, upload time, author/title/creator, language).

htmx Interactions: Click sidebar item → loads detail panel into pane--action; "Run AI" → classification via SSE + OOB swap; save → updates sidebar row + gbar counter + download dropdown via the OOB-swap pattern above; document part links → scroll viewer. Saves into a grouped-questions panel re-render the same panel; saves into a single-question or single-requirement panel return that item's partial.

Draft-report preview workflow: "Preview draft report" is the first entry in the gbar download dropdown (download_dropdown.html) and is always enabled, regardless of completion state — reviewers can preview the in-progress report at any time. Clicking it calls selectDraftReport() in review.js, which sets currentView = 'draft-report', fetches the preview into #document-viewer, and dispatches close-download-menu so the dropdown collapses (CSP-safe Alpine doesn't allow multi-statement @click handlers). The "Download Report" entries below the divider stay disabled until is_complete, so the dropdown cleanly separates "look at draft now" from "export the final artifact" — both wired to routes documented in §9 (/preview for the inline preview; /download for the final artifact downloads).

Session Dashboard (Template Sessions view): template_sessions.html renders a tile-based overview of every session under a template. Each row is a TemplateSessionRow (vibe/review/models.py), built from aggregating per-session RequirementTile records — one tile per requirement, carrying classification, is_ai_suggested, is_human_verified, and has_divergence (derived from diverged_at). Tiles drive the compact status strip and the AI-baseline indicators next to each session. The global review index (/review/) uses the parallel SessionListRow view-model, which carries session, template_info, and progress without per-requirement tiles.

Templates: 12 top-level templates + 23 partials in vibe/review/templates/review/ (the workbench rework added partials/download_dropdown.html so the gbar dropdown can be OOB-swapped from progress_bar.html).

9. ENTRY FLOW & URL STRUCTURE

Interview Mode Extension: vibe/review/interview_mode.py::ReviewModeExtension intercepts interview_mode: review templates and redirects to review UI.

URL Pattern: All routes include <template_id> for template-scoped review. Routes are defined in vibe/review/web/routes.py:

# Session management
/review/                                                  # Global index (SessionListRow)
/review/<template_id>/sessions                           # Sessions list (TemplateSessionRow)
/review/<template_id>/sessions/<session_id>/delete       # POST - delete session
/review/<template_id>/new                                # GET/POST - upload form / create session

# Ingestion (split: page vs SSE stream)
/review/<template_id>/<session_id>/ingest                # GET - ingestion progress page
/review/<template_id>/<session_id>/ingest-stream         # GET - SSE stream for ingestion
/review/<template_id>/<session_id>/start                 # POST - finalize labels/order, redirect to workbench

# Workbench
/review/<template_id>/<session_id>                       # Workbench
/review/<template_id>/<session_id>/progress              # Progress bar + sidebar OOB
/review/<template_id>/<session_id>/assessment-sidebar    # Sidebar partial
/review/<template_id>/<session_id>/requirements          # Requirements sidebar partial

# Grouped questions panel
/review/<template_id>/<session_id>/groups/<group_id>/questions  # Grouped-questions panel ("__ungrouped__" → Context Questions)

# Single-item detail panels
/review/<template_id>/<session_id>/item/question/<question_id>      # Single question
/review/<template_id>/<session_id>/item/requirement/<requirement_id># Single requirement
/review/<template_id>/<session_id>/requirement/<req_id>             # Requirement detail (legacy alias)
/review/<template_id>/<session_id>/questions                        # Legacy questions panel
/review/<template_id>/<session_id>/questions/<question_id>          # POST - save answer
/review/<template_id>/<session_id>/questions/<question_id>/suggest  # POST - AI suggestion
/review/<template_id>/<session_id>/questions/<question_id>/accept   # POST - accept suggestion

# Assessment streaming and curation
/review/<template_id>/<session_id>/assessment-stream                                       # POST SSE - single-item or batch
/review/<template_id>/<session_id>/assessment/<item_type>/<item_id>/parts                  # POST - add matched part
/review/<template_id>/<session_id>/assessment/<item_type>/<item_id>/parts/<part_db_id>     # DELETE - remove matched part
/review/<template_id>/<session_id>/assessment/requirement/<req_id>/parts/<part_db_id>/promote
/review/<template_id>/<session_id>/assessment/requirement/<req_id>/parts/<part_db_id>/demote
/review/<template_id>/<session_id>/assessment/requirement/<req_id>/reset-matched-parts     # POST - restore evidence to AI snapshot
/review/<template_id>/<session_id>/assessment/requirement/<req_id>/reset-all               # POST - undo every reviewer change
/review/<template_id>/<session_id>/requirement/<req_id>/classify                           # POST - save human classification
/review/<template_id>/<session_id>/requirement/<req_id>/save-example                       # POST - save as few-shot example
/review/<template_id>/<session_id>/assessment/<item_type>/<item_id>/promote-part/<part_db_id> # GET/POST - promote part to example

# Document viewer
/review/<template_id>/<session_id>/document                                  # GET - viewer (optional ?doc_id, ?highlight)
/review/<template_id>/<session_id>/document/<part_id>                        # GET - viewer scrolled to part
/review/<template_id>/<session_id>/document/<doc_id>/file                    # GET - original PDF/DOCX binary
/review/<template_id>/<session_id>/document/<doc_id>/page/<page_number>.png  # GET - rasterized PDF page
/review/<template_id>/<session_id>/document/<doc_id>/page/<page_number>/words.json # GET - per-word bboxes for selection overlay
/review/<template_id>/<session_id>/document/<doc_id>/parts-map               # GET - parts map (dev only)
/review/<template_id>/<session_id>/document/<doc_id>/delete                  # POST - delete document mid-review
/review/<template_id>/<session_id>/documents/<doc_id>/download               # GET - attachment download

# Examples
/review/<template_id>/examples                          # List
/review/<template_id>/examples/<example_id>             # GET edit form / POST update
/review/<template_id>/examples/<example_id>/delete      # POST - two-step delete

# Export and reports
/review/<template_id>/<session_id>/export               # XLSX export ("Export Excel" in the gbar download dropdown)
/review/<template_id>/<session_id>/download             # Final report (?type=main|pdf|docx, ?draft=1) — gated on completion in the gbar dropdown
/review/<template_id>/<session_id>/preview              # Inline HTML preview — feeds the in-pane "Draft report" view via data-draft-report-url (always enabled in the dropdown)

# Dev-only diagnostics (gated by is_devel())
/review/<template_id>/<session_id>/requirement/<req_id>/debug-prompt        # Stage-1 relevance prompt for a requirement
/review/<template_id>/<session_id>/questions/<question_id>/debug-prompt     # Stage-1 relevance prompt for a question
/review/<template_id>/<session_id>/requirement/<req_id>/used-examples       # Few-shot examples used in last AI run
/review/<template_id>/<session_id>/debug/context                            # Session context + question reviews JSON
/review/<template_id>/<session_id>/debug/probe                              # Probe applicable requirements

10. DATA FLOW DIAGRAMS

10.1 New Review Session

User uploads document(s) via /review/<template_id>/new
    → Store binaries in filestore
    → Create ReviewSessionModel (status=pending) + DocumentModels
    → Redirect to ingest route

SSE ingestion stream:
    → For each document:
        → create_document_source() → DocumentSource
        → Parse via pipeline, segment into parts with stable IDs
        → Detect language, compute embeddings (batch)
        → SSE progress events
    → Update session status to ready
    → SSE: complete (redirect to workbench)

10.2 Single Requirement Classification

Workbench loads / sidebar refreshes
    → ReviewService.load_session_requirements
        → _evaluate_computables: resolve type: computable answers,
          pre-populate every unanswered non-computable question with None,
          multi-pass until no progress (§2.2)
        → probe_template_for_requirements (req() ids ∩ template requirements)

User clicks "Run AI" for requirement D5-1
    → ReviewService.stream_assessment
        → Load existing RequirementReviewModel (if any)
        → If is_parts_curated AND curated parts exist:
            → Skip iter_retrieve_parts (reviewer already picked the set)
            → RequirementClassifier(skip_relevance_check=True).classify over the curated parts only — one LLM call
        → Else:
            → iter_retrieve_parts (multi-doc search → merge → filter → rerank)
            → SSE: stage="assessing_relevance" (candidate count)
            → RequirementClassifier.filter_relevance — one LLM call batched over all candidates, drops parts marked N
            → SSE: stage="assessing_compliance" (relevant-part count)
            → RequirementClassifier.aggregate_compliance — one LLM call over the relevant parts, returns verdict + primary_evidence + supporting_evidence
        → Freeze snapshot: write ai_original_verdict / ai_original_primary_part_ids / ai_original_supporting_part_ids
        → Clear diverged_at (new AI baseline restarts divergence tracking)
        → Persist assessment_part_ordinals so the UI can redisplay "DEL N" markers later
        → Store RequirementReviewModel (primary_part_ids / supporting_part_ids)
    → htmx: updated detail panel + sidebar row (OOB) + HX-Trigger refreshes the tile

10.3 Context Question AI Suggestion

User clicks "Suggest" for question
    → ReviewService.suggest_question_answer
        → HybridSearcher.search (query from question label/help)
        → QuestionAnswerer.answer (two-stage, non-streaming)
            → filter_relevance — one LLM call batched over all candidates, drops parts marked N
            → aggregate_answer — one LLM call over the relevant parts, returns answer + confidence + primary_evidence + supporting_evidence
        → Store suggestion in session.suggestions (primary_part_ids / supporting_part_ids)
User clicks "Accept"
    → Copy to session.context, clear suggestion
    → Re-probe template (new context may change applicable requirements)

Question suggestion uses the same two-stage pipeline as requirement classification (§10.2) but runs synchronously without SSE streaming — the Suggest button expects a single response, not progress events. When questions are assessed in a batched run via stream_assessment (e.g. "Run all"), they reuse the assessing_relevance / assessing_compliance stages and the shared relevance_batch prompt distinguishes item_kind="question" vs "requirement".

10.4 Report Rendering / Draft Preview

User opens "Preview draft report" or downloads the final report
    → ReviewService.render_report → build_template_context
        → _evaluate_computables: same helper as §10.2 — resolves type: computable
          answers and pre-populates unanswered template questions with None so
          report-side gates like {% if first_gate %} fire correctly even when
          first_gate is a computable derived from unanswered inputs (§2.2)
        → Build NestedValue with question answers + computables + review_session +
          requirements + documents
    → Jinja render of template.md / template.docx

11. FILE LOCATION INDEX

What Where
Database models (ORM) vibe/review/db_models.py (SQLAlchemy Base, persistence-only)
In-memory models vibe/review/models.py (dataclasses, enums, view models — RequirementTile, TemplateSessionRow, SessionListRow, MatchedPart)
Candidate input vibe/review/candidate.py::CandidatePart (frozen input shape for the two-stage pipeline; from_ranked / from_document_part constructors)
Stage-1 verdict parser vibe/review/_stage1_parsing.py::bucket_confidence, parse_verdicts_by_id (shared by classifier and question_answerer)
Migrations vibe/review/migrations/ (alembic.ini, env.py, versions/)
DB migrate runner vibe/review/db_migrate.py::upgrade_to_head, current_revision, check_pending
DB dump utility vibe/review/db_dump.py::run_export, run_import
Requirements loading vibe/review/requirements.py::RequirementLoader
Template probing vibe/review/template_functions.py::probe_template, probe_template_for_requirements, ProbeResult
Document ingestion vibe/review/ingestion.py::DocumentIngester
Document sources vibe/review/document_sources.py::DocumentSource, create_document_source
Outline anchors vibe/review/anchors.py::compute_anchors (stored on DocumentPartModel.part_anchor)
Heading-aware text vibe/review/document_text.py::combine, part_text, truncate_at_whitespace
Parsing pipeline vibe/review/parsing/ (see parsing-pipeline.md)
Filestore vibe/review/filestore.py
OCR backends vibe/review/parsing/extraction/ocr/ (package: backend, extractor, analysis, health, berget_docling.py for the Berget hosted docling client)
DOCX→PDF rendering vibe/review/docx_converter.py
Hybrid search vibe/review/hybrid_search.py::HybridSearcher
Document retrieval vibe/review/retrieval/search.py::DocumentSearcher
Reference search vibe/review/search.py::ReferenceSearcher
Reranking vibe/review/retrieval/reranker.py::PartReranker
Example retrieval vibe/review/retrieval/example_retriever.py::ExampleRetriever
Requirement classifier vibe/review/classifier.py::RequirementClassifier
Batch orchestration vibe/review/services/review_service.py::ReviewService.stream_assessment
Assessment vibe/review/assessment.py::AssessmentClassifier
Question answering vibe/review/question_answerer.py::QuestionAnswerer
Progress base vibe/review/progress.py::BaseProgress
Prompt building vibe/review/prompts.py::PromptBuilder
Prompt templates vibe/review/templates/prompts/*.jinja2
LLM provider base vibe.providers.llm.base.LLMProvider (see llm-providers.md)
Mock LLM (tests) vibe/review/services/mock_llm_provider.py::MockReviewLLMProvider
Embedding providers vibe/providers/embedding/ (shared)
Rerank providers vibe/providers/reranking/ (shared)
ReviewService vibe/review/services/review_service.py
Analytics service vibe/review/services/analytics_service.py
Provider factory vibe/review/services/provider_factory.py
Database utilities vibe/review/database.py
Routes vibe/review/web/routes.py
Interview mode ext. vibe/review/interview_mode.py::ReviewModeExtension
HTML templates vibe/review/templates/review/
Reference linking vibe/review/reference_linker.py::ReferenceLinker
CLI commands vibe/review/cli.py
Benchmark harness tools/vibe_dev/benchmark.py — see developer/review-benchmark.md
Config accessor vibe/config/accessor.py::get_app_config

Document Version: 2.15 Last Updated: 2026-04-30 Notes: Documented ReviewService._evaluate_computables (§2.2 template-is-truth, §7.1 Requirements responsibilities, §10.2 lead-in step, new §10.4 Report rendering): resolves type: computable questions and pre-populates unanswered non-computable template questions with None before requirement probing and report rendering, multi-pass until no progress, so req() calls and report gates that depend on computables don't silently disappear when their underlying inputs are unanswered. Documented question-side divergence helpers in §5.5 — _question_divergence and _normalize_answer_for_comparison in vibe/review/web/routes.py — including the bool-string coercion (yes/no/ja/nej/true/false/1/0) that prevents accepting an AI "yes" answer from being mis-flagged as a divergence against a stored Python True. Earlier (2.14): §8 rewritten for the four-pane workbench: documented the gbar global top bar (session crumb, combined progress + "Ask AI about all" button with data-stream-mode="all_requirements" and data-assessment-concurrency, AI-running strip with per-item dot grid, download dropdown, user chip), the OOB-swap pattern from progress_bar.html that keeps the gbar progress counter + sidebar + download dropdown in sync on every save, the corrected pane layout (pane--rail pane--left / pane--action / pane--read / pane--rail pane--right, with the document viewer now in the right-of-center read pane and a fourth right-rail TOC pane), the draft-report preview workflow (currentView === 'draft-report' driven by data-draft-report-urlreview.report_preview, "Preview draft report" always enabled vs gated download links, auto-switch on review-completed), and the new partials/download_dropdown.html (template count 12 top-level + 23 partials). §9 URL table cross-references the in-pane preview UX. Earlier sync (2.13): grouped-questions panel + UNGROUPED_PANEL_ID "Context Questions" bucket; expanded URL table (split /ingest page vs /ingest-stream SSE, document viewer + page rasterization + words-overlay routes, dev-mode debug-prompt routes); ReviewService responsibilities updated for get_relevant_question_ids, get_panel_navigation, get_assessment_navigation, get_requirement_tiles, get_question_to_group, get_questions_for_group, prepare_documents_for_display; file index gains _stage1_parsing.py, candidate.py::CandidatePart, and SessionListRow alongside TemplateSessionRow. Earlier sync: persistence split (db_models.py ORM vs. models.py view models), Alembic migrations + db_migrate.py / db_dump.py, programmatic upgrade-to-head on Flask startup, vibe review db migrate CLI + check_pending CI gate, parallel ThreadPoolExecutor-backed stream_assessment (per-worker DB session, app-context propagation, queue-ordered SSE), structured-output via LLMProvider.generate_structured, MockReviewLLMProvider test surface, and probe_template/ProbeResult plus the "template-is-truth for questions" rule (accessed_keys ∩ template question ids).