Skip to content

Review CLI (Developer)

This page documents the developer-facing command layout for the Review extension.

Which CLI?

Use vibe review when you are operating a review template or its stored data:

  • create and inspect review sessions
  • import reference sources and example corpora
  • initialize or reset the review database
  • rebuild embeddings or maintain render support

Use vibe-dev review when you are debugging or evaluating the Review extension itself:

  • inspect parser output and stored document parts
  • inspect structured review logs
  • benchmark retrieval and assessment quality
  • lint or validate golden records
  • use developer-only playgrounds and database shells

One command moved out of review entirely:

  • vibe-dev pdf dump-text for raw pdftotext-style PDF inspection

Command Groups

vibe-dev review is now organized by workflow:

  • logs Filter and summarize structured review logs.
  • docs Inspect stored document parts or round-trip them through JSON.
  • parse Debug parsing output and related layout tooling.
  • eval Evaluate quality against goldens or summarize human agreement with AI suggestions.
  • db Open a developer database shell.
  • golden Lint review golden files.
  • lab Run interactive playground tools.

Commands

Logs

vibe-dev review logs filter --stage bm25
vibe-dev review logs summarize --run <batch-run-id>

Docs

vibe-dev review docs inspect 42
vibe-dev review docs export 42 --output doc.json
vibe-dev review docs import doc.json --session 7

Parse

vibe-dev review parse doc contract.pdf --level parts
vibe-dev review parse validate contract.pdf --mode structure
vibe-dev review parse yolo-layout contract.pdf

Eval

vibe-dev review eval benchmark case.golden-assessment.yml
vibe-dev review eval benchmark -r ../vibe-review-corpus           # walk a whole corpus
vibe-dev review eval benchmark case.golden-assessment.yml --evaluation claude
vibe-dev review eval benchmark case.golden-assessment.yml \
  --evaluation berget_gpt_oss_120b \
  --evaluation berget_gemma4_31b \
  --evaluation mistral_small_24b   # retrieve once, assess against each model, compare
vibe-dev review eval agreement --template dora-review
  • benchmark measures retrieval and assessment quality against curated *.golden-assessment.yml files. Accepts one or more files; pass -r / --recursive to expand directory arguments via **/*.golden-assessment.yml. Use --evaluation <endpoint> to override the LLM endpoint used for the final relevance + compliance calls — any key from llm_endpoints in the template's config.yml (e.g. claude, local, mock). Repeat --evaluation to benchmark several models against the same retrieved parts in a single run; retrieval executes once and the report adds a side-by-side model-comparison table (aggregate scores plus per-requirement predictions). See Review Benchmark for the golden-assessment schema, scoring semantics, and the full CLI surface.
  • agreement measures how often human reviewers agree with AI suggestions in the live review database.

DB

vibe-dev review db shell

Golden

vibe-dev review golden lint-parse path/to/corpus
vibe-dev review golden lint-assessment path/to/corpus
  • lint-parse checks parsing golden.yml files.
  • lint-assessment checks *.golden-assessment.yml files used by the benchmark harness.

Lab

vibe-dev review lab playground 12

Common Workflows

Debug a Parsing Problem

  1. Dump the parser output: vibe-dev review parse doc contract.pdf --level parts
  2. Compare against the expected artifact: vibe-dev review parse validate contract.pdf
  3. If needed, inspect the low-level text layer: vibe-dev pdf dump-text contract.pdf

Debug a Bad Assessment

  1. Summarize the structured logs: vibe-dev review logs summarize --run <id>
  2. Inspect the stored parts: vibe-dev review docs inspect <doc-id>
  3. Re-run benchmark cases if the regression should be covered by goldens: vibe-dev review eval benchmark ...