Review CLI (Developer)¶

This page documents the developer-facing command layout for the Review extension.

Which CLI?¶

Use vibe review when you are operating a review template or its stored data:

create and inspect review sessions
import reference sources and example corpora
initialize or reset the review database
rebuild embeddings or maintain render support

Use vibe-dev review when you are debugging or evaluating the Review extension itself:

inspect parser output and stored document parts
inspect structured review logs
benchmark retrieval and assessment quality
lint or validate golden records
use developer-only playgrounds and database shells

One command moved out of review entirely:

vibe-dev pdf dump-text for raw pdftotext-style PDF inspection

Command Groups¶

vibe-dev review is now organized by workflow:

logs Filter and summarize structured review logs.
docs Inspect stored document parts or round-trip them through JSON.
parse Debug parsing output and related layout tooling.
eval Evaluate quality against goldens or summarize human agreement with AI suggestions.
db Open a developer database shell.
golden Lint review golden files.
lab Run interactive playground tools.

Commands¶

Logs¶

vibe-dev review logs filter --stage bm25
vibe-dev review logs summarize --run <batch-run-id>

Docs¶

vibe-dev review docs inspect 42
vibe-dev review docs export 42 --output doc.json
vibe-dev review docs import doc.json --session 7

Parse¶

vibe-dev review parse doc contract.pdf --level parts
vibe-dev review parse validate contract.pdf --mode structure
vibe-dev review parse yolo-layout contract.pdf

Eval¶

vibe-dev review eval benchmark case.golden-assessment.yml
vibe-dev review eval benchmark -r ../vibe-review-corpus           # walk a whole corpus
vibe-dev review eval benchmark case.golden-assessment.yml --evaluation claude
vibe-dev review eval benchmark case.golden-assessment.yml \
  --evaluation berget_gpt_oss_120b \
  --evaluation berget_gemma4_31b \
  --evaluation mistral_small_24b   # retrieve once, assess against each model, compare
vibe-dev review eval agreement --template dora-review

benchmark measures retrieval and assessment quality against curated *.golden-assessment.yml files. Accepts one or more files; pass -r / --recursive to expand directory arguments via **/*.golden-assessment.yml. Use --evaluation <endpoint> to override the LLM endpoint used for the final relevance + compliance calls — any key from llm_endpoints in the template's config.yml (e.g. claude, local, mock). Repeat --evaluation to benchmark several models against the same retrieved parts in a single run; retrieval executes once and the report adds a side-by-side model-comparison table (aggregate scores plus per-requirement predictions). See Review Benchmark for the golden-assessment schema, scoring semantics, and the full CLI surface.
agreement measures how often human reviewers agree with AI suggestions in the live review database.

DB¶

vibe-dev review db shell

Golden¶

vibe-dev review golden lint-parse path/to/corpus
vibe-dev review golden lint-assessment path/to/corpus

lint-parse checks parsing golden.yml files.
lint-assessment checks *.golden-assessment.yml files used by the benchmark harness.

Lab¶

vibe-dev review lab playground 12

Common Workflows¶

Debug a Parsing Problem¶

Dump the parser output: vibe-dev review parse doc contract.pdf --level parts
Compare against the expected artifact: vibe-dev review parse validate contract.pdf
If needed, inspect the low-level text layer: vibe-dev pdf dump-text contract.pdf

Debug a Bad Assessment¶

Summarize the structured logs: vibe-dev review logs summarize --run <id>
Inspect the stored parts: vibe-dev review docs inspect <doc-id>
Re-run benchmark cases if the regression should be covered by goldens: vibe-dev review eval benchmark ...