Skip to content

Template Testing

Template testing lets you define expected outcomes for your VIBE template and verify them automatically. Instead of manually clicking through interviews, you write test scenarios in a Markdown file and run them from the command line.

Quick Start

Create a file called tests.md next to your template's config.yml:

# Scenarios

## Basic contract without NDA

**Answers:**

| Question    | Answer   |
|-------------|----------|
| Client Name | Acme Inc |
| Include NDA | No       |

**Relevant questions:**
- Client Name
- Include NDA

**Not relevant questions:**
- NDA term

**Document contains:**
- "Acme Inc"

**Document excludes:**
- "Confidentiality"

Run it:

vibe test my-template

Each scenario is self-contained: it lists the answers to feed the interview and the assertions to check.

Running Behavioral Specs via Pytest (JUnit/XML reporting)

If you need CI-friendly test reports, you can run the same behavioral scenarios through pytest instead of vibe test.

pytest --vibe-tests my-template --junitxml=report.xml

Why use this mode:

  • Pytest can emit --junitxml, which many CI systems and report tools ingest directly.
  • You can still target one template at a time (--vibe-tests my-template) or all configured templates (--vibe-tests all).
  • You can narrow to one scenario name with --vibe-test-scenario "Scenario name".

Examples:

# One template -> JUnit XML output
pytest --vibe-tests beredskapsavtal --junitxml=report.xml

# One template + one scenario
pytest --vibe-tests beredskapsavtal   --vibe-test-scenario "Nivå 3 med säkerhetsprövning"   --junitxml=report.xml

vibe test is still the quickest local CLI workflow. The pytest mode is mainly for reporting pipelines and test-result aggregation.

Test File Location

By default, vibe test looks for tests.md next to your template's config.yml.

You can override this in config.yml:

# Single file
tests: my-tests.md

# Multiple files
tests:
  - basic-tests.md
  - edge-cases.md

Format Reference

A test spec file contains a Scenarios section (required) and optionally a Defaults section for reducing repetition across many scenarios.

Scenarios

The Scenarios section is an H1 heading containing H2 subsections, one per test scenario. Each scenario uses bold markers to specify answers and assertions.

Self-contained scenarios

The simplest pattern: each scenario lists all its answers in an inline table.

# Scenarios

## Greeting includes client name

> Optional description (blockquote)

**Answers:**

| Question    | Answer   |
|-------------|----------|
| Client Name | Acme Inc |

**Relevant questions:**
- Client Name

**Document contains:**
- "Acme Inc"

Answer tables use a Question column (or the localized equivalent, e.g. Fråga in Swedish) as the key column. Write the question's visible label — the text the user sees in the interview. In the Answer column, write the answer as displayed to the user (for example Yes/No or Ja/Nej for booleans, option labels like On-site support for selects).

Labels are matched using prefix-anchored matching: the value you write in the Question column must match the beginning of the question's full label. Matching is case-insensitive and normalizes whitespace. For example, if the full label is "Krav på datering av ändringar", you can write just "Krav på datering" in the test spec. However, "av ändringar" (a non-prefix substring) will not match. If a prefix matches more than one question label, the test runner raises an error — write a longer prefix to disambiguate. Exact matches are always tried first and take priority over prefix matching.

You can also add an optional Id column. When present and non-empty for a row, the runner uses the Id (the question's internal path from config.yml) instead of the label. Ids are guaranteed unique and often shorter, but labels are more readable and survive internal restructuring. A single table can mix both approaches — rows with a non-empty Id use it, other rows fall back to the Question column.

| Question    | Id          | Answer |
|-------------|-------------|--------|
| Client Name | client_name | Acme   |
| Include NDA |             | Yes    |

The Answers marker can be followed by either:

  • An inline table (as above) — the scenario is fully self-contained
  • A reference name (e.g. **Answers:** base) — refers to a named answer set in the Defaults section

Session context

Use Session context to seed session-scoped variables (for example values normally coming from dev_settings.session_context in app config).

Session context tables use variable IDs in flattened dotted notation (for example lager2.namn) and raw internal values rather than user-displayed labels. For booleans, write true/false (not Yes/No). Use the Variable column for keys and the Value column for values.

**Session context:** org_context

Or inline:

**Session context:**

| Variable    | Value         |
|-------------|---------------|
| org.name    | Initrode AB   |
| org.orgnr   | 123456-7890   |

The Session context marker can be followed by either:

  • An inline table (variable/value pairs)
  • A reference name (e.g. **Session context:** org_context) that points to a named set in Defaults

Answer overrides

Use With changes to override individual answers on top of a referenced defaults set:

**Answers:** base

**With changes:**

| Question    | Answer |
|-------------|--------|
| Include NDA | Yes    |

Complex question types

Questions with structured sub-properties — period, amount, and structured types — use a Property column to name the specific sub-field:

**Answers:**

| Question                      | Property | Answer   |
|-------------------------------|----------|----------|
| How long is the contract?     | mode     | duration |
| How long is the contract?     | quantity | 6        |
| How long is the contract?     | unit     | months   |
| How large is the project fee? | value    | 5000     |
| How large is the project fee? | currency | USD      |

The test runner groups rows by parent question and assembles them into the dict each handler expects. Numeric sub-fields (quantity for period, value for amount) are coerced to numbers automatically; all other sub-fields remain strings.

This works in inline answer tables, default sets, and With changes overrides. You can mix rows with and without the Property column in the same table.

Richer example: list, structured, and computable values

## Services with computed total

**Answers:**

| Question        | Property | Answer        |
|-----------------|----------|---------------|
| Customer        | name     | Acme AB       |
| Customer        | email    | legal@acme.se |
| Contract period | mode     | duration      |
| Contract period | quantity | 12            |
| Contract period | unit     | months        |
| Services        | 0.name   | Support       |
| Services        | 0.price  | 1000          |
| Services        | 1.name   | Training      |
| Services        | 1.price  | 500           |

**Variables:**
- total_price = 1500

**Document contains:**
- "Acme AB"
- "Support"
- "Training"

Use this pattern when you want one scenario to prove several things at once: nested values were assembled correctly, list items were accepted in order, and the computable result matched the expected total.

Session context overrides

Use With changes after Session context to override specific session-context variables:

**Session context:** org_context

**With changes:**

| Variable  | Value |
|-----------|-------|
| org.name  | Acme  |

With changes always applies to the most recent source marker:

  • If the latest source marker is Answers, it overrides answers.
  • If the latest source marker is Session context, it overrides session context.

Defaults (optional)

When many scenarios share the same base answers or session-context values, you can extract them into a Defaults section to avoid repetition. This section uses H2 subsections to define named sets, each containing a two-column Markdown table.

# Defaults

## base

| Question    | Answer   |
|-------------|----------|
| Client Name | Acme Inc |
| Include NDA | No       |

# Scenarios

## Without NDA

**Answers:** base

**Not relevant questions:**
- NDA term

## With NDA

**Answers:** base

**With changes:**

| Question    | Answer |
|-------------|--------|
| Include NDA | Yes    |

**Relevant questions:**
- NDA term

Inheritance

A child answer set can inherit from a parent by putting the parent name in parentheses:

## with_nda (base)

| Question    | Answer |
|-------------|--------|
| Include NDA | Yes    |

The child inherits all answers from the parent and overrides only the ones listed. The same inheritance mechanism can also be used for session-context sets.

Assertion Types

Marker Description
Relevant questions These questions must be asked during the interview
Not relevant questions These questions must NOT be asked
All relevant questions Exactly these questions (and no others) must be asked
Document contains The rendered document must contain these strings
Document excludes The rendered document must NOT contain these strings
Recommends These template IDs must appear in recommendations
Does not recommend These template IDs must NOT appear in recommendations
All recommendations Exactly these template IDs (and no others) must be recommended
Variables Variables must have these exact values (format: key = "value")
Has attachment An appendix with this alias must be present
Doesn't have attachment An appendix with this alias must NOT be present
All attachments Exactly these appendix aliases (and no others) must be present
Attachment "name" contains The rendered appendix must contain these strings
Attachment "name" excludes The rendered appendix must NOT contain these strings

Identifying questions in relevance assertions

Entries in Relevant questions, Not relevant questions, and All relevant questions lists are interpreted as question labels using the same prefix-anchored matching as answer tables. Write the question's visible label (or a unique prefix of it).

When a label is ambiguous or generic (e.g. several questions share the label "Motivering"), use the id: prefix to force interpretation as a question path: id:include_principer.

**Relevant questions:**
- Inkludera beredskap
- Beskriv det område som ska tryggas
- id:meddelande_hemligt_f_avtal

Localized Keywords

When your template sets locale in config.yml, write test specs using that locale's keywords. Keywords are matched case-insensitively. When a locale is set, only that locale's keywords are recognized — you cannot mix languages in the same test file.

Purpose English Svenska (sv) Deutsch (de) Français (fr) Español (es)
Section heading Defaults Grundinställningar Standardwerte Valeurs par défaut Valores predeterminados
Section heading Scenarios Scenarion Szenarien Scénarios Escenarios
Source marker Answers Svar Antworten Réponses Respuestas
Source marker Session context Sessionskontext Sitzungskontext Contexte de session Contexto de sesión
Override marker With changes Med ändringar Mit Änderungen Avec modifications Con cambios
Assertion Relevant questions Relevanta frågor Relevante Fragen Questions pertinentes Preguntas relevantes
Assertion Not relevant questions Ej relevanta frågor Nicht relevante Fragen Questions non pertinentes Preguntas no relevantes
Assertion All relevant questions Alla relevanta frågor Alle relevanten Fragen Toutes les questions pertinentes Todas las preguntas relevantes
Assertion Document contains Dokumentet innehåller Dokument enthält Le document contient El documento contiene
Assertion Document excludes Dokumentet innehåller inte Dokument enthält nicht Le document ne contient pas El documento no contiene
Assertion Recommends Rekommenderar Empfiehlt Recommande Recomienda
Assertion Does not recommend Rekommenderar inte Empfiehlt nicht Ne recommande pas No recomienda
Assertion All recommendations Alla rekommendationer Alle Empfehlungen Toutes les recommandations Todas las recomendaciones
Assertion Variables Variabler Variablen Variables Variables
Assertion Has attachment Har bilaga Hat Anlage A une annexe Tiene anexo
Assertion Doesn't have attachment Har inte bilaga Hat keine Anlage N'a pas d'annexe No tiene anexo
Assertion All attachments Alla bilagor Alle Anlagen Toutes les annexes Todos los anexos
Assertion Attachment "X" contains Bilaga "X" innehåller Anlage "X" enthält Annexe "X" contient Anexo "X" contiene
Assertion Attachment "X" excludes Bilaga "X" innehåller inte Anlage "X" enthält nicht Annexe "X" ne contient pas Anexo "X" no contiene

Localized Column Names

Table column headers are recognized in all supported languages. The runner matches column headers case-insensitively.

Answer tables

Purpose English (en) Svenska (sv) Deutsch (de) Français (fr) Español (es)
Question column Question Fråga Frage Question Pregunta
Answer column Answer Svar Antwort Réponse Respuesta
ID column Id Id Id Id Id
Property column Property Egenskap Eigenschaft Propriété Propiedad

The Answer column contains human-readable values as displayed in the interview (e.g. Yes, Ja, On-site support). Use the Value column instead when you need to write raw internal values (e.g. true, on_site). Both are recognized as the value column, but Answer is preferred for readability.

Session context tables

Purpose English (en) Svenska (sv) Deutsch (de) Français (fr) Español (es)
Key column Variable Variabel Variable Variable Variable
Value column Value Värde Wert Valeur Valor

The Value column contains raw internal values (e.g. true/false, not Yes/No). Session context keys are always variable IDs in dotted notation.

Running Tests

# Test a specific template
vibe test my-template

# Test all templates that have test specs
vibe test

# Verbose output (show all assertion details)
vibe test my-template --verbose

# Run only matching scenarios (case-insensitive substring match)
vibe test my-template --scenario "basic"

# Run a scenario in a headed browser (see Visual Test Runner below)
vibe test my-template --scenario "My scenario name" --visual

Visual Test Runner

The --visual flag runs scenarios in a headed Playwright browser so you can watch the interview being filled in. This is useful for debugging layout issues, observing conditional field visibility, or verifying that the interview flow looks correct to an end user.

# Run a scenario visually
vibe test my-template --scenario "Basic contract" --visual

# Keep the browser open after the scenario finishes for manual inspection
vibe test my-template --scenario "Basic contract" --pause

--pause implies --visual. When paused, the browser stays open and the terminal waits for you to press Enter before continuing.

How it works

The visual runner starts a real Flask server in a background thread, launches a headed Chromium browser via Playwright, and navigates to the interview page. It then iterates through the answer table from the scenario, locating each field's widget on the page and filling it in (text inputs, radio buttons, selects, checkboxes, composite fields). After each field is filled, the runner waits for htmx to settle before proceeding.

If a field is not yet visible (e.g. it depends on another answer), the runner moves on and retries on the next round. It also navigates between tabs, pages, and accordion sections automatically to reach fields on other pages.

After all answers are applied, the runner collects results (relevant questions, rendered document content, recommendations) and checks them against the scenario's assertions, just like the headless runner.

Limitations

  • Requires Playwright: You need playwright installed with Chromium (playwright install chromium).
  • One scenario at a time: --visual requires --scenario to select a single scenario.
  • Variables assertion: The Variables assertion is not available in visual mode since the runner cannot inspect server-side session state directly.

Exit Codes

Code Meaning
0 All scenarios passed (or no test specs found)
1 One or more scenarios failed
2 Setup error (e.g., could not create Flask app)

Tips

  • Start simple: Write self-contained scenarios with inline answer tables. Each scenario should be readable on its own.
  • Extract defaults when it pays off: Once you have many scenarios sharing the same base answers, move the common answers into a Defaults section and reference them.
  • Test both directions: assert what should be relevant AND what should not be relevant. This catches accidental question visibility.
  • Quote strings in Document contains/excludes assertions. The parser strips surrounding quotes so you can include phrases with special characters.
  • Use labels for readability: In both answer tables and relevance assertions, prefer question labels over internal IDs. Use the id: prefix only when labels are ambiguous or generic.

Example Patterns

Component-driven branch

Use paired scenarios to prove that an optional component appears only when its gate is enabled:

## Without confidentiality clause

**Answers:**

| Question                 | Answer |
|--------------------------|--------|
| Include confidentiality? | No     |

**Document excludes:**
- "Confidential information"

## With confidentiality clause

**Answers:**

| Question                 | Answer |
|--------------------------|--------|
| Include confidentiality? | Yes    |

**Document contains:**
- "Confidential information"

Session context override

Use a shared context set plus scenario-specific overrides when one template behaves differently by tenant or organization:

# Defaults

## org_context

| Variable         | Value      |
|------------------|------------|
| organization.name| Example AB |
| flags.procurement| true       |

# Scenarios

## Procurement enabled

**Session context:** org_context

**Document contains:**
- "Example AB"

## Procurement disabled

**Session context:** org_context

**With changes:**

| Variable          | Value |
|-------------------|-------|
| flags.procurement | false |

**Document excludes:**
- "procurement"

Testing computed outputs explicitly

When a branch depends on a computable variable, assert both the rendered text and the internal variable value:

## Volume discount applies

**Answers:**

| Question | Answer |
|----------|--------|
| Quantity | 25     |
| Unit fee | 100    |

**Variables:**
- discount_amount = 250
- total_price = 2250

**Document contains:**
- "2250"

Testing appendices

When your template generates appendices via appendix(), you can assert their existence and content. The appendix name used in assertions is the alias passed to appendix() in your template (e.g., appendix('pricing_schedule', alias='main_pricing') uses main_pricing).

Existence checks verify whether an appendix was generated:

## Enterprise contract includes pricing appendix

**Answers:**

| Question      | Answer     |
|---------------|------------|
| Contract type | Enterprise |

**Has attachment:** main_pricing

**Document contains:**
- "Appendix A: Pricing Schedule"

## Basic contract has no appendix

**Answers:**

| Question      | Answer |
|---------------|--------|
| Contract type | Basic  |

**Doesn't have attachment:** main_pricing

**Document excludes:**
- "Appendix"

Content checks verify text inside a specific appendix, using the Attachment "alias" contains and Attachment "alias" excludes syntax:

## Pricing appendix shows volume discount

**Answers:**

| Question         | Answer     |
|------------------|------------|
| Contract type    | Enterprise |
| Volume discount? | Yes        |

**Has attachment:** main_pricing

**Attachment "main_pricing" contains:**
- "Volume Discount"
- "10% for orders over 100 units"

**Attachment "main_pricing" excludes:**
- "No discounts available"

Exact attachment list verifies that only specific appendices are generated and no others:

## Full contract has both appendices

**All attachments:**
- main_pricing
- sla_appendix

These assertions can be freely combined with all other assertion types in the same scenario.

See Also