Template Testing¶

Template testing lets you define expected outcomes for your VIBE template and verify them automatically. Instead of manually clicking through interviews, you write test scenarios in a Markdown file and run them from the command line.

Quick Start¶

Create a file called tests.md next to your template's config.yml:

# Scenarios

## Basic contract without NDA

**Answers:**

| Question    | Answer   |
|-------------|----------|
| Client Name | Acme Inc |
| Include NDA | No       |

**Relevant questions:**
- Client Name
- Include NDA

**Not relevant questions:**
- NDA term

**Document contains:**
- "Acme Inc"

**Document excludes:**
- "Confidentiality"

Run it:

vibe test my-template

Each scenario is self-contained: it lists the answers to feed the interview and the assertions to check.

Running Behavioral Specs via Pytest (JUnit/XML reporting)¶

If you need CI-friendly test reports, you can run the same behavioral scenarios through pytest instead of vibe test.

pytest --vibe-tests my-template --junitxml=report.xml

Why use this mode:

Pytest can emit --junitxml, which many CI systems and report tools ingest directly.
You can still target one template at a time (--vibe-tests my-template) or all configured templates (--vibe-tests all).
You can narrow to one scenario name with --vibe-test-scenario "Scenario name".

Examples:

# One template -> JUnit XML output
pytest --vibe-tests beredskapsavtal --junitxml=report.xml

# One template + one scenario
pytest --vibe-tests beredskapsavtal   --vibe-test-scenario "Nivå 3 med säkerhetsprövning"   --junitxml=report.xml

vibe test is still the quickest local CLI workflow. The pytest mode is mainly for reporting pipelines and test-result aggregation.

Test File Location¶

By default, vibe test looks for tests.md next to your template's config.yml.

You can override this in config.yml:

# Single file
tests: my-tests.md

# Multiple files
tests:
  - basic-tests.md
  - edge-cases.md

Format Reference¶

A test spec file contains a Scenarios section (required) and optionally a Defaults section for reducing repetition across many scenarios.

Scenarios¶

The Scenarios section is an H1 heading containing H2 subsections, one per test scenario. Each scenario uses bold markers to specify answers and assertions.

Self-contained scenarios¶

The simplest pattern: each scenario lists all its answers in an inline table.

# Scenarios

## Greeting includes client name

> Optional description (blockquote)

**Answers:**

| Question    | Answer   |
|-------------|----------|
| Client Name | Acme Inc |

**Relevant questions:**
- Client Name

**Document contains:**
- "Acme Inc"

Answer tables use a Question column (or the localized equivalent, e.g. Fråga in Swedish) as the key column. Write the question's visible label — the text the user sees in the interview. In the Answer column, write the answer as displayed to the user (for example Yes/No or Ja/Nej for booleans, option labels like On-site support for selects).

Labels are matched using prefix-anchored matching: the value you write in the Question column must match the beginning of the question's full label. Matching is case-insensitive and normalizes whitespace. For example, if the full label is "Krav på datering av ändringar", you can write just "Krav på datering" in the test spec. However, "av ändringar" (a non-prefix substring) will not match. If a prefix matches more than one question label, the test runner raises an error — write a longer prefix to disambiguate. Exact matches are always tried first and take priority over prefix matching.

You can also add an optional Id column. When present and non-empty for a row, the runner uses the Id (the question's internal path from config.yml) instead of the label. Ids are guaranteed unique and often shorter, but labels are more readable and survive internal restructuring. A single table can mix both approaches — rows with a non-empty Id use it, other rows fall back to the Question column.

| Question    | Id          | Answer |
|-------------|-------------|--------|
| Client Name | client_name | Acme   |
| Include NDA |             | Yes    |

The Answers marker can be followed by either:

An inline table (as above) — the scenario is fully self-contained
A reference name (e.g. **Answers:** base) — refers to a named answer set in the Defaults section

Session context¶

Use Session context to seed session-scoped variables (for example values normally coming from dev_settings.session_context in app config).

Session context tables use variable IDs in flattened dotted notation (for example lager2.namn) and raw internal values rather than user-displayed labels. For booleans, write true/false (not Yes/No). Use the Variable column for keys and the Value column for values.

**Session context:** org_context

Or inline:

**Session context:**

| Variable    | Value         |
|-------------|---------------|
| org.name    | Initrode AB   |
| org.orgnr   | 123456-7890   |

The Session context marker can be followed by either:

An inline table (variable/value pairs)
A reference name (e.g. **Session context:** org_context) that points to a named set in Defaults

Answer overrides¶

Use With changes to override individual answers on top of a referenced defaults set:

**Answers:** base

**With changes:**

| Question    | Answer |
|-------------|--------|
| Include NDA | Yes    |

Complex question types¶

Questions with structured sub-properties — period, amount, and structured types — use a Property column to name the specific sub-field:

**Answers:**

| Question                      | Property | Answer   |
|-------------------------------|----------|----------|
| How long is the contract?     | mode     | duration |
| How long is the contract?     | quantity | 6        |
| How long is the contract?     | unit     | months   |
| How large is the project fee? | value    | 5000     |
| How large is the project fee? | currency | USD      |

The test runner groups rows by parent question and assembles them into the dict each handler expects. Numeric sub-fields (quantity for period, value for amount) are coerced to numbers automatically; all other sub-fields remain strings.

This works in inline answer tables, default sets, and With changes overrides. You can mix rows with and without the Property column in the same table.

Richer example: list, structured, and computable values¶

## Services with computed total

**Answers:**

| Question        | Property | Answer        |
|-----------------|----------|---------------|
| Customer        | name     | Acme AB       |
| Customer        | email    | legal@acme.se |
| Contract period | mode     | duration      |
| Contract period | quantity | 12            |
| Contract period | unit     | months        |
| Services        | 0.name   | Support       |
| Services        | 0.price  | 1000          |
| Services        | 1.name   | Training      |
| Services        | 1.price  | 500           |

**Variables:**
- total_price = 1500

**Document contains:**
- "Acme AB"
- "Support"
- "Training"

Use this pattern when you want one scenario to prove several things at once: nested values were assembled correctly, list items were accepted in order, and the computable result matched the expected total.

Session context overrides¶

Use With changes after Session context to override specific session-context variables:

**Session context:** org_context

**With changes:**

| Variable  | Value |
|-----------|-------|
| org.name  | Acme  |

With changes always applies to the most recent source marker:

If the latest source marker is Answers, it overrides answers.
If the latest source marker is Session context, it overrides session context.

Defaults (optional)¶

When many scenarios share the same base answers or session-context values, you can extract them into a Defaults section to avoid repetition. This section uses H2 subsections to define named sets, each containing a two-column Markdown table.

# Defaults

## base

| Question    | Answer   |
|-------------|----------|
| Client Name | Acme Inc |
| Include NDA | No       |

# Scenarios

## Without NDA

**Answers:** base

**Not relevant questions:**
- NDA term

## With NDA

**Answers:** base

**With changes:**

| Question    | Answer |
|-------------|--------|
| Include NDA | Yes    |

**Relevant questions:**
- NDA term

Inheritance¶

A child answer set can inherit from a parent by putting the parent name in parentheses:

## with_nda (base)

| Question    | Answer |
|-------------|--------|
| Include NDA | Yes    |

The child inherits all answers from the parent and overrides only the ones listed. The same inheritance mechanism can also be used for session-context sets.

Assertion Types¶

Marker	Description
Relevant questions	These questions must be asked during the interview
Not relevant questions	These questions must NOT be asked
All relevant questions	Exactly these questions (and no others) must be asked
Document contains	The rendered document must contain these strings
Document excludes	The rendered document must NOT contain these strings
Recommends	These template IDs must appear in recommendations
Does not recommend	These template IDs must NOT appear in recommendations
All recommendations	Exactly these template IDs (and no others) must be recommended
Variables	Variables must have these exact values (format: `key = "value"`)
Has attachment	An appendix with this alias must be present
Doesn't have attachment	An appendix with this alias must NOT be present
All attachments	Exactly these appendix aliases (and no others) must be present
Attachment "name" contains	The rendered appendix must contain these strings
Attachment "name" excludes	The rendered appendix must NOT contain these strings

Identifying questions in relevance assertions¶

Entries in Relevant questions, Not relevant questions, and All relevant questions lists are interpreted as question labels using the same prefix-anchored matching as answer tables. Write the question's visible label (or a unique prefix of it).

When a label is ambiguous or generic (e.g. several questions share the label "Motivering"), use the id: prefix to force interpretation as a question path: id:include_principer.

**Relevant questions:**
- Inkludera beredskap
- Beskriv det område som ska tryggas
- id:meddelande_hemligt_f_avtal

Localized Keywords¶

When your template sets locale in config.yml, write test specs using that locale's keywords. Keywords are matched case-insensitively. When a locale is set, only that locale's keywords are recognized — you cannot mix languages in the same test file.

Purpose	English	Svenska (`sv`)	Deutsch (`de`)	Français (`fr`)	Español (`es`)
Section heading	Defaults	Grundinställningar	Standardwerte	Valeurs par défaut	Valores predeterminados
Section heading	Scenarios	Scenarion	Szenarien	Scénarios	Escenarios
Source marker	Answers	Svar	Antworten	Réponses	Respuestas
Source marker	Session context	Sessionskontext	Sitzungskontext	Contexte de session	Contexto de sesión
Override marker	With changes	Med ändringar	Mit Änderungen	Avec modifications	Con cambios
Assertion	Relevant questions	Relevanta frågor	Relevante Fragen	Questions pertinentes	Preguntas relevantes
Assertion	Not relevant questions	Ej relevanta frågor	Nicht relevante Fragen	Questions non pertinentes	Preguntas no relevantes
Assertion	All relevant questions	Alla relevanta frågor	Alle relevanten Fragen	Toutes les questions pertinentes	Todas las preguntas relevantes
Assertion	Document contains	Dokumentet innehåller	Dokument enthält	Le document contient	El documento contiene
Assertion	Document excludes	Dokumentet innehåller inte	Dokument enthält nicht	Le document ne contient pas	El documento no contiene
Assertion	Recommends	Rekommenderar	Empfiehlt	Recommande	Recomienda
Assertion	Does not recommend	Rekommenderar inte	Empfiehlt nicht	Ne recommande pas	No recomienda
Assertion	All recommendations	Alla rekommendationer	Alle Empfehlungen	Toutes les recommandations	Todas las recomendaciones
Assertion	Variables	Variabler	Variablen	Variables	Variables
Assertion	Has attachment	Har bilaga	Hat Anlage	A une annexe	Tiene anexo
Assertion	Doesn't have attachment	Har inte bilaga	Hat keine Anlage	N'a pas d'annexe	No tiene anexo
Assertion	All attachments	Alla bilagor	Alle Anlagen	Toutes les annexes	Todos los anexos
Assertion	Attachment "X" contains	Bilaga "X" innehåller	Anlage "X" enthält	Annexe "X" contient	Anexo "X" contiene
Assertion	Attachment "X" excludes	Bilaga "X" innehåller inte	Anlage "X" enthält nicht	Annexe "X" ne contient pas	Anexo "X" no contiene

Localized Column Names¶

Table column headers are recognized in all supported languages. The runner matches column headers case-insensitively.

Answer tables¶

Purpose	English (`en`)	Svenska (`sv`)	Deutsch (`de`)	Français (`fr`)	Español (`es`)
Question column	Question	Fråga	Frage	Question	Pregunta
Answer column	Answer	Svar	Antwort	Réponse	Respuesta
ID column	Id	Id	Id	Id	Id
Property column	Property	Egenskap	Eigenschaft	Propriété	Propiedad

The Answer column contains human-readable values as displayed in the interview (e.g. Yes, Ja, On-site support). Use the Value column instead when you need to write raw internal values (e.g. true, on_site). Both are recognized as the value column, but Answer is preferred for readability.

Session context tables¶

Purpose	English (`en`)	Svenska (`sv`)	Deutsch (`de`)	Français (`fr`)	Español (`es`)
Key column	Variable	Variabel	Variable	Variable	Variable
Value column	Value	Värde	Wert	Valeur	Valor

The Value column contains raw internal values (e.g. true/false, not Yes/No). Session context keys are always variable IDs in dotted notation.

Running Tests¶

# Test a specific template
vibe test my-template

# Test all templates that have test specs
vibe test

# Verbose output (show all assertion details)
vibe test my-template --verbose

# Run only matching scenarios (case-insensitive substring match)
vibe test my-template --scenario "basic"

# Run a scenario in a headed browser (see Visual Test Runner below)
vibe test my-template --scenario "My scenario name" --visual

Visual Test Runner¶

The --visual flag runs scenarios in a headed Playwright browser so you can watch the interview being filled in. This is useful for debugging layout issues, observing conditional field visibility, or verifying that the interview flow looks correct to an end user.

# Run a scenario visually
vibe test my-template --scenario "Basic contract" --visual

# Keep the browser open after the scenario finishes for manual inspection
vibe test my-template --scenario "Basic contract" --pause

--pause implies --visual. When paused, the browser stays open and the terminal waits for you to press Enter before continuing.

How it works¶

The visual runner starts a real Flask server in a background thread, launches a headed Chromium browser via Playwright, and navigates to the interview page. It then iterates through the answer table from the scenario, locating each field's widget on the page and filling it in (text inputs, radio buttons, selects, checkboxes, composite fields). After each field is filled, the runner waits for htmx to settle before proceeding.

If a field is not yet visible (e.g. it depends on another answer), the runner moves on and retries on the next round. It also navigates between tabs, pages, and accordion sections automatically to reach fields on other pages.

After all answers are applied, the runner collects results (relevant questions, rendered document content, recommendations) and checks them against the scenario's assertions, just like the headless runner.

Limitations¶

Requires Playwright: You need playwright installed with Chromium (playwright install chromium).
One scenario at a time: --visual requires --scenario to select a single scenario.
Variables assertion: The Variables assertion is not available in visual mode since the runner cannot inspect server-side session state directly.

Exit Codes¶

Code	Meaning
0	All scenarios passed (or no test specs found)
1	One or more scenarios failed
2	Setup error (e.g., could not create Flask app)

Tips¶

Start simple: Write self-contained scenarios with inline answer tables. Each scenario should be readable on its own.
Extract defaults when it pays off: Once you have many scenarios sharing the same base answers, move the common answers into a Defaults section and reference them.
Test both directions: assert what should be relevant AND what should not be relevant. This catches accidental question visibility.
Quote strings in Document contains/excludes assertions. The parser strips surrounding quotes so you can include phrases with special characters.
Use labels for readability: In both answer tables and relevance assertions, prefer question labels over internal IDs. Use the id: prefix only when labels are ambiguous or generic.

Example Patterns¶

Component-driven branch¶

Use paired scenarios to prove that an optional component appears only when its gate is enabled:

## Without confidentiality clause

**Answers:**

| Question                 | Answer |
|--------------------------|--------|
| Include confidentiality? | No     |

**Document excludes:**
- "Confidential information"

## With confidentiality clause

**Answers:**

| Question                 | Answer |
|--------------------------|--------|
| Include confidentiality? | Yes    |

**Document contains:**
- "Confidential information"

Session context override¶

Use a shared context set plus scenario-specific overrides when one template behaves differently by tenant or organization:

# Defaults

## org_context

| Variable         | Value      |
|------------------|------------|
| organization.name| Example AB |
| flags.procurement| true       |

# Scenarios

## Procurement enabled

**Session context:** org_context

**Document contains:**
- "Example AB"

## Procurement disabled

**Session context:** org_context

**With changes:**

| Variable          | Value |
|-------------------|-------|
| flags.procurement | false |

**Document excludes:**
- "procurement"

Testing computed outputs explicitly¶

When a branch depends on a computable variable, assert both the rendered text and the internal variable value:

## Volume discount applies

**Answers:**

| Question | Answer |
|----------|--------|
| Quantity | 25     |
| Unit fee | 100    |

**Variables:**
- discount_amount = 250
- total_price = 2250

**Document contains:**
- "2250"

Testing appendices¶

When your template generates appendices via appendix(), you can assert their existence and content. The appendix name used in assertions is the alias passed to appendix() in your template (e.g., appendix('pricing_schedule', alias='main_pricing') uses main_pricing).

Existence checks verify whether an appendix was generated:

## Enterprise contract includes pricing appendix

**Answers:**

| Question      | Answer     |
|---------------|------------|
| Contract type | Enterprise |

**Has attachment:** main_pricing

**Document contains:**
- "Appendix A: Pricing Schedule"

## Basic contract has no appendix

**Answers:**

| Question      | Answer |
|---------------|--------|
| Contract type | Basic  |

**Doesn't have attachment:** main_pricing

**Document excludes:**
- "Appendix"

Content checks verify text inside a specific appendix, using the Attachment "alias" contains and Attachment "alias" excludes syntax:

## Pricing appendix shows volume discount

**Answers:**

| Question         | Answer     |
|------------------|------------|
| Contract type    | Enterprise |
| Volume discount? | Yes        |

**Has attachment:** main_pricing

**Attachment "main_pricing" contains:**
- "Volume Discount"
- "10% for orders over 100 units"

**Attachment "main_pricing" excludes:**
- "No discounts available"

Exact attachment list verifies that only specific appendices are generated and no others:

## Full contract has both appendices

**All attachments:**
- main_pricing
- sla_appendix

These assertions can be freely combined with all other assertion types in the same scenario.