Template Testing¶
Template testing lets you define expected outcomes for your VIBE template and verify them automatically. Instead of manually clicking through interviews, you write test scenarios in a Markdown file and run them from the command line.
Quick Start¶
Create a file called tests.md next to your template's config.yml:
# Scenarios
## Basic contract without NDA
**Answers:**
| Question | Answer |
|-------------|----------|
| Client Name | Acme Inc |
| Include NDA | No |
**Relevant questions:**
- Client Name
- Include NDA
**Not relevant questions:**
- NDA term
**Document contains:**
- "Acme Inc"
**Document excludes:**
- "Confidentiality"
Run it:
Each scenario is self-contained: it lists the answers to feed the interview and the assertions to check.
Running Behavioral Specs via Pytest (JUnit/XML reporting)¶
If you need CI-friendly test reports, you can run the same behavioral scenarios through pytest instead of vibe test.
Why use this mode:
- Pytest can emit
--junitxml, which many CI systems and report tools ingest directly. - You can still target one template at a time (
--vibe-tests my-template) or all configured templates (--vibe-tests all). - You can narrow to one scenario name with
--vibe-test-scenario "Scenario name".
Examples:
# One template -> JUnit XML output
pytest --vibe-tests beredskapsavtal --junitxml=report.xml
# One template + one scenario
pytest --vibe-tests beredskapsavtal --vibe-test-scenario "Nivå 3 med säkerhetsprövning" --junitxml=report.xml
vibe test is still the quickest local CLI workflow. The pytest mode is mainly for reporting pipelines and test-result aggregation.
Test File Location¶
By default, vibe test looks for tests.md next to your template's config.yml.
You can override this in config.yml:
Format Reference¶
A test spec file contains a Scenarios section (required) and optionally a Defaults section for reducing repetition across many scenarios.
Scenarios¶
The Scenarios section is an H1 heading containing H2 subsections, one per test scenario. Each scenario uses bold markers to specify answers and assertions.
Self-contained scenarios¶
The simplest pattern: each scenario lists all its answers in an inline table.
# Scenarios
## Greeting includes client name
> Optional description (blockquote)
**Answers:**
| Question | Answer |
|-------------|----------|
| Client Name | Acme Inc |
**Relevant questions:**
- Client Name
**Document contains:**
- "Acme Inc"
Answer tables use a Question column (or the localized equivalent, e.g. Fråga in Swedish) as the key column. Write the question's visible label — the text the user sees in the interview. In the Answer column, write the answer as displayed to the user (for example Yes/No or Ja/Nej for booleans, option labels like On-site support for selects).
Labels are matched using prefix-anchored matching: the value you write in the Question column must match the beginning of the question's full label. Matching is case-insensitive and normalizes whitespace. For example, if the full label is "Krav på datering av ändringar", you can write just "Krav på datering" in the test spec. However, "av ändringar" (a non-prefix substring) will not match. If a prefix matches more than one question label, the test runner raises an error — write a longer prefix to disambiguate. Exact matches are always tried first and take priority over prefix matching.
You can also add an optional Id column. When present and non-empty for a row, the runner uses the Id (the question's internal path from config.yml) instead of the label. Ids are guaranteed unique and often shorter, but labels are more readable and survive internal restructuring. A single table can mix both approaches — rows with a non-empty Id use it, other rows fall back to the Question column.
| Question | Id | Answer |
|-------------|-------------|--------|
| Client Name | client_name | Acme |
| Include NDA | | Yes |
The Answers marker can be followed by either:
- An inline table (as above) — the scenario is fully self-contained
- A reference name (e.g.
**Answers:** base) — refers to a named answer set in the Defaults section
Session context¶
Use Session context to seed session-scoped variables (for example values normally coming from dev_settings.session_context in app config).
Session context tables use variable IDs in flattened dotted notation (for example lager2.namn) and raw internal values rather than user-displayed labels. For booleans, write true/false (not Yes/No). Use the Variable column for keys and the Value column for values.
Or inline:
**Session context:**
| Variable | Value |
|-------------|---------------|
| org.name | Initrode AB |
| org.orgnr | 123456-7890 |
The Session context marker can be followed by either:
- An inline table (variable/value pairs)
- A reference name (e.g.
**Session context:** org_context) that points to a named set in Defaults
Answer overrides¶
Use With changes to override individual answers on top of a referenced defaults set:
**Answers:** base
**With changes:**
| Question | Answer |
|-------------|--------|
| Include NDA | Yes |
Complex question types¶
Questions with structured sub-properties — period, amount, and structured types — use a Property column to name the specific sub-field:
**Answers:**
| Question | Property | Answer |
|-------------------------------|----------|----------|
| How long is the contract? | mode | duration |
| How long is the contract? | quantity | 6 |
| How long is the contract? | unit | months |
| How large is the project fee? | value | 5000 |
| How large is the project fee? | currency | USD |
The test runner groups rows by parent question and assembles them into the dict each handler expects. Numeric sub-fields (quantity for period, value for amount) are coerced to numbers automatically; all other sub-fields remain strings.
This works in inline answer tables, default sets, and With changes overrides. You can mix rows with and without the Property column in the same table.
Richer example: list, structured, and computable values¶
## Services with computed total
**Answers:**
| Question | Property | Answer |
|-----------------|----------|---------------|
| Customer | name | Acme AB |
| Customer | email | legal@acme.se |
| Contract period | mode | duration |
| Contract period | quantity | 12 |
| Contract period | unit | months |
| Services | 0.name | Support |
| Services | 0.price | 1000 |
| Services | 1.name | Training |
| Services | 1.price | 500 |
**Variables:**
- total_price = 1500
**Document contains:**
- "Acme AB"
- "Support"
- "Training"
Use this pattern when you want one scenario to prove several things at once: nested values were assembled correctly, list items were accepted in order, and the computable result matched the expected total.
Session context overrides¶
Use With changes after Session context to override specific session-context variables:
**Session context:** org_context
**With changes:**
| Variable | Value |
|-----------|-------|
| org.name | Acme |
With changes always applies to the most recent source marker:
- If the latest source marker is Answers, it overrides answers.
- If the latest source marker is Session context, it overrides session context.
Defaults (optional)¶
When many scenarios share the same base answers or session-context values, you can extract them into a Defaults section to avoid repetition. This section uses H2 subsections to define named sets, each containing a two-column Markdown table.
# Defaults
## base
| Question | Answer |
|-------------|----------|
| Client Name | Acme Inc |
| Include NDA | No |
# Scenarios
## Without NDA
**Answers:** base
**Not relevant questions:**
- NDA term
## With NDA
**Answers:** base
**With changes:**
| Question | Answer |
|-------------|--------|
| Include NDA | Yes |
**Relevant questions:**
- NDA term
Inheritance¶
A child answer set can inherit from a parent by putting the parent name in parentheses:
The child inherits all answers from the parent and overrides only the ones listed. The same inheritance mechanism can also be used for session-context sets.
Assertion Types¶
| Marker | Description |
|---|---|
| Relevant questions | These questions must be asked during the interview |
| Not relevant questions | These questions must NOT be asked |
| All relevant questions | Exactly these questions (and no others) must be asked |
| Document contains | The rendered document must contain these strings |
| Document excludes | The rendered document must NOT contain these strings |
| Recommends | These template IDs must appear in recommendations |
| Does not recommend | These template IDs must NOT appear in recommendations |
| All recommendations | Exactly these template IDs (and no others) must be recommended |
| Variables | Variables must have these exact values (format: key = "value") |
| Has attachment | An appendix with this alias must be present |
| Doesn't have attachment | An appendix with this alias must NOT be present |
| All attachments | Exactly these appendix aliases (and no others) must be present |
| Attachment "name" contains | The rendered appendix must contain these strings |
| Attachment "name" excludes | The rendered appendix must NOT contain these strings |
Identifying questions in relevance assertions¶
Entries in Relevant questions, Not relevant questions, and All relevant questions lists are interpreted as question labels using the same prefix-anchored matching as answer tables. Write the question's visible label (or a unique prefix of it).
When a label is ambiguous or generic (e.g. several questions share the label "Motivering"), use the id: prefix to force interpretation as a question path: id:include_principer.
**Relevant questions:**
- Inkludera beredskap
- Beskriv det område som ska tryggas
- id:meddelande_hemligt_f_avtal
Localized Keywords¶
When your template sets locale in config.yml, write test specs using that locale's keywords. Keywords are matched case-insensitively. When a locale is set, only that locale's keywords are recognized — you cannot mix languages in the same test file.
| Purpose | English | Svenska (sv) |
Deutsch (de) |
Français (fr) |
Español (es) |
|---|---|---|---|---|---|
| Section heading | Defaults | Grundinställningar | Standardwerte | Valeurs par défaut | Valores predeterminados |
| Section heading | Scenarios | Scenarion | Szenarien | Scénarios | Escenarios |
| Source marker | Answers | Svar | Antworten | Réponses | Respuestas |
| Source marker | Session context | Sessionskontext | Sitzungskontext | Contexte de session | Contexto de sesión |
| Override marker | With changes | Med ändringar | Mit Änderungen | Avec modifications | Con cambios |
| Assertion | Relevant questions | Relevanta frågor | Relevante Fragen | Questions pertinentes | Preguntas relevantes |
| Assertion | Not relevant questions | Ej relevanta frågor | Nicht relevante Fragen | Questions non pertinentes | Preguntas no relevantes |
| Assertion | All relevant questions | Alla relevanta frågor | Alle relevanten Fragen | Toutes les questions pertinentes | Todas las preguntas relevantes |
| Assertion | Document contains | Dokumentet innehåller | Dokument enthält | Le document contient | El documento contiene |
| Assertion | Document excludes | Dokumentet innehåller inte | Dokument enthält nicht | Le document ne contient pas | El documento no contiene |
| Assertion | Recommends | Rekommenderar | Empfiehlt | Recommande | Recomienda |
| Assertion | Does not recommend | Rekommenderar inte | Empfiehlt nicht | Ne recommande pas | No recomienda |
| Assertion | All recommendations | Alla rekommendationer | Alle Empfehlungen | Toutes les recommandations | Todas las recomendaciones |
| Assertion | Variables | Variabler | Variablen | Variables | Variables |
| Assertion | Has attachment | Har bilaga | Hat Anlage | A une annexe | Tiene anexo |
| Assertion | Doesn't have attachment | Har inte bilaga | Hat keine Anlage | N'a pas d'annexe | No tiene anexo |
| Assertion | All attachments | Alla bilagor | Alle Anlagen | Toutes les annexes | Todos los anexos |
| Assertion | Attachment "X" contains | Bilaga "X" innehåller | Anlage "X" enthält | Annexe "X" contient | Anexo "X" contiene |
| Assertion | Attachment "X" excludes | Bilaga "X" innehåller inte | Anlage "X" enthält nicht | Annexe "X" ne contient pas | Anexo "X" no contiene |
Localized Column Names¶
Table column headers are recognized in all supported languages. The runner matches column headers case-insensitively.
Answer tables¶
| Purpose | English (en) |
Svenska (sv) |
Deutsch (de) |
Français (fr) |
Español (es) |
|---|---|---|---|---|---|
| Question column | Question | Fråga | Frage | Question | Pregunta |
| Answer column | Answer | Svar | Antwort | Réponse | Respuesta |
| ID column | Id | Id | Id | Id | Id |
| Property column | Property | Egenskap | Eigenschaft | Propriété | Propiedad |
The Answer column contains human-readable values as displayed in the interview (e.g. Yes, Ja, On-site support). Use the Value column instead when you need to write raw internal values (e.g. true, on_site). Both are recognized as the value column, but Answer is preferred for readability.
Session context tables¶
| Purpose | English (en) |
Svenska (sv) |
Deutsch (de) |
Français (fr) |
Español (es) |
|---|---|---|---|---|---|
| Key column | Variable | Variabel | Variable | Variable | Variable |
| Value column | Value | Värde | Wert | Valeur | Valor |
The Value column contains raw internal values (e.g. true/false, not Yes/No). Session context keys are always variable IDs in dotted notation.
Running Tests¶
# Test a specific template
vibe test my-template
# Test all templates that have test specs
vibe test
# Verbose output (show all assertion details)
vibe test my-template --verbose
# Run only matching scenarios (case-insensitive substring match)
vibe test my-template --scenario "basic"
# Run a scenario in a headed browser (see Visual Test Runner below)
vibe test my-template --scenario "My scenario name" --visual
Visual Test Runner¶
The --visual flag runs scenarios in a headed Playwright browser so you can watch the interview being filled in. This is useful for debugging layout issues, observing conditional field visibility, or verifying that the interview flow looks correct to an end user.
# Run a scenario visually
vibe test my-template --scenario "Basic contract" --visual
# Keep the browser open after the scenario finishes for manual inspection
vibe test my-template --scenario "Basic contract" --pause
--pause implies --visual. When paused, the browser stays open and the terminal waits for you to press Enter before continuing.
How it works¶
The visual runner starts a real Flask server in a background thread, launches a headed Chromium browser via Playwright, and navigates to the interview page. It then iterates through the answer table from the scenario, locating each field's widget on the page and filling it in (text inputs, radio buttons, selects, checkboxes, composite fields). After each field is filled, the runner waits for htmx to settle before proceeding.
If a field is not yet visible (e.g. it depends on another answer), the runner moves on and retries on the next round. It also navigates between tabs, pages, and accordion sections automatically to reach fields on other pages.
After all answers are applied, the runner collects results (relevant questions, rendered document content, recommendations) and checks them against the scenario's assertions, just like the headless runner.
Limitations¶
- Requires Playwright: You need
playwrightinstalled with Chromium (playwright install chromium). - One scenario at a time:
--visualrequires--scenarioto select a single scenario. - Variables assertion: The
Variablesassertion is not available in visual mode since the runner cannot inspect server-side session state directly.
Exit Codes¶
| Code | Meaning |
|---|---|
| 0 | All scenarios passed (or no test specs found) |
| 1 | One or more scenarios failed |
| 2 | Setup error (e.g., could not create Flask app) |
Tips¶
- Start simple: Write self-contained scenarios with inline answer tables. Each scenario should be readable on its own.
- Extract defaults when it pays off: Once you have many scenarios sharing the same base answers, move the common answers into a Defaults section and reference them.
- Test both directions: assert what should be relevant AND what should not be relevant. This catches accidental question visibility.
- Quote strings in Document contains/excludes assertions. The parser strips surrounding quotes so you can include phrases with special characters.
- Use labels for readability: In both answer tables and relevance assertions, prefer question labels over internal IDs. Use the
id:prefix only when labels are ambiguous or generic.
Example Patterns¶
Component-driven branch¶
Use paired scenarios to prove that an optional component appears only when its gate is enabled:
## Without confidentiality clause
**Answers:**
| Question | Answer |
|--------------------------|--------|
| Include confidentiality? | No |
**Document excludes:**
- "Confidential information"
## With confidentiality clause
**Answers:**
| Question | Answer |
|--------------------------|--------|
| Include confidentiality? | Yes |
**Document contains:**
- "Confidential information"
Session context override¶
Use a shared context set plus scenario-specific overrides when one template behaves differently by tenant or organization:
# Defaults
## org_context
| Variable | Value |
|------------------|------------|
| organization.name| Example AB |
| flags.procurement| true |
# Scenarios
## Procurement enabled
**Session context:** org_context
**Document contains:**
- "Example AB"
## Procurement disabled
**Session context:** org_context
**With changes:**
| Variable | Value |
|-------------------|-------|
| flags.procurement | false |
**Document excludes:**
- "procurement"
Testing computed outputs explicitly¶
When a branch depends on a computable variable, assert both the rendered text and the internal variable value:
## Volume discount applies
**Answers:**
| Question | Answer |
|----------|--------|
| Quantity | 25 |
| Unit fee | 100 |
**Variables:**
- discount_amount = 250
- total_price = 2250
**Document contains:**
- "2250"
Testing appendices¶
When your template generates appendices via appendix(), you can assert their existence and content. The appendix name used in assertions is the alias passed to appendix() in your template (e.g., appendix('pricing_schedule', alias='main_pricing') uses main_pricing).
Existence checks verify whether an appendix was generated:
## Enterprise contract includes pricing appendix
**Answers:**
| Question | Answer |
|---------------|------------|
| Contract type | Enterprise |
**Has attachment:** main_pricing
**Document contains:**
- "Appendix A: Pricing Schedule"
## Basic contract has no appendix
**Answers:**
| Question | Answer |
|---------------|--------|
| Contract type | Basic |
**Doesn't have attachment:** main_pricing
**Document excludes:**
- "Appendix"
Content checks verify text inside a specific appendix, using the Attachment "alias" contains and Attachment "alias" excludes syntax:
## Pricing appendix shows volume discount
**Answers:**
| Question | Answer |
|------------------|------------|
| Contract type | Enterprise |
| Volume discount? | Yes |
**Has attachment:** main_pricing
**Attachment "main_pricing" contains:**
- "Volume Discount"
- "10% for orders over 100 units"
**Attachment "main_pricing" excludes:**
- "No discounts available"
Exact attachment list verifies that only specific appendices are generated and no others:
These assertions can be freely combined with all other assertion types in the same scenario.