AI Assistant Architecture¶

Navigation guide for VIBE's AI-assisted interview system. Points you to the right source files for understanding implementation details.

See also:

core.md - Core VIBE engine
components.md - Component system
frontend.md - Frontend UI patterns
llm-providers.md - LLM provider abstraction (base classes, available providers, replay system)
review.md - Document compliance review

1. SYSTEM OVERVIEW¶

The Assistant feature enables LLM-powered document generation through conversational interviews. LLMs use tools to ask questions, collect information, and incrementally build draft documents.

Core Architecture:

Service-Oriented: Decomposed into focused collaborators (see Section 2)
Provider Abstraction: Multiple LLM backends via common interface (see llm-providers.md)
Streaming-First: Real-time UI updates via Server-Sent Events (SSE)
Tool-Based: LLM controls interview via function calls

Interview Modes:

standard -- Traditional form-first (no assistant)
assistant -- Assistant drives entire interview

Web Routes (HTTP/SSE) -> AssistantService (facade) -> LLM Providers -> External APIs
                              |                          |
                     Service Components          StreamTranslator -> SSE Events -> Browser
                              |
                      Session Storage

2. SERVICE DECOMPOSITION¶

2.1 Service Architecture¶

Location: vibe/assistant/services/

AssistantService (facade - thin coordinator)
    |
    +-- ConversationStateStore (session read/write, history serialization)
    +-- ProviderFactory (endpoint resolution, dev overrides, playback)
    +-- DraftManager (block IDs, storage, draft rendering)
    +-- TurnOrchestrator (retry logic, auto-reply, tool call validation)
    +-- QuestionSessionState (question state tracking across components)

2.2 Service Files¶

Component	Location
AssistantService (facade)	`vibe/assistant/services/assistant_service.py`
ConversationStateStore	`vibe/assistant/services/conversation_state_store.py`
ProviderFactory	`vibe/assistant/services/provider_factory.py`
DraftManager	`vibe/assistant/services/draft_manager.py`
TurnOrchestrator	`vibe/assistant/services/turn_orchestrator.py`
QuestionSessionState	`vibe/assistant/services/question_session_state.py`
SessionAccessor	`vibe/assistant/services/session_accessor.py`
StreamTranslator	`vibe/assistant/services/stream_translator.py`
ReferenceDocuments	`vibe/assistant/services/reference_documents.py`
VectorStoreManager	`vibe/assistant/services/vector_store_manager.py`

2.3 ModelContextManager Components¶

Location: vibe/assistant/

Component	Location	Purpose
HistoryPruner	`vibe/assistant/history_pruner.py`	Prune history for checkpoint/consolidated strategies
UserInputProcessor	`vibe/assistant/user_input_processor.py`	Security markers, input transformation
SystemPromptBuilder	`vibe/assistant/system_prompt_builder.py`	Build system prompts from templates
ModelContextManager	`vibe/assistant/context.py`	Orchestrates above components

3. HOW IT WORKS¶

3.1 Initialization¶

Each assistant conversation begins with a system prompt combined with an initial user prompt from {% assistant %}...{% endassistant %} tags in the template. VIBE resolves template variables through probing before the conversation begins. Readiness is tracked via update_assistant_readiness() in vibe/assistant/extension_support.py.

3.2 Conversation Loop¶

User message -> Service stores -> LLM processes history + tools -> Tool calls
    |                                                                   |
User answers <- Browser renders <- SSE events <- StreamTranslator translates

Key Insight: LLM constructs the interview UI dynamically by calling tools that render widgets.

3.3 Streaming with SSE¶

Location: vibe/web/sse.py (canonical SSE formatting)

Pipeline: LLM Provider -> StreamChunk objects -> StreamTranslator -> SSE events -> Browser

Chunk Types: TEXT, TOOL_CALL_START, TOOL_ARGUMENT_CHUNK, TOOL_ARGUMENTS_COMPLETE, TOOL_CALL_END

3.4 Session State¶

Everything stored in Flask session under session["assistants"][assistant_name]:

history -- Full conversation (messages + tool calls)
draft_blocks -- Document sections
tool_call_ids -- Widget ID to tool call ID mapping
pending_tool_outputs -- Queued tool results

4. MESSAGE TYPE SYSTEM¶

4.1 Message Classes¶

Location: vibe/assistant/structures.py

Standardized message format, converted to/from provider-specific formats at boundaries.

SystemMessage, UserMessage (supports auto-reply flag), AssistantMessage (with optional tool calls), ToolResult
Tool, ToolParameter (with JSON Schema validation), ToolCall

4.2 Message Conversion (Visitor Pattern)¶

Location: vibe/providers/llm/message_converter.py

MessageConverter base class uses @singledispatchmethod for type-based dispatch. Each provider defines its own converter. See llm-providers.md Section 4.

4.3 Serialization¶

message_to_dict() and message_from_dict() for session storage.

5. TOOL SYSTEM¶

Location: Tool definitions in vibe/providers/llm/tools.py, handlers in vibe/assistant/services/stream_translator.py

Available Tools:

ask_question -- Unified interface for all question types (predefined or generated)
insert_blocks -- Insert blocks at position (default: append at end)
upsert_block_contents -- Create or update single block
delete_blocks -- Remove blocks
clear_all_blocks -- Remove all blocks (for full rewrite)
finalize -- Complete interview

ask_question Modes:

predefined: Uses template question definitions (requires question_id)
generated: Creates ad-hoc questions (requires question_text, optional type)

Tool Call Lifecycle:

LLM calls tool -> Provider yields chunks (START, ARGUMENT_CHUNK, ARGUMENTS_COMPLETE, END)
StreamTranslator processes complete arguments -> Renders widget via handler
Yields SSE form_command -> Browser displays widget
User submits -> Service maps widget ID to tool call ID -> Creates ToolResult
ToolResult added to conversation history for next LLM turn

Tool Call ID Mapping: Session stores bidirectional mapping between widget IDs and tool call IDs.

5.1 JSON Action Fallback¶

When an endpoint is configured with tools: false, models output JSON actions in code blocks:

{"action": "ASK_QUESTION", "mode": "generated", "question_text": "What is your name?"}

The streaming JSON parser extracts these and converts them to standard StreamChunk tool call events, so the rest of the pipeline works identically.

6. STREAMING ARCHITECTURE¶

Location: vibe/assistant/services/stream_translator.py

StreamTranslator Responsibilities:

Process tool calls from chunks
Render UI widgets (HTML generation)
Manage streaming state (current block, question count)
Coordinate streaming targets (chat bubble vs draft blocks)

SSE Event Types:

Event	Purpose
`chunk`	Stream text content to current target
`form_command`	Render/update question widget
`form_start`	Initial form structure
`block_command`	Draft block operations
`set_stream_target`	Redirect subsequent chunks to element
`assistant_bubble`	Create assistant message bubble
`thinking`	Display model reasoning
`finalize`	Enable finalize button
`close`	End stream
`error`	Display error message
`assistant_status`	Status updates (tft, retrying, adjusting)

Key Design: Each question generates exactly ONE form_command event to prevent form corruption. set_stream_target events redirect the streaming cursor to different DOM locations (chat or draft blocks).

7. WEB INTEGRATION¶

Location: vibe/assistant/web/assistant.py

7.1 Routes¶

Route	Method	Purpose
`/assistant/workbench/<template_id>/<assistant_name>`	GET	Render initial workbench UI
`/assistant/turn/<template_id>/<assistant_name>`	POST	Process user input, return SSE connector
`/assistant/stream/<template_id>/<assistant_name>/<turn_id>`	GET	Stream LLM response via SSE
`/assistant/finalize/<template_id>/<assistant_name>`	POST	Finalize draft
`/assistant/dev-settings/<template_id>/<assistant_name>`	POST	Update dev settings
`/assistant/dev/logfiles`	GET	List log files
`/assistant/dev/sessions/`	GET	List recorded sessions
`/assistant/dev/<endpoint>/settings`	GET/POST	Endpoint settings

Assisted Text Blueprint: vibe/assistant/web/assisted_text.py provides /assisted/generate, /assisted/stream/<stream_id>, /assisted/refine_modal for AI-assisted text fields.

7.2 Turn Processing Flow¶

User submits form -> POST to /assistant/turn
Handler calls service.process_user_input(form) -> Returns AssistantTurn
Renders SSE connector template -> HTMX establishes SSE connection
GET to /assistant/stream starts streaming via prepare_streaming_response(turn)
Browser receives SSE events and updates UI

8. DATA FLOW DIAGRAMS¶

8.1 User Message -> LLM Response¶

User submits form
    -> POST /assistant/turn -> service.process_user_input(form)
    -> Return AssistantTurn, render SSE connector

Browser establishes SSE connection
    -> GET /assistant/stream -> service.prepare_streaming_response(turn)
    -> prepare_provider(turn) via ProviderFactory
       -> If interview_mode == "assistant": Wrap with SystemProxyProvider
    -> build_messages(turn, provider) via ModelContextManager
    -> start_streaming(turn, provider, messages)
    -> StreamTranslator translates chunks to SSE events
    -> Browser updates UI (chat bubble, widgets, draft blocks)
    -> Stream complete -> Save session

8.2 Tool Call Processing¶

LLM calls tool -> Provider yields: START -> ARGUMENT_CHUNK -> ARGUMENTS_COMPLETE -> END
    -> StreamTranslator renders widget HTML on ARGUMENTS_COMPLETE
    -> Yield form_command SSE event -> Browser injects widget
    -> User fills and submits
    -> service.process_user_input(form_data) maps field to tool_call_id
    -> Creates ToolResult -> Added to history
    -> Next LLM turn includes ToolResult

9. CONFIGURATION¶

Location: vibe/assistant/config.py

AssistantConfig: sessions_base_path, assistant_endpoints, default_timeout AssistantConfigFactory: from_flask_app (production), for_testing(...) (testing with mock dependencies)

Flow: config.yml -> load_configuration() -> Flask app.config -> app.LLM_ENDPOINTS -> AssistantConfig -> AssistantService

10. FILE LOCATION INDEX¶

Core Service Layer¶

What	Where
Service facade	`vibe/assistant/services/assistant_service.py`
ConversationStateStore	`vibe/assistant/services/conversation_state_store.py`
ProviderFactory	`vibe/assistant/services/provider_factory.py`
DraftManager	`vibe/assistant/services/draft_manager.py`
TurnOrchestrator	`vibe/assistant/services/turn_orchestrator.py`
QuestionSessionState	`vibe/assistant/services/question_session_state.py`
ReferenceDocuments	`vibe/assistant/services/reference_documents.py`
VectorStoreManager	`vibe/assistant/services/vector_store_manager.py`
Config	`vibe/assistant/config.py`

Context & Streaming¶

What	Where
ModelContextManager	`vibe/assistant/context.py`
HistoryPruner	`vibe/assistant/history_pruner.py`
SystemPromptBuilder	`vibe/assistant/system_prompt_builder.py`
StreamTranslator	`vibe/assistant/services/stream_translator.py`
SSE utilities	`vibe/web/sse.py`

Messages & Tools¶

What	Where
Message types	`vibe/assistant/structures.py`
Tool definitions	`vibe/providers/llm/tools.py`
Message converter	`vibe/providers/llm/message_converter.py`

Web Layer¶

What	Where
Assistant routes	`vibe/assistant/web/assistant.py`
Assisted text routes	`vibe/assistant/web/assisted_text.py`
Extension support	`vibe/assistant/extension_support.py`
Templates	`vibe/assistant/templates/assistant/workbench.html`
Stages	`vibe/assistant/assistant_stages.py`
Logging	`vibe/assistant/logging.py`

Document Version: 5.1 Last Updated: 2026-03-22 Notes: Updated file paths for package reorganization: vibe/sse.py -> vibe/web/sse.py, corrected assistant templates path