- Reorganize project structure and file locations - Add ReasoningController to manage model selection and reasoning mode - Update design and requirements for reasoning mode toggle - Implement model switching between Qwen3-4B-Instruct and Qwen3-4B-Thinking models - Remove deprecated files and consolidate project layout - Add new steering and specification documentation - Clean up and remove unnecessary files and directories - Prepare for enhanced AI sidebar functionality with more flexible model handling
16 KiB
Design Document: AI Sidebar Enhancements
Overview
This design document outlines the technical approach for enhancing the AI sidebar module with streaming responses, improved UI, conversation management commands, persistence features, and reasoning mode controls. The enhancements build upon the existing GTK4-based architecture using the Ollama Python SDK.
The current implementation uses:
- GTK4 for UI with gtk4-layer-shell for Wayland integration
- Ollama Python SDK for LLM interactions
- JSON-based conversation persistence via ConversationManager
- Threading for async operations with GLib.idle_add for UI updates
Architecture
Current Architecture Overview
┌─────────────────────────────────────────────────────────┐
│ SidebarWindow (GTK4) │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Header (Title + Model Label) │ │
│ ├────────────────────────────────────────────────────┤ │
│ │ ScrolledWindow │ │
│ │ └─ Message List (Gtk.Box vertical) │ │
│ ├────────────────────────────────────────────────────┤ │
│ │ Input Box (Entry + Send Button) │ │
│ └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ ConversationMgr │ │ OllamaClient │
│ - Load/Save │ │ - chat() │
│ - Messages │ │ - stream_chat() │
└──────────────────┘ └──────────────────┘
Enhanced Architecture
The enhancements will introduce:
- CommandProcessor: New component to parse and execute slash commands
- StreamingHandler: Manages token streaming and UI updates
- ConversationArchive: Extends ConversationManager for multi-conversation management
- ReasoningController: Manages reasoning mode state and formatting
- Enhanced Input Widget: Multi-line text view replacing single-line entry
Components and Interfaces
1. Streaming Response Display
StreamingHandler Class
class StreamingHandler:
"""Manages streaming response display with token-by-token updates."""
def __init__(self, message_widget: Gtk.Label, scroller: Gtk.ScrolledWindow):
self._widget = message_widget
self._scroller = scroller
self._buffer = ""
self._is_streaming = False
def start_stream(self) -> None:
"""Initialize streaming state."""
def append_token(self, token: str) -> None:
"""Add token to buffer and update UI via GLib.idle_add."""
def finish_stream(self) -> str:
"""Finalize streaming and return complete content."""
Integration Points
- Modify
_request_response()to useollama_client.stream_chat()instead ofchat() - Use GLib.idle_add to schedule UI updates for each token on the main thread
- Create message widget before streaming starts, update label text progressively
- Maintain smooth scrolling by calling
_scroll_to_bottom()periodically (not per token)
Technical Considerations
- Token updates must occur on GTK main thread via GLib.idle_add
- Buffer tokens to reduce UI update frequency (e.g., every 3-5 tokens or 50ms)
- Handle stream interruption and error states gracefully
- Show visual indicator (e.g., cursor or "..." suffix) during active streaming
2. Improved Text Input Field
TextView Widget Replacement
Replace Gtk.Entry with Gtk.TextView wrapped in Gtk.ScrolledWindow:
# Current: Gtk.Entry (single line)
self._entry = Gtk.Entry()
# Enhanced: Gtk.TextView (multi-line)
self._text_view = Gtk.TextView()
self._text_buffer = self._text_view.get_buffer()
text_scroller = Gtk.ScrolledWindow()
text_scroller.set_child(self._text_view)
text_scroller.set_min_content_height(40)
text_scroller.set_max_content_height(200)
Features
- Automatic text wrapping with
set_wrap_mode(Gtk.WrapMode.WORD_CHAR) - Dynamic height expansion up to max height (200px), then scroll
- Shift+Enter for new lines, Enter alone to submit
- Placeholder text using CSS or empty buffer state
- Maintain focus behavior with proper event controllers
Key Bindings
- Enter: Submit message (unless Shift is held)
- Shift+Enter: Insert newline
- Ctrl+A: Select all text
3. Conversation Management Commands
CommandProcessor Class
class CommandProcessor:
"""Parses and executes slash commands."""
COMMANDS = {
"/new": "start_new_conversation",
"/clear": "start_new_conversation", # Alias for /new
"/models": "list_models",
"/model": "switch_model",
"/resume": "resume_conversation",
"/list": "list_conversations",
}
def is_command(self, text: str) -> bool:
"""Check if text starts with a command."""
def execute(self, text: str) -> CommandResult:
"""Parse and execute command, return result."""
Command Implementations
/new and /clear
- Save current conversation with timestamp-based ID
- Reset conversation manager to new default conversation
- Clear message list UI
- Show confirmation message
/models
- Query
ollama_client.list_models() - Display formatted list in message area
- Highlight current model
/model <name>
- Validate model name against available models
- Update
_current_modelattribute - Update model label in header
- Show confirmation message
/list
- Scan conversation storage directory
- Display conversations with ID, timestamp, message count
- Format as selectable list
/resume <id>
- Load specified conversation via ConversationManager
- Clear and repopulate message list
- Update window title/header with conversation ID
UI Integration
- Check for commands in
_on_submit()before processing as user message - Display command results as system messages (distinct styling)
- Provide command help via
/helpcommand - Support tab completion for commands (future enhancement)
4. Conversation Persistence and Resume
ConversationArchive Extension
Extend ConversationManager with multi-conversation capabilities:
class ConversationArchive:
"""Manages multiple conversation files."""
def __init__(self, storage_dir: Path):
self._storage_dir = storage_dir
def list_conversations(self) -> List[ConversationMetadata]:
"""Return metadata for all saved conversations."""
def archive_conversation(self, conversation_id: str) -> str:
"""Save conversation with timestamp-based archive ID."""
def load_conversation(self, archive_id: str) -> ConversationState:
"""Load archived conversation by ID."""
def generate_archive_id(self) -> str:
"""Create unique ID: YYYYMMDD_HHMMSS_<short-hash>"""
File Naming Convention
- Active conversation:
default.json - Archived conversations:
archive_YYYYMMDD_HHMMSS_<hash>.json - Metadata includes: id, created_at, updated_at, message_count, first_message_preview
Workflow
-
User types
/newor/clear -
Current conversation saved as archive file
-
New ConversationManager instance created with "default" ID
-
UI cleared and reset
-
Confirmation message shows archive ID
-
User types
/list -
System scans storage directory for archive files
-
Displays formatted list with metadata
-
User types
/resume <id> -
ConversationManager loads specified archive
-
UI repopulated with conversation history
-
User can continue conversation
5. Reasoning Mode Toggle
ReasoningController Class
class ReasoningController:
"""Manages reasoning mode state and model selection."""
# Model names for reasoning toggle
INSTRUCT_MODEL = "hf.co/unsloth/Qwen3-4B-Instruct-2507-GGUF:Q8_K_XL"
THINKING_MODEL = "hf.co/unsloth/Qwen3-4B-Thinking-2507-GGUF:Q8_K_XL"
def __init__(self):
self._enabled = False
self._preference_file = Path.home() / ".config" / "aisidebar" / "preferences.json"
def is_enabled(self) -> bool:
"""Check if reasoning mode is active."""
def toggle(self) -> bool:
"""Toggle reasoning mode and persist preference."""
def get_model_name(self) -> str:
"""Return the appropriate model name based on reasoning mode."""
return self.THINKING_MODEL if self._enabled else self.INSTRUCT_MODEL
UI Components
Add toggle button to header area:
self._reasoning_toggle = widgets.Button(label="🧠 Reasoning: OFF")
self._reasoning_toggle.connect("clicked", self._on_reasoning_toggled)
Ollama Integration
When reasoning mode is toggled, switch between models:
# Get model based on reasoning mode
model = self._reasoning_controller.get_model_name()
# Use the selected model for chat
ollama.chat(model=model, messages=messages)
Message Formatting
When using the thinking model:
- Display thinking process in distinct style (italic, gray text)
- Separate reasoning from final answer with visual divider
- Parse
<think>tags from model output to extract reasoning content
Persistence
- Save reasoning preference to
~/.config/aisidebar/preferences.json - Load preference on startup
- Apply to all new conversations
- Automatically switch models when preference changes
Data Models
ConversationMetadata
@dataclass
class ConversationMetadata:
"""Metadata for conversation list display."""
archive_id: str
created_at: str
updated_at: str
message_count: int
preview: str # First 50 chars of first user message
CommandResult
@dataclass
class CommandResult:
"""Result of command execution."""
success: bool
message: str
data: dict | None = None
PreferencesState
@dataclass
class PreferencesState:
"""User preferences for sidebar behavior."""
reasoning_enabled: bool = False
default_model: str | None = None
theme: str = "default"
Error Handling
Ollama Unavailability
- Startup Without Ollama: Initialize all components successfully, show status message in UI
- Model List Failure: Return empty list, display "Ollama not running" in model label
- Chat Request Without Ollama: Display friendly message: "Please start Ollama to use AI features"
- Connection Lost Mid-Stream: Display partial response + reconnection instructions
- Periodic Availability Check: Attempt to reconnect every 30s when unavailable (non-blocking)
Implementation Strategy
class OllamaClient:
def __init__(self, host: str | None = None) -> None:
# Never raise exceptions during initialization
# Set _available = False if connection fails
def list_models(self) -> list[str]:
# Return empty list instead of raising on connection failure
# Log warning but don't crash
def chat(self, ...) -> dict[str, str] | None:
# Return error message dict instead of raising
# {"role": "assistant", "content": "Ollama unavailable..."}
Streaming Errors
- Connection Lost: Display partial response + error message, allow retry
- Model Unavailable: Fall back to non-streaming mode with error notice
- Stream Timeout: Cancel after 60s, show timeout message
Command Errors
- Invalid Command: Show available commands with
/help - Invalid Arguments: Display command usage syntax
- File Not Found: Handle missing conversation archives gracefully
- Permission Errors: Show clear error message for storage access issues
Conversation Loading Errors
- Corrupted JSON: Log error, skip file, continue with other conversations
- Missing Files: Remove from list, show warning
- Version Mismatch: Attempt migration or show incompatibility notice
Testing Strategy
Unit Tests
-
StreamingHandler
- Token buffering logic
- Thread-safe UI updates
- Stream completion handling
-
CommandProcessor
- Command parsing (valid/invalid formats)
- Each command execution path
- Error handling for malformed commands
-
ConversationArchive
- Archive ID generation uniqueness
- List/load/save operations
- File system error handling
-
ReasoningController
- Toggle state management
- Preference persistence
- API option generation
Integration Tests
-
End-to-End Streaming
- Mock Ollama stream response
- Verify UI updates occur
- Check final message persistence
-
Command Workflows
/new→ archive →/list→/resumeflow- Model switching with active conversation
- Command execution during streaming (edge case)
-
Multi-line Input
- Text wrapping behavior
- Submit vs newline key handling
- Height expansion limits
Manual Testing Checklist
- Stream response displays smoothly without flicker
- Multi-line input expands and wraps correctly
- All commands execute successfully
- Conversation archives persist across restarts
- Resume loads correct conversation history
- Reasoning toggle affects model behavior
- UI remains responsive during streaming
- Error states display helpful messages
Implementation Notes
GTK4 Threading Considerations
- All UI updates must occur on main thread via
GLib.idle_add() - Worker threads for Ollama API calls to prevent UI blocking
- Use
GLib.PRIORITY_DEFAULTfor normal updates,GLib.PRIORITY_HIGHfor critical UI state
Performance Optimizations
- Buffer tokens (3-5 at a time) to reduce GLib.idle_add overhead
- Limit scroll updates to every 100ms during streaming
- Cache conversation metadata to avoid repeated file I/O
- Lazy-load conversation content only when resuming
Backward Compatibility
- Existing
default.jsonconversation file remains compatible - New archive files use distinct naming pattern
- Preferences file is optional; defaults work without it
- Graceful degradation if gtk4-layer-shell unavailable
Ollama Availability Detection
Add periodic checking mechanism to detect when Ollama becomes available:
class OllamaAvailabilityMonitor:
"""Monitors Ollama availability and notifies UI of state changes."""
def __init__(self, client: OllamaClient, callback: Callable[[bool], None]):
self._client = client
self._callback = callback
self._last_state = False
self._check_interval = 30 # seconds
def start_monitoring(self) -> None:
"""Begin periodic availability checks via GLib.timeout_add."""
def _check_availability(self) -> bool:
"""Check if Ollama is available and notify on state change."""
Integration in SidebarWindow:
- Initialize monitor on startup
- Update UI state when availability changes (enable/disable input, update status message)
- Show notification when Ollama becomes available: "Ollama connected - AI features enabled"
Future Enhancements
- Command history with up/down arrow navigation
- Conversation search functionality
- Export conversations to markdown
- Custom keyboard shortcuts
- Syntax highlighting for code in messages
- Image/file attachment support