Files

Melvin Ragusa 239242e2fc refactor(aisidebar): restructure project and implement reasoning mode toggle

- Reorganize project structure and file locations
- Add ReasoningController to manage model selection and reasoning mode
- Update design and requirements for reasoning mode toggle
- Implement model switching between Qwen3-4B-Instruct and Qwen3-4B-Thinking models
- Remove deprecated files and consolidate project layout
- Add new steering and specification documentation
- Clean up and remove unnecessary files and directories
- Prepare for enhanced AI sidebar functionality with more flexible model handling

2025-10-26 09:10:31 +01:00

16 KiB

Raw Blame History

Design Document: AI Sidebar Enhancements

Overview

This design document outlines the technical approach for enhancing the AI sidebar module with streaming responses, improved UI, conversation management commands, persistence features, and reasoning mode controls. The enhancements build upon the existing GTK4-based architecture using the Ollama Python SDK.

The current implementation uses:

GTK4 for UI with gtk4-layer-shell for Wayland integration
Ollama Python SDK for LLM interactions
JSON-based conversation persistence via ConversationManager
Threading for async operations with GLib.idle_add for UI updates

Architecture

Current Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    SidebarWindow (GTK4)                  │
│  ┌────────────────────────────────────────────────────┐ │
│  │  Header (Title + Model Label)                      │ │
│  ├────────────────────────────────────────────────────┤ │
│  │  ScrolledWindow                                    │ │
│  │    └─ Message List (Gtk.Box vertical)             │ │
│  ├────────────────────────────────────────────────────┤ │
│  │  Input Box (Entry + Send Button)                  │ │
│  └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
         │                           │
         ▼                           ▼
┌──────────────────┐        ┌──────────────────┐
│ ConversationMgr  │        │  OllamaClient    │
│  - Load/Save     │        │  - chat()        │
│  - Messages      │        │  - stream_chat() │
└──────────────────┘        └──────────────────┘

Enhanced Architecture

The enhancements will introduce:

CommandProcessor: New component to parse and execute slash commands
StreamingHandler: Manages token streaming and UI updates
ConversationArchive: Extends ConversationManager for multi-conversation management
ReasoningController: Manages reasoning mode state and formatting
Enhanced Input Widget: Multi-line text view replacing single-line entry

Components and Interfaces

1. Streaming Response Display

StreamingHandler Class

class StreamingHandler:
    """Manages streaming response display with token-by-token updates."""
    
    def __init__(self, message_widget: Gtk.Label, scroller: Gtk.ScrolledWindow):
        self._widget = message_widget
        self._scroller = scroller
        self._buffer = ""
        self._is_streaming = False
    
    def start_stream(self) -> None:
        """Initialize streaming state."""
        
    def append_token(self, token: str) -> None:
        """Add token to buffer and update UI via GLib.idle_add."""
        
    def finish_stream(self) -> str:
        """Finalize streaming and return complete content."""

Integration Points

Modify _request_response() to use ollama_client.stream_chat() instead of chat()
Use GLib.idle_add to schedule UI updates for each token on the main thread
Create message widget before streaming starts, update label text progressively
Maintain smooth scrolling by calling _scroll_to_bottom() periodically (not per token)

Technical Considerations

Token updates must occur on GTK main thread via GLib.idle_add
Buffer tokens to reduce UI update frequency (e.g., every 3-5 tokens or 50ms)
Handle stream interruption and error states gracefully
Show visual indicator (e.g., cursor or "..." suffix) during active streaming

2. Improved Text Input Field

Replace Gtk.Entry with Gtk.TextView wrapped in Gtk.ScrolledWindow:

# Current: Gtk.Entry (single line)
self._entry = Gtk.Entry()

# Enhanced: Gtk.TextView (multi-line)
self._text_view = Gtk.TextView()
self._text_buffer = self._text_view.get_buffer()
text_scroller = Gtk.ScrolledWindow()
text_scroller.set_child(self._text_view)
text_scroller.set_min_content_height(40)
text_scroller.set_max_content_height(200)

Features

Automatic text wrapping with set_wrap_mode(Gtk.WrapMode.WORD_CHAR)
Dynamic height expansion up to max height (200px), then scroll
Shift+Enter for new lines, Enter alone to submit
Placeholder text using CSS or empty buffer state
Maintain focus behavior with proper event controllers

Key Bindings

Enter: Submit message (unless Shift is held)
Shift+Enter: Insert newline
Ctrl+A: Select all text

3. Conversation Management Commands

CommandProcessor Class

class CommandProcessor:
    """Parses and executes slash commands."""
    
    COMMANDS = {
        "/new": "start_new_conversation",
        "/clear": "start_new_conversation",  # Alias for /new
        "/models": "list_models",
        "/model": "switch_model",
        "/resume": "resume_conversation",
        "/list": "list_conversations",
    }
    
    def is_command(self, text: str) -> bool:
        """Check if text starts with a command."""
        
    def execute(self, text: str) -> CommandResult:
        """Parse and execute command, return result."""

Command Implementations

/new and /clear

Save current conversation with timestamp-based ID
Reset conversation manager to new default conversation
Clear message list UI
Show confirmation message

/models

Query ollama_client.list_models()
Display formatted list in message area
Highlight current model

/model <name>

Validate model name against available models
Update _current_model attribute
Update model label in header
Show confirmation message

/list

Scan conversation storage directory
Display conversations with ID, timestamp, message count
Format as selectable list

/resume <id>

Load specified conversation via ConversationManager
Clear and repopulate message list
Update window title/header with conversation ID

UI Integration

Check for commands in _on_submit() before processing as user message
Display command results as system messages (distinct styling)
Provide command help via /help command
Support tab completion for commands (future enhancement)

4. Conversation Persistence and Resume

ConversationArchive Extension

Extend ConversationManager with multi-conversation capabilities:

class ConversationArchive:
    """Manages multiple conversation files."""
    
    def __init__(self, storage_dir: Path):
        self._storage_dir = storage_dir
    
    def list_conversations(self) -> List[ConversationMetadata]:
        """Return metadata for all saved conversations."""
        
    def archive_conversation(self, conversation_id: str) -> str:
        """Save conversation with timestamp-based archive ID."""
        
    def load_conversation(self, archive_id: str) -> ConversationState:
        """Load archived conversation by ID."""
        
    def generate_archive_id(self) -> str:
        """Create unique ID: YYYYMMDD_HHMMSS_<short-hash>"""

File Naming Convention

Active conversation: default.json
Archived conversations: archive_YYYYMMDD_HHMMSS_<hash>.json
Metadata includes: id, created_at, updated_at, message_count, first_message_preview

Workflow

User types /new or /clear
Current conversation saved as archive file
New ConversationManager instance created with "default" ID
UI cleared and reset
Confirmation message shows archive ID
User types /list
System scans storage directory for archive files
Displays formatted list with metadata
User types /resume <id>
ConversationManager loads specified archive
UI repopulated with conversation history
User can continue conversation

5. Reasoning Mode Toggle

ReasoningController Class

class ReasoningController:
    """Manages reasoning mode state and model selection."""
    
    # Model names for reasoning toggle
    INSTRUCT_MODEL = "hf.co/unsloth/Qwen3-4B-Instruct-2507-GGUF:Q8_K_XL"
    THINKING_MODEL = "hf.co/unsloth/Qwen3-4B-Thinking-2507-GGUF:Q8_K_XL"
    
    def __init__(self):
        self._enabled = False
        self._preference_file = Path.home() / ".config" / "aisidebar" / "preferences.json"
    
    def is_enabled(self) -> bool:
        """Check if reasoning mode is active."""
        
    def toggle(self) -> bool:
        """Toggle reasoning mode and persist preference."""
        
    def get_model_name(self) -> str:
        """Return the appropriate model name based on reasoning mode."""
        return self.THINKING_MODEL if self._enabled else self.INSTRUCT_MODEL

UI Components

Add toggle button to header area:

self._reasoning_toggle = widgets.Button(label="🧠 Reasoning: OFF")
self._reasoning_toggle.connect("clicked", self._on_reasoning_toggled)

Ollama Integration

When reasoning mode is toggled, switch between models:

# Get model based on reasoning mode
model = self._reasoning_controller.get_model_name()

# Use the selected model for chat
ollama.chat(model=model, messages=messages)

Message Formatting

When using the thinking model:

Display thinking process in distinct style (italic, gray text)
Separate reasoning from final answer with visual divider
Parse <think> tags from model output to extract reasoning content

Persistence

Save reasoning preference to ~/.config/aisidebar/preferences.json
Load preference on startup
Apply to all new conversations
Automatically switch models when preference changes

Data Models

ConversationMetadata

@dataclass
class ConversationMetadata:
    """Metadata for conversation list display."""
    archive_id: str
    created_at: str
    updated_at: str
    message_count: int
    preview: str  # First 50 chars of first user message

CommandResult

@dataclass
class CommandResult:
    """Result of command execution."""
    success: bool
    message: str
    data: dict | None = None

PreferencesState

@dataclass
class PreferencesState:
    """User preferences for sidebar behavior."""
    reasoning_enabled: bool = False
    default_model: str | None = None
    theme: str = "default"

Error Handling

Ollama Unavailability

Startup Without Ollama: Initialize all components successfully, show status message in UI
Model List Failure: Return empty list, display "Ollama not running" in model label
Chat Request Without Ollama: Display friendly message: "Please start Ollama to use AI features"
Connection Lost Mid-Stream: Display partial response + reconnection instructions
Periodic Availability Check: Attempt to reconnect every 30s when unavailable (non-blocking)

Implementation Strategy

class OllamaClient:
    def __init__(self, host: str | None = None) -> None:
        # Never raise exceptions during initialization
        # Set _available = False if connection fails
        
    def list_models(self) -> list[str]:
        # Return empty list instead of raising on connection failure
        # Log warning but don't crash
        
    def chat(self, ...) -> dict[str, str] | None:
        # Return error message dict instead of raising
        # {"role": "assistant", "content": "Ollama unavailable..."}

Streaming Errors

Connection Lost: Display partial response + error message, allow retry
Model Unavailable: Fall back to non-streaming mode with error notice
Stream Timeout: Cancel after 60s, show timeout message

Command Errors

Invalid Command: Show available commands with /help
Invalid Arguments: Display command usage syntax
File Not Found: Handle missing conversation archives gracefully
Permission Errors: Show clear error message for storage access issues

Conversation Loading Errors

Corrupted JSON: Log error, skip file, continue with other conversations
Missing Files: Remove from list, show warning
Version Mismatch: Attempt migration or show incompatibility notice

Testing Strategy

Unit Tests

StreamingHandler
- Token buffering logic
- Thread-safe UI updates
- Stream completion handling
CommandProcessor
- Command parsing (valid/invalid formats)
- Each command execution path
- Error handling for malformed commands
ConversationArchive
- Archive ID generation uniqueness
- List/load/save operations
- File system error handling
ReasoningController
- Toggle state management
- Preference persistence
- API option generation

Integration Tests

End-to-End Streaming
- Mock Ollama stream response
- Verify UI updates occur
- Check final message persistence
Command Workflows
- /new → archive → /list → /resume flow
- Model switching with active conversation
- Command execution during streaming (edge case)
Multi-line Input
- Text wrapping behavior
- Submit vs newline key handling
- Height expansion limits

Manual Testing Checklist

Stream response displays smoothly without flicker
Multi-line input expands and wraps correctly
All commands execute successfully
Conversation archives persist across restarts
Resume loads correct conversation history
Reasoning toggle affects model behavior
UI remains responsive during streaming
Error states display helpful messages

Implementation Notes

GTK4 Threading Considerations

All UI updates must occur on main thread via GLib.idle_add()
Worker threads for Ollama API calls to prevent UI blocking
Use GLib.PRIORITY_DEFAULT for normal updates, GLib.PRIORITY_HIGH for critical UI state

Performance Optimizations

Buffer tokens (3-5 at a time) to reduce GLib.idle_add overhead
Limit scroll updates to every 100ms during streaming
Cache conversation metadata to avoid repeated file I/O
Lazy-load conversation content only when resuming

Backward Compatibility

Existing default.json conversation file remains compatible
New archive files use distinct naming pattern
Preferences file is optional; defaults work without it
Graceful degradation if gtk4-layer-shell unavailable

Ollama Availability Detection

Add periodic checking mechanism to detect when Ollama becomes available:

class OllamaAvailabilityMonitor:
    """Monitors Ollama availability and notifies UI of state changes."""
    
    def __init__(self, client: OllamaClient, callback: Callable[[bool], None]):
        self._client = client
        self._callback = callback
        self._last_state = False
        self._check_interval = 30  # seconds
        
    def start_monitoring(self) -> None:
        """Begin periodic availability checks via GLib.timeout_add."""
        
    def _check_availability(self) -> bool:
        """Check if Ollama is available and notify on state change."""

Integration in SidebarWindow:

Initialize monitor on startup
Update UI state when availability changes (enable/disable input, update status message)
Show notification when Ollama becomes available: "Ollama connected - AI features enabled"

Future Enhancements

Command history with up/down arrow navigation
Conversation search functionality
Export conversations to markdown
Custom keyboard shortcuts
Syntax highlighting for code in messages
Image/file attachment support

16 KiB Raw Blame History