niri-ai-sidebar/.kiro/specs/ai-sidebar-enhancements/design.md

# Design Document: AI Sidebar Enhancements

## Overview

This design document outlines the technical approach for enhancing the AI sidebar module with streaming responses, improved UI, conversation management commands, persistence features, and reasoning mode controls. The enhancements build upon the existing GTK4-based architecture using the Ollama Python SDK.

The current implementation uses:
- GTK4 for UI with gtk4-layer-shell for Wayland integration
- Ollama Python SDK for LLM interactions
- JSON-based conversation persistence via ConversationManager
- Threading for async operations with GLib.idle_add for UI updates

## Architecture

### Current Architecture Overview

```
┌─────────────────────────────────────────────────────────┐
│                    SidebarWindow (GTK4)                  │
│  ┌────────────────────────────────────────────────────┐ │
│  │  Header (Title + Model Label)                      │ │
│  ├────────────────────────────────────────────────────┤ │
│  │  ScrolledWindow                                    │ │
│  │    └─ Message List (Gtk.Box vertical)             │ │
│  ├────────────────────────────────────────────────────┤ │
│  │  Input Box (Entry + Send Button)                  │ │
│  └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
         │                           │
         ▼                           ▼
┌──────────────────┐        ┌──────────────────┐
│ ConversationMgr  │        │  OllamaClient    │
│  - Load/Save     │        │  - chat()        │
│  - Messages      │        │  - stream_chat() │
└──────────────────┘        └──────────────────┘
```

### Enhanced Architecture

The enhancements will introduce:

1. **CommandProcessor**: New component to parse and execute slash commands
2. **StreamingHandler**: Manages token streaming and UI updates
3. **ConversationArchive**: Extends ConversationManager for multi-conversation management
4. **ReasoningController**: Manages reasoning mode state and formatting
5. **Enhanced Input Widget**: Multi-line text view replacing single-line entry

## Components and Interfaces

### 1. Streaming Response Display

#### StreamingHandler Class

```python
class StreamingHandler:
    """Manages streaming response display with token-by-token updates."""

    def __init__(self, message_widget: Gtk.Label, scroller: Gtk.ScrolledWindow):
        self._widget = message_widget
        self._scroller = scroller
        self._buffer = ""
        self._is_streaming = False

    def start_stream(self) -> None:
        """Initialize streaming state."""

    def append_token(self, token: str) -> None:
        """Add token to buffer and update UI via GLib.idle_add."""

    def finish_stream(self) -> str:
        """Finalize streaming and return complete content."""
```

#### Integration Points

- Modify `_request_response()` to use `ollama_client.stream_chat()` instead of `chat()`
- Use GLib.idle_add to schedule UI updates for each token on the main thread
- Create message widget before streaming starts, update label text progressively
- Maintain smooth scrolling by calling `_scroll_to_bottom()` periodically (not per token)

#### Technical Considerations

- Token updates must occur on GTK main thread via GLib.idle_add
- Buffer tokens to reduce UI update frequency (e.g., every 3-5 tokens or 50ms)
- Handle stream interruption and error states gracefully
- Show visual indicator (e.g., cursor or "..." suffix) during active streaming

### 2. Improved Text Input Field

#### TextView Widget Replacement

Replace `Gtk.Entry` with `Gtk.TextView` wrapped in `Gtk.ScrolledWindow`:

```python
# Current: Gtk.Entry (single line)
self._entry = Gtk.Entry()

# Enhanced: Gtk.TextView (multi-line)
self._text_view = Gtk.TextView()
self._text_buffer = self._text_view.get_buffer()
text_scroller = Gtk.ScrolledWindow()
text_scroller.set_child(self._text_view)
text_scroller.set_min_content_height(40)
text_scroller.set_max_content_height(200)
```

#### Features

- Automatic text wrapping with `set_wrap_mode(Gtk.WrapMode.WORD_CHAR)`
- Dynamic height expansion up to max height (200px), then scroll
- Shift+Enter for new lines, Enter alone to submit
- Placeholder text using CSS or empty buffer state
- Maintain focus behavior with proper event controllers

#### Key Bindings

- **Enter**: Submit message (unless Shift is held)
- **Shift+Enter**: Insert newline
- **Ctrl+A**: Select all text

### 3. Conversation Management Commands

#### CommandProcessor Class

```python
class CommandProcessor:
    """Parses and executes slash commands."""

    COMMANDS = {
        "/new": "start_new_conversation",
        "/clear": "start_new_conversation",  # Alias for /new
        "/models": "list_models",
        "/model": "switch_model",
        "/resume": "resume_conversation",
        "/list": "list_conversations",
    }

    def is_command(self, text: str) -> bool:
        """Check if text starts with a command."""

    def execute(self, text: str) -> CommandResult:
        """Parse and execute command, return result."""
```

#### Command Implementations

**`/new` and `/clear`**
- Save current conversation with timestamp-based ID
- Reset conversation manager to new default conversation
- Clear message list UI
- Show confirmation message

**`/models`**
- Query `ollama_client.list_models()`
- Display formatted list in message area
- Highlight current model

**`/model <name>`**
- Validate model name against available models
- Update `_current_model` attribute
- Update model label in header
- Show confirmation message

**`/list`**
- Scan conversation storage directory
- Display conversations with ID, timestamp, message count
- Format as selectable list

**`/resume <id>`**
- Load specified conversation via ConversationManager
- Clear and repopulate message list
- Update window title/header with conversation ID

#### UI Integration

- Check for commands in `_on_submit()` before processing as user message
- Display command results as system messages (distinct styling)
- Provide command help via `/help` command
- Support tab completion for commands (future enhancement)

### 4. Conversation Persistence and Resume

#### ConversationArchive Extension

Extend ConversationManager with multi-conversation capabilities:

```python
class ConversationArchive:
    """Manages multiple conversation files."""

    def __init__(self, storage_dir: Path):
        self._storage_dir = storage_dir

    def list_conversations(self) -> List[ConversationMetadata]:
        """Return metadata for all saved conversations."""

    def archive_conversation(self, conversation_id: str) -> str:
        """Save conversation with timestamp-based archive ID."""

    def load_conversation(self, archive_id: str) -> ConversationState:
        """Load archived conversation by ID."""

    def generate_archive_id(self) -> str:
        """Create unique ID: YYYYMMDD_HHMMSS_<short-hash>"""
```

#### File Naming Convention

- Active conversation: `default.json`
- Archived conversations: `archive_YYYYMMDD_HHMMSS_<hash>.json`
- Metadata includes: id, created_at, updated_at, message_count, first_message_preview

#### Workflow

1. User types `/new` or `/clear`
2. Current conversation saved as archive file
3. New ConversationManager instance created with "default" ID
4. UI cleared and reset
5. Confirmation message shows archive ID

6. User types `/list`
7. System scans storage directory for archive files
8. Displays formatted list with metadata

9. User types `/resume <id>`
10. ConversationManager loads specified archive
11. UI repopulated with conversation history
12. User can continue conversation

### 5. Reasoning Mode Toggle

#### ReasoningController Class

```python
class ReasoningController:
    """Manages reasoning mode state and API parameters."""

    def __init__(self):
        self._enabled = False
        self._preference_file = Path.home() / ".config" / "aisidebar" / "preferences.json"

    def is_enabled(self) -> bool:
        """Check if reasoning mode is active."""

    def toggle(self) -> bool:
        """Toggle reasoning mode and persist preference."""

    def get_chat_options(self) -> dict:
        """Return Ollama API options for reasoning mode."""
```

#### UI Components

Add toggle button to header area:

```python
self._reasoning_toggle = Gtk.ToggleButton(label="🧠 Reasoning")
self._reasoning_toggle.connect("toggled", self._on_reasoning_toggled)
```

#### Ollama Integration

When reasoning mode is enabled, pass additional options to Ollama:

```python
# Standard mode
ollama.chat(model=model, messages=messages)

# Reasoning mode (model-dependent)
ollama.chat(
    model=model,
    messages=messages,
    options={
        "temperature": 0.7,
        # Model-specific reasoning parameters
    }
)
```

#### Message Formatting

When reasoning is enabled and model supports it:
- Display thinking process in distinct style (italic, gray text)
- Separate reasoning from final answer with visual divider
- Use expandable/collapsible section for reasoning (optional)

#### Persistence

- Save reasoning preference to `~/.config/aisidebar/preferences.json`
- Load preference on startup
- Apply to all new conversations

## Data Models

### ConversationMetadata

```python
@dataclass
class ConversationMetadata:
    """Metadata for conversation list display."""
    archive_id: str
    created_at: str
    updated_at: str
    message_count: int
    preview: str  # First 50 chars of first user message
```

### CommandResult

```python
@dataclass
class CommandResult:
    """Result of command execution."""
    success: bool
    message: str
    data: dict | None = None
```

### PreferencesState

```python
@dataclass
class PreferencesState:
    """User preferences for sidebar behavior."""
    reasoning_enabled: bool = False
    default_model: str | None = None
    theme: str = "default"
```

## Error Handling

### Ollama Unavailability

- **Startup Without Ollama**: Initialize all components successfully, show status message in UI
- **Model List Failure**: Return empty list, display "Ollama not running" in model label
- **Chat Request Without Ollama**: Display friendly message: "Please start Ollama to use AI features"
- **Connection Lost Mid-Stream**: Display partial response + reconnection instructions
- **Periodic Availability Check**: Attempt to reconnect every 30s when unavailable (non-blocking)

#### Implementation Strategy

```python
class OllamaClient:
    def __init__(self, host: str | None = None) -> None:
        # Never raise exceptions during initialization
        # Set _available = False if connection fails

    def list_models(self) -> list[str]:
        # Return empty list instead of raising on connection failure
        # Log warning but don't crash

    def chat(self, ...) -> dict[str, str] | None:
        # Return error message dict instead of raising
        # {"role": "assistant", "content": "Ollama unavailable..."}
```

### Streaming Errors

- **Connection Lost**: Display partial response + error message, allow retry
- **Model Unavailable**: Fall back to non-streaming mode with error notice
- **Stream Timeout**: Cancel after 60s, show timeout message

### Command Errors

- **Invalid Command**: Show available commands with `/help`
- **Invalid Arguments**: Display command usage syntax
- **File Not Found**: Handle missing conversation archives gracefully
- **Permission Errors**: Show clear error message for storage access issues

### Conversation Loading Errors

- **Corrupted JSON**: Log error, skip file, continue with other conversations
- **Missing Files**: Remove from list, show warning
- **Version Mismatch**: Attempt migration or show incompatibility notice

## Testing Strategy

### Unit Tests

1. **StreamingHandler**
   - Token buffering logic
   - Thread-safe UI updates
   - Stream completion handling

2. **CommandProcessor**
   - Command parsing (valid/invalid formats)
   - Each command execution path
   - Error handling for malformed commands

3. **ConversationArchive**
   - Archive ID generation uniqueness
   - List/load/save operations
   - File system error handling

4. **ReasoningController**
   - Toggle state management
   - Preference persistence
   - API option generation

### Integration Tests

1. **End-to-End Streaming**
   - Mock Ollama stream response
   - Verify UI updates occur
   - Check final message persistence

2. **Command Workflows**
   - `/new` → archive → `/list` → `/resume` flow
   - Model switching with active conversation
   - Command execution during streaming (edge case)

3. **Multi-line Input**
   - Text wrapping behavior
   - Submit vs newline key handling
   - Height expansion limits

### Manual Testing Checklist

- [ ] Stream response displays smoothly without flicker
- [ ] Multi-line input expands and wraps correctly
- [ ] All commands execute successfully
- [ ] Conversation archives persist across restarts
- [ ] Resume loads correct conversation history
- [ ] Reasoning toggle affects model behavior
- [ ] UI remains responsive during streaming
- [ ] Error states display helpful messages

## Implementation Notes

### GTK4 Threading Considerations

- All UI updates must occur on main thread via `GLib.idle_add()`
- Worker threads for Ollama API calls to prevent UI blocking
- Use `GLib.PRIORITY_DEFAULT` for normal updates, `GLib.PRIORITY_HIGH` for critical UI state

### Performance Optimizations

- Buffer tokens (3-5 at a time) to reduce GLib.idle_add overhead
- Limit scroll updates to every 100ms during streaming
- Cache conversation metadata to avoid repeated file I/O
- Lazy-load conversation content only when resuming

### Backward Compatibility

- Existing `default.json` conversation file remains compatible
- New archive files use distinct naming pattern
- Preferences file is optional; defaults work without it
- Graceful degradation if gtk4-layer-shell unavailable

### Ollama Availability Detection

Add periodic checking mechanism to detect when Ollama becomes available:

```python
class OllamaAvailabilityMonitor:
    """Monitors Ollama availability and notifies UI of state changes."""

    def __init__(self, client: OllamaClient, callback: Callable[[bool], None]):
        self._client = client
        self._callback = callback
        self._last_state = False
        self._check_interval = 30  # seconds

    def start_monitoring(self) -> None:
        """Begin periodic availability checks via GLib.timeout_add."""

    def _check_availability(self) -> bool:
        """Check if Ollama is available and notify on state change."""
```

Integration in SidebarWindow:
- Initialize monitor on startup
- Update UI state when availability changes (enable/disable input, update status message)
- Show notification when Ollama becomes available: "Ollama connected - AI features enabled"

### Future Enhancements

- Command history with up/down arrow navigation
- Conversation search functionality
- Export conversations to markdown
- Custom keyboard shortcuts
- Syntax highlighting for code in messages
- Image/file attachment support