Skip to content

Conversation

@basnijholt
Copy link
Owner

@basnijholt basnijholt commented Jan 4, 2026

Summary

Integrates the vector-backed memory system (agent_cli/memory/MemoryClient) into the chat agent via LLM tools. The LLM can explicitly store and retrieve memories, with user consent. Also supports automatic memory extraction mode.

Changes

  • config.py: Added Memory dataclass for memory configuration with 3 modes (off, tools, auto)
  • opts.py: Added CLI options (--memory-mode, --memory-path, --memory-top-k, etc.)
  • _tools.py: Refactored to class-based MemoryTools bound to MemoryClient
  • chat.py: Initialize MemoryClient on startup, pass to tools, automatic extraction in auto mode
  • docs/: Updated documentation for memory system
  • tests/: Added comprehensive tests for MemoryTools class

Memory Modes

Mode Behavior
off Memory system disabled
tools (default) LLM decides when to store/retrieve via tools
auto Automatic extraction after each conversation turn

Memory Tools

Tool Description
add_memory Store information (LLM asks permission first)
search_memory Semantic search across stored memories
list_all_memories List all stored memories

CLI Options

Option Default Description
--memory-mode tools Memory mode: off, tools, or auto
--memory-path Auto Path for memory database storage
--embedding-model text-embedding-3-small Embedding model for semantic search
--memory-top-k 5 Number of memories to retrieve
--memory-score-threshold 0.35 Minimum relevance score

How it Works

  1. Memory system auto-initializes if [memory] extra is installed
  2. In tools mode: LLM has access to memory tools and decides when to use them
  3. In auto mode: Facts are automatically extracted after each conversation turn
  4. Falls back gracefully if memory extra not installed

Test Plan

  • All tests pass (including new MemoryTools tests)
  • Pre-commit checks pass
  • Manual test: memory system initializes
  • Manual test: LLM asks permission before storing (tools mode)

Closes #158

basnijholt and others added 11 commits January 4, 2026 00:01
Replace the simple JSON-based memory in the chat agent with the advanced
MemoryClient that provides semantic search via ChromaDB and embeddings.

Changes:
- Add AdvancedMemory config dataclass with memory system options
- Add CLI options: --advanced-memory, --memory-path, --memory-top-k, etc.
- Refactor _tools.py to support dual backends (simple JSON vs advanced)
- Initialize MemoryClient in chat agent when --advanced-memory enabled
- Auto-fallback to simple memory if [memory] extra not installed

The advanced memory system is enabled by default and provides:
- Semantic search using vector embeddings
- MMR retrieval for diverse results
- Recency weighting and score thresholds
- Automatic fact extraction and reconciliation

Closes #158
- Add Memory System section to chat.md explaining the new advanced
  memory feature with semantic search
- Add cross-links between chat, memory, and architecture docs
- Regenerate auto-generated options tables to include new Memory Options
Remove the simple JSON-based memory system, keeping only the vector-backed
MemoryClient. This simplifies the codebase by eliminating the dual-backend
logic and the --advanced-memory flag.

- Rename AdvancedMemory config to Memory, remove enabled field
- Remove all simple memory functions from _tools.py
- Rename init_advanced_memory/cleanup_advanced_memory to init_memory/cleanup_memory
- Update chat.py to use simplified memory initialization
- Update documentation to remove "advanced" terminology
- Remove obsolete test_memory_tools.py
- Replace closure-based memory tools with MemoryTools class
- Pass memory_client and conversation_id directly to tools()
- Remove module-level globals (_memory_client, _conversation_id)
- Remove init_memory/cleanup_memory lifecycle functions
- Update chat.py to handle memory client lifecycle directly
- Add proper type hints using TYPE_CHECKING imports
- Update tests to pass new required parameters
The tool was misleading - it counted entries by internal role
(memory, user, assistant, summary) rather than user-facing
categories (personal, preferences, facts, etc.).
@basnijholt basnijholt changed the title feat(chat): integrate advanced vector-backed memory system feat(chat): integrate vector-backed memory system with LLM tools Jan 4, 2026
Uses a hash of the history directory path to ensure consistency across sessions.
"""
import hashlib # noqa: PLC0415
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import on top

Returns the MemoryClient if successful, None otherwise.
"""
from agent_cli.memory.client import MemoryClient as MemoryClientImpl # noqa: PLC0415
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just normal please

basnijholt and others added 15 commits January 4, 2026 01:10
Add memory mode selection with three options:
- off: Memory system disabled
- tools: LLM decides via add_memory/search_memory tools (default)
- auto: Automatic fact extraction after each conversation turn

The modes are mutually exclusive to avoid duplicate memory storage.
In "auto" mode, facts are automatically extracted from both user and
assistant messages without requiring explicit tool calls.

Resolves #184 as part of #183
Pass the user's configured openai model to extract_from_turn() instead
of using the default gpt-5-mini, which may not be available on custom
OpenAI-compatible endpoints.
- Fix opts.with_default type hint (str → Any) for bool support
- Fix FBT003 lint errors by using keyword arg default=True
- Fix tests using old --no-git-versioning option name
- Add comprehensive tests for MemoryTools class (30 tests)
- Document memory modes (off/tools/auto) in chat.md
In "auto" mode, the LLM now has read-only access to memory tools
(search_memory, list_all_memories) while extraction still happens
automatically. Previously, auto mode disabled all memory access
for the LLM, meaning stored facts couldn't be searched.

Also added read_only parameter to create_memory_tools() and
memory_read_only parameter to tools() function with tests.
In "auto" mode, relevant memories are now automatically retrieved
and injected into the system prompt before each LLM call. This
mirrors the memory-proxy behavior but only in auto mode.

Memory mode behavior:
- off: No memory at all
- tools: LLM has full control via tools (no auto-injection)
- auto: Auto-inject + read-only tools + auto-extract after turn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor: Integrate advanced memory system into chat agent

2 participants