-
Notifications
You must be signed in to change notification settings - Fork 9
feat(chat): integrate vector-backed memory system with LLM tools #183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
basnijholt
wants to merge
26
commits into
main
Choose a base branch
from
feat/chat-memory-integration
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Replace the simple JSON-based memory in the chat agent with the advanced MemoryClient that provides semantic search via ChromaDB and embeddings. Changes: - Add AdvancedMemory config dataclass with memory system options - Add CLI options: --advanced-memory, --memory-path, --memory-top-k, etc. - Refactor _tools.py to support dual backends (simple JSON vs advanced) - Initialize MemoryClient in chat agent when --advanced-memory enabled - Auto-fallback to simple memory if [memory] extra not installed The advanced memory system is enabled by default and provides: - Semantic search using vector embeddings - MMR retrieval for diverse results - Recency weighting and score thresholds - Automatic fact extraction and reconciliation Closes #158
- Add Memory System section to chat.md explaining the new advanced memory feature with semantic search - Add cross-links between chat, memory, and architecture docs - Regenerate auto-generated options tables to include new Memory Options
Remove the simple JSON-based memory system, keeping only the vector-backed MemoryClient. This simplifies the codebase by eliminating the dual-backend logic and the --advanced-memory flag. - Rename AdvancedMemory config to Memory, remove enabled field - Remove all simple memory functions from _tools.py - Rename init_advanced_memory/cleanup_advanced_memory to init_memory/cleanup_memory - Update chat.py to use simplified memory initialization - Update documentation to remove "advanced" terminology - Remove obsolete test_memory_tools.py
- Replace closure-based memory tools with MemoryTools class - Pass memory_client and conversation_id directly to tools() - Remove module-level globals (_memory_client, _conversation_id) - Remove init_memory/cleanup_memory lifecycle functions - Update chat.py to handle memory client lifecycle directly - Add proper type hints using TYPE_CHECKING imports - Update tests to pass new required parameters
The tool was misleading - it counted entries by internal role (memory, user, assistant, summary) rather than user-facing categories (personal, preferences, facts, etc.).
basnijholt
commented
Jan 4, 2026
agent_cli/agents/chat.py
Outdated
| Uses a hash of the history directory path to ensure consistency across sessions. | ||
| """ | ||
| import hashlib # noqa: PLC0415 |
Owner
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import on top
agent_cli/agents/chat.py
Outdated
| Returns the MemoryClient if successful, None otherwise. | ||
| """ | ||
| from agent_cli.memory.client import MemoryClient as MemoryClientImpl # noqa: PLC0415 |
Owner
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just normal please
Add memory mode selection with three options: - off: Memory system disabled - tools: LLM decides via add_memory/search_memory tools (default) - auto: Automatic fact extraction after each conversation turn The modes are mutually exclusive to avoid duplicate memory storage. In "auto" mode, facts are automatically extracted from both user and assistant messages without requiring explicit tool calls. Resolves #184 as part of #183
…ify MemoryClient import
Pass the user's configured openai model to extract_from_turn() instead of using the default gpt-5-mini, which may not be available on custom OpenAI-compatible endpoints.
- Fix opts.with_default type hint (str → Any) for bool support - Fix FBT003 lint errors by using keyword arg default=True - Fix tests using old --no-git-versioning option name - Add comprehensive tests for MemoryTools class (30 tests) - Document memory modes (off/tools/auto) in chat.md
In "auto" mode, the LLM now has read-only access to memory tools (search_memory, list_all_memories) while extraction still happens automatically. Previously, auto mode disabled all memory access for the LLM, meaning stored facts couldn't be searched. Also added read_only parameter to create_memory_tools() and memory_read_only parameter to tools() function with tests.
In "auto" mode, relevant memories are now automatically retrieved and injected into the system prompt before each LLM call. This mirrors the memory-proxy behavior but only in auto mode. Memory mode behavior: - off: No memory at all - tools: LLM has full control via tools (no auto-injection) - auto: Auto-inject + read-only tools + auto-extract after turn
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Integrates the vector-backed memory system (
agent_cli/memory/MemoryClient) into the chat agent via LLM tools. The LLM can explicitly store and retrieve memories, with user consent. Also supports automatic memory extraction mode.Changes
config.py: AddedMemorydataclass for memory configuration with 3 modes (off, tools, auto)opts.py: Added CLI options (--memory-mode,--memory-path,--memory-top-k, etc.)_tools.py: Refactored to class-basedMemoryToolsbound toMemoryClientchat.py: InitializeMemoryClienton startup, pass to tools, automatic extraction in auto modedocs/: Updated documentation for memory systemtests/: Added comprehensive tests forMemoryToolsclassMemory Modes
offtools(default)autoMemory Tools
add_memorysearch_memorylist_all_memoriesCLI Options
--memory-modetools--memory-path--embedding-modeltext-embedding-3-small--memory-top-k5--memory-score-threshold0.35How it Works
[memory]extra is installedtoolsmode: LLM has access to memory tools and decides when to use themautomode: Facts are automatically extracted after each conversation turnTest Plan
Closes #158