liminallm is an experiment in what a chatgpt-like system looks like if you stop hard-coding product logic and let the model help evolve itself.
it’s a small kernel wrapped around:
- a frozen base llm (jax)
- per-user / per-skill lora adapters
- emergent “skills” from clusters + preference events
- self-describing artifacts (workflows, routing policies, tools)
- notebooklm-style grounding over filesystem-backed files
- boring infra: postgres + redis + filesystem
- artifacts and adapter payloads live as JSON/weights on the shared filesystem
the code is just the glue. everything interesting lives as data.
User Feedback → Embeddings → Clustering → Skill Discovery
↑ ↓
Router Updates ← Adapter Training ← Promotion Decision
-
chatgpt-like web ui
- multi-user, password + pluggable auth
- conversations, history, summaries
- text first; voice later
-
deep behavioral memory
- per-user persona adapters (lora)
- skill adapters born from usage: “when problems like this show up, start with this debugging workflow”
- continuous micro-training jobs in jax, only on adapters, never on the base model
-
natural factual memory
- user files in the filesystem (
/users/{id}/files) - ingestion → chunking → embeddings in postgres (pgvector)
- notebooklm-style: bind “contexts” (collections of files/folders) to a chat and ask questions grounded in that corpus
- user files in the filesystem (
-
small kernel, big data
- kernel only knows how to:
- auth users
- run workflows (graphs)
- run routing policies
- call the llm with optional lora adapters
- talk to postgres / redis / filesystem
- everything else (domains, skills, behaviors, tools, routing rules) is expressed as artifacts:
adapter.loraworkflow.chatpolicy.routingtool.speccontext.knowledge- etc.
- kernel only knows how to:
-
emergent domains & skills
- no hard-coded
DEBUGGING,WRITING, whatever - we cluster preference events in embedding space
- llm labels clusters (“kernel panic debugging”, “multi-tenant billing schema design”, …)
- when a cluster is big + consistently positive, we auto-propose a new skill adapter tied to that cluster
- no hard-coded
-
router as data, not code
- routing policies are artifacts (
policy.routing) with a tiny expression language:- conditions over embeddings, clusters, safety flags
- actions: activate/deactivate adapters, scale weights, etc.
- the router engine is dumb and stable; policy is editable data
- routing policies are artifacts (
-
llm as architect (under guardrails)
- a config-ops api lets the llm propose patches to:
- routing policies
- workflows
- adapter metadata
- patches are stored, validated, can be auto- or human-approved, and are fully versioned
- a config-ops api lets the llm propose patches to:
-
language / runtime
- python (services, api, orchestration)
- jax (base model + lora training)
-
storage
- postgres
- users, auth, conversations, messages
- artifacts & versions
- semantic clusters
- knowledge chunks (with pgvector)
- preference events, training jobs, router state
- redis
- sessions
- rate limiting
- hot conversation summaries
- router and workflow scratch state
- filesystem
/shared/models– frozen base model weights/users/{id}/files– user docs/users/{id}/adapters– per-user lora weight files/users/{id}/artifacts– generated notebooks, exports, etc.
- postgres
-
services (logically)
- auth service
- chat orchestrator
- artifact service
- workflow engine
- router service
- llm inference (jax + lora)
- knowledge / rag service
- preference + training service
- clusterer + skill discovery
- configops (patch proposals / approvals)
for v1 these can all live in one python app with clear module boundaries.
- early design / prototyping
- do not treat as production-ready
- interfaces & schemas are expected to change
- goal is to keep:
- implementation minimal
- all "product behavior" in data (artifacts / policies / workflows)
- evolution driven by usage + llm suggestions, not constant code surgery
# Start the full test stack (Postgres, Redis, App)
docker compose -f docker-compose.test.yml up --build
# Verify health
curl http://localhost:8000/healthz
# Pre-configured admin credentials:
# Email: admin@test.local
# Password: TestAdmin123!This automatically:
- Runs database migrations
- Bootstraps an admin user
- Serves the chat UI at
http://localhost:8000/ - Serves the admin console at
http://localhost:8000/admin
For running directly on a Linux host without containers.
For development and testing without external dependencies:
# Install dependencies
pip install -e ".[dev]"
# Set required environment variables
export JWT_SECRET="YourSecure-JWT-Secret-With-32-Characters!"
export SHARED_FS_ROOT="/tmp/liminallm"
export USE_MEMORY_STORE=true
export TEST_MODE=true
# Start the API server
uvicorn liminallm.app:app --reload --host 0.0.0.0 --port 8000
# Verify health
curl http://localhost:8000/healthzFor persistent storage and production use:
# 1. Install system dependencies
sudo apt update
sudo apt install -y python3.11 python3.11-venv postgresql-16 redis-server libpq-dev gcc
# 2. Install PostgreSQL extensions
sudo -u postgres psql -c "CREATE DATABASE liminallm;"
sudo -u postgres psql -c "CREATE USER liminallm WITH PASSWORD 'yourpassword';"
sudo -u postgres psql -d liminallm -c "CREATE EXTENSION IF NOT EXISTS vector;"
sudo -u postgres psql -d liminallm -c "CREATE EXTENSION IF NOT EXISTS citext;"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE liminallm TO liminallm;"
# 3. Create Python environment and install
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
# 4. Create data directories
sudo mkdir -p /srv/liminallm/{adapters,artifacts,models,users}
sudo chown -R $USER:$USER /srv/liminallm
# 5. Set environment variables
export DATABASE_URL="postgresql://liminallm:yourpassword@localhost:5432/liminallm"
export REDIS_URL="redis://localhost:6379/0"
export JWT_SECRET="YourSecure-JWT-Secret-With-32-Characters!"
export SHARED_FS_ROOT="/srv/liminallm"
# 6. Run database migrations
./scripts/migrate.sh
# 7. Bootstrap admin user
python scripts/bootstrap_admin.py --email admin@test.local --password TestAdmin123!
# 8. Start the server
uvicorn liminallm.app:app --host 0.0.0.0 --port 8000 --workers 4# Run all tests in memory mode
TEST_MODE=true USE_MEMORY_STORE=true pytest tests/ -v
# Run specific test suites
pytest tests/test_post_smoke.py -v # Post-smoke tests
pytest tests/test_integration_admin.py -v # Admin tests
# Run smoke tests
./scripts/smoke_test.sh http://localhost:8000See docs/DEPLOYMENT.md for complete native deployment documentation including systemd services, TLS setup, and GPU configuration.
Before QA begins, verify:
| Criterion | How to Verify |
|---|---|
| Health check | curl http://localhost:8000/healthz returns {"status": "healthy"} |
| Chat UI loads | Open http://localhost:8000/ in browser |
| User signup | Sign up via UI or POST /v1/auth/signup |
| User login | Log in via UI or POST /v1/auth/login |
| Send message | Create conversation and send via /v1/chat |
| Admin protected | Regular user gets 403 on /v1/admin/settings |
| Admin access | Admin user gets 200 on /v1/admin/settings |
| Tests pass | ./scripts/run_tests.sh passes on fresh install |
| Bootstrap works | python scripts/bootstrap_admin.py creates admin |
Run the automated smoke test:
./scripts/smoke_test.sh http://localhost:8000- Install and runtime guidance (docker compose and manual host setup) live in
docs/DEPLOYMENT.md - Configuration architecture documented in
docs/CONFIGURATION.md - Testing guide in
TESTING.md
- implemented
- file upload endpoint writing to the shared filesystem and ingesting chunks into RAG contexts with configurable chunk sizes; default retrieval runs against pgvector with shared deterministic embeddings (optional in-process hybrid fallback for dev/test)
- workflow execution with branching/parallel scheduling across
workflow.chatgraphs - router policies with a sandboxed evaluation engine (limited adapter gating usage)
- pluggable model backend that can target external API fine-tune IDs or local JAX+LoRA adapter application
- filesystem-backed LoRA adapter training that turns preference events into new adapter versions
- preference capture with clustering + skill adapter promotion and routing integration
- hardened auth + multi-tenant isolation: OAuth provider mapping, session revocation on password resets, error envelopes with stable
error.code, ownership-enforced artifact and conversation access (including workflows/tools), adapter checksum + path validation, and email verification flows
- MFA with TOTP enrollment (otpauth URL), session gating, and login verification
- email verification tokens with
/v1/auth/request_email_verificationand/v1/auth/verify_email - tenant-scoped conversation history enforcement in workflows and tool invocations
- HMAC-signed JWT access tokens with refresh rotation, tenant-aware sessions, and admin-only config endpoints
- preference UI and rich routing feedback loop
- LLM-as-architect auto-patch generation
- voice interface
- admin UI for patch approval
- chat and admin frontends prompt for MFA codes when required and revoke sessions on logout
note: this is intentionally vague; exact commands depend on how you wire the codebase.
- bring your infra
- postgres (with pgvector installed)
- redis
- filesystem path accessible to the app
- gpu / tpu for jax model if you expect to train adapters
- backend selection is single-sourced from the SQL deployment config (editable from the web console when wired); env vars only override if you set them explicitly
- set
MODEL_BACKEND=local_gpu_lorato target the local JAX+LoRA path instead of external API fine-tune IDs; omit or leave as the default to use the OpenAI-style plug. The JAX backend (LocalJaxLoRABackendinliminallm/service/model_backend.py) loads adapters from the filesystem, tokenizes prompts, runs a JAX forward pass, and enforces conservative shapes; it requires a JAX runtime and optionally a Transformers tokenizer for decode parity. OpenAI plug secrets live under adapter-specific env vars (see below).
- A minimal, ChatGPT-style UI now lives in
/frontendand is served by the FastAPI app at/with static assets mounted at/static/*. - Authenticate with
/v1/auth/login; the UI stores the issued bearer token/tenant ID locally and uses it for/v1/chat,/v1/conversations, and other API calls. - The admin console is separate at
/adminand is guarded by theadminrole (FastAPI enforces the role before serving the HTML). It surfaces config patch proposal/approval flows backed by/v1/config/*endpoints, tenant-scoped user administration (list/add/delete, role changes), adapter visibility, and a read-only inspector for database objects.
- Run
scripts/run_tests.shto mirror CI defaults; it compiles the code and executespytestwith in-memory stores enabled for deterministic local runs.
-
Router policies pick an adapter; the inference backend decides whether that means applying LoRA weights locally, swapping to a remote fine-tuned model ID, or injecting distilled prompt instructions on top of a black-box API.
-
Each
adapter.loraartifact carries abackendfield describing where inference happens:{ "kind": "adapter.lora.remote", "provider": "zhipu", "backend": "api", "base_model": "glm-4-air", "remote_model_id": "glm-4-air-ft-2025-11-01-u123-debug", "region": "cn-beijing", "cluster_id": "…", "applicability": { "natural_language": "u123: kernel panic debugging skill on GLM-4-Air", "embedding_centroid": [] } }{ "kind": "adapter.lora.local", "backend": "local", "provider": "aliyun", "base_model": "qwen2.5-32b-instruct", "cephfs_dir": "/users/u123/adapters/{id}", "rank": 8, "layers": [0, 1, 2, 3], "matrices": ["attn_q", "attn_v"], "cluster_id": "…" }{ "kind": "adapter.lora.prompt", "backend": "prompt", "provider": "api_only", "base_model": "glm-4-air", "prompt_instructions": "for kernel issues: reproduce → bisect → log inspection; keep replies terse", "cluster_id": "…", "applicability": { "natural_language": "prompt-distilled skill for kernel debugging", "embedding_centroid": [] } } -
Remote adapters send requests to OpenAI-compatible fine-tuned model IDs (e.g., Zhipu BigModel or Alibaba DashScope). Local adapters resolve to filesystem-backed LoRA weights and are composable. Prompt-distilled adapters inject behavior as system messages without changing model IDs so you can still steer API-only providers.
-
“Model-ID adapters” (fine-tuned endpoints) map 1:1 to model strings on providers like OpenAI/Azure (fine-tuned deployments), Vertex AI Gemini, or Bedrock custom models. Switching behavior = switching the
modelstring; composition happens at routing time, not inside a single call. -
“Adapter-ID adapters” (multi-LoRA / adapter servers) surface
adapter_idparameters on Together AI Serverless Multi-LoRA, LoRAX-style servers, or SageMaker adapter inference components. The backend keeps the base model string and passesadapter_idfor one-or-more adapters per request when supported. -
Hybrid patterns (local adapter-enabled “controller” + external API “executor”) flow through the same artifacts: the controller uses a local LoRA backend to plan, then the API backend executes with prompt or remote-model adapters.
-
configure env
DATABASE_URL– postgres dsnREDIS_URL– redis dsnSHARED_FS_ROOT– filesystem root pathMODEL_PATH– model identifier for cloud mode (defaultgpt-4o-mini) or filesystem path when using an adapter serverOPENAI_ADAPTER_API_KEY– OpenAI plug API key (leave unset to use the echo fallback)OPENAI_ADAPTER_BASE_URL– optional base URL override when pointing at an OpenAI-compatible endpointADAPTER_SERVER_MODEL– model name when pointing at an OpenAI-compatible adapter serverUSE_MEMORY_STORE– set totrueto run without Postgres/Redis while testing the API and LLM callsTEST_MODE– set totrueto allow Redis-free test harnesses (rate limits, idempotency durability, and caches are disabled)RAG_CHUNK_SIZE– default character window for knowledge ingestion; overrides can be provided per requestRAG_MODE–pgvector(default) uses the database index;local_hybridforces the in-process BM25+cosine fallback for dev/test
-
migrate db
- run the alembic / migration tool to create tables described in the spec.
- if you ran earlier builds, delete
${SHARED_FS_ROOT}/state/training_pg.jsonafter upgrading to purge legacy MFA secrets (secrets are now sourced solely from theuser_mfa_secrettable).
4a. preference_event → adapter dataset → tokenized batches
preference_eventrows (positive feedback) capturecontext_embedding,score, and optionalcontext_text; they are clustered per-user to build adapter personas.- the training service reconstructs prompts from recent messages, appends any provided context snippet, and uses corrected text as targets while tracking cluster centroids.
- dataset rows are written to
${SHARED_FS_ROOT}/users/{user_id}/adapters/{adapter_id}/jobs/{job_id}/dataset.jsonl. - tokenized batches carry shapes for the downstream JAX/Optax loop (padding + masks, no base-model update), and training metadata records batch shapes + cluster summaries.
- adapter metadata and params are stored under
${SHARED_FS_ROOT}/users/{user_id}/adapters/{adapter_id}/v####/.
-
start services
- run the api server (http + websocket for streaming)
- run a background worker for:
- ingestion / embeddings
- clustering
- adapter training
- configops patch application
-
open the web ui
- sign up / log in
- create a conversation
- upload a few files, create a knowledge context, and attach it to a chat
- start talking to see basic chat + rag behavior
- enable preference capture + adapters once that’s wired
- minimal chat with postgres-backed conversations
- file upload + filesystem + rag over pgvector chunks
- artifacts for workflows + tools (no adapters yet)
- preference events + single persona adapter per user
- semantic clustering + skill adapters
- router policies as data + simple editor
- configops api + llm-generated patches
- mobile / voice clients (optional layer)
MIT
See TESTING.md for comprehensive testing documentation.
# Quick test (in-memory, no external dependencies)
./scripts/run_tests.sh
# Full integration test with Docker
docker compose -f docker-compose.test.yml up --build
./scripts/smoke_test.shKey endpoints (Bearer access token required):
POST /v1/auth/signup→ returns session + signed access/refresh tokensPOST /v1/auth/login→ returns tokens, with MFA gating when enabledPOST /v1/auth/refresh→ rotates refresh tokensPOST /v1/chat→ creates conversation + LLM replyGET /v1/artifacts→ lists data-driven workflows/policiesGET /v1/admin/settings→ admin-only system settings
Admin endpoints (/v1/admin/*, /v1/config/*) require admin role.
- local rate limits now fall back to in-process counters when Redis is unavailable (TEST_MODE), covering auth and chat flows
- uploads are capped by
MAX_UPLOAD_BYTESto prevent unbounded in-memory reads