liminallm

liminallm is an experiment in what a chatgpt-like system looks like if you stop hard-coding product logic and let the model help evolve itself.

it’s a small kernel wrapped around:

a frozen base llm (jax)
per-user / per-skill lora adapters
emergent “skills” from clusters + preference events
self-describing artifacts (workflows, routing policies, tools)
notebooklm-style grounding over filesystem-backed files
boring infra: postgres + redis + filesystem
artifacts and adapter payloads live as JSON/weights on the shared filesystem

the code is just the glue. everything interesting lives as data.

what it does (conceptually)

feedback loop (at a glance)

User Feedback → Embeddings → Clustering → Skill Discovery
     ↑                                            ↓
Router Updates ← Adapter Training ← Promotion Decision

chatgpt-like web ui
- multi-user, password + pluggable auth
- conversations, history, summaries
- text first; voice later
deep behavioral memory
- per-user persona adapters (lora)
- skill adapters born from usage: “when problems like this show up, start with this debugging workflow”
- continuous micro-training jobs in jax, only on adapters, never on the base model
natural factual memory
- user files in the filesystem (/users/{id}/files)
- ingestion → chunking → embeddings in postgres (pgvector)
- notebooklm-style: bind “contexts” (collections of files/folders) to a chat and ask questions grounded in that corpus
small kernel, big data
- kernel only knows how to:
  - auth users
  - run workflows (graphs)
  - run routing policies
  - call the llm with optional lora adapters
  - talk to postgres / redis / filesystem
- everything else (domains, skills, behaviors, tools, routing rules) is expressed as artifacts:
  - adapter.lora
  - workflow.chat
  - policy.routing
  - tool.spec
  - context.knowledge
  - etc.
emergent domains & skills
- no hard-coded DEBUGGING, WRITING, whatever
- we cluster preference events in embedding space
- llm labels clusters (“kernel panic debugging”, “multi-tenant billing schema design”, …)
- when a cluster is big + consistently positive, we auto-propose a new skill adapter tied to that cluster
router as data, not code
- routing policies are artifacts (policy.routing) with a tiny expression language:
  - conditions over embeddings, clusters, safety flags
  - actions: activate/deactivate adapters, scale weights, etc.
- the router engine is dumb and stable; policy is editable data
llm as architect (under guardrails)
- a config-ops api lets the llm propose patches to:
  - routing policies
  - workflows
  - adapter metadata
- patches are stored, validated, can be auto- or human-approved, and are fully versioned

architecture (stack)

language / runtime
- python (services, api, orchestration)
- jax (base model + lora training)
storage
- postgres
  - users, auth, conversations, messages
  - artifacts & versions
  - semantic clusters
  - knowledge chunks (with pgvector)
  - preference events, training jobs, router state
- redis
  - sessions
  - rate limiting
  - hot conversation summaries
  - router and workflow scratch state
- filesystem
  - /shared/models – frozen base model weights
  - /users/{id}/files – user docs
  - /users/{id}/adapters – per-user lora weight files
  - /users/{id}/artifacts – generated notebooks, exports, etc.
services (logically)
- auth service
- chat orchestrator
- artifact service
- workflow engine
- router service
- llm inference (jax + lora)
- knowledge / rag service
- preference + training service
- clusterer + skill discovery
- configops (patch proposals / approvals)

for v1 these can all live in one python app with clear module boundaries.

current status

early design / prototyping
- do not treat as production-ready
- interfaces & schemas are expected to change
goal is to keep:
- implementation minimal
- all "product behavior" in data (artifacts / policies / workflows)
- evolution driven by usage + llm suggestions, not constant code surgery

quick start

Option 1: Docker (Recommended for QA/Testing)

# Start the full test stack (Postgres, Redis, App)
docker compose -f docker-compose.test.yml up --build

# Verify health
curl http://localhost:8000/healthz

# Pre-configured admin credentials:
#   Email: admin@test.local
#   Password: TestAdmin123!

This automatically:

Runs database migrations
Bootstraps an admin user
Serves the chat UI at http://localhost:8000/
Serves the admin console at http://localhost:8000/admin

Option 2: Native Deployment (No Docker)

For running directly on a Linux host without containers.

Quick Start (In-Memory Mode)

For development and testing without external dependencies:

# Install dependencies
pip install -e ".[dev]"

# Set required environment variables
export JWT_SECRET="YourSecure-JWT-Secret-With-32-Characters!"
export SHARED_FS_ROOT="/tmp/liminallm"
export USE_MEMORY_STORE=true
export TEST_MODE=true

# Start the API server
uvicorn liminallm.app:app --reload --host 0.0.0.0 --port 8000

# Verify health
curl http://localhost:8000/healthz

Production (With PostgreSQL and Redis)

For persistent storage and production use:

# 1. Install system dependencies
sudo apt update
sudo apt install -y python3.11 python3.11-venv postgresql-16 redis-server libpq-dev gcc

# 2. Install PostgreSQL extensions
sudo -u postgres psql -c "CREATE DATABASE liminallm;"
sudo -u postgres psql -c "CREATE USER liminallm WITH PASSWORD 'yourpassword';"
sudo -u postgres psql -d liminallm -c "CREATE EXTENSION IF NOT EXISTS vector;"
sudo -u postgres psql -d liminallm -c "CREATE EXTENSION IF NOT EXISTS citext;"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE liminallm TO liminallm;"

# 3. Create Python environment and install
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# 4. Create data directories
sudo mkdir -p /srv/liminallm/{adapters,artifacts,models,users}
sudo chown -R $USER:$USER /srv/liminallm

# 5. Set environment variables
export DATABASE_URL="postgresql://liminallm:yourpassword@localhost:5432/liminallm"
export REDIS_URL="redis://localhost:6379/0"
export JWT_SECRET="YourSecure-JWT-Secret-With-32-Characters!"
export SHARED_FS_ROOT="/srv/liminallm"

# 6. Run database migrations
./scripts/migrate.sh

# 7. Bootstrap admin user
python scripts/bootstrap_admin.py --email admin@test.local --password TestAdmin123!

# 8. Start the server
uvicorn liminallm.app:app --host 0.0.0.0 --port 8000 --workers 4

Running Tests (Native)

# Run all tests in memory mode
TEST_MODE=true USE_MEMORY_STORE=true pytest tests/ -v

# Run specific test suites
pytest tests/test_post_smoke.py -v  # Post-smoke tests
pytest tests/test_integration_admin.py -v  # Admin tests

# Run smoke tests
./scripts/smoke_test.sh http://localhost:8000

See docs/DEPLOYMENT.md for complete native deployment documentation including systemd services, TLS setup, and GPU configuration.

acceptance criteria (ready to test)

Before QA begins, verify:

Criterion	How to Verify
Health check	`curl http://localhost:8000/healthz` returns `{"status": "healthy"}`
Chat UI loads	Open `http://localhost:8000/` in browser
User signup	Sign up via UI or `POST /v1/auth/signup`
User login	Log in via UI or `POST /v1/auth/login`
Send message	Create conversation and send via `/v1/chat`
Admin protected	Regular user gets 403 on `/v1/admin/settings`
Admin access	Admin user gets 200 on `/v1/admin/settings`
Tests pass	`./scripts/run_tests.sh` passes on fresh install
Bootstrap works	`python scripts/bootstrap_admin.py` creates admin

Run the automated smoke test:

./scripts/smoke_test.sh http://localhost:8000

deployment

Install and runtime guidance (docker compose and manual host setup) live in docs/DEPLOYMENT.md
Configuration architecture documented in docs/CONFIGURATION.md
Testing guide in TESTING.md

implementation completeness (prototype)

implemented
- file upload endpoint writing to the shared filesystem and ingesting chunks into RAG contexts with configurable chunk sizes; default retrieval runs against pgvector with shared deterministic embeddings (optional in-process hybrid fallback for dev/test)
- workflow execution with branching/parallel scheduling across workflow.chat graphs
- router policies with a sandboxed evaluation engine (limited adapter gating usage)
- pluggable model backend that can target external API fine-tune IDs or local JAX+LoRA adapter application
- filesystem-backed LoRA adapter training that turns preference events into new adapter versions
- preference capture with clustering + skill adapter promotion and routing integration
- hardened auth + multi-tenant isolation: OAuth provider mapping, session revocation on password resets, error envelopes with stable error.code, ownership-enforced artifact and conversation access (including workflows/tools), adapter checksum + path validation, and email verification flows
MFA with TOTP enrollment (otpauth URL), session gating, and login verification
email verification tokens with /v1/auth/request_email_verification and /v1/auth/verify_email
tenant-scoped conversation history enforcement in workflows and tool invocations
HMAC-signed JWT access tokens with refresh rotation, tenant-aware sessions, and admin-only config endpoints
preference UI and rich routing feedback loop
LLM-as-architect auto-patch generation
voice interface
admin UI for patch approval
chat and admin frontends prompt for MFA codes when required and revoke sessions on logout

getting started (high level)

note: this is intentionally vague; exact commands depend on how you wire the codebase.

bring your infra
- postgres (with pgvector installed)
- redis
- filesystem path accessible to the app

gpu / tpu for jax model if you expect to train adapters
backend selection is single-sourced from the SQL deployment config (editable from the web console when wired); env vars only override if you set them explicitly
set MODEL_BACKEND=local_gpu_lora to target the local JAX+LoRA path instead of external API fine-tune IDs; omit or leave as the default to use the OpenAI-style plug. The JAX backend (LocalJaxLoRABackend in liminallm/service/model_backend.py) loads adapters from the filesystem, tokenizes prompts, runs a JAX forward pass, and enforces conservative shapes; it requires a JAX runtime and optionally a Transformers tokenizer for decode parity. OpenAI plug secrets live under adapter-specific env vars (see below).

frontend (chat + admin)

A minimal, ChatGPT-style UI now lives in /frontend and is served by the FastAPI app at / with static assets mounted at /static/*.
Authenticate with /v1/auth/login; the UI stores the issued bearer token/tenant ID locally and uses it for /v1/chat, /v1/conversations, and other API calls.
The admin console is separate at /admin and is guarded by the admin role (FastAPI enforces the role before serving the HTML). It surfaces config patch proposal/approval flows backed by /v1/config/* endpoints, tenant-scoped user administration (list/add/delete, role changes), adapter visibility, and a read-only inspector for database objects.

tests

Run scripts/run_tests.sh to mirror CI defaults; it compiles the code and executes pytest with in-memory stores enabled for deterministic local runs.

adapters: local LoRA vs remote fine-tune IDs vs prompt-distilled

Router policies pick an adapter; the inference backend decides whether that means applying LoRA weights locally, swapping to a remote fine-tuned model ID, or injecting distilled prompt instructions on top of a black-box API.

Each adapter.lora artifact carries a backend field describing where inference happens:

{
  "kind": "adapter.lora.remote",
  "provider": "zhipu",
  "backend": "api",
  "base_model": "glm-4-air",
  "remote_model_id": "glm-4-air-ft-2025-11-01-u123-debug",
  "region": "cn-beijing",
  "cluster_id": "…",
  "applicability": {
    "natural_language": "u123: kernel panic debugging skill on GLM-4-Air",
    "embedding_centroid": []
  }
}

{
  "kind": "adapter.lora.local",
  "backend": "local",
  "provider": "aliyun",
  "base_model": "qwen2.5-32b-instruct",
  "cephfs_dir": "/users/u123/adapters/{id}",
  "rank": 8,
  "layers": [0, 1, 2, 3],
  "matrices": ["attn_q", "attn_v"],
  "cluster_id": "…"
}

{
  "kind": "adapter.lora.prompt",
  "backend": "prompt",
  "provider": "api_only",
  "base_model": "glm-4-air",
  "prompt_instructions": "for kernel issues: reproduce → bisect → log inspection; keep replies terse",
  "cluster_id": "…",
  "applicability": {
    "natural_language": "prompt-distilled skill for kernel debugging",
    "embedding_centroid": []
  }
}

Remote adapters send requests to OpenAI-compatible fine-tuned model IDs (e.g., Zhipu BigModel or Alibaba DashScope). Local adapters resolve to filesystem-backed LoRA weights and are composable. Prompt-distilled adapters inject behavior as system messages without changing model IDs so you can still steer API-only providers.
“Model-ID adapters” (fine-tuned endpoints) map 1:1 to model strings on providers like OpenAI/Azure (fine-tuned deployments), Vertex AI Gemini, or Bedrock custom models. Switching behavior = switching the model string; composition happens at routing time, not inside a single call.
“Adapter-ID adapters” (multi-LoRA / adapter servers) surface adapter_id parameters on Together AI Serverless Multi-LoRA, LoRAX-style servers, or SageMaker adapter inference components. The backend keeps the base model string and passes adapter_id for one-or-more adapters per request when supported.
Hybrid patterns (local adapter-enabled “controller” + external API “executor”) flow through the same artifacts: the controller uses a local LoRA backend to plan, then the API backend executes with prompt or remote-model adapters.

configure env
- DATABASE_URL – postgres dsn
- REDIS_URL – redis dsn
- SHARED_FS_ROOT – filesystem root path
- MODEL_PATH – model identifier for cloud mode (default gpt-4o-mini) or filesystem path when using an adapter server
- OPENAI_ADAPTER_API_KEY – OpenAI plug API key (leave unset to use the echo fallback)
- OPENAI_ADAPTER_BASE_URL – optional base URL override when pointing at an OpenAI-compatible endpoint
- ADAPTER_SERVER_MODEL – model name when pointing at an OpenAI-compatible adapter server
- USE_MEMORY_STORE – set to true to run without Postgres/Redis while testing the API and LLM calls
- TEST_MODE – set to true to allow Redis-free test harnesses (rate limits, idempotency durability, and caches are disabled)
- RAG_CHUNK_SIZE – default character window for knowledge ingestion; overrides can be provided per request
- RAG_MODE – pgvector (default) uses the database index; local_hybrid forces the in-process BM25+cosine fallback for dev/test
migrate db
- run the alembic / migration tool to create tables described in the spec.
- if you ran earlier builds, delete ${SHARED_FS_ROOT}/state/training_pg.json after upgrading to purge legacy MFA secrets (secrets are now sourced solely from the user_mfa_secret table).

4a. preference_event → adapter dataset → tokenized batches

preference_event rows (positive feedback) capture context_embedding, score, and optional context_text; they are clustered per-user to build adapter personas.
the training service reconstructs prompts from recent messages, appends any provided context snippet, and uses corrected text as targets while tracking cluster centroids.
dataset rows are written to ${SHARED_FS_ROOT}/users/{user_id}/adapters/{adapter_id}/jobs/{job_id}/dataset.jsonl.
tokenized batches carry shapes for the downstream JAX/Optax loop (padding + masks, no base-model update), and training metadata records batch shapes + cluster summaries.
adapter metadata and params are stored under ${SHARED_FS_ROOT}/users/{user_id}/adapters/{adapter_id}/v####/.

start services
- run the api server (http + websocket for streaming)
- run a background worker for:
  - ingestion / embeddings
  - clustering
  - adapter training
  - configops patch application
open the web ui
- sign up / log in
- create a conversation
- upload a few files, create a knowledge context, and attach it to a chat
- start talking to see basic chat + rag behavior
- enable preference capture + adapters once that’s wired

roadmap (rough)

minimal chat with postgres-backed conversations
file upload + filesystem + rag over pgvector chunks
artifacts for workflows + tools (no adapters yet)
preference events + single persona adapter per user
semantic clustering + skill adapters
router policies as data + simple editor
configops api + llm-generated patches
mobile / voice clients (optional layer)

license

MIT

testing

See TESTING.md for comprehensive testing documentation.

# Quick test (in-memory, no external dependencies)
./scripts/run_tests.sh

# Full integration test with Docker
docker compose -f docker-compose.test.yml up --build
./scripts/smoke_test.sh

API endpoints

Key endpoints (Bearer access token required):

POST /v1/auth/signup → returns session + signed access/refresh tokens
POST /v1/auth/login → returns tokens, with MFA gating when enabled
POST /v1/auth/refresh → rotates refresh tokens
POST /v1/chat → creates conversation + LLM reply
GET /v1/artifacts → lists data-driven workflows/policies
GET /v1/admin/settings → admin-only system settings

Admin endpoints (/v1/admin/*, /v1/config/*) require admin role.

operational hardening

local rate limits now fall back to in-process counters when Redis is unavailable (TEST_MODE), covering auth and chat flows
uploads are capped by MAX_UPLOAD_BYTES to prevent unbounded in-memory reads

Name		Name	Last commit message	Last commit date
Latest commit History 609 Commits
.github/workflows		.github/workflows
deploy/openbsd		deploy/openbsd
docs		docs
frontend		frontend
liminallm		liminallm
scripts		scripts
sql		sql
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SPEC.md		SPEC.md
TESTING.md		TESTING.md
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yaml		docker-compose.yaml
nginx.conf		nginx.conf
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

liminallm

what it does (conceptually)

feedback loop (at a glance)

architecture (stack)

current status

quick start

Option 1: Docker (Recommended for QA/Testing)

Option 2: Native Deployment (No Docker)

Quick Start (In-Memory Mode)

Production (With PostgreSQL and Redis)

Running Tests (Native)

acceptance criteria (ready to test)

deployment

implementation completeness (prototype)

getting started (high level)

frontend (chat + admin)

tests

adapters: local LoRA vs remote fine-tune IDs vs prompt-distilled

roadmap (rough)

license

testing

API endpoints

operational hardening

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

yellowman/liminallm

Folders and files

Latest commit

History

Repository files navigation

liminallm

what it does (conceptually)

feedback loop (at a glance)

architecture (stack)

current status

quick start

Option 1: Docker (Recommended for QA/Testing)

Option 2: Native Deployment (No Docker)

Quick Start (In-Memory Mode)

Production (With PostgreSQL and Redis)

Running Tests (Native)

acceptance criteria (ready to test)

deployment

implementation completeness (prototype)

getting started (high level)

frontend (chat + admin)

tests

adapters: local LoRA vs remote fine-tune IDs vs prompt-distilled

roadmap (rough)

license

testing

API endpoints

operational hardening

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages