Self-hosted gateway that mirrors Claude Proβs connection handshake, exposes an OpenAI-compatible API, and includes built-in identity management, Prometheus telemetry, web UI, and subscription/usage monitoring dashboards.
ClaudeBridge enables you to:
- Use your Claude Pro subscription anywhere either an OpenAI or Anthropic endpoint is accepted
- Login with Claude Pro/Max through a friendly web ui
- Record and gives complete observability over your subscription usage
- This include estimated $ value of the subscription
- Exposes models that do not seem otherwise available in ClaudeCode (namely
sonnet 3.7andopus 3) - Works across apps and machines without losing track of your usage
- Allows to share the subscription across several users or applications with internal tokens
- Shows real subscription usage in % for 5h and 7d window limits
Keep in mind this is immature code prone to bugs but the base function of enabling OpenAI-style client on ClaudeCode subsccription has been fairly stable.
There is a some amount of technical debt due to a late move to components and modular server paths resulting in potential code duplication between the ui and blueprints folders.
The code was also written with tailwind's CDN version which ended up breaking good chunks of the design when finally bundling the css.
The project was developed with the idea of writing a python backend with a well optimized, snappy and SPA-like frontend without writing a single line of Javascript or installing node.js/npm. This of course becomes increasingly hard as we bundle the app.
This includes in-line Javascript if it's more than 1-2 lines but does not include readily available WebComponent libraries.
I would need to look more closely about everything ClaudeCode does but so far the proxy is not adding noticeable delays to the answer in fact, it sometime seems to stream faster.
π‘ Keep in mind that this repo does not bundle the frontend dependencies so you need to run the build script, use the container or use the wheel build from the release section
π‘ If a password is not set in config or in the env variable or is not disabled one will be generated for you in stdout at first run
docker run -e DISABLE_UI_PASSWORD=true claudebridge # No password
docker run -e UI_PASSWORD=mysecret claudebridge # Passwordpip install claudeprobridge
python -m claudebridge.scripts.build- Install locally
pip install dist/claudebridge-0.1.0-py3-none-any.whl
# Alternatively, pip install . should work once built as long as `download_deps` has run once- Run
claudeprobridge
- Build Locally:
docker build -t claudebridge .- or pull from this repo's registry:
docker pull ghcr.io/ylanallouche/claudebridge:latest- Run
docker run -p 8000:8000 \
-v ~/.config/claudebridge:/root/.config/claudebridge \
# or -v ./claudebridge-data:/root/.config/claudebridge to map in current directory
-e DEBUG=info \
ghcr.io/ylanallouche/claudebridge:latest
# or just claudebridge to use locally built container
python -m claudebridge.scripts.download_deps # run once to cache the various CDN stored js/css bundles
python -m claudebridge.dev # will start the app with Flask in dev mode with auto reloader and DEBUG on as well as the tailwindcss cli watching the python files.π‘ Note that: if you do not want to use the service anymore, you can remove the session in your Anthropic console
First go to account and start the account connection steps:
In a browser you are logged into Anthropic:
- Go to http://localhost:8000 or whereever you are hosting the app
- Give your account a name - it can be anything, it's for local purposes only
- Open the link
- authorize, get the code
- paste into
ClaudeBridge
Optionally, to also get the % of use of your subscription:
- go to the
claude.ai settings>usageinspect the page> go tonetwork- refresh the page
- filter for the endpoint getting the data by typing
usage - look for the request the page uses to poll the subscription usage
- and paste into the second field of the same account page in ClaudeBridge.
- Go to users
- Add new user
- copy the auth token
π‘ the
copy to clipbardmay only work over https, it seems. You can always docat ~/.config/claudebridge/config.json | grep "<your-user-name>" -B1to get the token back.
Using
ANTHROPIC_BASE_URL="http://localhost:8000" ANTHROPIC_API_KEY="mykey" claudebureau = function()
return require("codecompanion.adapters").extend("openai_compatible", {
name = "local",
env = {
url = "http://localhost:8000",
chat_url = "/v1/chat/completions",
models_endpoint = "/v1/models",
api_key = "Your internal token here",
},
schema = {
model = {
default = "claude-haiku-4-5",
},
},
})
end,sequenceDiagram
participant Client
participant ClaudeBridge
participant MetricsManager as Metrics Manager<br/>/metrics
participant ClaudeAPI as Claude Pro/Max<br/>
participant UsageAPI as claude.ai/usage<br/>Web API
Client->>ClaudeBridge: POST /v1/chat/completions
ClaudeBridge->>MetricsManager: Capture Request Data
ClaudeBridge->>ClaudeBridge: Validate Token, Rate Limit
ClaudeBridge->>ClaudeAPI: Refresh Token (if needed)
ClaudeAPI-->>ClaudeBridge: New Access Token
ClaudeBridge->>ClaudeAPI: Stream Request + Token
ClaudeAPI-->>ClaudeBridge: Stream Response + Usage Metadata
ClaudeBridge->>MetricsManager: Capture Response Data
ClaudeBridge-->>Client: Response + Subscription Headers
par Web Session Polling
ClaudeBridge->>UsageAPI: Poll Usage Endpoint (5min interval)
UsageAPI-->>ClaudeBridge: Quota %, 5h/7d
ClaudeBridge->>MetricsManager: Capture Quota Data
ClaudeBridge->>ClaudeBridge: Update accounts.json
end
Client->>ClaudeBridge: GET /metrics
ClaudeBridge->>MetricsManager: Fetch Prometheus Metrics
MetricsManager-->>Client: Prometheus Format Response
The application has 2 sources of truths when it comes to figuring out the state of the account and session boundaries.
/usagepolling fromclaude.aiwhen availablerate-limitingreturned in the header on every request (this can only be confirmed when the user does make a request)
In order to do so:
-
the bridge initially assumes both session are ready
-
upon first request it will create the first session or close the previous one
-
use the new timestamp to create a new session
-
figure out if the new 5h session is part of the previous weekly limit or if a new 7d session also need to be rolled out
-
then upon either:
- hitting out-of-quota
- or letting the timer run out (from the initial 5 hours or 7 days since the start of the session)
- the session will end and the reason for termination will be inferred
-
termination reasons can be
natural: the account did not go through the full usage window and the timer ran outooq-5h: the account got to the 5 hour limitooq-7d: the account did not get to its 5 hour limit but got to its
-
the session is only fully confirmed to be ended once the next
200request goes through and will look like the "current" ones until then
I have only been using the Anthropic service for about a week so I'm not entirely sure I got the behavior right and it's difficult to mock.
It seems that the account can get a "grace period" when hitting 7d-OOQ but still pretty low in usage of the 5h session.
I have only seen it once but it also looks like the 7d session timer can also move around slightly so the server also has some basic guardrails for "rollover" termination reason.
Here is a full diagram of the logic:
flowchart TD
Start([User Starts Session])
subgraph "5h Session"
A5["π’ active"]
O5H["π΄ ooq_5h<br/>(quota hit)"]
O5D["π΄ ooq_7d<br/>(blocked)"]
B5D["π‘ blocked_by_7d<br/>(7d expired)"]
R5["π΅ ready"]
end
subgraph "7d Period"
A7["π’ active"]
O7["π΄ ooq_7d<br/>(quota hit)"]
R7["π΅ ready"]
end
Start --> A5
Start --> A7
A5 -->|5h quota exhausted| O5H
A5 -->|7d blocked| O5D
A5 -->|7d expires| B5D
A5 -->|time expires| R5
O5H --> R5
O5D --> R5
B5D --> R5
A7 -->|7d quota hit| O7
A7 -->|time expires| R7
O7 --> R7
O5D -.->|inherits from| O7
style O5D fill:#c46686
style O5H fill:#c46686
style O7 fill:#c46686
style A5 fill:#788c5d
style A7 fill:#788c5d
style R5 fill:#bcd1ca
style R7 fill:#bcd1ca
style B5D fill:#cc785c
The models used by ClaudeCode seem to be hardcoded and not documented dynamicaly on a /models endpont.
This mean we also have to document them manually.
To do so, the app contains all the models I could find to work.
In the future, if wanting to add a model you can simply enter a model in the models page of the app.
Then hit the "set cost" button to make sure that cost estimate is tracked for the new models.
For both built-in and custom models you can also hit the "test" button that will send a simple message to that model to check of it works which be shown to you in UI
You can find your local models overrides in ~/.config/claudebridge/config.json
Alternatively you can use the chat page to test the model further.
Note that:
- the conversation are not recorded anywhere
- both the "test" button and the built-in chat ui have thir usage tracked towards a default, built-in user called "frontend".
Lastly, you can block models from being used by your tokens and from being documented in the /models api endpoint.
β οΈ While the user tokens can be disabled entirely inconfigs.json, I would recommend setting one up beside security reasons: some clients don't seem to like it and it's untested (not sure how the inner metrics work without a user/token).
You can easily:
- create new user (I recomment setting one up per app)
- all you have to do is enter a username and press enter
- rotate keys
- set rate limit (not extensively tested)
Claudebridge tracks every request made:
- which user makes it
- with how many tokens in/out
- on which sessions (5h/7d)
- using which models
- at what estimated cost (updating the price of a model does not change the estimate of the previous calls)
It also has knowledge of the current time limit if any as well as state of the account:
- display all the metris in an internal dashboard
- both 7d (weekly limit) and 5h (session) Out-of-Quota monitoring
- Time based or rate-limit headers bases session bound calculation
- set or update model prices as well to keep the cost estimate accurate
And use all of that to display a live dashboard in a serie of collapsible elements.
- Global usage summary:
ClaudeBridge also exposes a /metrics endpoint for prometheus (which can be turned off in settings).
This allows to take the data and build anything with it.
Here is a quick example:
ClaudeBridge doesn't use a database at the moment but a set of json object that it constently writes to.
All of them are located in ~/.config/claudebridge/
config.json- Stores user configuration- sets all the different options
- admin UI password (stored plainly)
- Internal user/tokens
- User rate limits
- Blocked/custom models
- model cost overrides
- only one to reload if changed manually by the user
- metrics.json - Metrics Checkpoint
- restores all the metrics for the dashboard and prometheus
- rate_limits.json - Rate Limit State
- Sliding window data for rate limiting
- Per-token request/token counts over time
- allows for rate limiting to survive reboot
- Look into the low hanging fruits from Lightouse
- Wrong default log level on module
-
/chatdoes not fail gracefull with no account setup - too many waitresses related log
- custom models with custom pricing can appear twice in the models list (only visual)
- no visual confirmation when testing a custom model in /models page
- some inconsistence in the labels on the pill in the UI especially for
termination_reason - unnecessary/duplicate informations in accounts.json
- smarter polling when not making requests (currently 5 minutes) although that might be what keeps the session up
- Test everything, commit mock scripts for server answers first
- Clean duplicate logic between ui and blueprints
- Move heavily towards component
- Look into WA's theme system to remove most tailwind in-line classes
- better grafana/prometheus documentation
- Fix design left somewhat broken my tailwind migration
- One-click link to setup the container on public cloud
- Find a way to treeshake WA
- Fix every lsp error
- Will first require to do a better job witht the htmx module for pyhtml
- optional auth on prometheus /metrics endpoint
- while the % usage can be tracked over time in prometheus exporter in the case of a session ending naturally before reaching its window we could log how far in % the session was - currently only recording how long it took to reach it when reaching the end of the window which seems more interesting
- investigate optimal llm usage for sub as well if time of day can be a correlation
- [/] Multi-account setup with auto queue user requests accross account based on subscription state - was unable to test: not shipped
- add timer/usage endpoint to integrate in taskbar/tmux etc
- consider firing an event on weekly/session reset
- notify-send if not in docker
- use
smtpto send an email if setup in config
- Add new rate limit rule:
%maxof session (users can't submit query if subscription window is too advanced), andgrace_countdownhow many minutes before the reset of the session does the "%max" stops applying? - Option to switch logstyle from current dramatic formattic to
logfmt - Return the remaining time to reset the session directly in the 429 responses to display the timer as error message in clients
- ship as desktop app with a webview and inno/mac bundle
- Look into how this would work for people who use overrage when going over the subscription
- build alternate way to expose the Anthropic services by relying on the SDK json formatting and streaming capabilities
- Before nearly giving up on the current connection, I had some good result with it
- integrate common llm-capabilities that may not be specifically handled or captured at the moment:
- stop parameter
- temperature
- top_p
- max_tokens
- Prompt caching
- Citations
- PDF support
- deep-chat
- loguru
- flask/waitress
- pyHtml - and my modules for WA/CEM processing and htmx
- htmx
- tailwindcss - using the globally install cli, not the npm package
- WebAwesome / FontAwesome
- highlight.js to get syntax highlighting in code blocks in deep-chat llm reponses










