-
Notifications
You must be signed in to change notification settings - Fork 14.2k
server : implement extra_args support for /models/load endpoint #18261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
We don't allow this function because it introduce too many security risk. Instead, please remove this from the docs. Ref discussion: #17470 |
I understand the security concern, which is why this implementation requires an explicit opt-in flag (--models-allow-extra-args). It's disabled by default - operators must consciously enable it when starting the server in trusted environments. My use case: I'm building a system where I need to dynamically load models with different context sizes on demand. The preset file approach (--models-preset) only allows static, predefined context sizes per model - it can't handle loading the same model with 4k context for one request and 32k for another at runtime. Without extra_args, there's no way to specify -c per model when calling /models/load. The only alternative would be restarting the entire server for each context size change, which defeats the purpose of multi-model mode. Would you be open to keeping the feature with the opt-in flag, or is there an alternative approach you'd suggest for setting per-model context sizes dynamically? |
|
I don't think the function worth the risk even when hiding under a flag. We will soon be flooded by security reports that nits-pick this functionality. Instead, we should allow only some selected params as @ServeurpersoCom suggested, with proper validations. |
|
Got it I understand the concern, I'd be happy to submit a more scoped PR that only allows context_size (or a small set of validated parameters) in the /models/load request body, rather than arbitrary extra_args. Something like: {
"model": "my-model",
"context_size": 8192
}Would that approach be acceptable? And would you still want it behind an opt-in flag, or would validated/scoped parameters be safe enough to enable by default? Happy to follow whatever direction you think is best for the project. |
I'd certainly like to see this. Candidates I'd be interested in personally, in order of preference:
In the meantime I may just merge this PR locally, since I never run my LLMs accessible outside my localhost anyhow. |
Just a thought on the security architecture here. Those params (ctx, n-gpu-layers, lora) are what I'd call "cold-start" parameters, they need to be set when spawning the child process, not changed at runtime. This is different from hot params like temperature that a running process can absorb. |
|
I don't get why we need to allow lora to be set dynamic from router. Aren't we already support lora config per-request for this? If we add such feature asked in this PR, I doubt that people will soon flood us with PR/issues just to add params specific for their personal use case even when it's redundant and/or insecure. And when we add it, security researchers will start nitpicking it. So I don't think such feature worth maintaining, the human effort is not worth it. Probably support (manual) hot-reload preset file like Pascal said, which is pretty much aligned with nginx/apache approach. There will be some edge cases like what if a model is removed from the ini file, what if global config changed, etc. Probably worth opening an issue to list out all of these cases before writing the code. |
|
On second thought, we can transfer the risk by asking user to specify the set of parameters that can be overwritten via api themself. This way it's completely up to user to decide which parameters are allowed to be changed via api. I will have a look onto this feature. |
Implements the
extra_argsfeature for/models/loadthat is already documented in the server README but was not yet implemented.From the docs:
Changes
common/common.h: Addmodels_allow_extra_argsparamcommon/arg.cpp: Add--models-allow-extra-args/--no-models-allow-extra-argsflagtools/server/server-models.h: Updateload()signature to acceptextra_argstools/server/server-models.cpp: Parseextra_argsfrom JSON body and append to child process argsSee server README for usage documentation.