server : implement extra_args support for /models/load endpoint #18261

Chrisischris · 2025-12-21T17:52:23Z

Implements the extra_args feature for /models/load that is already documented in the server README but was not yet implemented.

From the docs:

extra_args: (optional) an array of additional arguments to be passed to the model instance. Note: you must start the server with --models-allow-extra-args to enable this feature.

Changes

common/common.h: Add models_allow_extra_args param
common/arg.cpp: Add --models-allow-extra-args / --no-models-allow-extra-args flag
tools/server/server-models.h: Update load() signature to accept extra_args
tools/server/server-models.cpp: Parse extra_args from JSON body and append to child process args

See server README for usage documentation.

ngxson · 2025-12-21T18:06:09Z

We don't allow this function because it introduce too many security risk. Instead, please remove this from the docs.

Ref discussion: #17470

Chrisischris · 2025-12-21T18:15:05Z

We don't allow this function because it introduce too many security risk. Instead, please remove this from the docs.

Ref discussion: #17470

I understand the security concern, which is why this implementation requires an explicit opt-in flag (--models-allow-extra-args). It's disabled by default - operators must consciously enable it when starting the server in trusted environments.

My use case: I'm building a system where I need to dynamically load models with different context sizes on demand. The preset file approach (--models-preset) only allows static, predefined context sizes per model - it can't handle loading the same model with 4k context for one request and 32k for another at runtime.

Without extra_args, there's no way to specify -c per model when calling /models/load. The only alternative would be restarting the entire server for each context size change, which defeats the purpose of multi-model mode.

Would you be open to keeping the feature with the opt-in flag, or is there an alternative approach you'd suggest for setting per-model context sizes dynamically?

ngxson · 2025-12-21T18:24:26Z

I don't think the function worth the risk even when hiding under a flag. We will soon be flooded by security reports that nits-pick this functionality.

Instead, we should allow only some selected params as @ServeurpersoCom suggested, with proper validations.

Chrisischris · 2025-12-21T18:36:59Z

Got it I understand the concern, I'd be happy to submit a more scoped PR that only allows context_size (or a small set of validated parameters) in the /models/load request body, rather than arbitrary extra_args. Something like:

  {
    "model": "my-model",
    "context_size": 8192
  }

Would that approach be acceptable? And would you still want it behind an opt-in flag, or would validated/scoped parameters be safe enough to enable by default?

Happy to follow whatever direction you think is best for the project.

strawberrymelonpanda · 2025-12-23T02:32:20Z

I'd be happy to submit a more scoped PR that only allows context_size

I'd certainly like to see this.

Candidates I'd be interested in personally, in order of preference:

ctx
n-gpu-layers
lora
jinja on/off

In the meantime I may just merge this PR locally, since I never run my LLMs accessible outside my localhost anyhow.
presets.ini could be edited programmatically and llama-server rebooted as an alternative, I suppose.

ServeurpersoCom · 2025-12-23T07:19:18Z

In the meantime I may just merge this PR locally, since I never run my LLMs accessible outside my localhost anyhow. presets.ini could be edited programmatically and llama-server rebooted as an alternative, I suppose.

Just a thought on the security architecture here. Those params (ctx, n-gpu-layers, lora) are what I'd call "cold-start" parameters, they need to be set when spawning the child process, not changed at runtime. This is different from hot params like temperature that a running process can absorb.
I think there's value in following the Apache/nginx pattern here: the router reads a preset file at startup (or on reload signal), validates it, then spawns children with those cold parameters. If a config is bad, it fails at spawn time in an isolated child, not mid-request via API.
The challenge with exposing cold params directly through the API is that validation becomes infrastructure-specific. Your VRAM limits, filesystem policies, and GPU topology are different from mine. llama.cpp can't realistically validate every admin's unique constraints.
Maybe the cleaner approach is a thin custom backend that validates cold params against your specific infrastructure rules, writes them to the preset file, then signals the router to reload. That way llama.cpp stays generic and your security policy lives where it belongs, in your own infrastructure layer.

ngxson · 2025-12-23T09:04:05Z

I don't get why we need to allow lora to be set dynamic from router. Aren't we already support lora config per-request for this?

If we add such feature asked in this PR, I doubt that people will soon flood us with PR/issues just to add params specific for their personal use case even when it's redundant and/or insecure. And when we add it, security researchers will start nitpicking it.

So I don't think such feature worth maintaining, the human effort is not worth it.

Probably support (manual) hot-reload preset file like Pascal said, which is pretty much aligned with nginx/apache approach.

There will be some edge cases like what if a model is removed from the ini file, what if global config changed, etc. Probably worth opening an issue to list out all of these cases before writing the code.

ngxson · 2025-12-23T09:21:35Z

On second thought, we can transfer the risk by asking user to specify the set of parameters that can be overwritten via api themself. This way it's completely up to user to decide which parameters are allowed to be changed via api.

I will have a look onto this feature.

router allow extra args flag

2898b5b

Chrisischris requested review from ggerganov and ngxson as code owners December 21, 2025 17:52

github-actions bot added examples server labels Dec 21, 2025

Chrisischris mentioned this pull request Dec 21, 2025

server : implement extra_args support for /models/load endpoint #18232

Closed

ngxson closed this Dec 21, 2025

ngxson mentioned this pull request Dec 23, 2025

server: (preset) add unsafe-allow-api-override #18322

Draft

loci-dev mentioned this pull request Dec 23, 2025

UPSTREAM PR #18322: server: (preset) add unsafe-allow-api-override auroralabs-loci/llama.cpp#673

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server : implement extra_args support for /models/load endpoint #18261

server : implement extra_args support for /models/load endpoint #18261

Chrisischris commented Dec 21, 2025

Uh oh!

ngxson commented Dec 21, 2025

Uh oh!

Chrisischris commented Dec 21, 2025

Uh oh!

ngxson commented Dec 21, 2025

Uh oh!

Chrisischris commented Dec 21, 2025

Uh oh!

strawberrymelonpanda commented Dec 23, 2025 •

edited

Loading

Uh oh!

ServeurpersoCom commented Dec 23, 2025

Uh oh!

ngxson commented Dec 23, 2025 •

edited

Loading

Uh oh!

ngxson commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

server : implement extra_args support for /models/load endpoint #18261

server : implement extra_args support for /models/load endpoint #18261

Conversation

Chrisischris commented Dec 21, 2025

Changes

Uh oh!

ngxson commented Dec 21, 2025

Uh oh!

Chrisischris commented Dec 21, 2025

Uh oh!

ngxson commented Dec 21, 2025

Uh oh!

Chrisischris commented Dec 21, 2025

Uh oh!

strawberrymelonpanda commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Dec 23, 2025

Uh oh!

ngxson commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

strawberrymelonpanda commented Dec 23, 2025 •

edited

Loading

ngxson commented Dec 23, 2025 •

edited

Loading