Skip to content

Conversation

@ngxson
Copy link
Owner

@ngxson ngxson commented Dec 23, 2025

Mirror from upstream PR: ggml-org#18322

Note: @coderabbitai use my 'Mirror PR' preset for reviewing this.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added "unsafe-allow-api-override" preset option to whitelist parameters for API endpoint overrides.
    • Extended model loading endpoint with "overrides" field for runtime parameter customization.
    • Model loading now returns the full argument list used during execution.
  • Documentation

    • Updated API documentation to reflect new preset configuration and override usage patterns.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 23, 2025

Walkthrough

The changes introduce parameter override support for model loading via the server API, controlled by a preset-based whitelist. This includes new preset infrastructure for map-based loading, updated server model loading with override validation, and documentation for the new /models/load overrides field.

Changes

Cohort / File(s) Summary
Preset Infrastructure
common/arg.h, common/arg.cpp
Added new preset option unsafe-allow-api-override with environment variable binding; added corresponding macro definition; adjusted macro formatting for alignment.
Preset Loading
common/preset.h, common/preset.cpp
Introduced load_from_map() method to create presets from string-to-string mappings; refactored load_from_ini() to reuse load_from_map() and now warns on unknown keys.
Server Model Loading
tools/server/server-models.h, tools/server/server-models.cpp
Changed load() signature to accept override parameters and return argument vector; added validation of overrides against whitelist; integrated override application before args rendering; updated route handler to parse and pass overrides from request payload.
Documentation
tools/server/README.md
Documented new unsafe-allow-api-override preset option; documented new overrides field in POST /models/load payload; updated example payloads to show override usage.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server as Server<br/>/models/load
    participant PresetCtx as Preset<br/>Context
    participant ModelLoader as Model<br/>Loader
    
    Client->>Server: POST /models/load<br/>(name, overrides)
    Server->>PresetCtx: Load preset &<br/>whitelist
    rect rgb(200, 220, 255)
        Note over Server,PresetCtx: Validate overrides
        Server->>PresetCtx: Check each override key<br/>against whitelist
        PresetCtx-->>Server: Valid/Invalid result
    end
    alt Validation passes
        Server->>ModelLoader: Apply overrides to<br/>preset copy
        ModelLoader->>ModelLoader: Render model args
        ModelLoader-->>Server: Return arg vector
        Server-->>Client: 200 OK + args
    else Validation fails
        Server-->>Client: 400 Bad Request<br/>(invalid override)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is largely incomplete. It only references an upstream PR and includes a note for CodeRabbit without providing substantive information about the change, rationale, or implementation details. Expand the description to explain what the preset does, why it's needed, and how it impacts API behavior. At minimum, include a brief summary of the feature and any safety considerations.
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately describes the main change: adding a new preset option called 'unsafe-allow-api-override' to the server module, which is the primary focus of changes across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch xsn/server_router_overrides

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tools/server/server-models.cpp (1)

21-21: Incorrect header included.

#include <set> is added but the code uses std::unordered_set at line 412. Either change to #include <unordered_set> or switch to using std::set for the allowed_keys container.

🔎 Suggested fix
-#include <set>
+#include <unordered_set>
tools/server/README.md (1)

1575-1575: Clarify field description wording.

Line 1575 reads: "list of preset parameter override (an object mapping string to string)". The opening phrase "list of" conflicts with the object-based structure. Consider: "preset parameter overrides (an object mapping string to string)".

- `overrides`: list of preset parameter override (an object mapping string to string). Parameters must be whitelisted via the `unsafe-allow-api-override` preset parameter.
+ `overrides`: preset parameter overrides (an object mapping string to string). Parameters must be whitelisted via the `unsafe-allow-api-override` preset parameter.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12ee176 and 878b63f.

📒 Files selected for processing (7)
  • common/arg.cpp
  • common/arg.h
  • common/preset.cpp
  • common/preset.h
  • tools/server/README.md
  • tools/server/server-models.cpp
  • tools/server/server-models.h
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{cpp,h,hpp}: Always format C++ code using git clang-format before committing, following .clang-format configuration (4-space indentation, 120 column limit, braces on same line for functions, pointer/reference alignment in middle)
Minimize external dependencies; avoid adding new external dependencies unless absolutely necessary

Files:

  • common/arg.h
  • common/preset.h
  • common/arg.cpp
  • common/preset.cpp
  • tools/server/server-models.h
  • tools/server/server-models.cpp
**/*.{cpp,h,hpp,py}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Ensure cross-platform compatibility by testing code changes on Linux, macOS, and Windows when possible

Files:

  • common/arg.h
  • common/preset.h
  • common/arg.cpp
  • common/preset.cpp
  • tools/server/server-models.h
  • tools/server/server-models.cpp
🧠 Learnings (1)
📓 Common learnings
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 0
File: :0-0
Timestamp: 2025-12-22T23:32:42.587Z
Learning: For mirrored PRs (those with [Mirror] in title or mirroring upstream), ngxson wants AI-assisted code review with these preferences: check the upstream PR URL for description, be nit-picky about obvious mistakes, don't tag contributors directly (inform ngxson instead), and alert about suspicious AI-generated code comments like "CRITICAL: ..." or "You are correct here...".
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 14
File: examples/lookahead/lookahead.cpp:97-98
Timestamp: 2025-03-13T22:56:47.417Z
Learning: ngxson prefers to prioritize simplification in some cases and defer adding error handling to a later time, as indicated by their response to a suggestion about adding error checks for llama_decode_ext calls.
🪛 Cppcheck (2.19.0)
tools/server/server-models.cpp

[error] 424-424: failed to evaluate #if condition, undefined function-like macro invocation

(syntaxError)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: build-cmake-pkg / linux
  • GitHub Check: ubuntu-22-cmake-sycl
  • GitHub Check: ggml-ci-x64-cpu-low-perf
  • GitHub Check: ubuntu-22-cmake-sycl-fp16
  • GitHub Check: ubuntu-22-cmake-hip
  • GitHub Check: android-ndk-build (arm64-snapdragon, --preset arm64-android-snapdragon-release)
  • GitHub Check: ggml-ci-x64-cpu-high-perf
  • GitHub Check: android-ndk-build (arm64-cpu, -D ANDROID_ABI=arm64-v8a -D ANDROID_PLATFORM=android-31 -D CMAKE_TO...
  • GitHub Check: ubuntu-22-cmake-musa
  • GitHub Check: android-build
  • GitHub Check: ubuntu-latest-cmake-cuda
  • GitHub Check: macOS-latest-cmake-x64
  • GitHub Check: ubuntu-cpu-cmake (x64, ubuntu-22.04)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (THREAD, Debug)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (UNDEFINED, Debug)
  • GitHub Check: ubuntu-latest-cmake-sanitizer (ADDRESS, Debug)
  • GitHub Check: ubuntu-latest-cmake-rpc
  • GitHub Check: ubuntu-latest-llguidance
  • GitHub Check: macOS-latest-cmake-arm64-webgpu
🔇 Additional comments (12)
common/arg.h (1)

12-13: LGTM!

The new macro follows the existing naming convention and the alignment adjustment improves readability.

common/preset.h (1)

68-70: LGTM!

The new load_from_map method declaration is well-documented and follows the existing patterns for preset loaders in this struct.

common/preset.cpp (2)

239-259: LGTM!

The load_from_map implementation properly:

  • Reuses existing helpers (is_bool_arg, parse_bool_arg) for consistent boolean handling
  • Gracefully handles unknown options with a warning rather than failing
  • Provides debug logging for traceability

265-273: LGTM!

Good refactoring - load_from_ini now delegates to load_from_map for each section, reducing code duplication and ensuring consistent option parsing behavior across different preset loading paths.

tools/server/server-models.h (1)

117-118: LGTM!

The updated signature appropriately:

  • Returns the argument list for debugging/inspection purposes
  • Accepts override_params to support the new API override feature
common/arg.cpp (1)

3521-3526: LGTM!

The new unsafe-allow-api-override preset option is well-designed:

  • The "unsafe" prefix clearly signals security implications to users
  • Properly marked as preset_only to prevent CLI exposure
  • Follows the established pattern from the load-on-startup preset above
tools/server/server-models.cpp (4)

408-434: Override validation logic is secure and correctly implemented.

The implementation properly:

  • Defaults to an empty whitelist (secure by default - no overrides allowed unless explicitly configured)
  • Validates each override key against the whitelist before applying
  • Restores the original preset after rendering args, preventing permanent preset modifications
  • Throws std::invalid_argument with a clear error message for unauthorized overrides

The restore-after-render pattern at line 433 is important for maintaining preset integrity across multiple load calls with different overrides.


776-788: LGTM!

Good input validation - ensuring all override values are strings before processing. The error response for non-string values prevents type confusion issues.


248-248: LGTM!

Correctly updated to pass empty map {} for the new override_params parameter.


598-598: LGTM!

Correctly updated to pass empty map for override_params when ensuring a model is loaded.

tools/server/README.md (2)

1489-1489: Verify against upstream PR for completeness.

Documentation of unsafe-allow-api-override looks solid with prominent security warning. However, per the learnings, confirm that the upstream PR (PR #18322) contains the same level of detail or if additional context about validation/error handling should be documented here (e.g., what happens if an override is requested but not whitelisted).


1577-1593: Confirm response format documentation.

The /models/load response example shows only {"success": true}. The AI summary mentions the implementation "returns the computed args". Verify whether the response should document returned argument values or if that's only returned by other endpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants