Skip to content

Tool fall failures occurred during SWE-Bench testing based on DeepSeek-V3.2 #312

@SefaZeng

Description

@SefaZeng

I try to use this repo to test SWE-Bench with DeepSeek-V3.2, but the log in the deployed sglang DeepSeek-V3.2 shows the tool call is invalid. If I change to MiniMax-m2.1, it goes well.

2026-01-14T05:32:17.632471128Z [2026-01-14 05:32:17] INFO:     29.163.184.147:33006 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2026-01-14T05:32:17.717142499Z [2026-01-14 05:32:17] Error in request: No tool calls but found tool output
2026-01-14T05:32:17.717164798Z Traceback (most recent call last):
2026-01-14T05:32:17.717166488Z   File "/usr/local/lib/python3.12/dist-packages/sglang/srt/entrypoints/openai/serving_base.py", line 100, in handle_request
2026-01-14T05:32:17.717168098Z     adapted_request, processed_request = self._convert_to_internal_request(
2026-01-14T05:32:17.717169728Z                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-01-14T05:32:17.717171178Z   File "/usr/local/lib/python3.12/dist-packages/sglang/srt/entrypoints/openai/serving_chat.py", line 165, in _convert_to_internal_request
2026-01-14T05:32:17.717172858Z     processed_messages = self._process_messages(request, is_multimodal)
2026-01-14T05:32:17.717174208Z                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-01-14T05:32:17.717175548Z   File "/usr/local/lib/python3.12/dist-packages/sglang/srt/entrypoints/openai/serving_chat.py", line 261, in _process_messages
2026-01-14T05:32:17.717177018Z     result = self._apply_jinja_template(request, tools, is_multimodal)
2026-01-14T05:32:17.717178168Z              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-01-14T05:32:17.717179518Z   File "/usr/local/lib/python3.12/dist-packages/sglang/srt/entrypoints/openai/serving_chat.py", line 300, in _apply_jinja_template
2026-01-14T05:32:17.717180988Z     real_input = encode_messages(
2026-01-14T05:32:17.717182298Z                  ^^^^^^^^^^^^^^^^
2026-01-14T05:32:17.717183428Z   File "/usr/local/lib/python3.12/dist-packages/sglang/srt/entrypoints/openai/encoding_dsv32.py", line 319, in encode_messages
2026-01-14T05:32:17.717185778Z     prompt += render_message(
2026-01-14T05:32:17.717186898Z               ^^^^^^^^^^^^^^^
2026-01-14T05:32:17.717198638Z   File "/usr/local/lib/python3.12/dist-packages/sglang/srt/entrypoints/openai/encoding_dsv32.py", line 226, in render_message
2026-01-14T05:32:17.717200327Z     assistant_tool_calls and len(assistant_tool_calls) >= tool_call_order
2026-01-14T05:32:17.717203017Z AssertionError: No tool calls but found tool output
2026-01-14T05:32:17.717223367Z [2026-01-14 05:32:17] INFO:     29.163.184.147:33006 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

The log in benchmarks:

[DOCKER] {"asctime": "2026-01-14 05:32:12,046", "levelname": "INFO", "name": "uvicorn.access", "filename": "h11_impl.py", "lineno": 473, "message": "172.17.0.1:45008 - \"GET /api/conversations/c7f7f6a3-8d0e-452a-984c-8333eedb6a37 HTTP/1.1\" 200"}
[DOCKER] {"asctime": "2026-01-14 05:32:13,049", "levelname": "INFO", "name": "uvicorn.access", "filename": "h11_impl.py", "lineno": 473, "message": "172.17.0.1:45008 - \"GET /api/conversations/c7f7f6a3-8d0e-452a-984c-8333eedb6a37 HTTP/1.1\" 200"}
[DOCKER] {"asctime": "2026-01-14 05:32:14,052", "levelname": "INFO", "name": "uvicorn.access", "filename": "h11_impl.py", "lineno": 473, "message": "172.17.0.1:45008 - \"GET /api/conversations/c7f7f6a3-8d0e-452a-984c-8333eedb6a37 HTTP/1.1\" 200"}
[DOCKER] {"asctime": "2026-01-14 05:32:15,055", "levelname": "INFO", "name": "uvicorn.access", "filename": "h11_impl.py", "lineno": 473, "message": "172.17.0.1:45008 - \"GET /api/conversations/c7f7f6a3-8d0e-452a-984c-8333eedb6a37 HTTP/1.1\" 200"}
[DOCKER] {"asctime": "2026-01-14 05:32:16,058", "levelname": "INFO", "name": "uvicorn.access", "filename": "h11_impl.py", "lineno": 473, "message": "172.17.0.1:45008 - \"GET /api/conversations/c7f7f6a3-8d0e-452a-984c-8333eedb6a37 HTTP/1.1\" 200"}
[DOCKER] {"asctime": "2026-01-14 05:32:17,062", "levelname": "INFO", "name": "uvicorn.access", "filename": "h11_impl.py", "lineno": 473, "message": "172.17.0.1:45008 - \"GET /api/conversations/c7f7f6a3-8d0e-452a-984c-8333eedb6a37 HTTP/1.1\" 200"}
Agent Action ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Thought:
I'll help you implement the necessary changes to fix the issue with `separability_matrix` not computing separability correctly for nested CompoundModels. Let's start with Phase 1.

🤔  Thinking: **Phase 1. READING: read the problem and reword it in clearer terms**

The issue is about the `separability_matrix` function in astropy's modeling module not correctly computing separability for nested CompoundModels.

Let me break this down:

1. **Code snippets and conventions:**
- `m.Linear1D(10) & m.Linear1D(5)` creates a CompoundModel with the `&` operator (probably means "and" or concatenation of models)
- `separability_matrix(cm)` returns a boolean matrix indicating which outputs depend on which inputs
- The expected separability matrix for two independent Linear1D models should be diagonal: `[[True, False], [False, True]]`
- This means output 1 depends only on input 1, output 2 depends only on input 2

2. **The problem:**
- When models are combined directly: `m.Pix2Sky_TAN() & m.Linear1D(10) & m.Linear1D(5)` works correctly
- When models are nested: `m.Pix2Sky_TAN() & cm` (where cm is already a compound model) gives incorrect results
- The issue shows that for nested compound models, the separability matrix incorrectly shows that outputs of the linear models are not separable (they show `True, True` in the 2x2 bottom-right submatrix instead of `[[True, False], [False, True]]`)

3. **Steps to reproduce:**
```python
from astropy.modeling import models as m
from astropy.modeling.separable import separability_matrix

# Create a simple compound model
cm = m.Linear1D(10) & m.Linear1D(5)

# This works correctly
print(separability_matrix(cm))
# Expected: [[True, False], [False, True]]

# This also works correctly
print(separability_matrix(m.Pix2Sky_TAN() & m.Linear1D(10) & m.Linear1D(5)))

# This gives incorrect results with nested compound models
print(separability_matrix(m.Pix2Sky_TAN() & cm))
# Expected bottom-right 2x2 should be diagonal but shows all True
\`\`\`

4. **Best practices for testing and fixing:**
- Understand how `separability_matrix` works
- Check how compound models are structured when nested
- Look at how the function traverses model trees
- Ensure the fix handles recursion properly for nested models

Tokens: ↑ input 0 • cache hit N/A • ↓ output 0 • $ 0.00

[DOCKER] /agent-server/.venv/lib/python3.12/site-packages/openhands/sdk/llm/utils/telemetry.py:244: UserWarning: Cost calculation failed: This model isn't mapped yet. model=DeepSeek-V3.2, custom_llm_provider=openai. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.
[DOCKER]   warnings.warn(f"Cost calculation failed: {e}")
[DOCKER] {"asctime": "2026-01-14 05:32:18,065", "levelname": "INFO", "name": "uvicorn.access", "filename": "h11_impl.py", "lineno": 473, "message": "172.17.0.1:45008 - \"GET /api/conversations/c7f7f6a3-8d0e-452a-984c-8333eedb6a37 HTTP/1.1\" 200"}
[DOCKER] {"asctime": "2026-01-14 05:32:19,068", "levelname": "INFO", "name": "uvicorn.access", "filename": "h11_impl.py", "lineno": 473, "message": "172.17.0.1:45008 - \"GET /api/conversations/c7f7f6a3-8d0e-452a-984c-8333eedb6a37 HTTP/1.1\" 200"}
[DOCKER] {"asctime": "2026-01-14 05:32:20,071", "levelname": "INFO", "name": "uvicorn.access", "filename": "h11_impl.py", "lineno": 473, "message": "172.17.0.1:45008 - \"GET /api/conversations/c7f7f6a3-8d0e-452a-984c-8333eedb6a37 HTTP/1.1\" 200"}
[DOCKER] {"asctime": "2026-01-14 05:32:20,393", "levelname": "ERROR", "name": "openhands.sdk.llm.utils.retry_mixin", "filename": "retry_mixin.py", "lineno": 124, "message": "litellm.ServiceUnavailableError: ServiceUnavailableError: OpenAIException - No available workers (all circuits open or unhealthy). Attempt #1 | You can customize retry values in the configuration."}

v3.2 response in first round, but in the next round, the error occurs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions