Skip to content

Conversation

@socksy
Copy link
Contributor

@socksy socksy commented Sep 19, 2025

The MCP server resulted in a bunch of regression for running an app via the actual CLI — this was because I refactored it to have a pure inner function and a side-effectful outer function, and re-used the inner function in both the MCP server and in the CLI. I tried to keep most of the functionality by using a ChannelSink at the time, with the idea that we could still stream logs in. However, fundamentally the behaviour needs to be different.

So:

  • remote runs have newlines again (😅)

  • there's two different sinks now, ChannelSink for the normal formatted lines, and PlainChannelSink for no formatting (for MCP server responses)

  • 422 errors were not being successfully interpreted as errors by the MCP server, and fundamentally it seemed to be because the unwrap_api_response from the api.rs didn't treat them as errors (which they're not really — they're 4xx issues). I'm not 100% sure this solution is the best and am interested in other suggestions

    However we want to be able to see we've passed in the wrong parameters to a run, and the MCP server should accordingly return with an error when this fails, so now we wrap the call to unwrap_api_response with a check to see if there was a client error or server error — and when there is, return an actual Err response that we can match for in the sink logic (see here and here)

  • reintroduce spinners (and pass in a boolean to enable or disable them so as not to spam the MCP server)

  • timestamps are colourized again

  • the lines appear one by one. Not sure exactly why that was broken before, maybe the fact it was all on one line they got buffered

  • local runs correctly detect when the app errored out locally (this got subtly broke in the refactor, status_task.abort(); got called before we printed out the failure)

  • lots of regression tests so this won't happen again (and of course, the tests caught a bunch of ways my fixes weren't working 🙃)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes regressions in the CLI run functionality that were introduced when the MCP server was added. The main issue was that the run functionality was refactored to use a pure inner function for both CLI and MCP server, but this broke several CLI behaviors.

  • Restores proper newline formatting for remote runs and colored output
  • Adds separate sinks for different output formatting needs (ChannelSink vs PlainChannelSink)
  • Improves error handling for 422 validation errors in both CLI and MCP contexts
  • Reintroduces spinners for CLI runs and ensures proper status monitoring

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/mock-api-server/main.py Adds mock 422 error handling for testing invalid parameters
tests/integration/features/steps/mcp_steps.py Adds comprehensive test steps for MCP plain text output and error handling
tests/integration/features/steps/cli_steps.py New CLI-specific test steps for verifying colors, formatting, and spinners
tests/integration/features/mcp_app_management.feature Adds scenarios testing MCP plain text output and error handling
tests/integration/features/cli_runs.feature New feature file testing CLI-specific run behaviors
crates/tower-cmd/src/run.rs Major refactoring with separate sinks, improved error handling, and restored formatting
crates/tower-cmd/src/output.rs Makes format_timestamp public for use in different sinks
crates/tower-cmd/src/mcp.rs Updates MCP run tool to use plain text sink and accept parameters
crates/tower-cmd/src/api.rs Enhances error handling for 4xx/5xx responses

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

let (output_result, status_result) = tokio::join!(output_task, status_task);

let final_result = output_result.unwrap();
status_result.unwrap(); // This will report crash/success status
Copy link

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 171 is misleading. The unwrap() call will panic if the status task fails, but it won't 'report' anything - it will just crash the program. Consider using proper error handling or updating the comment to reflect the actual behavior.

Suggested change
status_result.unwrap(); // This will report crash/success status
if let Err(e) = status_result {
eprintln!("Error monitoring app status: {:?}", e);
return Err(Error::from(e));
}

Copilot uses AI. Check for mistakes.
Comment on lines 316 to 320
if let Some((ts, msg)) = line.split_once(": ") {
output::log_line(ts, msg, output::LogLineType::Remote);
} else {
output::write(&format!("{}\n", line));
}
Copy link

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timestamp parsing logic is duplicated in multiple sink implementations (StdoutSink, ChannelSink, PlainChannelSink). Consider extracting this into a common helper function to reduce code duplication.

Copilot uses AI. Check for mistakes.
entity: None,
}))
}
response => unwrap_api_response(async { response }).await,
Copy link

Copilot AI Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 168 creates an unnecessary async block just to pass the response to unwrap_api_response. Consider refactoring to handle the success case directly or modify unwrap_api_response to accept the response directly.

Suggested change
response => unwrap_api_response(async { response }).await,
response => unwrap_api_response(response).await,

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@bradhe bradhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't think there's anything critical here that should block the PR. I think there's a bit of follow-up/cleanup we should do later on!

Comment on lines +160 to +168
Ok(response) if response.status.is_client_error() || response.status.is_server_error() => {
Err(Error::ResponseError(tower_api::apis::ResponseContent {
tower_trace_id: response.tower_trace_id,
status: response.status,
content: response.content,
entity: None,
}))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if this can be generalized and, as Copilot suggests, this can get pushed into update_api_response? Nothing really fancy going on here.

struct RunRequest {
#[serde(flatten)]
common: CommonParams,
parameters: Option<std::collections::HashMap<String, String>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I was just thinking that it'd be good to push use std::collections::HashMap onto the top of the file. But, then I thought there's probably a guideline for when we should use something versus directly referencing it, and I'd bet it's about how many times the thing is used in a file?

So, now I'm not sure :) Just thinking out loud, no real action in this comment but perhaps it will inspire conversation elsewhere.

impl OutputSink for ChannelSink {
fn send_line(&self, line: String) {
let _ = self.0.send(line);
if let Some((ts, msg)) = line.split_once(": ") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man this really makes me think that we should be pushing the timestamp into a parameter to send_line on the OutputSink trait, considering we're doing it in two places. Probably want to be able to format the ts differently for different cases...

Could be follow-up.

@bradhe bradhe self-requested a review September 25, 2025 12:42
@socksy socksy marked this pull request as draft September 25, 2025 17:10
@socksy socksy force-pushed the fix/cli-run-regressions branch from 08eaa72 to 82762ca Compare September 26, 2025 10:35
@socksy socksy marked this pull request as ready for review September 26, 2025 13:06
socksy and others added 16 commits September 26, 2025 18:51
remote run fixes on CLI:
- newlines (!)
- colorized output
- parameter errors correctly reported

local run fixes on CLI:
- error out properly (previously would abort before the status was
  actually reporting the error)

MCP server:
- new plain text sink instead of using the same between MCP and CLI
- prefix errors with "ERROR: " (probably better than "Oh no!" for an LLM
  to parse?)
- propagate 422 error from api so that it gets interpreted as an error,
  and thus able to check parameter validation correctly
…ied architecture

Enable real-time progress notifications during tool execution and clean up redundant code patterns.

## Real-time Streaming Implementation
- Add RequestContext<RoleServer> parameter to tower_run_local and tower_run_remote
- Implement setup_streaming_output() to forward write() calls to notify_progress()
- Create with_streaming() wrapper functions to abstract the pattern

WHY: Users should see output as it happens during long-running operations, not wait for completion. The SSE infrastructure was designed for this but tools were batching output.

## Remove Redundant Capture Functions
- Delete do_run_local_capture, do_run_remote_capture_plain, do_follow_run_capture_plain
- Fix monitor_output_capture to use global output::write() system properly

WHY: These functions duplicated the global write() capture mechanism and were only used once. The global system already handles capture when CAPTURE_SENDER is set.

## Consolidate Error Handling
- Extract error parsing into dedicated functions: extract_api_error_message, parse_error_response
- Replace nested conditionals with early returns and helper functions

WHY: Error handling logic was scattered and duplicated across handlers. Centralized functions ensure consistent error messages and reduce repetition.

## Validated by Integration Tests
- All 22 test scenarios pass, confirming functionality preserved
- Tests verify both success cases and error handling work correctly
- Streaming behavior validated through existing MCP client integration

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Separate "should we capture" from "where to send" by splitting
CAPTURE_SENDER -> (CAPTURE_MODE, CURRENT_SENDER)

So now:
- CAPTURE_MODE: immutable context flag (MCP vs CLI mode)
- CURRENT_SENDER: mutable destination that can change per operation
- Enable concurrent tool executions with client-specific senders
- Update write() and Spinner to use decoupled architecture

Huh?

The previous design complected capture context with sender destination.
E.G., previous OnceLock design couldn't handle multiple MCP clients -
each tool execution needs output routed to the requesting client,
but OnceLock meant all output went to whoever connected first.
Collect all output during streaming operations and include it in final
responses to ensure MCP integration tests receive expected output.

- Collect output in Arc<Mutex<Vec<String>>> during streaming
- Include captured output in both success and error responses
- Add small delay for message processing completion (unhappy about this
  but claude suggested it and it fixes the integration test, so
  committing for now)

Integration tests expected complete output in MCP tool responses,
but previous implementation only sent real-time notifications without
including output in the final CallToolResult (oops).
@socksy socksy force-pushed the fix/cli-run-regressions branch from 82762ca to 83f09e6 Compare September 26, 2025 16:53
socksy and others added 2 commits September 26, 2025 22:30
- Convert deploy functions to return Results instead of void
- Add proper error variants for DeployApp and DescribeApp API errors
- Update MCP deploy tool to handle and report deploy failures
- Ensure deploy errors are properly propagated up the call stack

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Update test scenarios for better CLI validation coverage
- Enhance mock server with better error responses and endpoints
- Improve test environment setup and configuration
- Add better test step definitions for MCP scenarios
- Update documentation for mock server usage

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@socksy socksy force-pushed the fix/cli-run-regressions branch from 0d11894 to 91aa8fa Compare September 26, 2025 20:55
@socksy
Copy link
Contributor Author

socksy commented Sep 29, 2025

closed in favour of the PR stack that was eventually folded into fix/tow-963-switch-to-global-output-writing (#105)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants