Skip to content

Conversation

@santigimeno
Copy link
Member

@santigimeno santigimeno commented Jan 15, 2026

When trying to reset a CommandStream because of reasons: for example, we change grpc endpoint dynamically, we can enter an infinite recursion calling GrpcAgent::reset_command_stream() because when calling ~CommandStream() the OnDone() callback may call the observer (GrpcAgent instance) which in turn would call
GrpcAgent::reset_command_stream(). Avoid this recursion by adding a new cancelling_for_destruction_ member variable to CommandStream which allows avoiding calling the observer if triggered by the destructor.

Summary by CodeRabbit

  • Bug Fixes

    • Improved shutdown behavior to suppress observer notifications during destruction and ensure cancellation completes cleanly.
  • Tests

    • Added a test validating reconnection and configuration recovery when starting with an invalid initial setup, ensuring command delivery and clean client shutdown.

✏️ Tip: You can customize this high-level summary in your review settings.

@santigimeno santigimeno requested a review from RafaelGSS January 15, 2026 15:47
@santigimeno santigimeno self-assigned this Jan 15, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

Walkthrough

Adds a destruction-aware flag to CommandStream that suppresses observer notifications during teardown and updates cancellation ordering; also adds a test validating agent reconnection after an initial invalid NSOLID_GRPC and subsequent CommandStream behavior.

Changes

Cohort / File(s) Summary
CommandStream Destruction Handling
agents/grpc/src/command_stream.h, agents/grpc/src/command_stream.cc
Added private cancelling_for_destruction_ flag (default false). Destructor sets the flag before invoking cancellation. OnDone now avoids notifying the observer when the flag is set, preventing callbacks during destruction.
Agent Reconnection Test
test/agents/test-grpc-basic.mjs
New test "should reconnect after initial invalid NSOLID_GRPC" simulating startup with invalid config, then applying valid TLS/insecure settings, constructing client, obtaining agentId, building dynamic config, verifying server→client CommandStream command handling, and asserting clean client shutdown and server close.

Sequence Diagram(s)

(omitted — changes are localized and do not introduce a multi-component sequential flow warranting a diagram)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I hop through shutdowns, flag in paw, 🐇
Quiet the callers, avoid the maw.
Reconnects hum after a shaky start,
Commands flow tidy — clean as art.
A rabbit nods: destruction plays its part.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main fix—preventing infinite recursion in the CommandStream destructor—which aligns with the primary change across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings


📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1820255 and c4554c9.

📒 Files selected for processing (2)
  • agents/grpc/src/command_stream.cc
  • agents/grpc/src/command_stream.h
🚧 Files skipped from review as they are similar to previous changes (1)
  • agents/grpc/src/command_stream.h
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-03T14:35:32.707Z
Learnt from: CR
Repo: nodesource/nsolid PR: 0
File: deps/grpc/src/core/util/GEMINI.md:0-0
Timestamp: 2025-12-03T14:35:32.707Z
Learning: Applies to deps/grpc/src/core/util/**/*.{cc,h,cpp} : Use `Orphanable` class from `orphanable.h` as a base class for objects that can be orphaned

Applied to files:

  • agents/grpc/src/command_stream.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-tarball
  • GitHub Check: lint-js-and-md
  • GitHub Check: test-macOS
  • GitHub Check: coverage-linux-without-intl
  • GitHub Check: coverage-linux
  • GitHub Check: coverage-windows
  • GitHub Check: test-linux (ubuntu-24.04)
  • GitHub Check: test-linux (ubuntu-24.04-arm)
🔇 Additional comments (2)
agents/grpc/src/command_stream.cc (2)

30-42: LGTM! Correct ordering and synchronization for the destruction flag.

The flag is set before TryCancel() while holding the mutex, ensuring OnDone() (which also acquires the mutex) will observe the flag correctly and avoid invoking the observer callback during destruction. This properly breaks the infinite recursion cycle described in the PR objectives.


52-62: LGTM! Proper guard against observer notification during destruction.

The cancelling_for_destruction_ member is properly initialized to false in the header file. The condition at line 57 correctly skips the observer callback when the stream is being destroyed, breaking the recursion cycle. The flag is properly synchronized under the mutex: set in the destructor while holding the lock and checked in OnDone within the same critical section. The observer notification is made outside the lock, which is correct to avoid deadlocks.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

When trying to reset a CommandStream because of reasons: for example, we
change `grpc` endpoint dynamically, we can enter an infinite recursion
calling `GrpcAgent::reset_command_stream()` because when calling
`~CommandStream()` the `OnDone()` callback may call the observer
(GrpcAgent instance) which in turn would call
`GrpcAgent::reset_command_stream()`. Avoid this recursion by adding a
new `cancelling_for_destruction_` member variable to `CommandStream`
which allows avoiding calling the observer if triggered by the
destructor.
@santigimeno santigimeno force-pushed the santi/fix_grpc_race_cond branch from 1820255 to c4554c9 Compare January 16, 2026 11:11
@santigimeno santigimeno requested a review from EHortua January 16, 2026 11:23
Copy link

@EHortua EHortua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working fine
Without the patch
image

With the patch
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants