Skip to content

Conversation

@erfrimod
Copy link
Contributor

Clean cherry-pick of #2625

Adding tracing spans to control-path operations in VF Manager to try and help answer two questions:

  1. If a VF Manager operation like ManaDeviceRemoved fails to complete in a timely fashion, which 'await' call are we stuck in? It could be try_notify_guest_and_revoke_vtl0_vf(), shutdown_vtl2_device(), or update_vtl2_device_bind_state() and tracing spans will provide entry and exit opcodes.
  2. If a VF Manager operation like ManaDeviceRemoved starts taking longer and missing SLAs, tracing spans can help us narrow down which call is taking up additional time.
  • Improves NetVSP observability by adding structured tracing spans around VF Manager operations: VTL2 device startup/shutdown, VTL0 VF arrival/removal/revoke, endpoint disconnects, bind-state updates.
  • Makes NetVSP device logging more actionable by adding instance_id and/or channel_idx to queue/channel errors: open/close/restore.
  • Minor consistency improvements to error tracing. err => error and making that field the first element.
  • Readability improvement to state transition logic in DataPathSwitchPending handler. Turning double-nested 'if' check into a 'match(a,b)'.
  • Fixes to minor typos found in comments in net_mana and tracelimit..

Adding tracing spans to control-path operations in VF Manager to try and
help answer two questions:

1. If a VF Manager operation like ManaDeviceRemoved fails to complete in
a timely fashion, which 'await' call are we stuck in? It could be
`try_notify_guest_and_revoke_vtl0_vf()`, `shutdown_vtl2_device()`, or
`update_vtl2_device_bind_state()` and tracing spans will provide entry
and exit opcodes.
2. If a VF Manager operation like ManaDeviceRemoved starts taking longer
and missing SLAs, tracing spans can help us narrow down which call is
taking up additional time.

* Improves NetVSP observability by adding structured tracing spans
around VF Manager operations: VTL2 device startup/shutdown, VTL0 VF
arrival/removal/revoke, endpoint disconnects, bind-state updates.
* Makes NetVSP device logging more actionable by adding `instance_id`
and/or `channel_idx` to queue/channel errors: open/close/restore.
* Minor consistency improvements to error tracing. `err` => `error` and
making that field the first element.
* Readability improvement to state transition logic in
DataPathSwitchPending handler. Turning double-nested 'if' check into a
'match(a,b)'.
* Fixes to minor typos found in comments in net_mana and tracelimit..

---------

Co-authored-by: Alvin Tan <71284430+AlvinTanMS@users.noreply.github.com>
@erfrimod erfrimod requested a review from a team as a code owner January 26, 2026 22:10
Copilot AI review requested due to automatic review settings January 26, 2026 22:10
@github-actions github-actions bot added the release_1.7.2511 Targets the release/1.7.2511 branch. label Jan 26, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances NetVSP observability and maintainability by adding structured tracing spans to VF (Virtual Function) Manager operations, improving error logging consistency, fixing minor typos, and refactoring state transition logic for better readability.

Changes:

  • Adds tracing spans to critical VF Manager async operations (device startup/shutdown, VF arrival/removal/revoke, endpoint disconnects, bind state updates) to help diagnose timing issues and performance bottlenecks
  • Enriches error logs with contextual identifiers (instance_id, channel_idx) and standardizes error field naming (errerror)
  • Improves code readability through refactoring (double-nested if to match expression) and fixes typos in comments

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
vm/devices/net/netvsp/src/lib.rs Removes unused imports, adds instance_id/channel_idx to error logs, standardizes error field naming, refactors state transition logic to match expression, fixes typo
vm/devices/net/net_mana/src/lib.rs Corrects comment typos (capitalization and grammar)
support/tracelimit/src/lib.rs Fixes spelling in documentation example
openhcl/underhill_core/src/emuplat/netvsp.rs Adds Display trait for Vtl0Bus, instruments async operations with tracing spans, extracts helper method for bind state updates, improves variable naming clarity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release_1.7.2511 Targets the release/1.7.2511 branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant