Skip to content

Conversation

@iximeow
Copy link
Member

@iximeow iximeow commented Jan 28, 2026

Doorbell buffers want the CQ event index to roughly track with the tail,
because most CQ doorbells are not interesting. If there are ten CQ
entries, space for 54 more, and the guest has acknowledged one, we don't
really care anything happened.

If there are 64 CQ entries, 0 available, and one SQ entry that we're
notified for, we'll try to reserve space in a full CQ, see it's at
capacity, and the SQ will idle with the CQ aware that that SQ is corked.
Without doorbell buffers, at some point the guest will process a CQE,
ring a CQ doorbell, uncork the CQ, and we'll nudge the parked SQ worker
thread into operation again.

With doorbell buffers, if the event index matches the CQ tail we'll
never get a doorbell when the CQ has been drained. If the guest never
submits another I/O on the corked SQ, the pending I/O may never get
processed.

A more careful treatment would only have the event index lag if we are
at risk of having a worker thread defer an I/O because this CQ would be
full. This would look something like not letting EventIdx advance so far
forward that there is less than one entry available for each SQ pointed
at this CQ.

Eating the unnecessary doorbell cost at least gets doorbell buffers
correct in this case and we can turn it back on. The more careful
treatment of EventIdx can come later.

@iximeow iximeow added bug Something that isn't working. storage Related to storage devices/backends. labels Jan 28, 2026
Doorbell buffers want the CQ event index to roughly track with the tail,
because most CQ doorbells are not interesting. If there are ten CQ
entries, space for 54 more, and the guest has acknowledged one, we don't
really care anything happened.

If there are 64 CQ entries, 0 available,
and one SQ entry that we're notified for, we'll try to reserve space in
a full CQ, see it's at capacity, and the SQ will idle with the CQ aware
that that SQ is corked. Without doorbell buffers, at some point the
guest will process a CQE, ring a CQ doorbell, uncork the CQ, and we'll
nudge the parked SQ worker thread into operation again.

With doorbell buffers, if the event index matches the CQ tail we'll
never get a doorbell when the CQ has been drained. If the guest never
submits another I/O on the corked SQ, the pending I/O may never get
processed.

A more careful treatment would only have the event index lag if we are
at risk of having a worker thread defer an I/O because this CQ would be
full. This would look something like not letting EventIdx advance so far
forward that there is less than one entry available for each SQ pointed
at this CQ.

Eating the unnecessary doorbell cost at least gets doorbell buffers
correct in this case and we can turn it back on. The more careful
treatment of EventIdx can come later.
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, though you may want a look from someone who knows more about the NVMe standard than I do.

@hawkw hawkw added this to the 18 milestone Jan 28, 2026
@iximeow iximeow force-pushed the doorbell-buffer-cq-wakefulness branch from 3c8e3a4 to 9484c10 Compare January 28, 2026 22:23
@leftwo leftwo added the local storage Relating to the local storage project label Jan 28, 2026
@iximeow iximeow self-assigned this Jan 28, 2026
Copy link
Contributor

@luqmana luqmana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find!

Copy link
Contributor

@dancrossnyc dancrossnyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@pfmooney
Copy link
Contributor

A more careful treatment would only have the event index lag if we are at risk of having a worker thread defer an I/O because this CQ would be full. This would look something like not letting EventIdx advance so far forward that there is less than one entry available for each SQ pointed at this CQ.

Might be worth recording that in the source commentary for future-you (or someone else). It seems like implementing that wouldn't be too onerous at some point down the line, and would get things back to nearly full elision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something that isn't working. local storage Relating to the local storage project storage Related to storage devices/backends.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants