Skip to content

Conversation

@kubanrob
Copy link
Collaborator

@kubanrob kubanrob commented Mar 1, 2021

Proposal for fixing the deadlock triggered by recursive Cap-deletion while holding CapEntry-locks. Please review thoroughly, this needs some work before being ready to merge and probably still has bugs.

This PR introduces CapEntry::lock_cap() protecting the Capability in the CapEntry, and removes locking from tree traversal. Instead, we use a mechanism similar to double-checked locking. If we notice races, the traversal is restarted from the root.

Please also comment on the documentation (or improve it yourself).

@kubanrob kubanrob requested review from gypsephi and rottaran March 1, 2021 19:10

bool try_lock_next()
{
bool ret = !(_next.fetch_or(LOCKED_FLAG) & LOCKED_FLAG);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only successful when we are the first to set the LOCKED_FLAG. That's good. But still, the operation should fail when next points to another CapEntry than we expect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need an argument that passes the expected successor address that should be in _next.

* or call deleteCap on an obejct must lock_cap
*
* lock order: lock_cap, lock_prev, lock_next
*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order for insertAfter A -> B to A->N->B: lock_next on A and read its value to find B, set new entry's next pointer in unlocked state N.next=B (prevents A from disappearing, prevents B from disappearing, and prevents other new entries between A and B), set successor's prev pointer to new entry while keepting B's other flags there (B.prev CAS to N+flags(B.prev)) (unlocks link between new entry and successor), set new entry's prev pointer to predecessor N.prev=A (N is still protected from concurrent removal because its predecessors next pointer is still locked and points to the wrong entry, but B is removable again and entries can be inserted between N and B already), set predecessor's next pointer to the new entry while clearing the locked flag there A.next=N (unlocks the link from predecessor to its successor).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order for kill A->K->B to A->B: lock_next on A and make sure that A is actually/still our predecessor (prevents A from disappearing and prevents new entries from appearing between A and K), lock_next on K (prevents B from disappearing and prevents new entries from appearing between K and B (which should not be possible due to other reasons)), set successor's prev pointer to A while keeping the B's flags there (B.prev = CAS to A+flags(B.prev)) (now any operation on B would try to acquire the lock on A.next and fail because it is still locked and still points to the wrong entry), set predecessor's next pointer to B while clearing the locked flag there A.next=B (unlocks the link from predecessor to its new successor)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prev pointer of an entry is changed only while its predecessor's next pointer is locked. Therefore, if we successfully acquire the lock from predecessor's next pointing to our entry by using compare and swap, our prev pointer cannot be wrong: The CAS fails if someone else concurrently locked the predecessor's next pointer or that pointer is not locked but someone else inserted an entry or removed our predecessor between the point where we fetched our prev pointer value to find our predecessor and our try to lock the predecessor's next pointer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...lets add this to the documentation within the code, and extend this with respect to what the other flags do, when they are supposed to be acquired, released and checked.

}


optional<void> CapEntry::unlinkAndUnlockLinks()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to do all required locking here or in CapEntry's code instead of the external revoke operation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting into locking and an ending operation (similar to a commit) allows us to use locked values to implement higher level operation. We can check if this is possible move locking into unlink after making sure we will not need that in the reworked revocation implementation.

Also, the external revoke operation is not just any operation and is bound to now details about the CapEntry/locking anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we apply this acquire-commit design to the insertAfter operation as well? This might remove this one callback function via template argument in CapEntry.hh

} else {
// deletion failed
// Either tried to delete a portal that is currently deleting
// or tried to to delete _guarded via a recursive call.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guarded was protecting what? why can't it be protected like other stuff in the line above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that is the same thing ... the portal is the _guarded`object. I will rephrase that comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but there is also this REVOKING_FLAG that does something similar to the guarded argument if I understood correctly. I cannot remember what the difference was and why.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The REVOKING_FLAG is just there to be sure nobody moves a Capability while we try to revoke it (since we (re)start our search for a leaf at the CapEntry). Without the move operation, we should not need that flag any more.

}

CapEntry* RevokeOperation::_findLockedLeaf(CapEntry* root)
// not sure if we even need that locking
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will need more thinking here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation, this does not use locking any more, I just forgot to remove the comment. We should discuss correctness though 👍

nextCap = nextEntry->cap()) {
ASSERT(!nextEntry->isDeleted());
hwthread_pause();
auto nextEntry = leafEntry->next();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible outcomes of reading next pointer: 1) points to the successor, 2) points to something but is locked because someone is trying concurrently to insertAfter, remove us or remove the successor, 3) points to nothing because someone has removed us concurrently.
On (3) we can only abort and retry from our root. On (2) we can ignore and just proceed to the successor despite the lock or we can spin by reading the next pointer until it is unlocked (case 1) or dead (case 3).

leafCap = nextCap.asZombie();
continue;
} else return true;
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation style: I wonder if the code would be easier to read and less nested when just doing the bailing out conditionally. like

if (!nextEntry) {
  return false;
}
// proceed without else branch.

Maybe a matter of taste

nextCap.isAllocated();
nextCap = nextEntry->cap()) {
ASSERT(!nextEntry->isDeleted());
hwthread_pause();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we recheck our own next pointer in the case that reading the successors capability failed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? As long as nobody deletes the parent (not possible at the moment if this is a child), I don't see how that could go wrong. It does not matter if somebody added a child before us, and since the entry is killed (most likely) ... good luck doing that. We can only become not-a-leaf if sb. adds a child to us.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our next pointer can change because of the successor being deleted concurrently. This is allowed because nothing was locked during our traversal to the first leaf.

@gypsephi gypsephi mentioned this pull request Mar 2, 2021
hwthread_pause();
}
if (cap::isParentOf(*leafEntry, leafCap, *nextEntry, nextCap)) {
if (!nextEntry->kill(nextCap)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modifying anything in the chain from within a function named findLeaf feels very wrong. This line would kill every CapEntry on the way to the first leaf. Why is this necessary? Or what is the benefit? Why does this no permanent damage in the case that the deletion operation has to abort?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rather change the name. Killing the intermediate nodes ensures they can't get inherited from and this subtree can't grow further while deleting.

Killing the cap should not do damage, it can be deleted by concurrent operations as well as in a retry.

_startAsyncDelete(t);
}

optional<void> RevokeOperation::_delete(CapEntry* root, Cap rootCap)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

objective: find first leaf child of the root and remove it from, repeat from the root until no leafs left, then delete root itself.
A CapEntry may be deleted only if it has no children. Otherwise, these children would point to their ancestor's kernel object after it has been deleted. Thus, when deleting an entry, we have to make sure that no children are concurrently inserted after it. The only possibility would be through CapEntry::insertAfter(). Alternatively, at a higher level in the inherit operation.
The steps should be: find first leaf, lock the leaf entry in order to prevent new childs to appear behind it, check that it is still the leaf and nobody concurrently inserted a child behind it, delete the kernel object and capability, remove the entry from the chain. If the locking fails we can abort due to concurrency. If the locking is successfull but a child appeared behind, we release the lock and repeat to find the first leaf by starting from where we are.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When traversing the CapEntry chain without any locking, it is mandatory to ensure that we still have the same contents in the CapEntry we are looking at, that is the Capabilities and the prev and next pointers. We just check the capability as follows: The initial caller provides the pointer to the CapEntry and the Capability it expects there. When looking at the successor in the chain, we read our next pointer, then read the successor's capability which becomes our expected value for the successor CapEntry. Whenever we decide to modify anything on a CapEntry, we have to acquire a respective lock flag in that entry and, then, verify that the CapEntry still contains the expected content.
The acquire lock should directly fail if the CapEntry's links do no longer match with what we expect: The lock flags are in the prev and next pointer of the CapEntry. When traversing the chain, we know the expected predecessor of the CapEntry that we try to lock.

@rottaran
Copy link
Collaborator

still fails with deadlock, maybe kernel fault:

1: app [app/init.cc:310] hello thread! ctx=0x0000000000000000
3: Test [app/init.cc:347] Success: res2
1: app [app/init.cc:312] thread in mutex ctx=0x0000000000000000
3: app [app/init.cc:354] sending notifications
    33: v=20 e=0000 i=0 cpl=0 IP=0010:ffffffff8138fed2 pc=ffffffff8138fed2 SP=0018:ffff800100208ff0 env->regs[R_EAX]=0000000000000003
RAX=0000000000000003 RBX=0000000000000000 RCX=ffff800100208e00 RDX=0000000000000000
RSI=0000000000000000 RDI=ffffffff818048d8 RBP=ffff800100208fe0 RSP=ffff800100208ff0
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000246
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff8138fed2 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0028 0000000000000000 ffffffff 00aff300 DPL=3 DS   [-WA]
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0028 0000000000000000 ffffffff 00aff300 DPL=3 DS   [-WA]
FS =0038 0000000001403050 ffffffff 00eff300 DPL=3 DS   [-WA]
GS =0040 0000000000000080 ffffffff 00eff300 DPL=3 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0048 ffffffff814289b8 00002068 0000e900 DPL=3 TSS64-avl
GDT=     ffffffff8142aa20 00000057
IDT=     ffffffff8180c1c0 00000fff
CR0=e0050033 CR2=0000000000000000 CR3=0000000000111000 CR4=00040620
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000044 CCD=0000000000000000 CCO=EFLAGS  
EFER=0000000000000d01
    34: v=20 e=0000 i=0 cpl=0 IP=0010:ffffffff8138fed2 pc=ffffffff8138fed2 SP=0018:ffff800100205ff0 env->regs[R_EAX]=0000000000000003
RAX=0000000000000003 RBX=0000000000000000 RCX=ffff800100205e00 RDX=0000000000000000
RSI=0000000000000000 RDI=ffffffff81804858 RBP=ffff800100205fe0 RSP=ffff800100205ff0
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000246
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff8138fed2 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0028 0000000000000000 ffffffff 00aff300 DPL=3 DS   [-WA]
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0028 0000000000000000 ffffffff 00aff300 DPL=3 DS   [-WA]
FS =0038 0000000001403110 ffffffff 00eff300 DPL=3 DS   [-WA]
GS =0040 0000000000000040 ffffffff 00eff300 DPL=3 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0048 ffffffff814268ec 00002068 0000e900 DPL=3 TSS64-avl
GDT=     ffffffff81428954 00000057
IDT=     ffffffff8180c1c0 00000fff
CR0=e0050033 CR2=0000000000000000 CR3=0000000000111000 CR4=00040620
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000044 CCD=0000000000000000 CCO=EFLAGS  
EFER=0000000000000d01
1: app [app/init.cc:315] thread resumed from wait ctx=0x0000000000000000
2: app [app/init.cc:315] thread resumed from wait ctx=0x0000000000000000
3: app [./runtime/SimpleCapAlloc.hh:81] free p=1025
3: cap [objects/RevokeOperation.cc:109] mythos::optional<void> mythos::RevokeOperation::_delete(mythos::CapEntry*, mythos::Cap) root=0xffff800000104078 rootCap=0xffff800000129240:original:in_transition:zombie:0
3: cap [objects/RevokeOperation.cc:159] bool mythos::RevokeOperation::_findLeaf(mythos::CapEntry*, mythos::Cap, mythos::CapEntry*&, mythos::Cap&) root=0xffff800000104078
3: cap [objects/RevokeOperation.cc:120] _findLockedLeaf returned *leaf=0xffff800000129240:original:in_transition:zombie:0 rootCap=0xffff800000129240:original:in_transition:zombie:0
3: cap [./objects/CapEntry.hh:133] bool mythos::CapEntry::try_lock_cap() this=0xffff800000104078  locked
    35: v=20 e=0000 i=0 cpl=0 IP=0010:ffffffff8138fed2 pc=ffffffff8138fed2 SP=0018:ffff800100208ff0 env->regs[R_EAX]=0000000000000003
RAX=0000000000000003 RBX=0000000000000000 RCX=ffff800100208e00 RDX=0000000000000000
RSI=0000000000000000 RDI=ffffffff818048d8 RBP=ffff800100208fe0 RSP=ffff800100208ff0
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000246
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff8138fed2 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0028 0000000000000000 ffffffff 00aff300 DPL=3 DS   [-WA]
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0028 0000000000000000 ffffffff 00aff300 DPL=3 DS   [-WA]
FS =0038 0000000001403050 ffffffff 00eff300 DPL=3 DS   [-WA]
GS =0040 0000000000000080 ffffffff 00eff300 DPL=3 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0048 ffffffff814289b8 00002068 0000e900 DPL=3 TSS64-avl
GDT=     ffffffff8142aa20 00000057
IDT=     ffffffff8180c1c0 00000fff
CR0=e0050033 CR2=0000000000000000 CR3=0000000000111000 CR4=00040620
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000044 CCD=0000000000000000 CCO=EFLAGS  
EFER=0000000000000d01
2: cap [./objects/CapEntry.hh:133] bool mythos::CapEntry::try_lock_cap() this=0xffff8000001292f8  locked
2: cap [objects/CapEntry.cc:169] mythos::Error mythos::CapEntry::try_lock_prev() this=0xffff8000001292f8 prev=0xffff800000100e08
2: cap [./objects/CapEntry.hh:105] bool mythos::CapEntry::try_lock_next(mythos::CapEntry*) this=0xffff800000100e08  locked
2: cap [./objects/CapEntry.hh:192] bool mythos::CapEntry::try_lock_next() this=0xffff8000001292f8  locked
2: cap [objects/CapEntry.cc:154] mythos::optional<void> mythos::CapEntry::unlinkAndUnlockLinks() this=0xffff8000001292f8
2: cap [objects/CapEntry.cc:158] this unlocks _next of predecessor
2: cap [objects/CapEntry.cc:161] this unlocks _next
2: cap [./objects/CapEntry.hh:133] bool mythos::CapEntry::try_lock_cap() this=0xffff800000412238  locked
2: cap [./objects/CapEntry.hh:150] void mythos::CapEntry::unlock_cap() this=0xffff800000412238
2: cap [objects/RevokeOperation.cc:109] mythos::optional<void> mythos::RevokeOperation::_delete(mythos::CapEntry*, mythos::Cap) root=0xffff800000412238 rootCap=0xffff8000008048c0:usable:original:0
2: cap [objects/RevokeOperation.cc:159] bool mythos::RevokeOperation::_findLeaf(mythos::CapEntry*, mythos::Cap, mythos::CapEntry*&, mythos::Cap&) root=0xffff800000412238
2: cap [objects/RevokeOperation.cc:120] _findLockedLeaf returned *leaf=0xffff8000008048c0:reference:in_transition:zombie:0 rootCap=0xffff8000008048c0:usable:original:0
2: cap [./objects/CapEntry.hh:133] bool mythos::CapEntry::try_lock_cap() this=0xffff800000100e08  locked
2: cap [objects/CapEntry.cc:169] mythos::Error mythos::CapEntry::try_lock_prev() this=0xffff800000100e08 prev=0xffff800000412238
2: cap [./objects/CapEntry.hh:105] bool mythos::CapEntry::try_lock_next(mythos::CapEntry*) this=0xffff800000412238  locked
2: cap [./objects/CapEntry.hh:192] bool mythos::CapEntry::try_lock_next() this=0xffff800000100e08  locked
2: cap [objects/CapEntry.cc:154] mythos::optional<void> mythos::CapEntry::unlinkAndUnlockLinks() this=0xffff800000100e08
2: cap [objects/CapEntry.cc:158] this unlocks _next of predecessor
2: cap [objects/CapEntry.cc:161] this unlocks _next
2: cap [objects/CapEntry.cc:68] void mythos::CapEntry::reset() this=0xffff800000100e08
2: cap [objects/RevokeOperation.cc:159] bool mythos::RevokeOperation::_findLeaf(mythos::CapEntry*, mythos::Cap, mythos::CapEntry*&, mythos::Cap&) root=0xffff800000412238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants