I194 deadlock in pagemap #198

kubanrob · 2021-03-01T19:09:15Z

Proposal for fixing the deadlock triggered by recursive Cap-deletion while holding CapEntry-locks. Please review thoroughly, this needs some work before being ready to merge and probably still has bugs.

This PR introduces CapEntry::lock_cap() protecting the Capability in the CapEntry, and removes locking from tree traversal. Instead, we use a mechanism similar to double-checked locking. If we notice races, the traversal is restarted from the root.

Please also comment on the documentation (or improve it yourself).

…hos into i194_deadlock_in_pagemap

kernel/objects/capability-spinning/objects/CapEntry.cc

rottaran · 2021-03-01T22:23:41Z

kernel/objects/capability-spinning/objects/CapEntry.hh

+
+    bool try_lock_next()
+    { 
+      bool ret = !(_next.fetch_or(LOCKED_FLAG) & LOCKED_FLAG);


This is only successful when we are the first to set the LOCKED_FLAG. That's good. But still, the operation should fail when next points to another CapEntry than we expect.

we need an argument that passes the expected successor address that should be in _next.

kernel/objects/capability-spinning/objects/CapEntry.cc

rottaran · 2021-03-01T22:47:04Z

kernel/objects/capability-spinning/objects/CapEntry.hh

+   * or call deleteCap on an obejct  must lock_cap
+   *
+   * lock order: lock_cap, lock_prev, lock_next
+   *


order for insertAfter A -> B to A->N->B: lock_next on A and read its value to find B, set new entry's next pointer in unlocked state N.next=B (prevents A from disappearing, prevents B from disappearing, and prevents other new entries between A and B), set successor's prev pointer to new entry while keepting B's other flags there (B.prev CAS to N+flags(B.prev)) (unlocks link between new entry and successor), set new entry's prev pointer to predecessor N.prev=A (N is still protected from concurrent removal because its predecessors next pointer is still locked and points to the wrong entry, but B is removable again and entries can be inserted between N and B already), set predecessor's next pointer to the new entry while clearing the locked flag there A.next=N (unlocks the link from predecessor to its successor).

order for kill A->K->B to A->B: lock_next on A and make sure that A is actually/still our predecessor (prevents A from disappearing and prevents new entries from appearing between A and K), lock_next on K (prevents B from disappearing and prevents new entries from appearing between K and B (which should not be possible due to other reasons)), set successor's prev pointer to A while keeping the B's flags there (B.prev = CAS to A+flags(B.prev)) (now any operation on B would try to acquire the lock on A.next and fail because it is still locked and still points to the wrong entry), set predecessor's next pointer to B while clearing the locked flag there A.next=B (unlocks the link from predecessor to its new successor)

The prev pointer of an entry is changed only while its predecessor's next pointer is locked. Therefore, if we successfully acquire the lock from predecessor's next pointing to our entry by using compare and swap, our prev pointer cannot be wrong: The CAS fails if someone else concurrently locked the predecessor's next pointer or that pointer is not locked but someone else inserted an entry or removed our predecessor between the point where we fetched our prev pointer value to find our predecessor and our try to lock the predecessor's next pointer.

...lets add this to the documentation within the code, and extend this with respect to what the other flags do, when they are supposed to be acquired, released and checked.

kernel/objects/capability-spinning/objects/CapEntry.cc

rottaran · 2021-03-02T00:01:21Z

kernel/objects/capability-spinning/objects/CapEntry.cc

+  }
+
+
+  optional<void> CapEntry::unlinkAndUnlockLinks()


would be good to do all required locking here or in CapEntry's code instead of the external revoke operation.

Splitting into locking and an ending operation (similar to a commit) allows us to use locked values to implement higher level operation. We can check if this is possible move locking into unlink after making sure we will not need that in the reworked revocation implementation.

Also, the external revoke operation is not just any operation and is bound to now details about the CapEntry/locking anyway.

Should we apply this acquire-commit design to the insertAfter operation as well? This might remove this one callback function via template argument in CapEntry.hh

kernel/objects/capability-spinning/objects/RevokeOperation.cc

rottaran · 2021-03-02T00:18:04Z

kernel/objects/capability-spinning/objects/RevokeOperation.cc

+      } else {
+        // deletion failed
+        // Either tried to delete a portal that is currently deleting
+        // or tried to to delete _guarded via a recursive call.


guarded was protecting what? why can't it be protected like other stuff in the line above?

Sorry, that is the same thing ... the portal is the _guarded`object. I will rephrase that comment.

yes, but there is also this REVOKING_FLAG that does something similar to the guarded argument if I understood correctly. I cannot remember what the difference was and why.

The REVOKING_FLAG is just there to be sure nobody moves a Capability while we try to revoke it (since we (re)start our search for a leaf at the CapEntry). Without the move operation, we should not need that flag any more.

rottaran · 2021-03-02T00:18:57Z

kernel/objects/capability-spinning/objects/RevokeOperation.cc

-  }
-
-  CapEntry* RevokeOperation::_findLockedLeaf(CapEntry* root)
+  // not sure if we even need that locking


will need more thinking here

In the current implementation, this does not use locking any more, I just forgot to remove the comment. We should discuss correctness though 👍

rottaran · 2021-03-02T09:59:47Z

kernel/objects/capability-spinning/objects/RevokeOperation.cc

-          nextCap = nextEntry->cap()) {
-        ASSERT(!nextEntry->isDeleted());
-        hwthread_pause();
+      auto nextEntry = leafEntry->next();


possible outcomes of reading next pointer: 1) points to the successor, 2) points to something but is locked because someone is trying concurrently to insertAfter, remove us or remove the successor, 3) points to nothing because someone has removed us concurrently.
On (3) we can only abort and retry from our root. On (2) we can ignore and just proceed to the successor despite the lock or we can spin by reading the next pointer until it is unlocked (case 1) or dead (case 3).

rottaran · 2021-03-02T10:02:21Z

kernel/objects/capability-spinning/objects/RevokeOperation.cc

+          leafCap = nextCap.asZombie();
+          continue;
+        } else return true;
+      } else {


Implementation style: I wonder if the code would be easier to read and less nested when just doing the bailing out conditionally. like

if (!nextEntry) { return false; } // proceed without else branch.

Maybe a matter of taste

rottaran · 2021-03-02T10:03:50Z

kernel/objects/capability-spinning/objects/RevokeOperation.cc

+            nextCap.isAllocated();
+            nextCap = nextEntry->cap()) {
+          ASSERT(!nextEntry->isDeleted());
+          hwthread_pause();


shouldn't we recheck our own next pointer in the case that reading the successors capability failed?

Why? As long as nobody deletes the parent (not possible at the moment if this is a child), I don't see how that could go wrong. It does not matter if somebody added a child before us, and since the entry is killed (most likely) ... good luck doing that. We can only become not-a-leaf if sb. adds a child to us.

Our next pointer can change because of the successor being deleted concurrently. This is allowed because nothing was locked during our traversal to the first leaf.

rottaran · 2021-03-02T10:11:14Z

kernel/objects/capability-spinning/objects/RevokeOperation.cc

+          hwthread_pause();
+        }
+        if (cap::isParentOf(*leafEntry, leafCap, *nextEntry, nextCap)) {
+          if (!nextEntry->kill(nextCap)) {


modifying anything in the chain from within a function named findLeaf feels very wrong. This line would kill every CapEntry on the way to the first leaf. Why is this necessary? Or what is the benefit? Why does this no permanent damage in the case that the deletion operation has to abort?

I rather change the name. Killing the intermediate nodes ensures they can't get inherited from and this subtree can't grow further while deleting.

Killing the cap should not do damage, it can be deleted by concurrent operations as well as in a retry.

rottaran · 2021-03-02T10:45:57Z

kernel/objects/capability-spinning/objects/RevokeOperation.cc

    _startAsyncDelete(t);
  }

  optional<void> RevokeOperation::_delete(CapEntry* root, Cap rootCap)


objective: find first leaf child of the root and remove it from, repeat from the root until no leafs left, then delete root itself.
A CapEntry may be deleted only if it has no children. Otherwise, these children would point to their ancestor's kernel object after it has been deleted. Thus, when deleting an entry, we have to make sure that no children are concurrently inserted after it. The only possibility would be through CapEntry::insertAfter(). Alternatively, at a higher level in the inherit operation.
The steps should be: find first leaf, lock the leaf entry in order to prevent new childs to appear behind it, check that it is still the leaf and nobody concurrently inserted a child behind it, delete the kernel object and capability, remove the entry from the chain. If the locking fails we can abort due to concurrency. If the locking is successfull but a child appeared behind, we release the lock and repeat to find the first leaf by starting from where we are.

When traversing the CapEntry chain without any locking, it is mandatory to ensure that we still have the same contents in the CapEntry we are looking at, that is the Capabilities and the prev and next pointers. We just check the capability as follows: The initial caller provides the pointer to the CapEntry and the Capability it expects there. When looking at the successor in the chain, we read our next pointer, then read the successor's capability which becomes our expected value for the successor CapEntry. Whenever we decide to modify anything on a CapEntry, we have to acquire a respective lock flag in that entry and, then, verify that the CapEntry still contains the expected content.
The acquire lock should directly fail if the CapEntry's links do no longer match with what we expect: The lock flags are in the prev and next pointer of the CapEntry. When traversing the chain, we know the expected predecessor of the CapEntry that we try to lock.

rottaran · 2021-07-22T11:59:02Z

still fails with deadlock, maybe kernel fault:

1: app [app/init.cc:310] hello thread! ctx=0x0000000000000000
3: Test [app/init.cc:347] Success: res2
1: app [app/init.cc:312] thread in mutex ctx=0x0000000000000000
3: app [app/init.cc:354] sending notifications
    33: v=20 e=0000 i=0 cpl=0 IP=0010:ffffffff8138fed2 pc=ffffffff8138fed2 SP=0018:ffff800100208ff0 env->regs[R_EAX]=0000000000000003
RAX=0000000000000003 RBX=0000000000000000 RCX=ffff800100208e00 RDX=0000000000000000
RSI=0000000000000000 RDI=ffffffff818048d8 RBP=ffff800100208fe0 RSP=ffff800100208ff0
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000246
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff8138fed2 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0028 0000000000000000 ffffffff 00aff300 DPL=3 DS   [-WA]
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0028 0000000000000000 ffffffff 00aff300 DPL=3 DS   [-WA]
FS =0038 0000000001403050 ffffffff 00eff300 DPL=3 DS   [-WA]
GS =0040 0000000000000080 ffffffff 00eff300 DPL=3 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0048 ffffffff814289b8 00002068 0000e900 DPL=3 TSS64-avl
GDT=     ffffffff8142aa20 00000057
IDT=     ffffffff8180c1c0 00000fff
CR0=e0050033 CR2=0000000000000000 CR3=0000000000111000 CR4=00040620
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000044 CCD=0000000000000000 CCO=EFLAGS  
EFER=0000000000000d01
    34: v=20 e=0000 i=0 cpl=0 IP=0010:ffffffff8138fed2 pc=ffffffff8138fed2 SP=0018:ffff800100205ff0 env->regs[R_EAX]=0000000000000003
RAX=0000000000000003 RBX=0000000000000000 RCX=ffff800100205e00 RDX=0000000000000000
RSI=0000000000000000 RDI=ffffffff81804858 RBP=ffff800100205fe0 RSP=ffff800100205ff0
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000246
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff8138fed2 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0028 0000000000000000 ffffffff 00aff300 DPL=3 DS   [-WA]
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0028 0000000000000000 ffffffff 00aff300 DPL=3 DS   [-WA]
FS =0038 0000000001403110 ffffffff 00eff300 DPL=3 DS   [-WA]
GS =0040 0000000000000040 ffffffff 00eff300 DPL=3 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0048 ffffffff814268ec 00002068 0000e900 DPL=3 TSS64-avl
GDT=     ffffffff81428954 00000057
IDT=     ffffffff8180c1c0 00000fff
CR0=e0050033 CR2=0000000000000000 CR3=0000000000111000 CR4=00040620
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000044 CCD=0000000000000000 CCO=EFLAGS  
EFER=0000000000000d01
1: app [app/init.cc:315] thread resumed from wait ctx=0x0000000000000000
2: app [app/init.cc:315] thread resumed from wait ctx=0x0000000000000000
3: app [./runtime/SimpleCapAlloc.hh:81] free p=1025
3: cap [objects/RevokeOperation.cc:109] mythos::optional<void> mythos::RevokeOperation::_delete(mythos::CapEntry*, mythos::Cap) root=0xffff800000104078 rootCap=0xffff800000129240:original:in_transition:zombie:0
3: cap [objects/RevokeOperation.cc:159] bool mythos::RevokeOperation::_findLeaf(mythos::CapEntry*, mythos::Cap, mythos::CapEntry*&, mythos::Cap&) root=0xffff800000104078
3: cap [objects/RevokeOperation.cc:120] _findLockedLeaf returned *leaf=0xffff800000129240:original:in_transition:zombie:0 rootCap=0xffff800000129240:original:in_transition:zombie:0
3: cap [./objects/CapEntry.hh:133] bool mythos::CapEntry::try_lock_cap() this=0xffff800000104078  locked
    35: v=20 e=0000 i=0 cpl=0 IP=0010:ffffffff8138fed2 pc=ffffffff8138fed2 SP=0018:ffff800100208ff0 env->regs[R_EAX]=0000000000000003
RAX=0000000000000003 RBX=0000000000000000 RCX=ffff800100208e00 RDX=0000000000000000
RSI=0000000000000000 RDI=ffffffff818048d8 RBP=ffff800100208fe0 RSP=ffff800100208ff0
R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000246
R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
RIP=ffffffff8138fed2 RFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0028 0000000000000000 ffffffff 00aff300 DPL=3 DS   [-WA]
CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0028 0000000000000000 ffffffff 00aff300 DPL=3 DS   [-WA]
FS =0038 0000000001403050 ffffffff 00eff300 DPL=3 DS   [-WA]
GS =0040 0000000000000080 ffffffff 00eff300 DPL=3 DS   [-WA]
LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
TR =0048 ffffffff814289b8 00002068 0000e900 DPL=3 TSS64-avl
GDT=     ffffffff8142aa20 00000057
IDT=     ffffffff8180c1c0 00000fff
CR0=e0050033 CR2=0000000000000000 CR3=0000000000111000 CR4=00040620
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
CCS=0000000000000044 CCD=0000000000000000 CCO=EFLAGS  
EFER=0000000000000d01
2: cap [./objects/CapEntry.hh:133] bool mythos::CapEntry::try_lock_cap() this=0xffff8000001292f8  locked
2: cap [objects/CapEntry.cc:169] mythos::Error mythos::CapEntry::try_lock_prev() this=0xffff8000001292f8 prev=0xffff800000100e08
2: cap [./objects/CapEntry.hh:105] bool mythos::CapEntry::try_lock_next(mythos::CapEntry*) this=0xffff800000100e08  locked
2: cap [./objects/CapEntry.hh:192] bool mythos::CapEntry::try_lock_next() this=0xffff8000001292f8  locked
2: cap [objects/CapEntry.cc:154] mythos::optional<void> mythos::CapEntry::unlinkAndUnlockLinks() this=0xffff8000001292f8
2: cap [objects/CapEntry.cc:158] this unlocks _next of predecessor
2: cap [objects/CapEntry.cc:161] this unlocks _next
2: cap [./objects/CapEntry.hh:133] bool mythos::CapEntry::try_lock_cap() this=0xffff800000412238  locked
2: cap [./objects/CapEntry.hh:150] void mythos::CapEntry::unlock_cap() this=0xffff800000412238
2: cap [objects/RevokeOperation.cc:109] mythos::optional<void> mythos::RevokeOperation::_delete(mythos::CapEntry*, mythos::Cap) root=0xffff800000412238 rootCap=0xffff8000008048c0:usable:original:0
2: cap [objects/RevokeOperation.cc:159] bool mythos::RevokeOperation::_findLeaf(mythos::CapEntry*, mythos::Cap, mythos::CapEntry*&, mythos::Cap&) root=0xffff800000412238
2: cap [objects/RevokeOperation.cc:120] _findLockedLeaf returned *leaf=0xffff8000008048c0:reference:in_transition:zombie:0 rootCap=0xffff8000008048c0:usable:original:0
2: cap [./objects/CapEntry.hh:133] bool mythos::CapEntry::try_lock_cap() this=0xffff800000100e08  locked
2: cap [objects/CapEntry.cc:169] mythos::Error mythos::CapEntry::try_lock_prev() this=0xffff800000100e08 prev=0xffff800000412238
2: cap [./objects/CapEntry.hh:105] bool mythos::CapEntry::try_lock_next(mythos::CapEntry*) this=0xffff800000412238  locked
2: cap [./objects/CapEntry.hh:192] bool mythos::CapEntry::try_lock_next() this=0xffff800000100e08  locked
2: cap [objects/CapEntry.cc:154] mythos::optional<void> mythos::CapEntry::unlinkAndUnlockLinks() this=0xffff800000100e08
2: cap [objects/CapEntry.cc:158] this unlocks _next of predecessor
2: cap [objects/CapEntry.cc:161] this unlocks _next
2: cap [objects/CapEntry.cc:68] void mythos::CapEntry::reset() this=0xffff800000100e08
2: cap [objects/RevokeOperation.cc:159] bool mythos::RevokeOperation::_findLeaf(mythos::CapEntry*, mythos::Cap, mythos::CapEntry*&, mythos::Cap&) root=0xffff800000412238

Philipp Gypser and others added 15 commits February 25, 2021 13:27

trigger pagemap bug

dd416b6

debug output

13611a5

debug output

9d43fb7

debug output

418ba45

rename CapEntry::unlink

9d9c9d7

capentry unlock debug output

9be6916

renames lock to lock_next.

7da68f2

PML4 broadcast page fault test

74f3bcc

fixed PML4 broadcast page fault

7e8d78a

adds lock_cap.

efc4ade

searches for leaf locklessy.

9b24ea6

fixed overwritten flags.

ae1bfc3

fixes lock order.

9469d8b

adds a bit more documentation.

0cb2143

Merge branch 'i194_deadlock_in_pagemap' of github.com:ManyThreads/myt…

ece2d00

…hos into i194_deadlock_in_pagemap

kubanrob requested review from gypsephi and rottaran March 1, 2021 19:10

rottaran reviewed Mar 2, 2021

View reviewed changes

gypsephi mentioned this pull request Mar 2, 2021

I195 capmap deletion #199

Merged

rottaran reviewed Mar 2, 2021

View reviewed changes

Robert Kuban and others added 4 commits March 2, 2021 13:10

updates try_lock code.

21e6d93

updates comment.

c7c08d7

Merge branch 'master' into i194_deadlock_in_pagemap

e528bb9

reduced failing test set

abed1d9

Merge branch 'master' into i194_deadlock_in_pagemap

73abdd6

I194 deadlock in pagemap #198

Are you sure you want to change the base?

I194 deadlock in pagemap #198

Uh oh!

Conversation

kubanrob commented Mar 1, 2021

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rottaran commented Jul 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants