lockfree mutex: fix wait/wake bug

When I developed lockfree mutex, the scheduler, preemption, and related code still had a bunch of bugs, so I resorted to some workarounds that in hindsite look unnecessary, and even wrong. When it seemed that I can only wake() a thread in wait state, I made an effort to enter the waiting state (with "wait_guard") before adding the thread to the to-awake queue, and then slept with schedule(). The biggest problem with this approach was that a spurious wake(), for any reason, of this thread, would cause us to end the lock() - and fail on an assert that we're the owners of the lock - instead of repeating the wait. When switching to lockfree mutex, the sunflow benchmark (for example) would die on this assertion failure. So now I replaced this ugliness with our familiar idiom, wait_until(). The thread is in running state for some time after entering queue, so it might be woken when not yet sleeping and the wake() will be ignored - but this is fine because part of our protocol is that the wake() before waking also sets "owner" to the to-be-woken thread, and before sleeping we check if owner isn't already us. Also changed the comment on "owner" to say that it is not *just* for implementing a recursive mutex, but also nessary for the wakeup protocol.

lockfree mutex: fix wait/wake bug
9bcf790a · Nadav Har'El · 30f6e9dd · 9bcf790a · 9bcf790a
Commit 9bcf790a authored 11 years ago by Nadav Har'El
--- a/core/lfmutex.cc
+++ b/core/lfmutex.cc
@@ -26,18 +26,6 @@ void mutex::lock()
    // If we're here still here the lock is owned by a different thread.
    // Put this thread in a waiting queue, so it will eventually be woken
    // when another thread releases the lock.
-
-    // Mark the thread "waiting" now with wait_guard(). The code from
-    // here until the schedule() below must be kept to a bare minimum:
-    // It will never be preempted (if we preempt a "waiting" thread, it
-    // will never come back), and it cannot run any code expecting a
-    // "running" thread - such as another wait_guard() or anything
-    // using one (mutexes, malloc, etc.). A good way to think about the
-    // situation is that between the wait_guard() and schedule(), the
-    // current thread is replaced by a new non-preemptable execution
-    // context with the code below.
-    sched::wait_guard wait_guard(current);
-
    linked_item<sched::thread *> waiter(current);
    waitqueue.push(&waiter);

@@ -56,9 +44,6 @@ void mutex::lock()
                owner.store(thread);
                if(thread!=current) {
                    thread->wake();
-                    // Note that because of the wait_guard, preemption of
-                    // this thread is disabled, so the above wake() will
-                    // not reschedule current or change its state.
                }  else
                    return; // got the lock ourselves
            }
@@ -67,9 +52,9 @@ void mutex::lock()

    // Wait until another thread wakes us up. When somebody wakes us,
    // they will set us to be the owner first.
-    // TODO: perhaps check if owner isn't already current, in which case no need to reschedule?
-    current->tcpu()->schedule(true);
-    assert(owner.load(std::memory_order_relaxed)==current);
+    sched::thread::wait_until([&] {
+        return owner.load(std::memory_order_relaxed) == current;
+    });
 }

 bool mutex::try_lock()

--- a/include/lockfree/mutex.hh
+++ b/include/lockfree/mutex.hh
@@ -52,7 +52,8 @@ namespace lockfree {
 class mutex {
 private:
    std::atomic<int> count;
-    // "owner" and "depth" are need for implementing a recursive mutex
+    // "owner" and "depth" are need for implementing a recursive mutex, but
+    // owner is also used to tell a thread being woken that it was made owner.
    unsigned int depth;
    std::atomic<sched::thread *> owner;
    queue_mpsc<sched::thread *> waitqueue;