From 19e52ce68eccd7842542be2de06aa3c4d9c29c96 Mon Sep 17 00:00:00 2001
From: Nadav Har'El <nyh@cloudius-systems.com>
Date: Sun, 26 May 2013 23:14:22 +0300
Subject: [PATCH] Fix two bugs in yield()

yield() had two bugs - thanks to Avi for pinpointing them:

1. It used runqueue.push_back() to put the thread on the run queue, but
   push_back() is a low-level function which can only be used if we're
   sure that the item we're pushing has the vruntime suitable for being
   *last* on the queue - and in the code we did nothing to ensure this
   is the case (we should...). So use insert_equal(), not push_back().

2. It was wrongly divided into two separate sections with interrupts
   disabled. The scheduler code is run both at interrupt time (e.g.,
   preempt()) and at thread time (e.g., wait(), yield(), etc.) so to
   guarantee it does not get called in the middle of itself, it needs
   to disable interrupts while working on the (per-cpu) runqueue.
   In the broken yield() code, we disabled interupts while adding the
   current thread to the run queue, and then again to reschedule.
   Between those two critical sections, an interrupt could arrive and
   do something with this thread (e.g., migrate it to another CPU, or
   preempt it), yet when the interrupt returns yield continues to run
   reschedule_from_interrupt which assumes that this thread is still
   running, and definitely not on the run queue.

Bug 2 is what caused the crashes in the tst-yield.so test. The fix is
to hold the interrupts disabled throughout the entire yield().
This is easiest done with with lock_guard, which simplifies the flow
of this function.
---
 core/sched.cc | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/core/sched.cc b/core/sched.cc
index 103a25a0e..cf9b73d08 100644
--- a/core/sched.cc
+++ b/core/sched.cc
@@ -227,21 +227,19 @@ void schedule(bool yield)
 void thread::yield()
 {
     auto t = current();
+    std::lock_guard<irq_lock_type> guard(irq_lock);
     // FIXME: drive by IPI
-    bool resched = with_lock(irq_lock, [t] {
-        t->_cpu->handle_incoming_wakeups();
-        // FIXME: what about other cpus?
-        if (t->_cpu->runqueue.empty()) {
-            return false;
-        }
-        t->_cpu->runqueue.push_back(*t);
-        assert(t->_status.load() == status::running);
-        t->_status.store(status::queued);
-        return true;
-    });
-    if (resched) {
-        t->_cpu->schedule(true);
+    t->_cpu->handle_incoming_wakeups();
+    // FIXME: what about other cpus?
+    if (t->_cpu->runqueue.empty()) {
+        return;
     }
+    // TODO: need to give up some vruntime (move to borrow) so we're last
+    // on the queue, and then we can use push_back()
+    t->_cpu->runqueue.insert_equal(*t);
+    assert(t->_status.load() == status::running);
+    t->_status.store(status::queued);
+    t->_cpu->reschedule_from_interrupt(false);
 }
 
 thread::stack_info::stack_info()
-- 
GitLab