Skip to content
Snippets Groups Projects
  • Nadav Har'El's avatar
    19e52ce6
    Fix two bugs in yield() · 19e52ce6
    Nadav Har'El authored
    yield() had two bugs - thanks to Avi for pinpointing them:
    
    1. It used runqueue.push_back() to put the thread on the run queue, but
       push_back() is a low-level function which can only be used if we're
       sure that the item we're pushing has the vruntime suitable for being
       *last* on the queue - and in the code we did nothing to ensure this
       is the case (we should...). So use insert_equal(), not push_back().
    
    2. It was wrongly divided into two separate sections with interrupts
       disabled. The scheduler code is run both at interrupt time (e.g.,
       preempt()) and at thread time (e.g., wait(), yield(), etc.) so to
       guarantee it does not get called in the middle of itself, it needs
       to disable interrupts while working on the (per-cpu) runqueue.
       In the broken yield() code, we disabled interupts while adding the
       current thread to the run queue, and then again to reschedule.
       Between those two critical sections, an interrupt could arrive and
       do something with this thread (e.g., migrate it to another CPU, or
       preempt it), yet when the interrupt returns yield continues to run
       reschedule_from_interrupt which assumes that this thread is still
       running, and definitely not on the run queue.
    
    Bug 2 is what caused the crashes in the tst-yield.so test. The fix is
    to hold the interrupts disabled throughout the entire yield().
    This is easiest done with with lock_guard, which simplifies the flow
    of this function.
    19e52ce6
    History
    Fix two bugs in yield()
    Nadav Har'El authored
    yield() had two bugs - thanks to Avi for pinpointing them:
    
    1. It used runqueue.push_back() to put the thread on the run queue, but
       push_back() is a low-level function which can only be used if we're
       sure that the item we're pushing has the vruntime suitable for being
       *last* on the queue - and in the code we did nothing to ensure this
       is the case (we should...). So use insert_equal(), not push_back().
    
    2. It was wrongly divided into two separate sections with interrupts
       disabled. The scheduler code is run both at interrupt time (e.g.,
       preempt()) and at thread time (e.g., wait(), yield(), etc.) so to
       guarantee it does not get called in the middle of itself, it needs
       to disable interrupts while working on the (per-cpu) runqueue.
       In the broken yield() code, we disabled interupts while adding the
       current thread to the run queue, and then again to reschedule.
       Between those two critical sections, an interrupt could arrive and
       do something with this thread (e.g., migrate it to another CPU, or
       preempt it), yet when the interrupt returns yield continues to run
       reschedule_from_interrupt which assumes that this thread is still
       running, and definitely not on the run queue.
    
    Bug 2 is what caused the crashes in the tst-yield.so test. The fix is
    to hold the interrupts disabled throughout the entire yield().
    This is easiest done with with lock_guard, which simplifies the flow
    of this function.