Scheduler: Avoid vruntime jump when clock jumps

Currently, clock::get()->time() jumps (by system_time(), i.e., the host's uptime) at some point during the initialization. This can be a huge jump (e.g., a week if the host's uptime is a week). Fixing this jump is hard, so we'd rather just tolerate it. reschedule_from_interrupt() handles this clock jump badly. It calculates current_run, the amount of time the current thread has run, to include this jump while the thread was running. In the above example, a run time of a whole week is wrongly attributed to some thread, and added to its vruntime, causing it not to be scheduled again until all other threads yield the CPU. The fix in this patch is to limit the vruntime increase after a long run to max_slice (10ms). Even if a thread runs for longer (or just thinks it ran for longer), it won't be "penalized" in its dynamic priority more than a thread that ran for 10ms. Note that this cap makes sense, as cpu::enqueue already enforces a similar limit on the vruntime "bonus" of a woken thread, and this patch works toward a similar goal (avoid giving one thread a huge bonus because another thread was given a huge penalty). This bug is very visible in the CPU-bound SPECjvm2008 benchmarks, when running two benchmark threads on two virtual cpus. As it happens, the load_balancer() is the one that gets the huge vruntime increase, so it doesn't get to run until no other thread wants to run. Because we start with both CPU-bound threads on the same CPU, and these hardly yield the CPU (and even more rarely are the two threads sleeping at the same time), the load balancer thread on this CPU doesn't get to run, and the two threads remain on the same CPU, giving us halved performance (2-cpu performance identical to 1-cpu performance) and on the host we see qemu using 100% cpu, instead of 200% as expected with two vcpus.

Scheduler: Avoid vruntime jump when clock jumps
253e4536 · Nadav Har'El · a8d3a5ca · 253e4536
Commit 253e4536 authored 11 years ago by Nadav Har'El
--- a/core/sched.cc
+++ b/core/sched.cc
@@ -110,6 +110,13 @@ void cpu::reschedule_from_interrupt(bool preempt)
    if (p->_vruntime + current_run < 0) { // overflow (idle thread)
        current_run = 0;
    }
+    if (current_run > max_slice) {
+        // This thread has run for a long time, or clock:time() jumped. But if
+        // we increase vruntime by the full amount, this thread might go into
+        // a huge cpu time debt and won't be scheduled again for a long time.
+        // So limit the vruntime increase.
+        current_run = max_slice;
+    }
    if (p->_status == thread::status::running
            && (runqueue.empty()
                || p->_vruntime + current_run < runqueue.begin()->_vruntime + bias)) {