Skip to content
Snippets Groups Projects
  • Nadav Har'El's avatar
    253e4536
    Scheduler: Avoid vruntime jump when clock jumps · 253e4536
    Nadav Har'El authored
    Currently, clock::get()->time() jumps (by system_time(), i.e., the host's
    uptime) at some point during the initialization. This can be a huge jump
    (e.g., a week if the host's uptime is a week). Fixing this jump is hard,
    so we'd rather just tolerate it.
    
    reschedule_from_interrupt() handles this clock jump badly. It calculates
    current_run, the amount of time the current thread has run, to include this
    jump while the thread was running. In the above example, a run time of
    a whole week is wrongly attributed to some thread, and added to its vruntime,
    causing it not to be scheduled again until all other threads yield the
    CPU.
    
    The fix in this patch is to limit the vruntime increase after a long
    run to max_slice (10ms). Even if a thread runs for longer (or just thinks
    it ran for longer), it won't be "penalized" in its dynamic priority more
    than a thread that ran for 10ms. Note that this cap makes sense, as
    cpu::enqueue already enforces a similar limit on the vruntime "bonus"
    of a woken thread, and this patch works toward a similar goal (avoid
    giving one thread a huge bonus because another thread was given a huge
    penalty).
    
    This bug is very visible in the CPU-bound SPECjvm2008 benchmarks, when
    running two benchmark threads on two virtual cpus. As it happens, the
    load_balancer() is the one that gets the huge vruntime increase, so
    it doesn't get to run until no other thread wants to run. Because we start
    with both CPU-bound threads on the same CPU, and these hardly yield the
    CPU (and even more rarely are the two threads sleeping at the same time),
    the load balancer thread on this CPU doesn't get to run, and the two threads
    remain on the same CPU, giving us halved performance (2-cpu performance
    identical to 1-cpu performance) and on the host we see qemu using 100% cpu,
    instead of 200% as expected with two vcpus.
    253e4536
    History
    Scheduler: Avoid vruntime jump when clock jumps
    Nadav Har'El authored
    Currently, clock::get()->time() jumps (by system_time(), i.e., the host's
    uptime) at some point during the initialization. This can be a huge jump
    (e.g., a week if the host's uptime is a week). Fixing this jump is hard,
    so we'd rather just tolerate it.
    
    reschedule_from_interrupt() handles this clock jump badly. It calculates
    current_run, the amount of time the current thread has run, to include this
    jump while the thread was running. In the above example, a run time of
    a whole week is wrongly attributed to some thread, and added to its vruntime,
    causing it not to be scheduled again until all other threads yield the
    CPU.
    
    The fix in this patch is to limit the vruntime increase after a long
    run to max_slice (10ms). Even if a thread runs for longer (or just thinks
    it ran for longer), it won't be "penalized" in its dynamic priority more
    than a thread that ran for 10ms. Note that this cap makes sense, as
    cpu::enqueue already enforces a similar limit on the vruntime "bonus"
    of a woken thread, and this patch works toward a similar goal (avoid
    giving one thread a huge bonus because another thread was given a huge
    penalty).
    
    This bug is very visible in the CPU-bound SPECjvm2008 benchmarks, when
    running two benchmark threads on two virtual cpus. As it happens, the
    load_balancer() is the one that gets the huge vruntime increase, so
    it doesn't get to run until no other thread wants to run. Because we start
    with both CPU-bound threads on the same CPU, and these hardly yield the
    CPU (and even more rarely are the two threads sleeping at the same time),
    the load balancer thread on this CPU doesn't get to run, and the two threads
    remain on the same CPU, giving us halved performance (2-cpu performance
    identical to 1-cpu performance) and on the host we see qemu using 100% cpu,
    instead of 200% as expected with two vcpus.