Skip to content
Snippets Groups Projects
  1. Jul 18, 2013
    • Dor Laor's avatar
      d3421d25
    • Nadav Har'El's avatar
      Micro-benchmark for waking condvar on which no-one is waiting · 0080ee69
      Nadav Har'El authored
      This patch adds to tst-condvar two benchmark for measuring
      condvar::wake_all() on a condvar that nobody is waiting on.
      
      The first benchmark does these wakes from a single thread, measuring
      26ns before commit 3509b19b, and
      only 3ns after it.
      
      The second benchmark does wake_all() loops from two threads on two
      different CPUs. Before the aforementioned commit, this frequently
      involved a contented mutex and context switches, with as much as
      30,000 ns delay. After that commit, this benchmark measures 3ns,
      the same as the single-threaded benchmark.
      0080ee69
    • Nadav Har'El's avatar
      Improve performance of unwaited condvar_wake_one()/all() · 3509b19b
      Nadav Har'El authored
      Previously, condvar_wake_one()/all() took the condvar's internal lock
      before testing if anyone is waiting; A condvar_wake when nobody was
      waiting was mutex_lock()+mutex_unlock() time (on my machine, 26 ns)
      when there is no contention, but much much higher (involving a context
      switch) when several CPUs are trying condvar_wake concurrently.
      
      In this patch, we first test if the queue head is null before
      acquiring the lock, and only acquire the lock if it isn't.
      Now the condvar_wake-on-an-empty-queue micro-benchmark (see next patch)
      takes less than 3ns - regardless of how many CPUs are doing it
      concurrently.
      
      Note that the queue head we test is NOT atomic, and we do not
      use any memory fences. If we read the queue head and see there 0,
      it is safe to decide nobody is waiting and do nothing. But if we
      read the queue head and see != 0, we can't do anything with the
      value we read - it might be only half-set (if the pointer is not
      atomic on this architecture) or be set but the value it points
      to is not (we didn't use a memory fence to enforce any ordering).
      So if we see the head is != 0, we need to acquire the lock (which
      also imposes the required memory visibility and ordering) and try
      again.
      3509b19b
  2. Jul 17, 2013
  3. Jul 15, 2013
  4. Jul 12, 2013
  5. Jul 11, 2013
Loading