Skip to content
Snippets Groups Projects
  1. Jul 28, 2013
  2. Jul 27, 2013
  3. Jul 25, 2013
    • Nadav Har'El's avatar
      README: better instructions on how to run zfs-fuse on Fedora · dec50e17
      Nadav Har'El authored
      Make will not fail if zfs-fuse is missing (it will just print some
      strange messages), but the resulting image will fail. It would be
      nice if the makefile checked that our use of zfs-fuse is actually
      working, but let's start with instructions on the README on how to
      enable it in a way that survives reboot :-)
      dec50e17
    • Nadav Har'El's avatar
      Condvar: Don't context switch between two unlocks · 2114b426
      Nadav Har'El authored
      Before waiting, condvar_wait releases two locks - the user's lock and its
      internal lock. If we reschedule after the first unlock, a waiting thread
      may start running, and hang when it also does condvar_wait. So let's use
      preempt_disable/enable around the two locks.
      
      This patch improves single-CPU performance of the cond-perf benchmark by
      about 10%, but this is an extreme case (cond-perf tries to do condvar_wait
      almost immediately after waking up from its previous wait).
      2114b426
    • Nadav Har'El's avatar
      Condvar: Do not hold the condvar's internal lock during a wake_all() · 4f6303f1
      Nadav Har'El authored
      During a wake_all() of a large number of threads, we hold the condvar's
      internal lock throughout the loop of waking up the threads. The reason we
      need this lock is timeouts: when a condvar_wait times out, it needs to
      remove its wait_record from the queue, and it can potentially happen
      concurrently with a wake_all(), so we don't want items being deleted from
      the list while we're scanning it.
      
      This patch protects against this concurrency in a different way, not
      requiring to hold the internal mutex for a long period of time. Under
      the lock, it just grabs the whole wait list (copies the head and resets
      it to zero), and then walks this list and wakes the wait_records on it
      without the lock. When a timeout occurs, condvar_wait() grabs the lock
      and looks for its wait record in the queue; If this happened before
      condvar_wake_all() zeroed the list, it will succeed in removing itself
      from the list. But if the wait_record is not on the list, it means
      condvar_wake_all() already zeroed the list, but we don't know if it
      already woke the wait_record, or just about to do so, so we can't
      return from condvar_wait() just yet - we need to wait() on this
      wait_record. This wait() will either return immediately, or wait until
      the wake loop in condvar_wake_all() gets to this record and the lock
      we need is taken for us.
      
      The idea is that we improve a common case (waking many threads) and
      only suffer (the potential added wait) in a rare case of a race between
      a wake and a timeout.
      
      Despite these advantages, it is dubious how helpful this patch is in
      real scenarios. First of all, holding the internal mutex during a wake
      only hurts when the woken up threads access the condvar, and this usually
      happens only if the work done by the woken-up thread is so trivial that
      it immediately gets to condvar->wait() again - and it's not clear that this
      is an interesting scenario.
      Second, if the wake_all() is anyway called with the user mutex locked
      (as it is often does in our code), none of the woken threads can actually
      wake up until wake_all() is done, so no code will try to touch the condvar
      while it is locked for the entire duration of wake_all().
      It seems that this patch is only likely to help in artificial benchmarks
      like glibc's cond-perf.c
      4f6303f1
    • Nadav Har'El's avatar
      Condvar: Allow condvar->wait() to take mutex reference · 9a085f36
      Nadav Har'El authored
      As a convenience, overload condvar->wait() to also take a mutex reference,
      not just a mutex pointer.
      9a085f36
    • Nadav Har'El's avatar
      Condvar: Add some tracepoints · f7d6e269
      Nadav Har'El authored
      f7d6e269
    • Nadav Har'El's avatar
      Condvar: Order the condvar_wake_all wakeups by CPU · ac76d75a
      Nadav Har'El authored
      Because of wait morphing, we now have full control of the thread wakeup
      order on condvar_wake_all() - they are all be queued in the mutex's
      wait queue in a certain order, and woken up one by one in that order.
      
      Posix Threads leaves this order undetermined, saying that "the scheduling
      policy shall determine the order". In this patch we improve wakeup
      performance by ordering the wakeups by CPU: When a thread on some CPU
      wakes up (by its unlock()) another thread on the same CPU, it is faster
      than waking up a thread on another CPU.
      ac76d75a
    • Nadav Har'El's avatar
      Condvar: Make thread::tcpu() a const function · ccc911b6
      Nadav Har'El authored
      thread::tcpu() doesn't change the thread object, so let's mark it const,
      so it can be used on const sched::thread objects.
      
      We'll need it in the following patch, when we use it on wait_record.thread().
      ccc911b6
    • Nadav Har'El's avatar
      Condvar: Add wait morphing to condvar · aa3a6244
      Nadav Har'El authored
      This patch adds wait morphing to condvar:
      
      1. condvar->wake*() doesn't wake the thread to take the user mutex. Rather,
      it attempts to grab the lock for the sleeping thread, and if the lock is
      already taken, move the sleeping thread to wait on the mutex's queue,
      without waking the thread up.
      
      2. condvar->wait() now assumes that when it is woken up, it already has
      the mutex.
      
      Wait morphing reduces unnecessary context switches, and therefore improves
      performance, in two case:
      
      1. The "thundering herd" problem - when there are many threads waiting on
      the condvar, if  condvar->wake_all() wakes all of them, all will race to get
      the mutex and likely many of them will go back to sleep.
      
      2. The "locked wakeup" problem - when condvar_>wake*() is done with the user
      mutex locked (as it is very often does), if we wake a waiter to take the
      lock, it may find the lock already held (by the waker) and go back to sleep.
      aa3a6244
    • Nadav Har'El's avatar
      Condvar: Remember the mutex that the user associated with a condvar · 45dc4dcc
      Nadav Har'El authored
      Until now, we allowed each condvar->wait() call to specify a different
      mutex. To support "wait morphing", we need the condvar's waker to know
      which mutex the waiter wants to lock; If it can be a different mutex
      for each wait, we'll need to remember the mutex in wait_record. Adding
      a field to condvar's wait_record but not to mutex's wait_record is possible
      but quite ugly. This patch implements a simpler solution:
      
      In practice, condvar users normally "associate" a single mutex with a
      condvar, and use just it when wait()ing on the condvar. Posix threads
      even officially supports only this use case, and pthread_cond_wait(3p)
      states that "using more than one mutex for concurrent ... pthread_cond_wait
      operations on the same condition variable is undefined".
      
      So this patch remembers for a condvar the last mutex used in condvar->wait(),
      and in a later patch we will use it to implement wait morphing: a wake()
      will take this mutex, instead of waking up the thread to take it. We add
      assertions that verify that this assumption is not broken by the user.
      
      The price we pay for this simplicity is the new assumption on the single
      mutex per condvar, and adding 8 more bytes to the size of a condvar.
      45dc4dcc
    • Nadav Har'El's avatar
      Condvar: Move unlock of user mutex · 7351461a
      Nadav Har'El authored
      Move the unlocking of the user's mutex in condvar_wait() a bit earlier,
      while we still hold the condvar's internal mutex.
      
      This does not change correctness, but it is needed for the wait morphing
      protocol, where we assume that once condvar_wake() finds this thread's
      wait_record (which can happen as soon as we release the internal mutex),
      we are no longer holding the user mutex.
      7351461a
Loading