Skip to content
Snippets Groups Projects
  1. Dec 15, 2013
    • Nadav Har'El's avatar
      Fix race between join() and thread completion · 649654af
      Nadav Har'El authored
      
      thread::destroy() had a "FIXME" comment:
      // FIXME: we have a problem in case of a race between join() and the
      // thread's completion. Here we can see _joiner==0 and not notify
      // anyone, but at the same time join() decided to go to sleep (because
      // status is not yet status::terminated) and we'll never wake it.
      
      This is indeed a bug, which Glauber discovered was hanging the
      tst-threadcomplete.so test once in a while - the test sometimes hangs
      with one thread in the "terminated" state (waiting for someone to join
      it), and a second thread waiting in join() but missed the other thread's
      termination event.
      
      The solution works like this:
      
      join() uses a CAS to set itself as the _joiner. If it succeeded, it
      waits like before for the status to become "terminated". But if the CAS
      failed, it means a concurrent destroy() call beat us at the race, and we
      can just return from join().
      
      destroy() checks (with a CAS) if _joiner was already set - if so we need
      to wake this thread just like in the original code. But if _joiner was
      not yet set, either there is no-one doing join(), or there's a concurrent
      join() call that will soon return (this is what the joiner does when it
      loses the CAS race). In this case, all we need to do is to set the status
      to "terminated" - and we must do it through a _detached_state we saved
      earlier, because if join() already returned the thread may already be
      deleted).
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      649654af
    • Nadav Har'El's avatar
      Fix wake_with() · a6bbd0e7
      Nadav Har'El authored
      
      wake_with(action) was implemented using thread_handle, as the following:
      
      thread_handle h(handle());
      action();
      h.wake();
      
      This implementation is wrong: It only takes the RCU lock (which prevents
      the destruction of _detached_state) during h.wake(), meaning that if the
      thread is not sleeping, and action() causes it to exit, _detached_state
      may also be destructed, and h.wake() will crash.
      
      thread_handle is simply not needed for wake_with(), and was designed
      with a completely different use case in mind (long-term holding of a
      thread handler). We just need to use, in-line, the appropriate rcu
      lock which keeps _detached_state alive. The resulting code is even
      simpler, and nicely parallels the existing code of wake().
      
      This patch fixes a real bug, but unfortunately we don't have a concrete
      test-case which it is known to fix.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      a6bbd0e7
    • Nadav Har'El's avatar
      Add rcu_lock_in_preempt_type · 9f0e1287
      Nadav Har'El authored
      
      Add a new lock, "rcu_read_lock_in_preempt_disabled", which is exactly
      like rcu_read_lock but assuming that preemption is already disabled.
      Because all our rcu_read_lock does is to disable preemption, the new
      lock type currently does absolutely nothing - but in some future
      implementation of RCU it might need to do something.
      
      We'll use the new lock type in the following patch, as an optimization
      over the regular rcu_read_lock.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      9f0e1287
    • Glauber Costa's avatar
      enable interrupts during page fault handling · ec7ed8cd
      Glauber Costa authored
      
      Context: going to wait with irqs_disabled is a call for disaster.  While it is
      true that not every time we call wait we actually end up waiting, that should
      be an invalid call, due to the times we may wait. Because of that, it would
      be good to express that nonsense in an assertion.
      
      There is however, places we sleep with irqs disabled currently. Although they
      are technically safe, because we implicitly enable interrupts, they end up
      reaching wait() in a non-safe state. That happens in the page fault handler.
      Explicitly enabling interrupts will allow us to test for valid / invalid wait
      status.
      
      With this test applied, all tests in our whitelist still passes.
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      ec7ed8cd
  2. Dec 13, 2013
  3. Dec 12, 2013
  4. Dec 11, 2013
  5. Dec 10, 2013
    • Nadav Har'El's avatar
      Fix shared-object finalization · 4d24b90a
      Nadav Har'El authored
      
      This patch fixes two bugs in shared-object finalization, i.e., running
      its static destructors before it is unloaded. The bugs were seen when
      osv::run()ing a test program using libboost_unit_test_framework-mt.so,
      which crashed after the test program finished.
      
      The two related bugs were:
      
      1. We need to call the module's destructors (run_fini_funcs()) *before*
         removing it from the module list, otherwise the destructors will not
         be able to call functions from this module! (we got a symbol not
         found error in the destructor).
      
      2. We need to unload the modules needed by this module *before* unloading
         this module, not after like was (implictly) done until now.
         This makes sense because of symmetry (during a module load, the needed
         modules are loaded after this one), but also practically: a needed
         module's destructor (in our case, boost unit test framework) might refer
         to objects in the needing module (in our case, the test program),
         so we cannot call the needed module's destructor after we've already
         unloaded the needing module.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      4d24b90a
    • Juan Antonio Osorio's avatar
    • Pekka Enberg's avatar
      test.py: add '--repeat' option · 3acafce2
      Pekka Enberg authored
      
      Add a '--repeat' option to test.py that repeats the test suite until a
      test fails.  This is useful for detecting test cases that fail some of
      the time.
      
      Reviewed-by: default avatarTomasz Grabiec <tgrabiec@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      3acafce2
    • Pekka Enberg's avatar
      test.py: Make output pretty and show duration · fbf2a946
      Pekka Enberg authored
      
      Make the test runner output look pretty and show test duration to make
      it visible which tests take the longest time to run.  The output looks
      as follows now:
      
          TEST tst-af-local.so           OK  (3.288 s)
          TEST tst-bdev-write.so         OK  (1.058 s)
          TEST tst-bsd-evh.so            OK  (1.071 s)
          TEST tst-bsd-kthread.so        OK  (1.234 s)
          TEST tst-bsd-taskqueue.so      OK  (1.062 s)
          TEST tst-bsd-tcp1.so           OK  (2.114 s)
          TEST tst-commands.so           OK  (1.141 s)
          TEST tst-condvar.so            OK  (1.776 s)
          TEST tst-dns-resolver.so       OK  (2.560 s)
          TEST tst-epoll.so              OK  (1.952 s)
          TEST tst-except.so             OK  (1.146 s)
          TEST tst-fpu.so                OK  (2.630 s)
          TEST tst-fs-link.so            OK  (1.051 s)
          TEST tst-fs-stress.so          OK  (1.027 s)
          TEST tst-fsx.so                OK  (1.067 s)
          TEST tst-hub.so                OK  (6.256 s)
          TEST tst-huge.so               OK  (2.199 s)
          TEST tst-kill.so               OK  (4.147 s)
          TEST tst-libc-locking.so       OK  (2.110 s)
          TEST tst-loadbalance.so        OK  (1.070 s)
          TEST tst-mmap-file.so          OK  (1.080 s)
          TEST tst-mmap.so               OK  (1.087 s)
          TEST tst-pipe.so               OK  (7.306 s)
          TEST tst-preempt.so            OK  (1.119 s)
          TEST tst-pthread.so            OK  (1.100 s)
          TEST tst-queue-mpsc.so         OK  (3.748 s)
          TEST tst-ramdisk.so            OK  (1.078 s)
          TEST tst-readdir.so            OK  (1.094 s)
          TEST tst-remove.so             OK  (1.030 s)
          TEST tst-rename.so             OK  (1.157 s)
          TEST tst-resolve.so            OK  (1.095 s)
          TEST tst-scheduler.so          OK  (1.087 s)
          TEST tst-sleep.so              OK  (3.083 s)
          TEST tst-solaris-taskq.so      OK  (1.061 s)
          TEST tst-stat.so               OK  (1.106 s)
          TEST tst-strerror_r.so         OK  (1.102 s)
          TEST tst-tcp-sendonly.so       OK  (2.014 s)
          TEST tst-tcp.so                OK  (1.080 s)
          TEST tst-threadcomplete.so     OK  (2.770 s)
          TEST tst-tracepoint.so         OK  (1.109 s)
          TEST tst-truncate.so           OK  (1.083 s)
          TEST tst-utimes.so             OK  (1.079 s)
          TEST tst-vblk.so               OK  (1.310 s)
          TEST tst-vfs.so                OK  (1.118 s)
          TEST tst-yield.so              OK  (1.992 s)
          TEST tst-zfs-mount.so          OK  (1.087 s)
        OK (58 tests run, 82.944 s)
      
      Reviewed-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      fbf2a946
Loading