From 682e57fd3af8decaf230139f58429fa76f9ec5a0 Mon Sep 17 00:00:00 2001 From: Nadav Har'El <nyh@cloudius-systems.com> Date: Sat, 25 May 2013 18:05:20 +0300 Subject: [PATCH] todo: add todo/mutex Things we still need to do to use the lockfree mutex --- todo/mutex | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 todo/mutex diff --git a/todo/mutex b/todo/mutex new file mode 100644 index 000000000..b1f2eb4b5 --- /dev/null +++ b/todo/mutex @@ -0,0 +1,68 @@ +Replace spinlock-based mutex by lockfree mutext +=============================================== + +<lockfree/mutex.hh> seems functional, but to replace the spinlock-based mutex +with it, we'll should do the following: + +1. Make the structure smaller +----------------------------- +It can be 36 bytes if we also make condvar (which contains a mutex) smaller, +or if we make it 32 bytes, we don't need to change condvar. + +Some ideas on how to make the structure smaller: +1. Make sequence, handoff, and/or depth 16-bit (in osv/mutex.h depth is + already 16-bit). +2. Make queue_mpsc's two pointers 32-bit. On thread creation, give each + thread a 32-bit pointer (or a recycled thread_id - see below - indexing + a global array) which can be used instead of putting the wait struct on + the stack. Or perhaps we can put all stacks in the low 32 bit? +3. Do the same to the two pointers in condvar to make condvar smaller too, +4. Have a new recycled low (32-bit) numeric "threadid" and use it for owner + instead of a 64-bit pointer. + +2. More testing +--------------- + +Write more tests for the lockfree mutex. The most difficult part of the +algorithm, the "handoff", happens only when the queue is empty, so the +best chance to see this in action would probably be to test with only two +pinned threads. + +3. Memory ordering +------------------ + +Using sequential memory ordering for all atomic variables is definitely +not needed, and significantly slows down the mutex. I started relaxing the +memory ordering, and saw a significant improvement in the uncontended case, +but I need to complete this work. + +4. Benchmark +------------ + +Write a benchmark for the uncontended case (done), and for some sort of +contended case, and compare its performance to the old spinlock and mutex. + +4. Clean up the code +-------------------- + +Don't put everything in the .h. See how we can most as much as possible to +the .cc, without hurting performance. + +Also make the lockfree mutex usable from C. Think if we can do this with +the same type, as we did in <osv/mutex.h>. Perhaps we'll need to switch +from using the atomic<int> type to using just int and global std::atomic +functions. + +5. "Fishy" things to look at again +---------------------------------- + +Think - and *test* - the issue of spurious wake() coming from other code. +Replace the "lock guard" by an explicit prepare_wait(), and later replace +the schedule by a loop, doing a new prepare_wait() every time schedule() +returns when we're still not owner. + +Think and test: Write a "half lock" which increases count but doesn't add +anything to the queue. This causes every lock()/unlock() to use the handoff +protocol, allowing us to 1. test it. 2. see how much performance drops. +Consider the interesting theoretical problem: why should an uncompleted, +hung, lock, slow down now all the lock/unlock? Can't there be a better way? -- GitLab