- Dec 11, 2013
-
-
Pekka Enberg authored
Make vma constructors more strongly typed by using the addr_range type. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Separate the common vma code to an abstract base class that's inherited by anon_vma and file_vma. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
We have recently seen a problems where eventual page fault outside application would occur. I managed to track that down to my huge page failure patch, but wasn't really sure what was going on. Kudos for Raphael, then, that figured out that the problem happened when allocate_intemediate_level was called from split_huge_page. The problem here, is that in that case we do *not* enter allocate_intermediate_level with the pte emptied, and were previously expecting the write of the new pte to happen unconditionally. The compare_exchange broke it, because the exchange doesn't really happen. There are many ways to fix this issue, but the least confusing of them, given that there are other callers to this function that could potentially display this problem, is to do some deffensive programming and clearly separate the semantics of both types of callers. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Tested-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Once page_fault() checks that this is not a fast fixup (see safe_load()), we reach the page-fault slow path, which needs to allocate memory or even read from disk, and might sleep. If we ever get such a slow page-fault inside kernel code which has preemption or interrupts disabled, this is a serious bug, because the code in question thinks it cannot sleep. So this patch adds two assertions to verify this. The preemptable() assertion is easily triggered if stacks are demand-paged as explained in commit 41efdc1c (I have a patch to solve this, but it won't fit in the margin). However, I've also seen this assertion without demand-paged stacks, when running all tests together through testrunner.so. So I'm hoping these assertions will be helpful in hunting down some elusive bugs we still have. This patch adds a third use of the "0x200" constant (the nineth bit of the rflags register is the interrupt flag), so it replaces them by a new symbolic name, processor::rflags_if. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
We suddenly stop propagating the exception frame down the vma_fault path. There is no reason not to propagate it further, aside from the fact that currently there are no users. However, aside from the fact that it presents a more consistent frame passing, I intend to use it for the JVM balloon. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 10, 2013
-
-
Nadav Har'El authored
This patch fixes two bugs in shared-object finalization, i.e., running its static destructors before it is unloaded. The bugs were seen when osv::run()ing a test program using libboost_unit_test_framework-mt.so, which crashed after the test program finished. The two related bugs were: 1. We need to call the module's destructors (run_fini_funcs()) *before* removing it from the module list, otherwise the destructors will not be able to call functions from this module! (we got a symbol not found error in the destructor). 2. We need to unload the modules needed by this module *before* unloading this module, not after like was (implictly) done until now. This makes sense because of symmetry (during a module load, the needed modules are loaded after this one), but also practically: a needed module's destructor (in our case, boost unit test framework) might refer to objects in the needing module (in our case, the test program), so we cannot call the needed module's destructor after we've already unloaded the needing module. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
As ref() is now never called, we can remove the reference counter and make unref() unconditional. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
One problem with wake() is, if the thread that it is waking can cuncurrently exit, that it may touch freed memory belonging to the thread structure. Fix by separating the state that wake() touches into a detached_state structure, and free that using rcu. Add a thread_handle class that references only this detached state, and accesses it via rcu. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
rcu_read_lock disables preemption, but this is an implementation detail and users should not make use of it. Add preempt_lock_in_rcu that takes advantage of the implementation detail and does nothing, but allows users to explicitly disable preemption. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
When seeing this flag, pages fault in should not be filled with zeroes or any other patterns, and should rather be just left alone in whatever state we find them at. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 09, 2013
-
-
Glauber Costa authored
Addressing that FIXME, as part of my memory reclamation series. But this is ready to go already. The goal is to retry to serve the allocation if a huge page allocation fails, and fill the range with the 4k pages. The simplest and most robust way I've found to do that was to propagate the error up until we reach operate(). Being there, all we need to do is to re-walk the range with 4k pages instead of 2Mb. We could theoretically just bail out on huge pages and move hp_end, but, specially when we have reclaim, it is likely that one operation will fail while the upcoming ones may succeed. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> [ penberg: s/NULL/nullptr/ ] Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 08, 2013
-
-
Glauber Costa authored
I needed to call detach in a test code of mine, and this is isn't implemented. The code I wrote to use it may or may not stay in the end, but nevertheless, let's implement it. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Glauber Costa authored
set_cleanup is quite a complicated piece of code. It is very easy to get it to race with other thread destruction sites, which was made abundantly clear when we tried to implement pthread detach. This patch tries to make it easier, by restricting how and when set_cleanup can be called. The trick here is that currently, a thread may or may not have a cleanup function, and through a call to set_cleanup, our decision to cleanup may change. From this point on, set_cleanup will only tell us *how* to cleanup. If and when, is a decision that we will make ourselves. For instance, if a thread is block-local, the destructor will be called by the end of the block. In that case, the _cleanup function will be there anyhow: we'll just not call it. We're setting here a default cleanup function for all created threads, that just deletes the current thread object. Anything coming from pthread will try to override it by also deleting the pthread object. And again, it is important to node that they will set up those cleanup function unconditionally. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Glauber Costa authored
Linux uses a 32-bit integer for pid_t, so let's do it as well. This is because there are function in which we have to return our id back to the application. One application is gettid, that we already have in the tree. Theoretically, we could come up with a mapping between our 64-bit id and the Linux one, but since we have to maintain the mapping anyway, we might as well just use the Linux pids as our default IDs. The max size for that is 32-bit. It is not enough if we're just allocating pids by bumping the counter, but again, since we will have to maintain the bitmaps, 32-bit will allow us as much as 4 billion PIDs. avi: remove unneeded #include Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Glauber Costa authored
Right now we are taking a clock measure very early for cpu initialization. That forces an unnecessary dependency between sched and clock initializations. Since that lock is used to determine for how long the cpu has been running, we can initialize the runtime later, when we init the idle thread. Nothing should be running before it. After doing this, we can move the sched initialization a bit earlier. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 06, 2013
-
-
Asias He authored
Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Benoît Canet authored
The exact location of the stack end is not needed by java so move back this variable to restore the state to what was done before the mkfs.so/cpiod.so split. Signed-off-by:
Benoit Canet <benoit@irqsave.net> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 05, 2013
-
-
Avi Kivity authored
Some objects have DPTMOD64 relocations with the null symbol, presumably to set the value to 0 (it is too much trouble to write zero into the file during the link phase, apparently). Detect this condition and write the zero. Needed by JDK8. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
A list can be slow to search for an element if we have many threads. Even under normal load, the number of threads we span may not be classified as huge, but it is not tiny either. Change it to a map so we can implement functions that operate on a given thread without that much overhead - O(1) for the common case. Note that ideally we would use an unordered_set, that doesn't require an extra key. However, that would also mean that the key is implicit and set to be of type key_type&. Threads are not very lightweight to create for search purposes, so we go for a id-as-key approach. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Benoît Canet authored
This restore the original behavior of osv::run in place before the mkfs.so and cpiod.so split committed a day ago. Signed-off-by:
Benoit Canet <benoit@irqsave.net> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 04, 2013
-
-
Nadav Har'El authored
When source code is compiled with -D_FORTIFY_SOURCE on Linux, various functions are sometimes replaced by __*_chk variants (e.g., __strcpy_chk) which can help avoid buffer overflows when the compiler knows the buffer's size during compilation. If we want to run source compiled on Linux with -D_FORTIFY_SOURCE (either deliberately or unintentionally - see issue #111), we need to implement these functions otherwise the program will crash because of a missing symbol. We already implement a bunch of _chk functions, but we are definitely missing some more. This patch implements 6 more _chk functions which are needed to run the "rogue" program (mentioned in issue #111) when compiled with -D_FORTIFY_SOURCE=1. Following the philosophy of our existing *_chk functions, we do not aim for either ultimate performance or iron-clad security for our implementation of these functions. If this becomes important, we should revisit all our *_chk functions. When compiled with -D_FORTIFY_SOURCE=2, rogue still doesn't work, but not because of a missing symbol, but because it fails reading the terminfo file for a yet unknown reason (a patch for that issue will be sent separately). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 03, 2013
-
-
Raphael S. Carvalho authored
Besides simplifying mmu::map_file interface, let's make it more similar to mmu::map_anon. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Benoît Canet authored
A ';' at the end of a parameter mark the end of a program's arguments list. The goal of this patch is to be able to split mkfs.so in to parts mkfs.so and cpiod.so. The patch uses a full spirit parser to escape "" and split commands around ';'. Signed-off-by:
Benoit Canet <benoit@irqsave.net> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 01, 2013
-
-
Nadav Har'El authored
Before this patch, OSv crashes or continuously reboots when given unknown command line paramters, e.g., scripts/run.py -c1 -e "--help --z a" With this patch, it says, as expected that the "--z" option is not recognized, and displays the list of known options: unrecognised option '--z' OSv options: --help show help text --trace arg tracepoints to enable --trace-backtrace log backtraces in the tracepoint log --leak start leak detector after boot --nomount don't mount the file system --noshutdown continue running after main() returns --env arg set Unix-like environment variable (putenv()) --cwd arg set current working directory Aborted The problem was that to parse the command line options, we used Boost, which throws an exception when an unrecognized option is seen. We need to catch this exception, and show a message accordingly. But before this patch, C++ exceptions did not work correctly during this stage of the boot process, because exceptions use elf::program(), and we only set it up later. So this patch moves the setup of the elf::program() object earlier in the boot, to the beginning of main_cont(). Now we'll be able to use C++ exceptions throughout main_cont(), not just in command line parsing. This patch also removes the unused "filesystem" paramter of elf::program(), rather than move the initializion of this empty object as well. Fixes #103. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 28, 2013
-
-
Nadav Har'El authored
Due to an unknown bug (bug in our HPET driver?), OSv crashed on our build machine with an assertion that the clock only goes foward. The clock jumping backward is a bad sign, but it's not really necessary to crash the VM when it happens, assuming it only happens rarely. This patch makes the scheduler handle time<0 just like time==0. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 27, 2013
-
-
Nadav Har'El authored
Fix a couple of spelling mistakes in core/lfmutex.cc Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Nov 26, 2013
-
-
Nadav Har'El authored
This patch replaces the algorithm which the scheduler uses to keep track of threads' runtime, and to choose which thread to run next and for how long. The previous algorithm used the raw cumulative runtime of a thread as its runtime measure. But comparing these numbers directly was impossible: e.g., should a thread that slept for an hour now get an hour of uninterrupted CPU time? This resulted in a hodgepodge of heuristics which "modified" and "fixed" the runtime. These heuristics did work quite well in our test cases, but we were forced to add more and more unjustified heuristics and constants to fix scheduling bugs as they were discovered. The existing scheduler was especially problematic with thread migration (moving a thread from one CPU to another) as the runtime measure on one CPU was meaningless in another. This bug, if not corrected, (e.g., by the patch which I sent a month ago) can cause crucial threads to acquire exceedingly high runtimes by mistake, and resulted in the tst-loadbalance test using only one CPU on a two-CPU guest. The new scheduling algorithm follows a much more rigorous design, proposed by Avi Kivity in: https://docs.google.com/document/d/1W7KCxOxP-1Fy5EyF2lbJGE2WuKmu5v0suYqoHas1jRM/edit?usp=sharing To make a long story short (read the document if you want all the details), the new algorithm is based on a runtime measure R which is the running decaying average of the thread's running time. It is a decaying average in the sense that the thread's act of running or sleeping in recent history is given more weight than its behavior a long time ago. This measure R can tell us which of the runnable threads to run next (the one with the lowest R), and using some highschool-level mathematics, we can calculate for how long to run this thread until it should be preempted by the next one. R carries the same meaning on all CPUs, so CPU migration becomes trivial. The actual implementation uses a normalized version of R, called R'' (Rtt in the code), which is also explained in detail in the document. This Rtt allows updating just the running thread's runtime - not all threads' runtime - as time passes, making the whole calculation much more tractable. The benefits of the new scheduler code over the existing one are: 1. A more rigourous design with fewer unjustified heuristics. 2. A thread's runtime measurement correctly survives a migration to a different CPU, unlike the existing code (which sometimes botches it up, leading to threads hanging). In particular, tst-loadbalance now gives good results for the "intermittent thread" test, unlike the previous code which in 50% of the runs caused one CPU to be completely wasted (when the load- balancing thread hung). 3. The new algorithm can look at a much longer runtime history than the previous algorithm did. With the default tau=200ms, the one-cpu intermittent thread test of tst-scheduler now provides good fairness for sleep durations of 1ms-32ms. The previous algorithm was never fair in any of those tests. 4. The new algorithm is more deterministic in its use of timers (with thyst=2_ms: up to 500 timers a second), resulting in less varied performance in high-context-switch benchmarks like tst-ctxsw. This scheduler does very well on the fairness tests tst-scheduler and fairly well on tst-loadbalance. Even better performance on that second test will require an additional patch for the idle thread to wake other cpus' load balanacing threads. As expected the new scheduler is somewhat slower than the existing one (as we now do some relatively complex calculations instead of trivial integer operations), but thanks to using approximations when possible and to various other optimizations, the difference is relatively small: On my laptop, tst-ctxsw.so, which measures "context switch" time (actually, also including the time to use mutex and condvar which this test uses to cause context switching), on the "colocated" test I measured 355 ns with the old scheduler, and 382 ns with the new scheduler - meaning that the new scheduler adds 27ns of overhead to every context switch. To see that this penalty is minor, consider that tst-ctxsw is an extreme example, doing 3 million context switches a second, and even there it only slows down the workload by 7%. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
The schedule() and cpu::schedule() functions had a "yield" parameter. This parameter was inconsistently used (it's not clear why specific places called it with "true" and other with "false"), but moreover, was always ignored! So this patch removes the parameter of schedule(). If you really want a yield, call yield(), not schedule(). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
The idle thread cpu::idle() waits for other threads to become runnable, and then lets them run. It used to yield the CPU by calling yield(), because in early OSv history we didn't have an idle priority so simply calling schedule() would not guarantee that the new thread, not the idle thread, will run. But now we actually do have an idle priority; If the run queue is not empty, we are sure that calling schedule() will run another thread, not the idle thread. So this patch calls schedule(), which is simpler, faster, and more reliable than yield(). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
The scheduler (reschedule_from_interrupt()) changes the runtime of the current thread. This assumes that the current thread is not in the runqueue - because the runqueue is sorted by runtime, and modifying the runtime of a thread which is already in the runqueue ruins the sorted tree's invariants. Unfortunately, the existing code broke this assumption in two places: 1. When handle_incoming_wakeups() wakes up the current thread (i.e., a thread that prepared to wait but was woken before it could go to sleep), the current thread was queued. We need to instead to simply return the thread to the "running" state. 2. yield() queued the current thread. Rather, it needs to just change its runtime, and reschedule_from_interrupt() will decide to queue this thread. This patch fixes the first problem. The second problem will be solved by a yield() rewrite which is part of the new scheduler in a later patch. By the way, after we fix both problems, we can also be sure that the strange if(n != thread::current()) in the scheduler is always true. This is because n, picked up from the run queue, could never be the current thread, because the current thread isn't in the run queue. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Nov 25, 2013
-
-
Pekka Enberg authored
Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Switch to demand paging for anonymous virtual memory. I used SPECjvm2008 to verify performance impact. The numbers are mostly the same with few exceptions, most visible in the 'serial' benchmark. However, there's quite a lot of variance between SPECjvm2008 runs so I wouldn't read too much into them. As we need the demand paging mechanism and the performance numbers suggest that the implementation is reasonable, I'd merge the patch as-is and see optimize it later. Before: Running specJVM2008 benchmarks on an OSV guest. Score on compiler.compiler: 331.23 ops/m Score on compiler.sunflow: 131.87 ops/m Score on compress: 118.33 ops/m Score on crypto.aes: 41.34 ops/m Score on crypto.rsa: 204.12 ops/m Score on crypto.signverify: 196.49 ops/m Score on derby: 170.12 ops/m Score on mpegaudio: 70.37 ops/m Score on scimark.fft.large: 36.68 ops/m Score on scimark.lu.large: 13.43 ops/m Score on scimark.sor.large: 22.29 ops/m Score on scimark.sparse.large: 29.35 ops/m Score on scimark.fft.small: 195.19 ops/m Score on scimark.lu.small: 233.95 ops/m Score on scimark.sor.small: 90.86 ops/m Score on scimark.sparse.small: 64.11 ops/m Score on scimark.monte_carlo: 145.44 ops/m Score on serial: 94.95 ops/m Score on sunflow: 73.24 ops/m Score on xml.transform: 207.82 ops/m Score on xml.validation: 343.59 ops/m After: Score on compiler.compiler: 346.78 ops/m Score on compiler.sunflow: 132.58 ops/m Score on compress: 116.05 ops/m Score on crypto.aes: 40.26 ops/m Score on crypto.rsa: 206.67 ops/m Score on crypto.signverify: 194.47 ops/m Score on derby: 175.22 ops/m Score on mpegaudio: 76.18 ops/m Score on scimark.fft.large: 34.34 ops/m Score on scimark.lu.large: 15.00 ops/m Score on scimark.sor.large: 24.80 ops/m Score on scimark.sparse.large: 33.10 ops/m Score on scimark.fft.small: 168.67 ops/m Score on scimark.lu.small: 236.14 ops/m Score on scimark.sor.small: 110.77 ops/m Score on scimark.sparse.small: 121.29 ops/m Score on scimark.monte_carlo: 146.03 ops/m Score on serial: 87.03 ops/m Score on sunflow: 77.33 ops/m Score on xml.transform: 205.73 ops/m Score on xml.validation: 351.97 ops/m Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Use optimistic locking in populate() to make it robust against concurrent page faults. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Add permission flags to VMAs. They will be used by mprotect() and the page fault handler. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
We iterate over the timer list using an iterator, but the timer list can change during iteration due to timers being re-inserted. Switch to just looking at the head of the list instead, maintaining no state across loop iterations. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Tested-by:
Pekka Enberg <penberg@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
When a hardware timer fires, we walk over the timer list, expiring timers and erasing them from the list. This is all well and good, except that a timer may rearm itself in its callback (this only holds for timer_base clients, not sched::timer, which consumes its own callback). If it does, we end up erasing it even though it wants to be triggered. Fix by checking for the armed state before erasing. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Tested-by:
Pekka Enberg <penberg@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
When a condvar's timeout and wakeup race, we wait for the concurrent wakeup to complete, so it won't crash. We did this wr.wait() with the condvar's internal mutex (m) locked, which was fine when this code was written; But now that we have wait morphing, wr.wait() waits not just for the wakeup to complete, but also for the user_mutex to become available. With m locked and us waiting for user_mutex, we're now in deadlock territory - because a common idiom of using a condvar is to do the locks in opposite order: lock user_mutex first and then use the condvar, which locks m. I can't think of an easy way to actually demonstrate this deadlock, short of having a locked condvar_wait timeout racing with condvar_wake_one racing and then an additional locked condvar operation coming in concurrently, but I don't have a test case demonstrating this. I am hoping it will fix the lockups that Pekka is seeing in his Cassandra tests (which are the reason I looked for possible condvar deadlocks in the first place). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Tested-by:
Pekka Enberg <penberg@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-