- Jun 11, 2013
-
-
Nadav Har'El authored
Sorry, forgot one hunk in "git add -p" :(
-
Nadav Har'El authored
The way "owner" and "depth" were used in lockfree::mutex was messy. Ideally, neither should be needed if we implemented a non-recursive mutex, but following the design of ::mutex, we (re)used "owner" also as a marker that a thread was waken to have the lock (and it's not a spurious wake). After this patch, owner and depth are used in lockfree::mutex *only* for implementing a recursive mutex, and building a non-recursive mutex should be as simple as dropping these two variables. In more detail: 1. "owner" is no longer used to tell a woken up thread that the wake wasn't spurious. Instead, zero the thread in the wait-record. This is a familiar idiom, which we already used a few times before. 2. "depth" isn't an atomic variable, so it should only be read by the same thread which set it, and this wasn't the case previously. Now, depth is only ever written (set to 1, incremented or decremented) or read by the lock-holding thread - and not the lock releasing thread. 3. "owner" needs to be an atomic variable - a non-lock-holding thread needs to read it and recognize it isn't holding the lock - but it doesn't need any special memory ordering with other variables, so should always be accessed with "relaxed" memory ordering.
-
Nadav Har'El authored
Fixed a very rare hang in sched::thread::join(): thread::complete() included the following code: _status.store(status::terminated); if (_joiner) { _joiner->wake(); } If we are preempted right after setting status to "terminated", but before calling wake(), this thread will never be scheduled again (it will remain in the terminated status forever), and will never call wake() - so the join()ing thread may just wait forever. I saw this happening in a test case that started and joined millions of threads, and eventually the join() hangs. The solution is to enclose the above lines with preempt_disable()/ preempt_enable().
-
Nadav Har'El authored
wake() normally calls schedule(), but doesn't do so if preemption is disabled. So we should mark need_reschedule = true, to suggest that schedule() can be called when preemption is later enabled.
-
Avi Kivity authored
Due to the need to handle the x64 red zone, we use a separate stack for exceptions via the IST mechanism. This means that a nested exception will reuse the parent exception's stack, corrupting it. It is usually very hard to figure out the root cause when this happens. Prevent this by setting up a separate stack for nested exceptions, and aborting immediately if a nested exception happens.
-
- Jun 10, 2013
-
-
Avi Kivity authored
Used for implementing support for indirect functions referenced from shared libraries.
-
- Jun 09, 2013
-
-
Guy Zana authored
-
Guy Zana authored
controlling the logger output in tests is simplistic, it is enough to change the severity level to logger_error and all logging messages will appear.
-
Guy Zana authored
-
Avi Kivity authored
-
Nadav Har'El authored
Add some extern "C" versions of the lockfree::mutex methods. They will be necessary for providing the lockfree::mutex type to C code - as you'll see in later patches, C code will see an opaque type, a byte array, and will call these functions to operate on it.
-
Nadav Har'El authored
Do not include <sched.hh> in <lockfree/mutex.hh>. Including <sched.hh> creates annoying dependency loops when we (in a later patch) replace <osv/mutex.h> by <lockfree/mutex.hh>, and some header files included by <sched.hh> themselves use mutexes, so they include <osv/mutex.h>. This last include does nothing (because of the include guard) but the compiler never finished reading osv/mutex.h (it was only in its beginning, when it included sched.hh) so the inner-included code lacks the definitions it assumes after including mutex.h.
-
Nadav Har'El authored
Add to lockfree::mutex the simple owned() and getdepth() methods which existed in ::mutex and were used in a few places - so we need these methods to switch from ::mutex to lockfree::mutex.
-
Nadav Har'El authored
When I developed lockfree mutex, the scheduler, preemption, and related code still had a bunch of bugs, so I resorted to some workarounds that in hindsite look unnecessary, and even wrong. When it seemed that I can only wake() a thread in wait state, I made an effort to enter the waiting state (with "wait_guard") before adding the thread to the to-awake queue, and then slept with schedule(). The biggest problem with this approach was that a spurious wake(), for any reason, of this thread, would cause us to end the lock() - and fail on an assert that we're the owners of the lock - instead of repeating the wait. When switching to lockfree mutex, the sunflow benchmark (for example) would die on this assertion failure. So now I replaced this ugliness with our familiar idiom, wait_until(). The thread is in running state for some time after entering queue, so it might be woken when not yet sleeping and the wake() will be ignored - but this is fine because part of our protocol is that the wake() before waking also sets "owner" to the to-be-woken thread, and before sleeping we check if owner isn't already us. Also changed the comment on "owner" to say that it is not *just* for implementing a recursive mutex, but also nessary for the wakeup protocol.
-
- Jun 06, 2013
-
-
Guy Zana authored
we have to disable virio interrupts before msix EOI so disabling must be done in the ISR handler context. This patch adds an std::function isr to the bindings. references to the rx and tx queues are saved as well (_rx_queue and _tx_queue), so they can be used in the ISR context. this patch reduces virtio net rx interrupts by a factor of 450.
-
Avi Kivity authored
Place the tracepointv type in an anonymous namespace. This makes every translation unit have its own unique tracepoint types, so we only need to ensure uniqueness within a source file. Use the type's type_info to select the correct patch sites. Idea from Nadav.
-
- Jun 05, 2013
-
-
Avi Kivity authored
When a tracepoint is disabled, we want it to have no impact on running code. This patch changes the fast path to be a single 5-byte nop instruction. When a tracepoint is enabled, the nop is patched to a jump instruction to the out-of-line slow path.
-
Avi Kivity authored
In order to optimize the fast path of tracepoints, we need to patch the call sites to skip calling the slow path code completely. In turn, that requires that each call site be unique -- a separate function. In the current implementations, tracepoints with the same signature map to the same type. It would have been great to use the name as a discriminant (tracepoint<"sched_queue", thread*> trace_sched_queue(...);), but C++ does not support string literals as template arguments. We could do const char* trace_sched_queue_name = "sched_queue"; tracepoint<trace_sched_queue_name, thread*> trace_sched_queue(...); but that doubles the code for declaring a tracepoint. Add a unique ID instead (and code to verify it is unique).
-
Nadav Har'El authored
Until now, lockfree::mutex functions were entirely inline, which won't fly if we want to make it our default mutex. Change them to be out-of-line, implemented in a new source file core/lfmutex.cc. This has a slight performance impact - uncontended lock/unlock pair used to be 17ns, and is now 22ns. In the future we can get this performance back by making these functions partially inline (the uncontended case inline, the waiting case in a separate function), although we'll need to consider the speed/code-size tradeoff.
-
Nadav Har'El authored
When abort()ing on failed symbol lookup, if this is a C++ symbol, also show its demangled form, which in many case can be more helpful. Here is an example lookup failure now: failed looking up symbol _ZN8lockfree5mutex4lockEv (lockfree::mutex::lock()) Aborted
-
- Jun 04, 2013
-
-
Guy Zana authored
-
- Jun 02, 2013
-
-
Nadav Har'El authored
condvar_wait() wrongly dropped the condvar's internal lock too early, and accessed wr->t outside the lock, meaning that a concurrent wake() could race with it. This bug was exposed in one of the pipe() tests. This patch fixes this bug, by holding the internal lock throughout the execution of condvar_wait(), dropping it temporarily only while waiting.
-
- May 31, 2013
- May 29, 2013
-
-
Nadav Har'El authored
Say we also implement librt.so.1. This is required, for example, by the Boost libraries (e.g., libboost_system-mt.so.1.50.0). The librt library isn't actually a separate library on modern Linux - rather all its traditional functions are now in glibc, and "librt" is merely a filter on glibc. So no reason not to say we support librt too. Not to mention that we already implement a bunch of functions that traditionally resided in librt (nanosleep, sched_yield, sem_*, etc.
-
- May 27, 2013
-
-
Guy Zana authored
the debug() console function is taking a lock before it access the console driver, it does that by acquiring a mutex which may sleep. since we want to be able to debug (and abort) in contexts where it's not possible sleep, such as in page_fault, a lockless debug print method is introduced. previousely to this patch, any abort on page_fault would cause an "endless" recursive abort() loop which hanged the system in a peculiar state.
-
- May 26, 2013
-
-
Nadav Har'El authored
The comment about unlocking the irq_lock was put on the wrong line. Move it (and rephrase it a bit - the word "release" immediately after calling an unrelated release() function - is confusing).
-
Nadav Har'El authored
yield() had two bugs - thanks to Avi for pinpointing them: 1. It used runqueue.push_back() to put the thread on the run queue, but push_back() is a low-level function which can only be used if we're sure that the item we're pushing has the vruntime suitable for being *last* on the queue - and in the code we did nothing to ensure this is the case (we should...). So use insert_equal(), not push_back(). 2. It was wrongly divided into two separate sections with interrupts disabled. The scheduler code is run both at interrupt time (e.g., preempt()) and at thread time (e.g., wait(), yield(), etc.) so to guarantee it does not get called in the middle of itself, it needs to disable interrupts while working on the (per-cpu) runqueue. In the broken yield() code, we disabled interupts while adding the current thread to the run queue, and then again to reschedule. Between those two critical sections, an interrupt could arrive and do something with this thread (e.g., migrate it to another CPU, or preempt it), yet when the interrupt returns yield continues to run reschedule_from_interrupt which assumes that this thread is still running, and definitely not on the run queue. Bug 2 is what caused the crashes in the tst-yield.so test. The fix is to hold the interrupts disabled throughout the entire yield(). This is easiest done with with lock_guard, which simplifies the flow of this function.
-
Nadav Har'El authored
Because of Linux's calling convention, it should not be necessary to save the FPU state when a reschedule is caused by a function call. Because we had a bug and forgot to save the FPU state when calling a signal handler, and because this signal handler can cause a reschedule, we had to save the FPU on any reschedule. But after fixing that bug, we no longer need these unnecessary FPU saves. The "sunflow" benchmark still runs well after this patch.
-
Avi Kivity authored
This are used to support ifunc functions, which are resolved at load-time based on cpu features, rather than at link time.
-
Avi Kivity authored
If interrupts are disabled, we must not call schedule() even if the preemption counter says we need to, as the context is not preemption safe. This manifested itself in a wake() within a timer causing a schedule(), which re-enabled interrupts, which caused further manipulation of the timer list to occur concurrently with the next interrupt, resulting in corruption. Fixes timer stress test failure.
-
- May 24, 2013
-
-
Christoph Hellwig authored
-
- May 22, 2013
-
-
Nadav Har'El authored
1. osv::poweroff(), which can turn off a physical machine or in our case tell QEMU to quit. The implementation uses ACPI, through the ACPICA library. 2. osv::hang(), which ceases all computation on all cores, but does not turn off the machine. This can be useful if we want QEMU to remain alive for debugging, for example. The two functions are defined in the new <osv/power.hh> header files, and follow the new API guidelines we discussed today: They are C++-only, and are in the "osv" namespace.
-
Nadav Har'El authored
Added a timeout parameter to semaphore::wait(), which defaults to no timeout. semaphore:wait() is now a boolean, just like trywait(), and likewise can return false when the semaphore has not actually been decremented but rather we had a timeout. Because we need the mutex again after the wait, I replaced the "with_lock" mechanism by the better-looking lock_guard and mutex parameter to wait_until.
-
Nadav Har'El authored
Leak detection (e.g., by running with "--leak") used to have a devastating effect on the performance of the checked program, which although was tolerable (for leak detection, long runs are often unnecessary), it was still annoying. While before this patch leak-detection runs were roughly 5 times slower than regular runs, after this patch they are only about 40% slower than a regular run! Read on for the details. The main reason for this slowness was a simplistic vector which was used to keep the records for currently living allocations. This vector was linearly searched both for free spots (to remember new allocations) and for specific addresses (to forget freed allocations). Because this list often grew to a hundred thousand of items, it became incredibly slow and slowed down the whole program. For example, getting a prompt from cli.jar happened in 2 seconds without leak detection, but in 9 seconds with leak detection. A possible solution would have been to use an O(1) data structure, such as a hash table. This would be complicated by our desire to avoid frequent memory allocation inside the leak detector, or our general desire to avoid complicated stuff in the leak detector because they always end leading to complicated deadlocks :-) This patch uses a different approach, inspired by an idea by Guy. It still uses an ordinary vector for holding the records, but additionally keeps for each record one "next" pointer which is used for maintaining two separate lists of records: 1. A list of free records. This allows a finding a record for a new allocation in O(1) time. 2. A list of filled records, starting with the most-recently-filled record. When we free(), we walk this list and very often finish very quickly, because malloc() closely followed by free() are very common. Without this list, we had to walk the whole vector filled with ancient allocations and even free records, just to find the most recent allocation. Two examples of the performance with and without this patch: 1. Getting a prompt from cli.jar takes 2 seconds without leak detection, 9 seconds with leak detection before this patch, and 3 seconds with this patch. 2. The "sunflow" benchmark runs 53 ops/second without leak detection, which went down to 10 ops/second with leak detection before this patch, and after this patch - 33 ops/second. I verified (by commenting out the search algorithm and always using the first item in the vector) that the allocation record search is no longer having any effect on performance, so it is no longer interesting to replace this code with an even more efficient hash table. The remaining slowdown is probably due to the backtrace() operation and perhaps also the tracker lock.
-
Avi Kivity authored
Intrusive lists are faster since they require no allocations.
-
Avi Kivity authored
Previously, the mutex was stored using a pointer to avoid overflowing glibc's sem_t. Now we no longer have this restriction, drop the indirection.
-
Avi Kivity authored
Use Nadav's idea of iterating over the list and selecting wait records that fit the available units.
-
Avi Kivity authored
No code changes.
-
- May 21, 2013
-
-
Nadav Har'El authored
In the allocation tracker, not only did I use a dog-slow linear search, I forgot to stop on the first empty spot, and actually used the last empty spot... Add the missing break, which made leak detection 10% faster. A better implementation would be preferable, but this is a low hanging fruit
-