"README.md" did not exist on "a8af5dde165c153745ff16984095624aec7ae360"
- May 23, 2013
-
-
Avi Kivity authored
-
Avi Kivity authored
Context switch tests and related optimizations. Conflicts: bootfs.manifest build.mak
-
Avi Kivity authored
kvmclock changes always come from the same cpu, so a real memory barrier is not needed. Replace with a compiler barrier.
-
Avi Kivity authored
-
Avi Kivity authored
If there's nothing in the cpu_set (which is fairly common), there's no need to use an atomic operation.
-
Avi Kivity authored
Builds on osv an Linux. Tests context switch performance: - between threads co-located on the same cpu - between threads on different cpus - between threads placed by the scheduler policy
-
Nadav Har'El authored
Avi recently added extern "C" to pwrite64 (after the header change apparently removed this declaration from the header file). Also do this to pread64 - otherwise the "derby" benchmark (from SPECjvm2008) cannot run.
-
- May 22, 2013
-
-
Nadav Har'El authored
The following test currently frequently crashes - with an abort or assertion failure. It's a very simple test, where 10 threads do an endless yield() loop. While yield() itself is not very important - and doesn't even implement the promise of sched_yield(2) to move the thread to the end of the run queue - this test failure may be the sign of a scheduler bug that needs to be fixed.
-
Avi Kivity authored
Allows cross-platform code where the APIs aren't the same.
-
Nadav Har'El authored
Call the new poweroff() function after the payload finishes running. Makes sense on a cloud (why would you want to pay the provider after your workload is done?) as well as for our benchmarking, where we want qemu to exit after running the benchmark. When the "--leak" option is used, instead of calling poweroff(), we call hang(), so that QEMU continues to run and we can attach the debugger to run "osv leak show". Note that before this patch, if the payload spawned threads, they could continue running after the payload's main() return. This is no longer the case - after main() returns, the entire virtual machine is shut down (or just hung). This is reasonable behavior, though: If the payload needs some threads to continue running, it should join() them before returning. The behavior on Linux (and Posix threads in general) is identical to our new behavior: When main() of a multithreaded program returns, all threads are killed.
-
Nadav Har'El authored
abort() did the same thing as the new osv::hang(), so let's just use osv::hang(). Note that it's important that osv::hang() doesn't print anything - abort() does, but avoids infinite recursion that can happen when abort()'s printing itself causes a crash, and another abort().
-
Nadav Har'El authored
1. osv::poweroff(), which can turn off a physical machine or in our case tell QEMU to quit. The implementation uses ACPI, through the ACPICA library. 2. osv::hang(), which ceases all computation on all cores, but does not turn off the machine. This can be useful if we want QEMU to remain alive for debugging, for example. The two functions are defined in the new <osv/power.hh> header files, and follow the new API guidelines we discussed today: They are C++-only, and are in the "osv" namespace.
-
Nadav Har'El authored
Implement some missing functions in drivers/acpi.cc, which an OS that uses the ACPICA library needs to implement, to enable the use of semaphores and locks. These functions get called from ACPICA functions for entering sleep state - and in particular for powering off - which we will use in the next patch. This patch includes no new implementation - the semaphore implementation was already committed earlier, and here it is just used.
-
Nadav Har'El authored
Added a timeout parameter to semaphore::wait(), which defaults to no timeout. semaphore:wait() is now a boolean, just like trywait(), and likewise can return false when the semaphore has not actually been decremented but rather we had a timeout. Because we need the mutex again after the wait, I replaced the "with_lock" mechanism by the better-looking lock_guard and mutex parameter to wait_until.
-
Avi Kivity authored
Extract the existing semaphore implementation into a generic API.
-
Avi Kivity authored
As part of the include change fallout, we no longer have a declaration for pwrite64(), so need to mark it as extern "C".
-
Dor Laor authored
There was a bug caused by calling um.get() in the destructor still left the unique_ptr armed with the pointer. Using free_deleter is cleaner and works too.
-
Dor Laor authored
Put the right pointer into the smart pointer. Noted by Guy
-
Nadav Har'El authored
If run.py's stdin is redirected (e.g., in an automatic benchmark script), the call to "stty" fails and prints a error message, which isn't interesting. Unfortunately, stty doesn't have a "--silent" parameter. So just redirect its stderr to /dev/null.
-
Nadav Har'El authored
Leak detection (e.g., by running with "--leak") used to have a devastating effect on the performance of the checked program, which although was tolerable (for leak detection, long runs are often unnecessary), it was still annoying. While before this patch leak-detection runs were roughly 5 times slower than regular runs, after this patch they are only about 40% slower than a regular run! Read on for the details. The main reason for this slowness was a simplistic vector which was used to keep the records for currently living allocations. This vector was linearly searched both for free spots (to remember new allocations) and for specific addresses (to forget freed allocations). Because this list often grew to a hundred thousand of items, it became incredibly slow and slowed down the whole program. For example, getting a prompt from cli.jar happened in 2 seconds without leak detection, but in 9 seconds with leak detection. A possible solution would have been to use an O(1) data structure, such as a hash table. This would be complicated by our desire to avoid frequent memory allocation inside the leak detector, or our general desire to avoid complicated stuff in the leak detector because they always end leading to complicated deadlocks :-) This patch uses a different approach, inspired by an idea by Guy. It still uses an ordinary vector for holding the records, but additionally keeps for each record one "next" pointer which is used for maintaining two separate lists of records: 1. A list of free records. This allows a finding a record for a new allocation in O(1) time. 2. A list of filled records, starting with the most-recently-filled record. When we free(), we walk this list and very often finish very quickly, because malloc() closely followed by free() are very common. Without this list, we had to walk the whole vector filled with ancient allocations and even free records, just to find the most recent allocation. Two examples of the performance with and without this patch: 1. Getting a prompt from cli.jar takes 2 seconds without leak detection, 9 seconds with leak detection before this patch, and 3 seconds with this patch. 2. The "sunflow" benchmark runs 53 ops/second without leak detection, which went down to 10 ops/second with leak detection before this patch, and after this patch - 33 ops/second. I verified (by commenting out the search algorithm and always using the first item in the vector) that the allocation record search is no longer having any effect on performance, so it is no longer interesting to replace this code with an even more efficient hash table. The remaining slowdown is probably due to the backtrace() operation and perhaps also the tracker lock.
-
Avi Kivity authored
Intrusive lists are faster since they require no allocations.
-
Avi Kivity authored
Previously, the mutex was stored using a pointer to avoid overflowing glibc's sem_t. Now we no longer have this restriction, drop the indirection.
-
Avi Kivity authored
Rather than restricting our semaphore's implementation to be smaller than glibc's, use indirection to only store a pointer in the user's structure.
-
Avi Kivity authored
Use Nadav's idea of iterating over the list and selecting wait records that fit the available units.
-
Avi Kivity authored
No code changes.
-
- May 21, 2013
-
-
Nadav Har'El authored
In the allocation tracker, not only did I use a dog-slow linear search, I forgot to stop on the first empty spot, and actually used the last empty spot... Add the missing break, which made leak detection 10% faster. A better implementation would be preferable, but this is a low hanging fruit
-
Nadav Har'El authored
Various improvements to "osv leak show": 1. Somewhat faster performance (but still slow). 2. Better report of progress (percent done). Previously, much of the` work of fetching the backtraces from the guest was actually delayed until sort time, so was wrongly attributed to the sort phase. Now the fetching phase takes most of the time, and percent of its progress is shown. 3. Due to popular request: sort leak records by size: Instead of outputting immediately each leak record (summary of all living allocations from a particular call chain), we now save them in memory, making it very easy to sort these records by any interesting criterion. In this patch, I sort them in decreasing order of total bytes - i.e., the first record one sees is the one responsible for most allocated bytes. The additional sort takes only a fraction of a second, and makes the output of "osv leak show" much more useful.
-
Nadav Har'El authored
As Avi suggested, add an option (turned on by default) to remember only the most recent function calls - instead of the most high-level function calls like I did until now - in an allocation's stack trace. In our project, where we often don't care about the top-level functions (various Java stuff), it is more useful.
-
Avi Kivity authored
Detached threads delete themselves, so the auto-join creates an infinite loop. Avoid by checking whether this is a detached thread when destroying it.
-
Avi Kivity authored
The detached thread reaper deletes zombies, but our pthread implementation also deletes dead pthreads (using the container object). Fix by making the base thread use the set_cleanup() method to set up a deleter, which is then overridden by pthreads.
-
Avi Kivity authored
-
Christoph Hellwig authored
-
Christoph Hellwig authored
-
Christoph Hellwig authored
-
Avi Kivity authored
Improved detached threads handling and pthread_mutex_t.
-
Christoph Hellwig authored
-
Nadav Har'El authored
the FPU on a context switch caused by a function call (as opposed to a preemption during interrupt), in practice this makes the "sunflow" benchmark from SpecJVM fail, producing wrong results. This patch saves the FPU on any context switch and makes "sunflow" work correctly, at the price of slower context switches and an unsolved puzzle on why the heck this is needed in the first place :(
-
- May 20, 2013
-
-
Avi Kivity authored
We had a klugey pmutex class used to allow zero initialization of pthread_mutex_t. Now that the mutex class supports it natively we can drop it.
-
Avi Kivity authored
pthread_mutex_t has a 32-bit field, __kind, at offset 16. Non-standard static initializers set this field to a nonzero value, which can corrupt fields in our implementation. Rearrange field layout so we have a hole in that position. To keep the structure size small enough so that condvar will still fit in pthread_condvar_t, we need to change the size of the _depth field to 16 bits.
-
Avi Kivity authored
Use the generic one instead; the cleanup function allows destroying the pthread object.
-