- Feb 11, 2014
-
-
Nadav Har'El authored
This patch adds support for epoll()'s edge-triggered mode, EPOLLET. Fixes #188. As explained in issue #188, Boost's asio uses EPOLLET heavily, and we use that library in our management http server, and also in our image creation tool (cpiod.so). By ignoring EPOLLET, like we did until now, the code worked, but unnecessarily wasted CPU when epoll_wait() always returned immediately instead of waiting until a new event. This patch works within the confines of our existing poll mechanisms - where epoll() call poll(). We do not change this in this patch, and it should be changed in the future (see issue #17). In this patch we add to each struct file a field "poll_wake_count", which as its name suggests counts the number of poll_wake()s done on this file. Additionally, epoll remembers the last value it saw of this counter, so that in poll_scan(), if we see that an fp (polled with EPOLLET) has an unchanged counter from last time, we do not return readiness on this fp regardless on whether or not it has readable data. We have a complication with EPOLLET on sockets. These have an "SB_SEL" optimization, which avoids calling poll_wake() when it thinks the new data is not interesting because the old data was not yet consumed, and also avoids calling poll_wake() if fp->poll() was not previously done. This optimization is counter-productive for EPOLLET (and causes missed wakeups) so we need to work around it in the EPOLLET case. This patch also adds a test for the EPOLLET case in tst-epoll.cc. The test runs on both OSv and Linux, and can confirm that in the tested scenarios, Linux and OSv behave the same, including even one same false-positive: When epoll_wait() tells us there is data in a pipe, and we don't read it, but then more data comes on a pipe, epoll_wait() will again return a new event, despite this is not really being an edge event (the pipe didn't change from empty to not-empty, as it was previously not-empty as well). Concluding remarks: The primary goal of this implementation is to stop EPOLLET epoll_wait() from returning immediately despite nothing have happened on the file. That was what caused the 100% CPU use before this patch. That being said, the goal of this patch is NOT to avoid all false-positives or unnecessary wakeups; When events do occur on the file, we may be doing a bit more wakeups than strictly necessary. I think this is acceptable (our epoll() has worse problems) but for posterity, I want to explain: I already mentioned above one false-positive that also happens on Linux. Another false-positive wakeup that remains is in one of EPOLLET's classic use cases: Consider several threads sleeping on epoll() on the same socket (e.g., TCP listening socket, or UDP socket). When one packet arrives, normal level-triggered epoll() will wake all the threads, but only one will read the packet and the rest will find they have nothing to read. With edge- triggered epoll, only one thread should be woken and the rest would not. But in our implementation, poll_wake() wakes up *all* the pollers on this file, so we cannot currently support this optimization. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Feb 10, 2014
-
-
Dmitry Fleytman authored
Useful for testing on RAM disks when writes are fast enough to fill the whole image in less than test execution time. Signed-off-by:
Dmitry Fleytman <dmitry@daynix.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Gleb Natapov authored
It uses low level thread::wait_until() now which calls caller supplied predicate with preemption disabled. If caller supplied code access not yet mapped memory it will trigger an assertion on a page fault path. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Replace OSv-specific constructs in tst-epoll.cc by their standard C++ counterparts (i.e., std::thread, std::chrono, std::cout). This test now also runs (and of course, succeeds) on Linux. In general, it is important at all our Linux-ABI tests (where we test our implementation of the Linux/glibc functionality) to be able to run on Linux as well. Otherwise, it is possible our tests don't actually test the right thing (we may test for some expected behavior, but the actual behavior on Linux is different). I'm doing this in preparation for fixing issue #188 (fix edge-triggered epoll). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 09, 2014
-
-
Avi Kivity authored
net channels caused a crash in shutdown(), so add a test to excercise it. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Feb 07, 2014
-
-
Raphael S. Carvalho authored
Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 06, 2014
-
-
Raphael S. Carvalho authored
The main purpose of this tool is to understand/analyze the ARC behavior/ performance on specific workloads. $ scripts/run.py -e 'tests/misc-zfs-arc.so --help' OSv v0.05-155-g1f04e49 Allowed options: --help produce help message --set-max-target set ARC max target to 80% of the system memory. --check-arc-shrink check ARC shrink functionality --test arg analyze ARC performance on a given testcase, e.g. --test tst-001.so * --set-max-target: Used to check performance when ARC max target is higher than usual. Given that more data will be load into ARC, ZFS operations that needs I/O would perform better. 80% was chosen as the low watermark is 20%, so avoiding a bunch of memory pressure, thus more stability. * --check-arc-shrink: Check the functionality of the function arc_shrink from ARC. * --test arg: Check ARC performance on a specified testcase, e.g.: $ scripts/run.py -e 'tests/misc-zfs-arc.so --test tst-fs-link.so' * Default run, i.e -e 'tests/misc-zfs-arc.so' provides four distinct workloads: 1) Non-linear one where prefetch shouldn't be as effective. 2) Load all data into cache, then read it afterwards to check performance on such cases, almost speed of main memory. 3) Linear workload where the amount of data is 1.5% the size of the system memory, thus page replacement will be strongly used, and as the operation is sequential, prefetch (readahead) must be effective. It leads to a high cache hit ratio as blocks were read ahead of time. 4) Keep allocating memory through a populated anonymous mmaping to see if shrink would take place to release memory back to the operating system. Eventual reports and ARC stats are provided to ease the task of understanding ARC performance on specific workloads. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Mainly created to be used as a tool that reproduces specific workloads, so allowing us to understand how underlying components are performing, e.g. Adjustable Replacement Cache (ARC) from ZFS. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 03, 2014
-
-
Raphael S. Carvalho authored
Currently, a file path always need to be specified as an argument, so let's make things easier by providing a default file path. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
For page-sized allocations, it is better to use alloc_page than it is to use malloc. The reason is that malloc for that size will arrive at malloc_large, which is a locked operation, while alloc_page will proceed locklessly if there is room in the per-cpu buffers. Although it is just a test, since the goal is to saturate the disk, doing so will allow us to get a closer picture, since the completion handler won't block. In KVM with my weird disk I see now ~65 Mbps where I previously saw ~45Mbps. Interestingly enough, it doesn't seem to make a whole lot of difference for Xen. There is a difference, but it is not nowhere as near. Reviewed-by:
Tomasz Grabiec <tgrabiec@gmail.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This test is similar in spirit to misc-bdev-write, but instead of pushing as many bios as we can, we'll push one bio at a time. Example output for Xen: OSv v0.05-118-g0f2973c Min 50% 90% 99% 99.99% 99.999% Max [msec] --- --- --- --- ------ ------- --- 0.2344 0.3240 0.2847 0.8185 2.4095 6.5230 13.6572 Example output fo KVM: OSv v0.05-118-g0f2973c Min 50% 90% 99% 99.99% 99.999% Max [msec] --- --- --- --- ------ ------- --- 0.2626 0.3976 0.3273 0.4791 0.5993 7.6401 15.9672 (Hint: the current xen blkfront slowness is not related to RT latency...) Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 28, 2014
-
-
Glauber Costa authored
This is useful to measure OSv boot speed, IOW, how fast are we without CLI, Java, etc. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 27, 2014
-
-
Nadav Har'El authored
Remove unused #include of <drivers/clock.hh>. Except the clock drivers and <osv/clock.hh>, no source file now now include this header. Rather, <osv/clock.hh> should be used. Code including <sched.hh> will also get <osv/clock.hh> automatically. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Fix pthread_cond_timedwait to set the absolute timer using a timepoint, instead of the old s64. Moreover, now that we have both a wall-clock and monotonic clock, we can support pthread_condattr_setclock, so this patch also adds this support. OpenJDK 8, for example, cannot run without this support (it assumes that if the OS supports CLOCK_MONOTONIC, it can also configure condition variables to use it). Unfortunately supporting pthread_condattr_setclock - the only condition- variable attribute that really exists - grows the pthread condition variable structure :( Fixes #168. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Drop the nanotime() function. Change the few remaining callers to use the appropriate osv::clock or std::chrono replacements. We already got rid in previous patches of most references to nanotime() by switching from absolute times to relative times. The direct equivalent of the old nanotime() function, where we actually need the number of nanoseconds since the UNIX epoch, is the rather verbose expression osv::clock::wall::now().time_since_epoch().count(), or the shorter clock::get()->time(). Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Drop the s64 literals _ms, _ns, etc., from <drivers/clock.hh>. Fix a few places which still use the old literals. The std:chrono::duration version from <osv/clock.hh> remains - but remember you need to "using namespace osv::clock::literals" to use them. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Fix tst-wait-for.cc to use the new <osv/clock.hh> APIs. This test uses std::abs on a time duration, and unfortunately C++11 fails to implement std::abs on an std::chrono::duration. This patch also adds support for this (in the form of a trivial template function) to <osv/clock.hh>. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Delete the sched::thread::sleep_until() function. All users of this function actually wanted a relative time, not absolute time, and can use the simpler new sched::thread::sleep() instead. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
In tst-timerfd, test timerfd with monotonic clock, in addition to the existing test with the realtime clock. This patch also changes this test to only use Linux APIs, not anything OSv-specific, so it can also be run on Linux. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 26, 2014
-
-
Nadav Har'El authored
Rename tests/misc-bsd-callout.c to tests/misc-bsd-callout.cc - we'll need it in C++ in the upcoming patch set. No changes were needed for this code to continue compiling. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Jan 23, 2014
-
-
Raphael S. Carvalho authored
Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 22, 2014
-
-
Nadav Har'El authored
A previous test changed loader.cc's command line parsing to also support "&" as a separator of commands, causing the previous command to be executed in a new thread. To achieve this, the parser ends each command with another string, containing "&", ";" or "", depending on what appeared on the end of this command. This change caused tst-commands.cc, which checks the results of the command line parsing, to fail. So this patch fixes the test to match the code. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
The second stack trace mentioned in issue #178 happens because of a bug in tst-queue-mpsc (this is what happens when tests become too complex, and have bugs of their own...): The "popper" thread reads an "item" from a lockfree:queue_mpsc, and wakes the "pusher" thread in that item. But we have a bug when the pusher thread is done and returns: While the condvar remains valid, the "item" containing it does not! We cannot continue to use the index item->value.waiter after we woke this thread, because it can return and item points to invalid memory... We need to save the index "item->value.waiter" before waking the thread. Unfortunately, this does *not* completely solve issue #178 - the timer bug (similar to the two stack traces on issue #178) is still seen (rarely) after this patch. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 21, 2014
-
-
Nadav Har'El authored
This patch fixes chdir() on a normal file, which used to succeed (!?), but now will fail as it should, with ENOTDIR. The patch also adds an exhaustive test for chdir's success and error cases. Before the latest chdir() patches, most of these tests would fail, and now all of them succeed. This test is standard C++ & Posix code, so it can be run also on Linux. This is important for verifing that whatever we expect from OSv, Linux really does the same. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Test case for dladdr(), to make sure it returns correct symbol in these cases: - addr is less than 'vfprintf'. Should returns another symbol. - addr is equals to 'vfprintf'. Should returns 'vfprintf' as the result. - addr is bigger than 'vfprintf', and also inside of it. Should returns 'vfprintf' as the result. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 20, 2014
-
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Jan 17, 2014
-
-
Pekka Enberg authored
Add a simple manual test case for checking "/proc/self/maps" output. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 15, 2014
-
-
Avi Kivity authored
The timeout test sets a 2 second timeout and a 1 second alarm, and expects the timeout to happen first. Change the timeout to 0.5 seconds. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 14, 2014
-
-
Nadav Har'El authored
tst-vfs.cc currently stat()s the file /usr/lib/jvm/jre/lib/amd64/headless/libmawt.so And dies if it doesn't exist. Since Java is now optional in our images, it's not a good idea to check for such a file, which might not exist (e.g., "make image=tests check" will fail). This patch changes it to check a filename that is certain to exist, like namely the test itself - /tests/tst-vfs.so. If we wanted to have a pathname with more components, the test should be rewritten to create this pathname, say /a/a/a/a/a/a/a/a/a/a, and then test stat on that newly created file. It cannot rely on such a file to pre-exist. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 13, 2014
-
-
Dmitry Fleytman authored
Signed-off-by:
Dmitry Fleytman <dmitry@daynix.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 10, 2014
-
-
Glauber Costa authored
Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 07, 2014
-
-
Nadav Har'El authored
In very early OSv history, the spinlock was used in the mutex's implementation so it made sense to put it in mutex.cc and mutex.h. But now that the spinlock is all that's left in mutex.cc (the real mutex is in lfmutex.cc), rename this file spinlock.cc. Also, move the spinlock definitions from <osv/mutex.h> to a new <osv/spinlock.h>, so if someone wants to make the grave mistake of using a spinlock - they will at least need to explicitly include this header file. Currently, the only remaining user of the spinlock is the console. Using a spinlock (and not a mutex) in the console allows printing debug messages while preemption is disabled. Arguably, this use-case is no longer important (we have tracepoints), so in the future we can consider dropping the spinlock completely. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-