- Nov 29, 2013
-
-
Tomasz Grabiec authored
This patch changes module.py so that it generates both manifests at once. We do not need to have it split. Doing it in one step makes we also need to resolve modules once. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
This is a refactoring before further modifications. Should not have any semantic changes. Items: - file.readlines() is not necessary, file is line-iterable - use "cwd" named parameter in subprocess.call() instead of explicit "cd" - auto close using scope manager: "with xxx as yyy:" - os.path.exists() instead of os.access(mmod_path, os.F_OK) == False - extract paths containing ".." to variables Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Move runtime creation of '/dev' to image creation to avoid bogus "unable to create /dev directory" error messages during boot. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Looks like the serial console doesn't display anything until new line character is received. Looks like only when '\n' is written to the serial console QEMU displays the line. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
The link system call currently keeps the dentry in the cache by avoiding calling drele on success. However, it's not the right thing as link creates the entry, not references it (much the same as create). Currently, link works ok, but subsequent calls on the same dentry would fail as its refcount would be wrong. Strangely, this problem didn't come up at the time the link syscall was implemented. Something that has been changed recently made this bug manifest. This patch fixes it. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 28, 2013
-
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Add a new virtio::probe() helper function to simplify virtio driver probing. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Due to an unknown bug (bug in our HPET driver?), OSv crashed on our build machine with an assertion that the clock only goes foward. The clock jumping backward is a bad sign, but it's not really necessary to crash the VM when it happens, assuming it only happens rarely. This patch makes the scheduler handle time<0 just like time==0. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
The undefined reference error only shows up when building OSv on debug mode (mode=debug): bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.o: In function `zfs_setattr': bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:2529: undefined reference to `zfs_xvattr_set Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 27, 2013
-
-
Nadav Har'El authored
We only open file descriptor 1 relatively late in our boot process (see vfs_init() in fs/vfs/main.cc). We would like to be able to use stdout (and C++'s std::cout) much earlier than that - examples include ACPI's information messages (before c9dadf2d) and our "--help" command line parameter. Before this patch, early writes to stdout almost work, but with a strange twist: They only write the string up to the last newline, and whatever is left is buffered until much later - when all those "string ends" are lumped together. The basis of Musl's stdio write mechanism is the "f->write()" method. It needs to write *two* things: Whatever we have buffered previously, and the new string given to it. __stdio_write() is the default implementation, which does this correctly using writev(). But our early implementation, __stdout_write only write the new string, and the buffered part remained buffered, collecting various string parts until it was finally flushed when we switched to the correct __stdio_write. This patch fixes __stdout_write(), to write both strings as expected. Fixes #104. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Fix a couple of spelling mistakes in core/lfmutex.cc Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Dumps virtio drivers and the state of their queues. Sample output: (gdb) osv info virtio virtio::virtio_net at 0xffffc0003ff0ec00 queue 0 at 0xffffc0003ff2ba00 avail g=0xbe09 h=0xbe09 (0) used h=0xbe09 g=0xbd11 (248) used notifications: enabled queue 1 at 0xffffc0003ff2bc00 avail g=0xc951 h=0xc951 (0) used h=0xc951 g=0xc8fd (84) used notifications: enabled queue 2 at 0xffffc0003ff2bd00 avail g=0x0 h=0x0 (0) used h=0x0 g=0x0 (0) used notifications: enabled virtio::virtio_blk at 0xffffc0003fefd400 queue 0 at 0xffffc0003fee5100 avail g=0x15f h=0x15f (0) used h=0x15f g=0x15f (0) used notifications: enabled Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Tomasz Grabiec authored
There is a race between "usr.manifest" and "bootfs.manifest" rules which both call module.py. The script does complex stuff wrt module preparation like fetching module files, calling make, etc. and should not be run concurrently. This change fixes the problem by moving the calls into one rule. This is not the end of the story, more refactoring will follow. The module.py script should be split into parts, one that fetches modules and one that generates manifests. This way the dependencies could be made more fine grained and jobs paralellized. This fixes issue #100. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
This allows to use these variables inside build.mk regardless of CWD and is more clear than a cascade of ".."s This change also unifies $(submake) and $(modulemk) generation to reduce duplication. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Add missing join() in tst-loadbalance, to avoid rare crashes during the test. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
This patch adds tst-scheduler.cc, containing a few tests for the fairness of scheduling of several threads on one CPU (for scheduling issues involving load-balancing across multiple CPUs, check out the existing tst-loadbalance). The test is written in standard C++11, so it can be compiled and run on both Linux and OSv, to compare their scheduler behaviors. It is actually more a benchmark then a test (it doesn't "succeed" or "fail"). The test begins with several tests of the long-term fairness of the schduler when threads of different or identical priorities are run for 10 seconds, and we look at how much work each thread got done in those 10 seconds. This test only works on OSv (which supports float priorities). The second part of the test again tests long-term fairness of the scheduler when all threads have the default priority (so this test is standard C++11): We run a loop which takes (when run alone) 10 seconds, on 2 or 3 threads in parallel. We expect to see that all 2 or 3 threads finish at (more-or-less) exactly the same time - after 20 or 30 seconds. Both OSv and Linux pass this test with flying colors. The third part of the test runs two different threads concurrently: 1. One thread wants to use all available CPU to loop for 10 seconds. 2. The second thread wants to loop in an amount that takes N milliseconds, and then sleep for N milliseconds, and so on, until completing the same number of loop iterations that (when run alone) takes 10 seconds. The "fair" behavior of the this test is that both threads get equal CPU time and finish together: Thread 2 runs for N milliseconds, then while it is sleeping for N more, Thread 1 gets to run. This measure this for N=1 through 32ms. In OSv's new scheduler, indeed both threads get an almost fair share (with N=32ms, one thread finishes in 19 seconds, the second in 21.4 seconds; we don't expect total fairness because of the runtime decay). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Allows longer tests to be run. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
As detailed in [1], SO_REUSEADDR means slightly different things on BSD and Linux. One of the differences is in the treatment of sockets that are bound to addresses already occupied by existing sockets in the TIME_WAIT state; Linux allows the new socket if SO_REUSEADDR is set on it, while BSD refuses. Adjust the code to match the Linux behaviour. This allows multiple connection tests to pass, and will likely be required by other network intensive applications. [1] http://stackoverflow.com/questions/14388706/socket-options-so-reuseaddr-and-so-reuseport-how-do-they-differ-do-they-mean-t Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Nadav Har'El reports that tst-pipe.so starts to hang some of the time after commit c1d5fccb ("mmu: Anonymous memory demand paging"). Tracing page faults points to pthread stacks which are now demand faulted. Avi Kivity explains: It's a logical bug in our design. User code runs on mmap()ed stacks, then calls "kernel" code, which doesn't tolerate page faults (interrupts disabled, preemption disabled, already in the page fault path, whatever). Possible solutions: - insert "thunk code" between user and kernel code that switches the stacks to known resident stacks. We could abuse the elf linker code to do that for us, at run time. - use -fsplit-stack to allow a dynamically allocated, discontiguous stack on physical memory - use map_populate and live with the memory wastage Switch to map_populate as a stop-gap measure until OSv "kernel" code is able to deal with page faults. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Fix the printout of runtime by "osv info threads" and "osv runqueue" debugger commands, to fit the new scheduler (different variable name, and it is float, not integer). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 26, 2013
-
-
Nadav Har'El authored
This patch resolves issue #26. As you can see with "objdump -h build/release/loader.elf", our executable had over a thousand (!) separate sections, most of them should really be merged. We already started doing this in arch/x64/loader.ld, but didn't complete the work. This patch merges all the ".gcc_except_table.*" sections into one, and all the ".data.rel.ro.*" sections into one. After this merge, we are left with just 52 sections, instead of more than 1000. The default linker script (run "ld --verbose" to see it) also does similar merges, so there's no reason why we shouldn't. By reducing the number of ELF sections (each comes with a name, headers, etc.), this patch also reduces the size of our loader-stripped.elf by about 140K. Fixes #26. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
The "debugging" section of README.md was misleading - it suggested one needs to build with "mode=debug" to debug (usually, you don't), or that disabling preemption was a good idea for debugging. Change this to drop the preempt=0 suggestion, and make mode=debug an option, not a recommendation. Also link to our "Debugging OSv" wiki page, which is much more informative - and includes very important information still missing in the README (such as a line one needs to add to ~/.gdbinit, and how to handle multiple vcpus). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
We previously had the POSIX variant only. Implement the GNU variant as well, and update the header to point to the correct function based on the dialect selected. The POSIX variant is renamed __xpg_strerror_r() to conform to the ABI standards. This fixes calls to strerror_r() from binaries which were compiled with _GNU_SOURCE (libboost_system.a) but preserves the correct behaviour for BSD derived source. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Some functions (strerror_r()) are defined differently based on the source dialect. We need to provide both dialects since we have mixed source. Add a source-dialect macro (defaulting to _GNU_SOURCE) and override it as appropriate. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Started adding Doxygen documentation for the scheduler. Currently only set_priority() and priority() are documented. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
This patch replaces the algorithm which the scheduler uses to keep track of threads' runtime, and to choose which thread to run next and for how long. The previous algorithm used the raw cumulative runtime of a thread as its runtime measure. But comparing these numbers directly was impossible: e.g., should a thread that slept for an hour now get an hour of uninterrupted CPU time? This resulted in a hodgepodge of heuristics which "modified" and "fixed" the runtime. These heuristics did work quite well in our test cases, but we were forced to add more and more unjustified heuristics and constants to fix scheduling bugs as they were discovered. The existing scheduler was especially problematic with thread migration (moving a thread from one CPU to another) as the runtime measure on one CPU was meaningless in another. This bug, if not corrected, (e.g., by the patch which I sent a month ago) can cause crucial threads to acquire exceedingly high runtimes by mistake, and resulted in the tst-loadbalance test using only one CPU on a two-CPU guest. The new scheduling algorithm follows a much more rigorous design, proposed by Avi Kivity in: https://docs.google.com/document/d/1W7KCxOxP-1Fy5EyF2lbJGE2WuKmu5v0suYqoHas1jRM/edit?usp=sharing To make a long story short (read the document if you want all the details), the new algorithm is based on a runtime measure R which is the running decaying average of the thread's running time. It is a decaying average in the sense that the thread's act of running or sleeping in recent history is given more weight than its behavior a long time ago. This measure R can tell us which of the runnable threads to run next (the one with the lowest R), and using some highschool-level mathematics, we can calculate for how long to run this thread until it should be preempted by the next one. R carries the same meaning on all CPUs, so CPU migration becomes trivial. The actual implementation uses a normalized version of R, called R'' (Rtt in the code), which is also explained in detail in the document. This Rtt allows updating just the running thread's runtime - not all threads' runtime - as time passes, making the whole calculation much more tractable. The benefits of the new scheduler code over the existing one are: 1. A more rigourous design with fewer unjustified heuristics. 2. A thread's runtime measurement correctly survives a migration to a different CPU, unlike the existing code (which sometimes botches it up, leading to threads hanging). In particular, tst-loadbalance now gives good results for the "intermittent thread" test, unlike the previous code which in 50% of the runs caused one CPU to be completely wasted (when the load- balancing thread hung). 3. The new algorithm can look at a much longer runtime history than the previous algorithm did. With the default tau=200ms, the one-cpu intermittent thread test of tst-scheduler now provides good fairness for sleep durations of 1ms-32ms. The previous algorithm was never fair in any of those tests. 4. The new algorithm is more deterministic in its use of timers (with thyst=2_ms: up to 500 timers a second), resulting in less varied performance in high-context-switch benchmarks like tst-ctxsw. This scheduler does very well on the fairness tests tst-scheduler and fairly well on tst-loadbalance. Even better performance on that second test will require an additional patch for the idle thread to wake other cpus' load balanacing threads. As expected the new scheduler is somewhat slower than the existing one (as we now do some relatively complex calculations instead of trivial integer operations), but thanks to using approximations when possible and to various other optimizations, the difference is relatively small: On my laptop, tst-ctxsw.so, which measures "context switch" time (actually, also including the time to use mutex and condvar which this test uses to cause context switching), on the "colocated" test I measured 355 ns with the old scheduler, and 382 ns with the new scheduler - meaning that the new scheduler adds 27ns of overhead to every context switch. To see that this penalty is minor, consider that tst-ctxsw is an extreme example, doing 3 million context switches a second, and even there it only slows down the workload by 7%. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
The schedule() and cpu::schedule() functions had a "yield" parameter. This parameter was inconsistently used (it's not clear why specific places called it with "true" and other with "false"), but moreover, was always ignored! So this patch removes the parameter of schedule(). If you really want a yield, call yield(), not schedule(). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
The idle thread cpu::idle() waits for other threads to become runnable, and then lets them run. It used to yield the CPU by calling yield(), because in early OSv history we didn't have an idle priority so simply calling schedule() would not guarantee that the new thread, not the idle thread, will run. But now we actually do have an idle priority; If the run queue is not empty, we are sure that calling schedule() will run another thread, not the idle thread. So this patch calls schedule(), which is simpler, faster, and more reliable than yield(). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
The scheduler (reschedule_from_interrupt()) changes the runtime of the current thread. This assumes that the current thread is not in the runqueue - because the runqueue is sorted by runtime, and modifying the runtime of a thread which is already in the runqueue ruins the sorted tree's invariants. Unfortunately, the existing code broke this assumption in two places: 1. When handle_incoming_wakeups() wakes up the current thread (i.e., a thread that prepared to wait but was woken before it could go to sleep), the current thread was queued. We need to instead to simply return the thread to the "running" state. 2. yield() queued the current thread. Rather, it needs to just change its runtime, and reschedule_from_interrupt() will decide to queue this thread. This patch fixes the first problem. The second problem will be solved by a yield() rewrite which is part of the new scheduler in a later patch. By the way, after we fix both problems, we can also be sure that the strange if(n != thread::current()) in the scheduler is always true. This is because n, picked up from the run queue, could never be the current thread, because the current thread isn't in the run queue. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
v2: Let's convert everything to std::chrono::timepoint (Avi Kivity) v3: Use the to_timeptr approach suggested by Nadav Har'El This test checks the functionality of the utimes support. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
v2: Check limit of microseconds, among other minor changes (Nadav Har'El, Avi Kivity). v3: Get rid of goto & label by adding an else clause (Nadav Har'El). - This patch adds utimes support. - This patch addresses the issue #93 Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Tested-by:
Tomasz Grabiec <tgrabiec@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Attribute flags were moved from 'bsd/sys/cddl/compat/opensolaris/sys/vnode.h' to 'include/osv/vnode_attr.h' 'bsd/sys/cddl/compat/opensolaris/sys/vnode.h' now includes 'include/osv/vnode_attr.h' exactly at the place the flags were previously located. 'fs/vfs/vfs.h' includes 'include/osv/vnode_attr.h' as functions that rely on the setattr feature must specify the flags respective to the attr fields that are going to be changed. Approach sugested by Nadav Har'El Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Tested-by:
Tomasz Grabiec <tgrabiec@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-