- Dec 03, 2013
-
-
Avi Kivity authored
If we're going to have a vtable in there, the memset() will kill it. Instead, add initializers for those members not already initialized by make_file(). Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
The only caller, soo_close() can only be called from a context where no file references remain, so no further file API calls can be made. Remove it. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Subsumed by make_file(). Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
falloc() is inherently racy in that it installs an uninitialized file descriptor in a user accessible fd. It is also hard to use correctly when an error occurs. Luckily, we don't use it anywhere, so we can just remove it. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 01, 2013
-
-
Pekka Enberg authored
Enable device_destroy() API for the virtio-rng driver. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Before this patch, OSv crashes or continuously reboots when given unknown command line paramters, e.g., scripts/run.py -c1 -e "--help --z a" With this patch, it says, as expected that the "--z" option is not recognized, and displays the list of known options: unrecognised option '--z' OSv options: --help show help text --trace arg tracepoints to enable --trace-backtrace log backtraces in the tracepoint log --leak start leak detector after boot --nomount don't mount the file system --noshutdown continue running after main() returns --env arg set Unix-like environment variable (putenv()) --cwd arg set current working directory Aborted The problem was that to parse the command line options, we used Boost, which throws an exception when an unrecognized option is seen. We need to catch this exception, and show a message accordingly. But before this patch, C++ exceptions did not work correctly during this stage of the boot process, because exceptions use elf::program(), and we only set it up later. So this patch moves the setup of the elf::program() object earlier in the boot, to the beginning of main_cont(). Now we'll be able to use C++ exceptions throughout main_cont(), not just in command line parsing. This patch also removes the unused "filesystem" paramter of elf::program(), rather than move the initializion of this empty object as well. Fixes #103. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Needed for C++ conversion. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
In preparation for making 'file' a C++ type. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Unfortunately, C++ does not support designated initializers. Add a function that helps fill their place. Use example: -static struct netisr_handler ether_nh = { - .nh_name = "ether", - .nh_handler = ether_nh_input, - .nh_proto = NETISR_ETHER, - .nh_policy = NETISR_POLICY_SOURCE, - .nh_dispatch = NETISR_DISPATCH_DIRECT, -}; +static netisr_handler ether_nh = initialize_with([] (netisr_handler& x) { + x.nh_name = "ether"; + x.nh_handler = ether_nh_input; + x.nh_proto = NETISR_ETHER; + x.nh_policy = NETISR_POLICY_SOURCE; + x.nh_dispatch = NETISR_DISPATCH_DIRECT; +}); Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 26, 2013
-
-
Avi Kivity authored
We previously had the POSIX variant only. Implement the GNU variant as well, and update the header to point to the correct function based on the dialect selected. The POSIX variant is renamed __xpg_strerror_r() to conform to the ABI standards. This fixes calls to strerror_r() from binaries which were compiled with _GNU_SOURCE (libboost_system.a) but preserves the correct behaviour for BSD derived source. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Started adding Doxygen documentation for the scheduler. Currently only set_priority() and priority() are documented. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
This patch replaces the algorithm which the scheduler uses to keep track of threads' runtime, and to choose which thread to run next and for how long. The previous algorithm used the raw cumulative runtime of a thread as its runtime measure. But comparing these numbers directly was impossible: e.g., should a thread that slept for an hour now get an hour of uninterrupted CPU time? This resulted in a hodgepodge of heuristics which "modified" and "fixed" the runtime. These heuristics did work quite well in our test cases, but we were forced to add more and more unjustified heuristics and constants to fix scheduling bugs as they were discovered. The existing scheduler was especially problematic with thread migration (moving a thread from one CPU to another) as the runtime measure on one CPU was meaningless in another. This bug, if not corrected, (e.g., by the patch which I sent a month ago) can cause crucial threads to acquire exceedingly high runtimes by mistake, and resulted in the tst-loadbalance test using only one CPU on a two-CPU guest. The new scheduling algorithm follows a much more rigorous design, proposed by Avi Kivity in: https://docs.google.com/document/d/1W7KCxOxP-1Fy5EyF2lbJGE2WuKmu5v0suYqoHas1jRM/edit?usp=sharing To make a long story short (read the document if you want all the details), the new algorithm is based on a runtime measure R which is the running decaying average of the thread's running time. It is a decaying average in the sense that the thread's act of running or sleeping in recent history is given more weight than its behavior a long time ago. This measure R can tell us which of the runnable threads to run next (the one with the lowest R), and using some highschool-level mathematics, we can calculate for how long to run this thread until it should be preempted by the next one. R carries the same meaning on all CPUs, so CPU migration becomes trivial. The actual implementation uses a normalized version of R, called R'' (Rtt in the code), which is also explained in detail in the document. This Rtt allows updating just the running thread's runtime - not all threads' runtime - as time passes, making the whole calculation much more tractable. The benefits of the new scheduler code over the existing one are: 1. A more rigourous design with fewer unjustified heuristics. 2. A thread's runtime measurement correctly survives a migration to a different CPU, unlike the existing code (which sometimes botches it up, leading to threads hanging). In particular, tst-loadbalance now gives good results for the "intermittent thread" test, unlike the previous code which in 50% of the runs caused one CPU to be completely wasted (when the load- balancing thread hung). 3. The new algorithm can look at a much longer runtime history than the previous algorithm did. With the default tau=200ms, the one-cpu intermittent thread test of tst-scheduler now provides good fairness for sleep durations of 1ms-32ms. The previous algorithm was never fair in any of those tests. 4. The new algorithm is more deterministic in its use of timers (with thyst=2_ms: up to 500 timers a second), resulting in less varied performance in high-context-switch benchmarks like tst-ctxsw. This scheduler does very well on the fairness tests tst-scheduler and fairly well on tst-loadbalance. Even better performance on that second test will require an additional patch for the idle thread to wake other cpus' load balanacing threads. As expected the new scheduler is somewhat slower than the existing one (as we now do some relatively complex calculations instead of trivial integer operations), but thanks to using approximations when possible and to various other optimizations, the difference is relatively small: On my laptop, tst-ctxsw.so, which measures "context switch" time (actually, also including the time to use mutex and condvar which this test uses to cause context switching), on the "colocated" test I measured 355 ns with the old scheduler, and 382 ns with the new scheduler - meaning that the new scheduler adds 27ns of overhead to every context switch. To see that this penalty is minor, consider that tst-ctxsw is an extreme example, doing 3 million context switches a second, and even there it only slows down the workload by 7%. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
The schedule() and cpu::schedule() functions had a "yield" parameter. This parameter was inconsistently used (it's not clear why specific places called it with "true" and other with "false"), but moreover, was always ignored! So this patch removes the parameter of schedule(). If you really want a yield, call yield(), not schedule(). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Raphael S. Carvalho authored
v2: Check limit of microseconds, among other minor changes (Nadav Har'El, Avi Kivity). v3: Get rid of goto & label by adding an else clause (Nadav Har'El). - This patch adds utimes support. - This patch addresses the issue #93 Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Tested-by:
Tomasz Grabiec <tgrabiec@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Attribute flags were moved from 'bsd/sys/cddl/compat/opensolaris/sys/vnode.h' to 'include/osv/vnode_attr.h' 'bsd/sys/cddl/compat/opensolaris/sys/vnode.h' now includes 'include/osv/vnode_attr.h' exactly at the place the flags were previously located. 'fs/vfs/vfs.h' includes 'include/osv/vnode_attr.h' as functions that rely on the setattr feature must specify the flags respective to the attr fields that are going to be changed. Approach sugested by Nadav Har'El Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Tested-by:
Tomasz Grabiec <tgrabiec@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
This patch causes incorrect usage of percpu<>/PERCPU() to cause compilation errors instead of silent runtime corruptions. Thanks to Dmitry for first noticing this issue in xen_intr.cc (see his separate patch), and to Avi for suggesting a compile-time fix. With this patch: 1. Using percpu<...> to *define* a per-cpu variable fails compilation. Instead, PERCPU(...) must be used for the definition, which is important because it places the variable in the ".percpu" section. 2. If a *declaration* is needed additionally (e.g., for a static class member), percpu<...> must be used, not PERCPU(). Trying to use PERCPU() for declaration will cause a compilation error. 3. PERCPU() only works on statically-constructed objects - global variables, static function-variables and static class-members. Trying to use it on a dynamically-constructed object - stack variable, class field, or operator new - will cause a compilation error. With this patch, the bug in xen_intr.cc would have been caught at compile time. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 25, 2013
-
-
Pekka Enberg authored
Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Switch to demand paging for anonymous virtual memory. I used SPECjvm2008 to verify performance impact. The numbers are mostly the same with few exceptions, most visible in the 'serial' benchmark. However, there's quite a lot of variance between SPECjvm2008 runs so I wouldn't read too much into them. As we need the demand paging mechanism and the performance numbers suggest that the implementation is reasonable, I'd merge the patch as-is and see optimize it later. Before: Running specJVM2008 benchmarks on an OSV guest. Score on compiler.compiler: 331.23 ops/m Score on compiler.sunflow: 131.87 ops/m Score on compress: 118.33 ops/m Score on crypto.aes: 41.34 ops/m Score on crypto.rsa: 204.12 ops/m Score on crypto.signverify: 196.49 ops/m Score on derby: 170.12 ops/m Score on mpegaudio: 70.37 ops/m Score on scimark.fft.large: 36.68 ops/m Score on scimark.lu.large: 13.43 ops/m Score on scimark.sor.large: 22.29 ops/m Score on scimark.sparse.large: 29.35 ops/m Score on scimark.fft.small: 195.19 ops/m Score on scimark.lu.small: 233.95 ops/m Score on scimark.sor.small: 90.86 ops/m Score on scimark.sparse.small: 64.11 ops/m Score on scimark.monte_carlo: 145.44 ops/m Score on serial: 94.95 ops/m Score on sunflow: 73.24 ops/m Score on xml.transform: 207.82 ops/m Score on xml.validation: 343.59 ops/m After: Score on compiler.compiler: 346.78 ops/m Score on compiler.sunflow: 132.58 ops/m Score on compress: 116.05 ops/m Score on crypto.aes: 40.26 ops/m Score on crypto.rsa: 206.67 ops/m Score on crypto.signverify: 194.47 ops/m Score on derby: 175.22 ops/m Score on mpegaudio: 76.18 ops/m Score on scimark.fft.large: 34.34 ops/m Score on scimark.lu.large: 15.00 ops/m Score on scimark.sor.large: 24.80 ops/m Score on scimark.sparse.large: 33.10 ops/m Score on scimark.fft.small: 168.67 ops/m Score on scimark.lu.small: 236.14 ops/m Score on scimark.sor.small: 110.77 ops/m Score on scimark.sparse.small: 121.29 ops/m Score on scimark.monte_carlo: 146.03 ops/m Score on serial: 87.03 ops/m Score on sunflow: 77.33 ops/m Score on xml.transform: 205.73 ops/m Score on xml.validation: 351.97 ops/m Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Add permission flags to VMAs. They will be used by mprotect() and the page fault handler. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 22, 2013
-
-
Avi Kivity authored
To prevent leaks when a file is close()d without an EPOLL_CTL_DEL, record epoll registrations in the file structure and remove them when the file is destroyed. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
use file::operator delete to ensure it is reclaimed via rcu, and let the rest of the cleanup happen via the destructor. This allows us to add other members to file, and let the standard construction/destruction sequence take place. Note the constructor is already used (falloc_noinstall()). Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 21, 2013
-
-
Nadav Har'El authored
prio.hh defines various initialization priorities. The actual numbers don't matter, just the order between them. But when we add too many priorities between existing ones, we may hit a need to renumber. This is plain ugly, and reminds me of Basic programming ;-) So this patch switches to an enum (enum class, actually). We now just have a list of priority names in order, with no numbers. It would have been straightforward, if it weren't for a bug in GCC (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59211 ) where the "init_priority" attribute doesn't accept the enum (while the "constructor" attribute does). Luckily, a simple workaround - explicitly casting to int - works. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
With epoll(), the lifetime of an ongoing poll may be longer than the lifetime of a file descriptor; if an fd is close()d then we expect it to be silently removed from the epoll. With the current implementation of epoll(), which just calls poll(), this is impossible to do correctly since poll() is implemented in terms of file descriptor. Add an intermedite do_poll() that works on file pointers. This allows a refactored epoll() to convert file descriptors to file pointers just once, and then a close()d and re-open()ed descriptor can be added without a problem. As a side effect, a lot of atomic operations (fget() and fdrop()) are saved. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 20, 2013
-
-
Glauber Costa authored
We may have threads that were initialized and started very early, before sched::init() took place. We can easily identify such threads: they are all threads that are in the thread list so far with the exception of the main thread. For those, we finish their initialization so they are now in a safe state. Also, some of them may have been started already. Since we cannot really start anything before the main thread, they were put in as special state called "prestarted". Every thread found in this state is started at this moment. Note how this code needs to run in the main thread itself, since we depend on initialization that will only happen inside switch_to_first to properly function those procedures. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Pekka Enberg <penberg@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Glauber Costa authored
It may be that a thread that is initialized early is also started early. We need to somehow mark that thread as already started, so we can start it for real later when the scheduler is ready to go. We will do this by adding an extra state, prestarted. Later on, we will take action to start those threads properly. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Pekka Enberg <penberg@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Glauber Costa authored
Since the thread list does not depend on nothing but the memory allocator, allocate it as early as we can. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Pekka Enberg <penberg@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Glauber Costa authored
No reason at all for page_size to be in mmu.hh but huge_page_size in mmu.cc. Move it, so we can also use huge_page_size outside the mmu.cc scope. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 19, 2013
-
-
Raphael S. Carvalho authored
vop_eperm allows more code reuse (suggested by Glauber Costa) Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 10, 2013
-
-
dleifker@gmail.com authored
Fixed the following abort when build was running mkzfs.py. Done! eth0: ethernet address: 52:54:0:12:34:56 [I/20 dhcp]: Waiting for IP... [I/26 dhcp]: Server acknowledged IP for interface eth0 [I/26 dhcp]: Configuring eth0: ip 192.168.122.15 subnet mask 255.255.255.0 gateway 192.168.122.1 Aborted Signed-off-by:
David Leifker <dlei...@gmail.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Nov 07, 2013
-
-
Avi Kivity authored
Dead code, and boost::signals2::signal<> does the job better. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 04, 2013
-
-
Dmitry Fleytman authored
slop made page size by default because this is the most frequent case Signed-off-by:
Dmitry Fleytman <dmitry@daynix.com>
-
- Oct 31, 2013
-
-
Tomasz Grabiec authored
Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com>
-
- Oct 30, 2013
-
-
Nadav Har'El authored
Add Doxygen comments to the condvar class. Only the C++ interface (condvar's methods) is documented, not the alternative C interface (condvar_* functions). A reminder: run "doxygen" and point your browser to doxyout/html/index.html to see the API documentation we have so far. A lot can still be added to this condvar documentation, including a good introduction to how to use condition variables, why they have a mutex, etc. But it's at least a start. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com>
-
- Oct 28, 2013
-
-
Pekka Enberg authored
Spotted by Clang: In file included from ../../loader.cc:27: In file included from ../../drivers/virtio-net.hh:12: In file included from ../../bsd/sys/net/if_var.h:80: In file included from ../../bsd/sys/sys/mbuf.h:40: In file included from ../../bsd/porting/uma_stub.h:161: ../../include/osv/percpu.hh:35:42: error: arithmetic on a pointer to void return reinterpret_cast<T*>(base + offset); ~~~~ ^ Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Oct 25, 2013
-
-
Pekka Enberg authored
C++ does not allow reinterpret_cast in constant expressions: http://stackoverflow.com/questions/10369606/constexpr-pointer-value GCC doesn't enforce it but Clang does. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Oct 24, 2013
-
-
Pekka Enberg authored
Spotted by Clang. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Spotted by Clang. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Spotted by Clang: ../../include/sched.hh:278:12: error: class 'arch_cpu' was previously declared as a struct [-Werror,-Wmismatched-tags] friend class arch_cpu; ^ ../../arch/x64/arch-cpu.hh:39:8: note: previous use is here struct arch_cpu { ^ ../../include/sched.hh:278:12: note: did you mean struct here? friend class arch_cpu; ^~~~~ struct Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Spotted by Clang: ../../include/osv/trace.hh:58:1: error: 'assigner_type' defined as a struct template here but previously declared as a class template [-Werror,-Wmismatched-tags] struct assigner_type<storage_args<s_args...>, runtime_args<r_args...>> { ^ ../../include/osv/trace.hh:45:1: note: did you mean struct here? class assigner_type; ^~~~~ struct Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-