- Dec 08, 2013
-
-
Glauber Costa authored
Right now we are taking a clock measure very early for cpu initialization. That forces an unnecessary dependency between sched and clock initializations. Since that lock is used to determine for how long the cpu has been running, we can initialize the runtime later, when we init the idle thread. Nothing should be running before it. After doing this, we can move the sched initialization a bit earlier. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Currently, namei() does vget() unconditionally if no dentry is found. This is wrong because the path can be a hard link that points to a vnode that's already in memory. To fix the problem: - Use inode number as part of the hash in vget() - Use vn_lookup() in vget() to make sure we have one vnode in memory per inode number. - Push the vget() calls down to individual filesystems and make VOP_LOOKUP return an vnode - Drop lock in vn_lookup() and assert that vnode_lock is held. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 05, 2013
-
-
Avi Kivity authored
Prior to 65ccda4c (net: use a file derived class for sockets (socket_file)), ioctl()s for socket were directed to linux_ioctl_socket() and thence to soo_ioctl(). However that commit short-circuited linux_ioctl_socket() out and dipatched directly to what was previously known as soo_ioctl() (and became socket_file::ioctl()). The caused interface enumeration ioctl()s to fail, for example in Cassandra. Fix by bringing back the previous behaviour. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
A list can be slow to search for an element if we have many threads. Even under normal load, the number of threads we span may not be classified as huge, but it is not tiny either. Change it to a map so we can implement functions that operate on a given thread without that much overhead - O(1) for the common case. Note that ideally we would use an unordered_set, that doesn't require an extra key. However, that would also mean that the key is implicit and set to be of type key_type&. Threads are not very lightweight to create for search purposes, so we go for a id-as-key approach. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
no users in tree. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Benoît Canet authored
This restore the original behavior of osv::run in place before the mkfs.so and cpiod.so split committed a day ago. Signed-off-by:
Benoit Canet <benoit@irqsave.net> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 04, 2013
-
-
Avi Kivity authored
Everyone is now overriding file's virtual functions; we can make them pure virtual and remove fileops completely. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Unused. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Useful for C -> C++ conversions. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Not everyone wants it. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Derived file objects will be initialized by the class constructor, no need for fo_init(). Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 03, 2013
-
-
Raphael S. Carvalho authored
Besides simplifying mmu::map_file interface, let's make it more similar to mmu::map_anon. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Benoît Canet authored
A ';' at the end of a parameter mark the end of a program's arguments list. The goal of this patch is to be able to split mkfs.so in to parts mkfs.so and cpiod.so. The patch uses a full spirit parser to escape "" and split commands around ';'. Signed-off-by:
Benoit Canet <benoit@irqsave.net> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Gleb Natapov authored
storage/runtime arguments for tracepoint can be inferred from assign() function signature instead of specified explicitly by storage_args/runtime_args. This makes boilerplate code smaller. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
The default is to dispatch directly to the corresponding member of f_ops, but that can be overridden. The fo_*() functions are redirected to dispatch via the virtual functions. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Pekka Enberg authored
Add sys_umount2() and implement support for MNT_FORCE that will be used to force rootfs unmount at poweroff. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
If we're going to have a vtable in there, the memset() will kill it. Instead, add initializers for those members not already initialized by make_file(). Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
The only caller, soo_close() can only be called from a context where no file references remain, so no further file API calls can be made. Remove it. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Subsumed by make_file(). Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
falloc() is inherently racy in that it installs an uninitialized file descriptor in a user accessible fd. It is also hard to use correctly when an error occurs. Luckily, we don't use it anywhere, so we can just remove it. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 01, 2013
-
-
Pekka Enberg authored
Enable device_destroy() API for the virtio-rng driver. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Before this patch, OSv crashes or continuously reboots when given unknown command line paramters, e.g., scripts/run.py -c1 -e "--help --z a" With this patch, it says, as expected that the "--z" option is not recognized, and displays the list of known options: unrecognised option '--z' OSv options: --help show help text --trace arg tracepoints to enable --trace-backtrace log backtraces in the tracepoint log --leak start leak detector after boot --nomount don't mount the file system --noshutdown continue running after main() returns --env arg set Unix-like environment variable (putenv()) --cwd arg set current working directory Aborted The problem was that to parse the command line options, we used Boost, which throws an exception when an unrecognized option is seen. We need to catch this exception, and show a message accordingly. But before this patch, C++ exceptions did not work correctly during this stage of the boot process, because exceptions use elf::program(), and we only set it up later. So this patch moves the setup of the elf::program() object earlier in the boot, to the beginning of main_cont(). Now we'll be able to use C++ exceptions throughout main_cont(), not just in command line parsing. This patch also removes the unused "filesystem" paramter of elf::program(), rather than move the initializion of this empty object as well. Fixes #103. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Needed for C++ conversion. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
In preparation for making 'file' a C++ type. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Unfortunately, C++ does not support designated initializers. Add a function that helps fill their place. Use example: -static struct netisr_handler ether_nh = { - .nh_name = "ether", - .nh_handler = ether_nh_input, - .nh_proto = NETISR_ETHER, - .nh_policy = NETISR_POLICY_SOURCE, - .nh_dispatch = NETISR_DISPATCH_DIRECT, -}; +static netisr_handler ether_nh = initialize_with([] (netisr_handler& x) { + x.nh_name = "ether"; + x.nh_handler = ether_nh_input; + x.nh_proto = NETISR_ETHER; + x.nh_policy = NETISR_POLICY_SOURCE; + x.nh_dispatch = NETISR_DISPATCH_DIRECT; +}); Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 26, 2013
-
-
Avi Kivity authored
We previously had the POSIX variant only. Implement the GNU variant as well, and update the header to point to the correct function based on the dialect selected. The POSIX variant is renamed __xpg_strerror_r() to conform to the ABI standards. This fixes calls to strerror_r() from binaries which were compiled with _GNU_SOURCE (libboost_system.a) but preserves the correct behaviour for BSD derived source. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Started adding Doxygen documentation for the scheduler. Currently only set_priority() and priority() are documented. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
This patch replaces the algorithm which the scheduler uses to keep track of threads' runtime, and to choose which thread to run next and for how long. The previous algorithm used the raw cumulative runtime of a thread as its runtime measure. But comparing these numbers directly was impossible: e.g., should a thread that slept for an hour now get an hour of uninterrupted CPU time? This resulted in a hodgepodge of heuristics which "modified" and "fixed" the runtime. These heuristics did work quite well in our test cases, but we were forced to add more and more unjustified heuristics and constants to fix scheduling bugs as they were discovered. The existing scheduler was especially problematic with thread migration (moving a thread from one CPU to another) as the runtime measure on one CPU was meaningless in another. This bug, if not corrected, (e.g., by the patch which I sent a month ago) can cause crucial threads to acquire exceedingly high runtimes by mistake, and resulted in the tst-loadbalance test using only one CPU on a two-CPU guest. The new scheduling algorithm follows a much more rigorous design, proposed by Avi Kivity in: https://docs.google.com/document/d/1W7KCxOxP-1Fy5EyF2lbJGE2WuKmu5v0suYqoHas1jRM/edit?usp=sharing To make a long story short (read the document if you want all the details), the new algorithm is based on a runtime measure R which is the running decaying average of the thread's running time. It is a decaying average in the sense that the thread's act of running or sleeping in recent history is given more weight than its behavior a long time ago. This measure R can tell us which of the runnable threads to run next (the one with the lowest R), and using some highschool-level mathematics, we can calculate for how long to run this thread until it should be preempted by the next one. R carries the same meaning on all CPUs, so CPU migration becomes trivial. The actual implementation uses a normalized version of R, called R'' (Rtt in the code), which is also explained in detail in the document. This Rtt allows updating just the running thread's runtime - not all threads' runtime - as time passes, making the whole calculation much more tractable. The benefits of the new scheduler code over the existing one are: 1. A more rigourous design with fewer unjustified heuristics. 2. A thread's runtime measurement correctly survives a migration to a different CPU, unlike the existing code (which sometimes botches it up, leading to threads hanging). In particular, tst-loadbalance now gives good results for the "intermittent thread" test, unlike the previous code which in 50% of the runs caused one CPU to be completely wasted (when the load- balancing thread hung). 3. The new algorithm can look at a much longer runtime history than the previous algorithm did. With the default tau=200ms, the one-cpu intermittent thread test of tst-scheduler now provides good fairness for sleep durations of 1ms-32ms. The previous algorithm was never fair in any of those tests. 4. The new algorithm is more deterministic in its use of timers (with thyst=2_ms: up to 500 timers a second), resulting in less varied performance in high-context-switch benchmarks like tst-ctxsw. This scheduler does very well on the fairness tests tst-scheduler and fairly well on tst-loadbalance. Even better performance on that second test will require an additional patch for the idle thread to wake other cpus' load balanacing threads. As expected the new scheduler is somewhat slower than the existing one (as we now do some relatively complex calculations instead of trivial integer operations), but thanks to using approximations when possible and to various other optimizations, the difference is relatively small: On my laptop, tst-ctxsw.so, which measures "context switch" time (actually, also including the time to use mutex and condvar which this test uses to cause context switching), on the "colocated" test I measured 355 ns with the old scheduler, and 382 ns with the new scheduler - meaning that the new scheduler adds 27ns of overhead to every context switch. To see that this penalty is minor, consider that tst-ctxsw is an extreme example, doing 3 million context switches a second, and even there it only slows down the workload by 7%. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
The schedule() and cpu::schedule() functions had a "yield" parameter. This parameter was inconsistently used (it's not clear why specific places called it with "true" and other with "false"), but moreover, was always ignored! So this patch removes the parameter of schedule(). If you really want a yield, call yield(), not schedule(). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Raphael S. Carvalho authored
v2: Check limit of microseconds, among other minor changes (Nadav Har'El, Avi Kivity). v3: Get rid of goto & label by adding an else clause (Nadav Har'El). - This patch adds utimes support. - This patch addresses the issue #93 Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Tested-by:
Tomasz Grabiec <tgrabiec@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Attribute flags were moved from 'bsd/sys/cddl/compat/opensolaris/sys/vnode.h' to 'include/osv/vnode_attr.h' 'bsd/sys/cddl/compat/opensolaris/sys/vnode.h' now includes 'include/osv/vnode_attr.h' exactly at the place the flags were previously located. 'fs/vfs/vfs.h' includes 'include/osv/vnode_attr.h' as functions that rely on the setattr feature must specify the flags respective to the attr fields that are going to be changed. Approach sugested by Nadav Har'El Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Tested-by:
Tomasz Grabiec <tgrabiec@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
This patch causes incorrect usage of percpu<>/PERCPU() to cause compilation errors instead of silent runtime corruptions. Thanks to Dmitry for first noticing this issue in xen_intr.cc (see his separate patch), and to Avi for suggesting a compile-time fix. With this patch: 1. Using percpu<...> to *define* a per-cpu variable fails compilation. Instead, PERCPU(...) must be used for the definition, which is important because it places the variable in the ".percpu" section. 2. If a *declaration* is needed additionally (e.g., for a static class member), percpu<...> must be used, not PERCPU(). Trying to use PERCPU() for declaration will cause a compilation error. 3. PERCPU() only works on statically-constructed objects - global variables, static function-variables and static class-members. Trying to use it on a dynamically-constructed object - stack variable, class field, or operator new - will cause a compilation error. With this patch, the bug in xen_intr.cc would have been caught at compile time. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 25, 2013
-
-
Pekka Enberg authored
Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Switch to demand paging for anonymous virtual memory. I used SPECjvm2008 to verify performance impact. The numbers are mostly the same with few exceptions, most visible in the 'serial' benchmark. However, there's quite a lot of variance between SPECjvm2008 runs so I wouldn't read too much into them. As we need the demand paging mechanism and the performance numbers suggest that the implementation is reasonable, I'd merge the patch as-is and see optimize it later. Before: Running specJVM2008 benchmarks on an OSV guest. Score on compiler.compiler: 331.23 ops/m Score on compiler.sunflow: 131.87 ops/m Score on compress: 118.33 ops/m Score on crypto.aes: 41.34 ops/m Score on crypto.rsa: 204.12 ops/m Score on crypto.signverify: 196.49 ops/m Score on derby: 170.12 ops/m Score on mpegaudio: 70.37 ops/m Score on scimark.fft.large: 36.68 ops/m Score on scimark.lu.large: 13.43 ops/m Score on scimark.sor.large: 22.29 ops/m Score on scimark.sparse.large: 29.35 ops/m Score on scimark.fft.small: 195.19 ops/m Score on scimark.lu.small: 233.95 ops/m Score on scimark.sor.small: 90.86 ops/m Score on scimark.sparse.small: 64.11 ops/m Score on scimark.monte_carlo: 145.44 ops/m Score on serial: 94.95 ops/m Score on sunflow: 73.24 ops/m Score on xml.transform: 207.82 ops/m Score on xml.validation: 343.59 ops/m After: Score on compiler.compiler: 346.78 ops/m Score on compiler.sunflow: 132.58 ops/m Score on compress: 116.05 ops/m Score on crypto.aes: 40.26 ops/m Score on crypto.rsa: 206.67 ops/m Score on crypto.signverify: 194.47 ops/m Score on derby: 175.22 ops/m Score on mpegaudio: 76.18 ops/m Score on scimark.fft.large: 34.34 ops/m Score on scimark.lu.large: 15.00 ops/m Score on scimark.sor.large: 24.80 ops/m Score on scimark.sparse.large: 33.10 ops/m Score on scimark.fft.small: 168.67 ops/m Score on scimark.lu.small: 236.14 ops/m Score on scimark.sor.small: 110.77 ops/m Score on scimark.sparse.small: 121.29 ops/m Score on scimark.monte_carlo: 146.03 ops/m Score on serial: 87.03 ops/m Score on sunflow: 77.33 ops/m Score on xml.transform: 205.73 ops/m Score on xml.validation: 351.97 ops/m Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Add permission flags to VMAs. They will be used by mprotect() and the page fault handler. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-