- Apr 02, 2014
-
-
Nadav Har'El authored
Changes in v3, following Avi's review: * Use WITH_LOCK(migration_lock) instead of migrate_disable()/enable(). * Make the global RCU "generation" counter a static class variable, instead of static function variable. Rename it "next_generation" (the name "generation" was grossly overloaded previously) * In rcu_synchronize(), use migration_lock to be sure we wake up the thread to which we just added work. * Use thread_handle, instead of thread*, for percpu_quiescent_state_thread. This is safer (atomic variable, so we can't see it half-set on some esoteric CPU), and cleaner (no need to check t!=0). Thread_handle is a bit of an overkill here, but it's not in a performance sensitive area. The existing rcu_defer() used a global list of deferred work, protected by a global mutex. It also woke up the cleanup thread on every call. These decisions made rcu_dispose() noticably slower than a regular delete, to the point that when commit 70502950 introduced an rcu_dispose() to every poll() call, we saw performance of UDP memcached, which calls poll() on every request, drop by as much as 40%. The slowness of rcu_defer() was even more apparent in an artificial benchmark which repeatedly calls new and rcu_dispose from one or several concurrent threads. While on my machine a new/delete pair takes 24 ns, a new/rcu_dispose from a single thread (on a 4 cpus VM) takes a whopping 330 ns, and worse - when we have 4 threads on 4 cpus in a tight new/rcu_dispose loop, the mutex contention, the fact we free the memory on the "wrong" cpu, and the excessive context switches all bring the measurement to as much as 12,000 ns. With this patch the new/rcu_dispose numbers are down to 60 ns on a single thread (on 4 cpus) and 111 ns on 4 concurrent threads (on 4 cpus). This is a x5.5 - x120 speedup :-) This patch replaces the single list of functions with a per-cpu list. rcu_defer() can add more callbacks to this per-cpu list without a mutex, and instead of a single "garbage collection" thread running these callbacks, the per-cpu RCU thread, which we already had, is the one that runs the work deferred on this cpu's list. This per-cpu work is particularly effective for free() work (i.e., rcu_dispose()) because it is faster to free memory on the same CPU where it was allocated. This patch also eliminates the single "garbage collection" thread which the previous code needed. The per-CPU work queue has a fixed size, currently set to 2000 functions. It is actually a double-buffer, so we can continue to accumulate more work while cleaning up; If rcu_defer() is used so quickly that it outpaces the cleanup, rcu_defer() will wait while the buffer is no longer full. The choice of buffer size is a tradeoff between speed and memory: a larger buffer means fewer context switches (between the thread doing rcu_defer() and the RCU thread doing the cleanup), but also more memory temporarily being used by unfreed objects. Unlike the previous code, we do not wake up the cleanup thread after every rcu_defer(). When the RCU cleanup work is frequent but still small relative to the main work of the application (e.g., memcached server), the RCU cleanup thread would always have low runtime which meant we suffered a context switch on almost every wakeup of this thread by rcu_defer(). In this patch, we only wake up the cleanup thread when the buffer becomes full, so we have far fewer context switches. This means that currently rcu_defer() may delay the cleanup an unbounded amount of time. This is normally not a problem, and when it it, namely in rcu_synchronize(), we wake up the thread immediately. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Apr 01, 2014
-
-
Avi Kivity authored
This reverts commit d24cda2c. It wants migration_lock to be merged first. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
The existing rcu_defer() used a global list of deferred work, protected by a global mutex. It also woke up the cleanup thread on every call. These decisions made rcu_dispose() noticably slower than a regular delete, to the point that when commit 70502950 introduced an rcu_dispose() to every poll() call, we saw performance of UDP memcached, which calls poll() on every request, drop by as much as 40%. The slowness of rcu_defer() was even more apparent in an artificial benchmark which repeatedly calls new and rcu_dispose from one or several concurrent threads. While on my machine a new/delete pair takes 24 ns, a new/rcu_dispose from a single thread (on a 4 cpus VM) takes a whopping 330 ns, and worse - when we have 4 threads on 4 cpus in a tight new/rcu_dispose loop, the mutex contention, the fact we free the memory on the "wrong" cpu, and the excessive context switches all bring the measurement to as much as 12,000 ns. With this patch the new/rcu_dispose numbers are down to 60 ns on a single thread (on 4 cpus) and 111 ns on 4 concurrent threads (on 4 cpus). This is a x5.5 - x120 speedup :-) This patch replaces the single list of functions with a per-cpu list. rcu_defer() can add more callbacks to this per-cpu list without a mutex, and instead of a single "garbage collection" thread running these callbacks, the per-cpu RCU thread, which we already had, is the one that runs the work deferred on this cpu's list. This per-cpu work is particularly effective for free() work (i.e., rcu_dispose()) because it is faster to free memory on the same CPU where it was allocated. This patch also eliminates the single "garbage collection" thread which the previous code needed. The per-CPU work queue has a fixed size, currently set to 2000 functions. It is actually a double-buffer, so we can continue to accumulate more work while cleaning up; If rcu_defer() is used so quickly that it outpaces the cleanup, rcu_defer() will wait while the buffer is no longer full. The choice of buffer size is a tradeoff between speed and memory: a larger buffer means fewer context switches (between the thread doing rcu_defer() and the RCU thread doing the cleanup), but also more memory temporarily being used by unfreed objects. Unlike the previous code, we do not wake up the cleanup thread after every rcu_defer(). When the RCU cleanup work is frequent but still small relative to the main work of the application (e.g., memcached server), the RCU cleanup thread would always have low runtime which meant we suffered a context switch on almost every wakeup of this thread by rcu_defer(). In this patch, we only wake up the cleanup thread when the buffer becomes full, so we have far fewer context switches. This means that currently rcu_defer() may delay the cleanup an unbounded amount of time. This is normally not a problem, and when it it, namely in rcu_synchronize(), we wake up the thread immediately. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Mar 27, 2014
-
-
Raphael S. Carvalho authored
Previously, zfs device was being only provided to allow the use of commands needed to create the zpool, and so the file system. At that time, doing so was quite enough, however, making zfs device, i.e. /dev/zfs part of every OSv instance would allow us to use commands that will help analysing, debugging, tuning the zpool and file systems there contained. The basic explanation is that those commands use libzfs which in turn relies on /dev/zfs to communicate with the zfs code. Commands example: zpool, zfs, zdb. The latter one not being ported to OSv yet. This patch will also be helpful for the ongoing ztest porting. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Mar 25, 2014
-
-
Asias He authored
On VBOX and VMW, the version info is not printed correctly. Fix it by only print after our console is initialized. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Mar 24, 2014
-
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Mar 06, 2014
-
-
Asias He authored
This driver is for VMware's pvscsi disk. It has better performance than using AHCI device in VMware. This driver uses the common scsi code in scsi-common. This driver is written from scratch. QEMU and Linux pvscsi drivers were used as reference as there's no specification available. Tested on QEMU's pvscsi implementation and VMware Workstation. Signed-off-by:
Asias He <asias@cloudius-systems.com>
-
- Mar 04, 2014
-
-
Nadav Har'El authored
Removed an unused declaration, which is unnecessary and causes a warning in Eclipse. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 12, 2014
-
-
Asias He authored
AHCI is supported on various VMM, e.g. Virtual Box, VMware Workstation. Adding AHCI support enables OSv to run on them if the para-virtualized block device is not present or not supported yet. Tested on VirtualBox, VMware Workstation and QEMU. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 11, 2014
-
-
Claudio Fontana authored
move the arch-specific stuff in premain to arch/x64/arch-setup.cc. Introduce arch_init_premain() and arch_setup_tls(). arch_init_premain() is supposed to perform arch-specific initialization before the common premain code is run. arch_setup_tls() is run _after_ the common setup_tls code. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 07, 2014
-
-
Glauber Costa authored
When booting with --bootchart, OSv will print a summary of where is our boot time being spent up to the point right before our execution of main. This mechanism can be extended later to keep measuring it later using other facilities to account for the application, etc. Example output: OSv v0.05-156-gd3918a1 disk read (real mode): 132.94ms, (+132.94ms) .init functions: 146.10ms, (+13.16ms) SMP launched: 147.57ms, (+1.47ms) RCU initialized: 150.61ms, (+3.04ms) VFS initialized: 154.08ms, (+3.46ms) Network initialized: 160.79ms, (+6.71ms) pvpanic done: 162.31ms, (+1.52ms) pci enumerated: 171.45ms, (+9.14ms) drivers probe: 171.46ms, (+0.02ms) drivers loaded: 182.52ms, (+11.06ms) ZFS mounted: 2116.32ms, (+1933.80ms) Total time: 2116.70ms, (+0.38ms) Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 06, 2014
-
-
Nadav Har'El authored
When running a command in the background, do_main_thread() passes the command line in a std::vector pointer to a new pthread. Unfortunately, soon afterwards the vector can go out of scope and the result is a crash. Fix this oversight. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
This patch registers the ARC shrinker by using the event handler list from BSD side. When ARC is initialized, it inserts the lowmem event handler into an external event handler list. lowmem basically signals the reclaiming thread which will then wake up to decide which approach should be used to shrink the ARC. The memory pressure on OSv is activated when the 20% watermark is reached, so the shrink policy will decide which shrinker should be called on such events. bsd_shrinker_init is the responsible to find the lowmem event handler from the external list, and integrate it into our shrinker infrastructure. arc_lowmem needed few changes to return the amount of released memory from the ARC. Glauber and I tested the functionality by filling up the ARC up to its target, then allocating as much memory as possible to see if the ARC shrinker would take place to release memory back to the operating system. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Jan 27, 2014
-
-
Nadav Har'El authored
Remove unused #include of <drivers/clock.hh>. Except the clock drivers and <osv/clock.hh>, no source file now now include this header. Rather, <osv/clock.hh> should be used. Code including <sched.hh> will also get <osv/clock.hh> automatically. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Several source files include <drivers/clockevent.hh>, though this is a very low-level feature which they don't actually use. sched.cc does use <drivers/clockevent.hh>, but already gets it through sched.hh, so also doesn't need to include it explicitly. This patch removes the unnecessary includes. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 22, 2014
-
-
Nadav Har'El authored
Our loader's command line (what is given to the "-e" option of run.py) already allows running multiple commands (each a shared object with arguments) separated with a semicolon - e.g., run.py -e "program1.so; program2.so; program3.so" This patch allows, just like in Unix, to use a "&" instead of a ";", in which case the preceding program is run in the background, in our case this means in a new thread. For example, run.py -e "httpserver.so& java.so ..." As before a command line can constitute multiple commands, and whitespaces around the separators (; or &) are optional. Take care if you intend to run the *same* object multiple times concurrently, e.g., "something.so& something.so". For an object to support this use case, it should support its main() being called in parallel, and in particular avoid using global variables. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 21, 2014
-
-
Nadav Har'El authored
Add cwd (current directory) and env (environment variable) option to the loader. Can be useful for certain applications that expect to be in a certain directory, or certain environment variables to exist. Example usage: run.py -e "--cwd=/tmp /usr/bin/something.so" Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 16, 2014
-
-
Pekka Enberg authored
Remove the copyright statement from boot log but keep the OSv version banner there to tidy up boot. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Since recently, debug() messages are by default not visible. This means they are not a good way to print error messages :-) This patch fixes two error messages from loader.cc, that were no longer visible. For example, "scripts/run.py -e xyz" completed without any message instead of telling the user that "xyz" doesn't exist. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 15, 2014
-
-
Eduardo Piva authored
Add --verbose options to OSv, so it will flush buffer after handling the kernel option and set verbose boolean flag in debug code. Signed-off-by:
Eduardo Piva <efpiva@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 10, 2014
-
-
Nadav Har'El authored
When trying to run a nonexistant program, e.g., "run.py -e xyz", sometimes (especially in a debug build) we would see the spurious message: program xyz returned -16384 The bug is simple: when osv::run returns null, meaning the program was not found, it does not set the "ret" value. Our code checked if ret!=0 before checking if osv::run actually found the program, which is wrong. Simply changing the order of the code solves this bug. Fixes #156. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Add --vga on cmdline, to switch vga console. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
It's 2014 now. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 02, 2014
-
-
Tomasz Grabiec authored
In order to reuse the logic it needs to be extracted. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 27, 2013
-
-
Vlad Zolotarov authored
Signed-off-by:
Vlad Zolotarov <vladz@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 20, 2013
-
-
Raphael S. Carvalho authored
Currently, library objects are being leaked by run_main() on success of osv::run() which, in addition to leaking memory, makes the dcache leak directory entries that causes further problems. Releasing library objects is fine as even dependent objects will be released automatically. I have tested it, and dcache hasn't any leaked dentries anymore. This problem was found in our attempt to implement dentry hierarchy with help for Avi Kivity. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Reviewed-by:
Asias He <asias@cloudius-systems.com> [ penberg: improve changelog ] Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
We are under the virtio namespace, it makes no sense to repeat the virtio prefix again in the virito_rng driver. Change the naming from virtio::virtio_rng to virtio::rng. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
We are under the virtio namespace, it makes no sense to repeat the virtio prefix again in the virito_net driver. Change the naming from virtio::virtio_net to virtio::net. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 19, 2013
-
-
Asias He authored
We are under the virtio namespace, it makes no sense to repeat the virtio prefix again in the virito_blk driver. Change the naming from virtio::virtio_blk to virtio::blk. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-