- May 21, 2014
-
-
Claudio Fontana authored
the thread_control_block structure needs to be different between x64 and AArch64; For AArch64's implementation for local execution, try to match the layout in glibc and the generated code. Do not align .tdata and .tbss sections with .tdata : ALIGN(64) or it will affect the TLS loads. Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com> Cc: Glauber Costa <glommer@cloudius-systems.com> Cc: Will Newton <will.newton@linaro.org> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- May 16, 2014
-
-
Claudio Fontana authored
move driver setup and console creation to arch-setup, and ioapic init for x64 to smp_launch, so that we can remove ifdefs and increase the amount of common code. Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com>
-
Claudio Fontana authored
allow execution to flow until main_cont so we can reach the backtrace. Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com>
-
Jani Kokkonen authored
implement fixup fault and the backtrace functionality which is its first simple user. Signed-off-by:
Jani Kokkonen <jani.kokkonen@huawei.com> [claudio: added elf changes to allow lookup and demangling to work] Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com>
-
- May 14, 2014
-
-
Claudio Fontana authored
and do it early (before the loop around init_array) Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
This introduces a simple timer-based sampling profiler which is reusing our tracing infrastructure to collect samples. To enable sampler from run.py run it like this: $ scripts/run.py ... --sampler [frequency] Where 'frequency' is an optional parameter for overriding sampling frequency. The default is 1000 (ticks per second). The bigger the frequency the bigger sampling overhead is. Too low values will hurt profile accuracy. Ad-hoc sampler enabling is planned. The code already takes that into account. To see the profile you need to extract the trace: $ trace extract And then show it like this: $ trace prof All 'prof' options can be applied, for example you can group by CPU: $ trace prof -g cpu Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Claudio Fontana authored
an effect of commit 9bbbe9dc is that no output is possible before prio 'console' initializers have been run. This change allows to have at least one API available really early (from boot code and premain). Document the requirements for the early console class regarding the write() method. Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- May 09, 2014
-
-
Jaspal Singh Dhillon authored
This patch fixes the case of silently hanging of OSv when >64 cpus are provided. Signed-off-by:
Jaspal Singh Dhillon <jaspal.iiith@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- May 05, 2014
-
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Apr 11, 2014
-
-
Nadav Har'El authored
Currently, in several cases when a bad command line is set in the image, such as an empty command line (as in "make image=empty") or one with invalid paramters (e.g., run.py -e "-a a"), we use abort(). abort() has two annoying "features" - it hangs the VM forever, and shows an ugly stack trace. Both are useful for a debugging - but it doesn't make sense to use a debugger when just the command line is misconfigured - we just need to print a message and power off the VM. Calling osv::poweroff() in this early time during the boot is fine after the previous patch which fixed osv::poweroff(). By the way, running a non-existant file (e.g., 'run.py -e a') already had this correct behavior of powering off, not hanging. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Apr 02, 2014
-
-
Claudio Fontana authored
get the command line and elf header start, then try to stay clear of apparent random limitations during boot early stage with the model, and set up vectors as soon as possible, to enable some minimal post-mortem info. Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com>
-
Claudio Fontana authored
Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com>
-
Nadav Har'El authored
Changes in v3, following Avi's review: * Use WITH_LOCK(migration_lock) instead of migrate_disable()/enable(). * Make the global RCU "generation" counter a static class variable, instead of static function variable. Rename it "next_generation" (the name "generation" was grossly overloaded previously) * In rcu_synchronize(), use migration_lock to be sure we wake up the thread to which we just added work. * Use thread_handle, instead of thread*, for percpu_quiescent_state_thread. This is safer (atomic variable, so we can't see it half-set on some esoteric CPU), and cleaner (no need to check t!=0). Thread_handle is a bit of an overkill here, but it's not in a performance sensitive area. The existing rcu_defer() used a global list of deferred work, protected by a global mutex. It also woke up the cleanup thread on every call. These decisions made rcu_dispose() noticably slower than a regular delete, to the point that when commit 70502950 introduced an rcu_dispose() to every poll() call, we saw performance of UDP memcached, which calls poll() on every request, drop by as much as 40%. The slowness of rcu_defer() was even more apparent in an artificial benchmark which repeatedly calls new and rcu_dispose from one or several concurrent threads. While on my machine a new/delete pair takes 24 ns, a new/rcu_dispose from a single thread (on a 4 cpus VM) takes a whopping 330 ns, and worse - when we have 4 threads on 4 cpus in a tight new/rcu_dispose loop, the mutex contention, the fact we free the memory on the "wrong" cpu, and the excessive context switches all bring the measurement to as much as 12,000 ns. With this patch the new/rcu_dispose numbers are down to 60 ns on a single thread (on 4 cpus) and 111 ns on 4 concurrent threads (on 4 cpus). This is a x5.5 - x120 speedup :-) This patch replaces the single list of functions with a per-cpu list. rcu_defer() can add more callbacks to this per-cpu list without a mutex, and instead of a single "garbage collection" thread running these callbacks, the per-cpu RCU thread, which we already had, is the one that runs the work deferred on this cpu's list. This per-cpu work is particularly effective for free() work (i.e., rcu_dispose()) because it is faster to free memory on the same CPU where it was allocated. This patch also eliminates the single "garbage collection" thread which the previous code needed. The per-CPU work queue has a fixed size, currently set to 2000 functions. It is actually a double-buffer, so we can continue to accumulate more work while cleaning up; If rcu_defer() is used so quickly that it outpaces the cleanup, rcu_defer() will wait while the buffer is no longer full. The choice of buffer size is a tradeoff between speed and memory: a larger buffer means fewer context switches (between the thread doing rcu_defer() and the RCU thread doing the cleanup), but also more memory temporarily being used by unfreed objects. Unlike the previous code, we do not wake up the cleanup thread after every rcu_defer(). When the RCU cleanup work is frequent but still small relative to the main work of the application (e.g., memcached server), the RCU cleanup thread would always have low runtime which meant we suffered a context switch on almost every wakeup of this thread by rcu_defer(). In this patch, we only wake up the cleanup thread when the buffer becomes full, so we have far fewer context switches. This means that currently rcu_defer() may delay the cleanup an unbounded amount of time. This is normally not a problem, and when it it, namely in rcu_synchronize(), we wake up the thread immediately. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Apr 01, 2014
-
-
Avi Kivity authored
This reverts commit d24cda2c. It wants migration_lock to be merged first. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
The existing rcu_defer() used a global list of deferred work, protected by a global mutex. It also woke up the cleanup thread on every call. These decisions made rcu_dispose() noticably slower than a regular delete, to the point that when commit 70502950 introduced an rcu_dispose() to every poll() call, we saw performance of UDP memcached, which calls poll() on every request, drop by as much as 40%. The slowness of rcu_defer() was even more apparent in an artificial benchmark which repeatedly calls new and rcu_dispose from one or several concurrent threads. While on my machine a new/delete pair takes 24 ns, a new/rcu_dispose from a single thread (on a 4 cpus VM) takes a whopping 330 ns, and worse - when we have 4 threads on 4 cpus in a tight new/rcu_dispose loop, the mutex contention, the fact we free the memory on the "wrong" cpu, and the excessive context switches all bring the measurement to as much as 12,000 ns. With this patch the new/rcu_dispose numbers are down to 60 ns on a single thread (on 4 cpus) and 111 ns on 4 concurrent threads (on 4 cpus). This is a x5.5 - x120 speedup :-) This patch replaces the single list of functions with a per-cpu list. rcu_defer() can add more callbacks to this per-cpu list without a mutex, and instead of a single "garbage collection" thread running these callbacks, the per-cpu RCU thread, which we already had, is the one that runs the work deferred on this cpu's list. This per-cpu work is particularly effective for free() work (i.e., rcu_dispose()) because it is faster to free memory on the same CPU where it was allocated. This patch also eliminates the single "garbage collection" thread which the previous code needed. The per-CPU work queue has a fixed size, currently set to 2000 functions. It is actually a double-buffer, so we can continue to accumulate more work while cleaning up; If rcu_defer() is used so quickly that it outpaces the cleanup, rcu_defer() will wait while the buffer is no longer full. The choice of buffer size is a tradeoff between speed and memory: a larger buffer means fewer context switches (between the thread doing rcu_defer() and the RCU thread doing the cleanup), but also more memory temporarily being used by unfreed objects. Unlike the previous code, we do not wake up the cleanup thread after every rcu_defer(). When the RCU cleanup work is frequent but still small relative to the main work of the application (e.g., memcached server), the RCU cleanup thread would always have low runtime which meant we suffered a context switch on almost every wakeup of this thread by rcu_defer(). In this patch, we only wake up the cleanup thread when the buffer becomes full, so we have far fewer context switches. This means that currently rcu_defer() may delay the cleanup an unbounded amount of time. This is normally not a problem, and when it it, namely in rcu_synchronize(), we wake up the thread immediately. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Mar 27, 2014
-
-
Raphael S. Carvalho authored
Previously, zfs device was being only provided to allow the use of commands needed to create the zpool, and so the file system. At that time, doing so was quite enough, however, making zfs device, i.e. /dev/zfs part of every OSv instance would allow us to use commands that will help analysing, debugging, tuning the zpool and file systems there contained. The basic explanation is that those commands use libzfs which in turn relies on /dev/zfs to communicate with the zfs code. Commands example: zpool, zfs, zdb. The latter one not being ported to OSv yet. This patch will also be helpful for the ongoing ztest porting. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Mar 25, 2014
-
-
Asias He authored
On VBOX and VMW, the version info is not printed correctly. Fix it by only print after our console is initialized. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Mar 24, 2014
-
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Mar 06, 2014
-
-
Asias He authored
This driver is for VMware's pvscsi disk. It has better performance than using AHCI device in VMware. This driver uses the common scsi code in scsi-common. This driver is written from scratch. QEMU and Linux pvscsi drivers were used as reference as there's no specification available. Tested on QEMU's pvscsi implementation and VMware Workstation. Signed-off-by:
Asias He <asias@cloudius-systems.com>
-
- Mar 04, 2014
-
-
Nadav Har'El authored
Removed an unused declaration, which is unnecessary and causes a warning in Eclipse. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 12, 2014
-
-
Asias He authored
AHCI is supported on various VMM, e.g. Virtual Box, VMware Workstation. Adding AHCI support enables OSv to run on them if the para-virtualized block device is not present or not supported yet. Tested on VirtualBox, VMware Workstation and QEMU. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 11, 2014
-
-
Claudio Fontana authored
move the arch-specific stuff in premain to arch/x64/arch-setup.cc. Introduce arch_init_premain() and arch_setup_tls(). arch_init_premain() is supposed to perform arch-specific initialization before the common premain code is run. arch_setup_tls() is run _after_ the common setup_tls code. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 07, 2014
-
-
Glauber Costa authored
When booting with --bootchart, OSv will print a summary of where is our boot time being spent up to the point right before our execution of main. This mechanism can be extended later to keep measuring it later using other facilities to account for the application, etc. Example output: OSv v0.05-156-gd3918a1 disk read (real mode): 132.94ms, (+132.94ms) .init functions: 146.10ms, (+13.16ms) SMP launched: 147.57ms, (+1.47ms) RCU initialized: 150.61ms, (+3.04ms) VFS initialized: 154.08ms, (+3.46ms) Network initialized: 160.79ms, (+6.71ms) pvpanic done: 162.31ms, (+1.52ms) pci enumerated: 171.45ms, (+9.14ms) drivers probe: 171.46ms, (+0.02ms) drivers loaded: 182.52ms, (+11.06ms) ZFS mounted: 2116.32ms, (+1933.80ms) Total time: 2116.70ms, (+0.38ms) Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 06, 2014
-
-
Nadav Har'El authored
When running a command in the background, do_main_thread() passes the command line in a std::vector pointer to a new pthread. Unfortunately, soon afterwards the vector can go out of scope and the result is a crash. Fix this oversight. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
This patch registers the ARC shrinker by using the event handler list from BSD side. When ARC is initialized, it inserts the lowmem event handler into an external event handler list. lowmem basically signals the reclaiming thread which will then wake up to decide which approach should be used to shrink the ARC. The memory pressure on OSv is activated when the 20% watermark is reached, so the shrink policy will decide which shrinker should be called on such events. bsd_shrinker_init is the responsible to find the lowmem event handler from the external list, and integrate it into our shrinker infrastructure. arc_lowmem needed few changes to return the amount of released memory from the ARC. Glauber and I tested the functionality by filling up the ARC up to its target, then allocating as much memory as possible to see if the ARC shrinker would take place to release memory back to the operating system. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Jan 27, 2014
-
-
Nadav Har'El authored
Remove unused #include of <drivers/clock.hh>. Except the clock drivers and <osv/clock.hh>, no source file now now include this header. Rather, <osv/clock.hh> should be used. Code including <sched.hh> will also get <osv/clock.hh> automatically. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Several source files include <drivers/clockevent.hh>, though this is a very low-level feature which they don't actually use. sched.cc does use <drivers/clockevent.hh>, but already gets it through sched.hh, so also doesn't need to include it explicitly. This patch removes the unnecessary includes. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 22, 2014
-
-
Nadav Har'El authored
Our loader's command line (what is given to the "-e" option of run.py) already allows running multiple commands (each a shared object with arguments) separated with a semicolon - e.g., run.py -e "program1.so; program2.so; program3.so" This patch allows, just like in Unix, to use a "&" instead of a ";", in which case the preceding program is run in the background, in our case this means in a new thread. For example, run.py -e "httpserver.so& java.so ..." As before a command line can constitute multiple commands, and whitespaces around the separators (; or &) are optional. Take care if you intend to run the *same* object multiple times concurrently, e.g., "something.so& something.so". For an object to support this use case, it should support its main() being called in parallel, and in particular avoid using global variables. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 21, 2014
-
-
Nadav Har'El authored
Add cwd (current directory) and env (environment variable) option to the loader. Can be useful for certain applications that expect to be in a certain directory, or certain environment variables to exist. Example usage: run.py -e "--cwd=/tmp /usr/bin/something.so" Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-