Commits · e5fc1f1b7c4743c7ca54262c4515b4f1b1003638 · Verlässliche Systemsoftware / projects / osv

Apr 02, 2014

Nadav Har'El authored 10 years ago


Changes in v3, following Avi's review:
* Use WITH_LOCK(migration_lock) instead of migrate_disable()/enable().
* Make the global RCU "generation" counter a static class variable,
  instead of static function variable. Rename it "next_generation"
  (the name "generation" was grossly overloaded previously)
* In rcu_synchronize(), use migration_lock to be sure we wake up the
  thread to which we just added work.
* Use thread_handle, instead of thread*, for percpu_quiescent_state_thread.
  This is safer (atomic variable, so we can't see it half-set on some
  esoteric CPU), and cleaner (no need to check t!=0). Thread_handle is
  a bit of an overkill here, but it's not in a performance sensitive area.

The existing rcu_defer() used a global list of deferred work, protected by
a global mutex. It also woke up the cleanup thread on every call. These
decisions made rcu_dispose() noticably slower than a regular delete, to the
point that when commit 70502950 introduced
an rcu_dispose() to every poll() call, we saw performance of UDP memcached,
which calls poll() on every request, drop by as much as 40%.

The slowness of rcu_defer() was even more apparent in an artificial benchmark
which repeatedly calls new and rcu_dispose from one or several concurrent
threads. While on my machine a new/delete pair takes 24 ns, a new/rcu_dispose
from a single thread (on a 4 cpus VM) takes a whopping 330 ns, and worse -
when we have 4 threads on 4 cpus in a tight new/rcu_dispose loop, the mutex
contention, the fact we free the memory on the "wrong" cpu, and the excessive
context switches all bring the measurement to as much as 12,000 ns.

With this patch the new/rcu_dispose numbers are down to 60 ns on a single
thread (on 4 cpus) and 111 ns on 4 concurrent threads (on 4 cpus). This is
a x5.5 - x120 speedup :-)

This patch replaces the single list of functions with a per-cpu list.
rcu_defer() can add more callbacks to this per-cpu list without a mutex,
and instead of a single "garbage collection" thread running these callbacks,
the per-cpu RCU thread, which we already had, is the one that runs the work
deferred on this cpu's list. This per-cpu work is particularly effective
for free() work (i.e., rcu_dispose()) because it is faster to free memory
on the same CPU where it was allocated. This patch also eliminates the
single "garbage collection" thread which the previous code needed.

The per-CPU work queue has a fixed size, currently set to 2000 functions.
It is actually a double-buffer, so we can continue to accumulate more work
while cleaning up; If rcu_defer() is used so quickly that it outpaces the
cleanup, rcu_defer() will wait while the buffer is no longer full.
The choice of buffer size is a tradeoff between speed and memory: a larger
buffer means fewer context switches (between the thread doing rcu_defer()
and the RCU thread doing the cleanup), but also more memory temporarily
being used by unfreed objects.

Unlike the previous code, we do not wake up the cleanup thread after
every rcu_defer(). When the RCU cleanup work is frequent but still small
relative to the main work of the application (e.g., memcached server),
the RCU cleanup thread would always have low runtime which meant we suffered
a context switch on almost every wakeup of this thread by rcu_defer().
In this patch, we only wake up the cleanup thread when the buffer becomes
full, so we have far fewer context switches. This means that currently
rcu_defer() may delay the cleanup an unbounded amount of time. This is
normally not a problem, and when it it, namely in rcu_synchronize(),
we wake up the thread immediately.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e5fc1f1b

Apr 01, 2014

Revert "rcu: Per-CPU rcu_defer()" · 6d68d1ab

Avi Kivity authored 10 years ago


This reverts commit d24cda2c.  It wants
migration_lock to be merged first.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

6d68d1ab

rcu: Per-CPU rcu_defer() · d24cda2c

Nadav Har'El authored 10 years ago

The existing rcu_defer() used a global list of deferred work, protected by
a global mutex. It also woke up the cleanup thread on every call. These
decisions made rcu_dispose() noticably slower than a regular delete, to the
point that when commit 70502950 introduced
an rcu_dispose() to every poll() call, we saw performance of UDP memcached,
which calls poll() on every request, drop by as much as 40%.

The slowness of rcu_defer() was even more apparent in an artificial benchmark
which repeatedly calls new and rcu_dispose from one or several concurrent
threads. While on my machine a new/delete pair takes 24 ns, a new/rcu_dispose
from a single thread (on a 4 cpus VM) takes a whopping 330 ns, and worse -
when we have 4 threads on 4 cpus in a tight new/rcu_dispose loop, the mutex
contention, the fact we free the memory on the "wrong" cpu, and the excessive
context switches all bring the measurement to as much as 12,000 ns.

With this patch the new/rcu_dispose numbers are down to 60 ns on a single
thread (on 4 cpus) and 111 ns on 4 concurrent threads (on 4 cpus). This is
a x5.5 - x120 speedup :-)

This patch replaces the single list of functions with a per-cpu list.
rcu_defer() can add more callbacks to this per-cpu list without a mutex,
and instead of a single "garbage collection" thread running these callbacks,
the per-cpu RCU thread, which we already had, is the one that runs the work
deferred on this cpu's list. This per-cpu work is particularly effective
for free() work (i.e., rcu_dispose()) because it is faster to free memory
on the same CPU where it was allocated. This patch also eliminates the
single "garbage collection" thread which the previous code needed.

The per-CPU work queue has a fixed size, currently set to 2000 functions.
It is actually a double-buffer, so we can continue to accumulate more work
while cleaning up; If rcu_defer() is used so quickly that it outpaces the
cleanup, rcu_defer() will wait while the buffer is no longer full.
The choice of buffer size is a tradeoff between speed and memory: a larger
buffer means fewer context switches (between the thread doing rcu_defer()
and the RCU thread doing the cleanup), but also more memory temporarily
being used by unfreed objects.

Unlike the previous code, we do not wake up the cleanup thread after
every rcu_defer(). When the RCU cleanup work is frequent but still small
relative to the main work of the application (e.g., memcached server),
the RCU cleanup thread would always have low runtime which meant we suffered
a context switch on almost every wakeup of this thread by rcu_defer().
In this patch, we only wake up the cleanup thread when the buffer becomes
full, so we have far fewer context switches. This means that currently
rcu_defer() may delay the cleanup an unbounded amount of time. This is
normally not a problem, and when it it, namely in rcu_synchronize(),
we wake up the thread immediately.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

d24cda2c

Mar 27, 2014

drivers: Add zfs device to allow use of zfs commands · ff3534e2

Raphael S. Carvalho authored 10 years ago


Previously, zfs device was being only provided to allow the use of
commands needed to create the zpool, and so the file system.
At that time, doing so was quite enough, however, making zfs
device, i.e. /dev/zfs part of every OSv instance would allow us
to use commands that will help analysing, debugging, tuning
the zpool and file systems there contained.

The basic explanation is that those commands use libzfs which in
turn relies on /dev/zfs to communicate with the zfs code.

Commands example:
zpool, zfs, zdb. The latter one not being ported to OSv yet.
This patch will also be helpful for the ongoing ztest porting.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ff3534e2

Mar 25, 2014

loader: Print OSv version info correctly · c9c94f51

Asias He authored 11 years ago


On VBOX and VMW, the version info is not printed correctly.
Fix it by only print after our console is initialized.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c9c94f51

Mar 24, 2014

Add vmxnet3 driver · 7a78ad84

Takuya ASADA authored 11 years ago


Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7a78ad84

Mar 06, 2014

vmw-pvscsi: Initial support · 81fdc730

Asias He authored 11 years ago


This driver is for VMware's pvscsi disk. It has better performance than
using AHCI device in VMware. This driver uses the common scsi code in
scsi-common.

This driver is written from scratch. QEMU and Linux pvscsi drivers were
used as reference as there's no specification available.

Tested on QEMU's pvscsi implementation and VMware Workstation.

Signed-off-by: Asias He <asias@cloudius-systems.com>

81fdc730

Mar 04, 2014

loader: remove unused declaration · f19c137a

Nadav Har'El authored 11 years ago


Removed an unused declaration, which is unnecessary and causes a warning
in Eclipse.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f19c137a

Feb 12, 2014

ahci: Initial support · 1ee8e2d1

Asias He authored 11 years ago


AHCI is supported on various VMM, e.g. Virtual Box, VMware Workstation.
Adding AHCI support enables OSv to run on them if the para-virtualized
block device is not present or not supported yet.

Tested on VirtualBox, VMware Workstation and QEMU.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

1ee8e2d1

Feb 11, 2014

loader: move x64-specific stuff from premain · a06c22d7

Claudio Fontana authored 11 years ago


move the arch-specific stuff in premain to
arch/x64/arch-setup.cc.

Introduce arch_init_premain() and arch_setup_tls().

arch_init_premain() is supposed to perform arch-specific
initialization before the common premain code is run.

arch_setup_tls() is run _after_ the common setup_tls code.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a06c22d7

Feb 07, 2014

loader: add a bootchart option. · 2209c95a

Glauber Costa authored 11 years ago


When booting with --bootchart, OSv will print a summary of where is our boot
time being spent up to the point right before our execution of main.

This mechanism can be extended later to keep measuring it later using other
facilities to account for the application, etc.

Example output:

OSv v0.05-156-gd3918a1
    disk read (real mode): 132.94ms, (+132.94ms)
    .init functions: 146.10ms, (+13.16ms)
    SMP launched: 147.57ms, (+1.47ms)
    RCU initialized: 150.61ms, (+3.04ms)
    VFS initialized: 154.08ms, (+3.46ms)
    Network initialized: 160.79ms, (+6.71ms)
    pvpanic done: 162.31ms, (+1.52ms)
    pci enumerated: 171.45ms, (+9.14ms)
    drivers probe: 171.46ms, (+0.02ms)
    drivers loaded: 182.52ms, (+11.06ms)
    ZFS mounted: 2116.32ms, (+1933.80ms)
    Total time: 2116.70ms, (+0.38ms)

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2209c95a

loader: measure some key points · 61842f11

Glauber Costa authored 11 years ago


Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

61842f11

Feb 06, 2014

Fix loader.cc parallel running bug · 8947cb9a

Nadav Har'El authored 11 years ago


When running a command in the background, do_main_thread() passes the
command line in a std::vector pointer to a new pthread. Unfortunately,
soon afterwards the vector can go out of scope and the result is a
crash. Fix this oversight.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8947cb9a

bsd: Register ARC shrinker · 0560d33f

Raphael S. Carvalho authored 11 years ago


This patch registers the ARC shrinker by using the event handler
list from BSD side. When ARC is initialized, it inserts the lowmem event
handler into an external event handler list. lowmem basically signals
the reclaiming thread which will then wake up to decide which approach
should be used to shrink the ARC.

The memory pressure on OSv is activated when the 20% watermark is
reached, so the shrink policy will decide which shrinker should
be called on such events.

bsd_shrinker_init is the responsible to find the lowmem event handler
from the external list, and integrate it into our shrinker infrastructure.

arc_lowmem needed few changes to return the amount of released memory
from the ARC.

Glauber and I tested the functionality by filling up the ARC up
to its target, then allocating as much memory as possible to see
if the ARC shrinker would take place to release memory back to
the operating system.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0560d33f

ide: register ide driver · e4b60706

Takuya ASADA authored 11 years ago


Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e4b60706

Jan 27, 2014

clock: Remove unnecessary #include <drivers/clock.hh> · 8bf2fedd

Nadav Har'El authored 11 years ago


Remove unused #include of <drivers/clock.hh>.
Except the clock drivers and <osv/clock.hh>, no source file now now
include this header. Rather, <osv/clock.hh> should be used. Code including
<sched.hh> will also get <osv/clock.hh> automatically.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8bf2fedd

clock: Avoid #include <drivers/clockevent.hh> · e1c4aa82

Nadav Har'El authored 11 years ago


Several source files include <drivers/clockevent.hh>, though this is a
very low-level feature which they don't actually use.

sched.cc does use <drivers/clockevent.hh>, but already gets it through
sched.hh, so also doesn't need to include it explicitly.

This patch removes the unnecessary includes.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e1c4aa82

Jan 22, 2014

loader: Add support for "&" in command line · 995fe31a

Nadav Har'El authored 11 years ago


Our loader's command line (what is given to the "-e" option of run.py)
already allows running multiple commands (each a shared object with
arguments) separated with a semicolon - e.g.,

    run.py -e "program1.so; program2.so; program3.so"

This patch allows, just like in Unix, to use a "&" instead of a ";",
in which case the preceding program is run in the background, in our case
this means in a new thread.

For example,
    run.py -e "httpserver.so& java.so ..."

As before a command line can constitute multiple commands, and whitespaces
around the separators (; or &) are optional.

Take care if you intend to run the *same* object multiple times concurrently,
e.g., "something.so& something.so". For an object to support this use case,
it should support its main() being called in parallel, and in particular
avoid using global variables.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

995fe31a

include: Move debug.hh to include/osv · 7809519b
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
7809519b
include: Move mempool.hh to include/osv · 9c95f49d
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
9c95f49d
include: Move tls.hh to include/osv · f7e2eb41
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
f7e2eb41
include: Move dhcp.hh to include/osv · f880005c
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
f880005c
include: Move elf.hh to include/osv · b8034e34
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
b8034e34
include: Move commands.hh to include/osv · 86110819
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
86110819
include: Move barrier.hh to include/osv · c80be886
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
c80be886
include: Move sched.hh to include/osv · fae5693e
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
fae5693e

Jan 21, 2014

loader: add cwd and env options · 33034397

Nadav Har'El authored 11 years ago


Add cwd (current directory) and env (environment variable) option to the
loader. Can be useful for certain applications that expect to be in a
certain directory, or certain environment variables to exist.

Example usage:

   run.py -e "--cwd=/tmp /usr/bin/something.so"

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

33034397

Jan 16, 2014

loader: Remove copyright statement from boot log · a07a7dfc

Pekka Enberg authored 11 years ago


Remove the copyright statement from boot log but keep the OSv version
banner there to tidy up boot.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a07a7dfc

loader: Bring back OSv version at boot · 822a93ba
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
822a93ba

loader.cc: make error messages visible · 116b4988

Nadav Har'El authored 11 years ago

Since recently, debug() messages are by default not visible. This means
they are not a good way to print error messages :-)

This patch fixes two error messages from loader.cc, that were no longer
visible. For example, "scripts/run.py -e xyz" completed without any message
instead of telling the user that "xyz" doesn't exist.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

116b4988

Jan 15, 2014

loader: Add --verbose option to OSv · e9092426

Eduardo Piva authored 11 years ago


Add --verbose options to OSv, so it will flush buffer after
handling the kernel option and set verbose boolean flag in
debug code.

Signed-off-by: Eduardo Piva <efpiva@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e9092426

Jan 10, 2014

loader: fix spurious error message · 4931c574

Nadav Har'El authored 11 years ago


When trying to run a nonexistant program, e.g., "run.py -e xyz", sometimes
(especially in a debug build) we would see the spurious message:

    program xyz returned -16384

The bug is simple: when osv::run returns null, meaning the program was not
found, it does not set the "ret" value. Our code checked if ret!=0 before
checking if osv::run actually found the program, which is wrong. Simply
changing the order of the code solves this bug.

Fixes #156.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4931c574

loader: Add '--vga' to switch console · 66d02657

Takuya ASADA authored 11 years ago


Add --vga on cmdline, to switch vga console.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

66d02657

loader: Update Cloudius copyright · 7740d2cb
Pekka Enberg authored 11 years ago
```
It's 2014 now.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
7740d2cb

Jan 02, 2014

core: extract graceful shutdown logic · 8b616285

Tomasz Grabiec authored 11 years ago


In order to reuse the logic it needs to be extracted.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8b616285

Dec 27, 2013

loader.cc: Print a message when a program returns a non-zero status · e89cc204

Vlad Zolotarov authored 11 years ago


Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e89cc204

Dec 20, 2013

loader: Fix leak on library objects · 0dc50432

Raphael S. Carvalho authored 11 years ago


Currently, library objects are being leaked by run_main() on success of
osv::run() which, in addition to leaking memory, makes the dcache leak
directory entries that causes further problems.

Releasing library objects is fine as even dependent objects will be
released automatically. I have tested it, and dcache hasn't any leaked
dentries anymore.

This problem was found in our attempt to implement dentry hierarchy with
help for Avi Kivity.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Reviewed-by: Asias He <asias@cloudius-systems.com>
[ penberg: improve changelog ]
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0dc50432

virtio-rng: Make the class name to virtio::rng. · 359db0e5

Asias He authored 11 years ago


We are under the virtio namespace, it makes no sense to repeat the
virtio prefix again in the virito_rng driver.

Change the naming from virtio::virtio_rng to virtio::rng.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

359db0e5

virtio-net: Make the class name to virtio::net. · f8142f23

Asias He authored 11 years ago


We are under the virtio namespace, it makes no sense to repeat the
virtio prefix again in the virito_net driver.

Change the naming from virtio::virtio_net to virtio::net.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f8142f23

Dec 19, 2013

virtio-blk: Make the class name to virtio::blk. · 9e971a49

Asias He authored 11 years ago


We are under the virtio namespace, it makes no sense to repeat the
virtio prefix again in the virito_blk driver.

Change the naming from virtio::virtio_blk to virtio::blk.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9e971a49