Commits · f85f721879367ce9664ed701edcad5579faccf4d · Verlässliche Systemsoftware / projects / osv

Jun 24, 2013

zfs: build range lock code · f85f7218
Christoph Hellwig authored 11 years ago

f85f7218
zfs: use proper typing for log replay functions · b16e4bed
Christoph Hellwig authored 11 years ago

b16e4bed
zfs: implement sync and statfs · 670d68b9
Christoph Hellwig authored 11 years ago

670d68b9
zfs: disable ZFS_ENTER/ZFS_EXIT for now · 910408da
Christoph Hellwig authored 11 years ago

910408da
solaris: fix <sys/extdirent.h> compilation · cdc7ec2e
Christoph Hellwig authored 11 years ago

cdc7ec2e
solaris: provide a stub VN_RELE_ASYNC · 055829cf
Christoph Hellwig authored 11 years ago

055829cf
solaris: fix compat <sys/dirent.h> · 17f62d21
Christoph Hellwig authored 11 years ago

17f62d21
solaris: add a <sys/namei.h> stub · 12094d2c
Christoph Hellwig authored 11 years ago

12094d2c
solaris: add a <sys/fcntl.h> stub · acf46377
Christoph Hellwig authored 11 years ago

acf46377

Fix join() hang bug (issue #11) · 60f45d4d

Nadav Har'El authored 11 years ago

This patch solves issue #11, where join() hangs were seen in tst-mutex, with
threads remaining in "terminating" state.

The problem was that complete() assumed that _cpu->terminating_thread was
nullptr, so it could overwrite it with the current thread. This is usually
true - when we switch to any thread, if terminating_thread!=0 it is
handled. The problem is that when we switch to a *new* thread (in
sched::init(), call switch_to_first()) this code does not run, and if
this thread quickly terminates, _cpu->terminating_thread gets overwritten
instead of being handled.

The simplest workaround, in this patch is simply to handle (i.e., call
unref()) the previous _cpu->terminating_thread before overwriting it.

The downside with this approach is that the termination of a thread
may be delayed by the run time of the first time-slice of a new thread.

But we anyway plan to eventually replace this termination mechanism
(see issue #10), so I think this solution is fine.

60f45d4d

lfmutex: add tracepoints for debug purposes · 0f9b64cb
Guy Zana authored 11 years ago

0f9b64cb

run.js: fix argv handling, use String[] as in the java command · 0e21676c

Guy Zana authored 11 years ago

Starting the CLI and using the run command by specifying it as a run.py argument
didn't work due to a cast problem (run expected NativeArray).

previousely this didn't work:

$ sudo ./scripts/run.py -n -e "java.so -jar /java/cli.jar run tools/netserver-osv -D -4 -f -N" -c2 -m1G

0e21676c

Revert "netperf", it was supposed to be local commit only · c58cd186
Dor Laor authored 11 years ago
```
This reverts commit ccaad30c.
```
c58cd186
Remove duplicated virtio bits definition · f9d05929
Dor Laor authored 11 years ago
```
VIRTIO_RING_F_EVENT_IDX and VIRTIO_RING_F_INDIRECT_DESC are
defined in virtio.hh
```
f9d05929
netperf · ccaad30c
Dor Laor authored 11 years ago

ccaad30c

Jun 23, 2013

Remove redundant assignment in tst-condvar · ea63416b

Nadav Har'El authored 11 years ago

No need to assign the condvar initializer, this is C++ after all and objects
are initialized by default anyway.

ea63416b

Fix crash on use of deleted callout · 7b82d241

Guy Zana authored 11 years ago


When a callout is deleted, it is properly deleted from the set of
callouts, but if it was the next-in-line to run, it was also saved
in a local variable while waiting for its timer to expire, and could
be run despite being deleted.

The Shrew test HTTP server (see bug 7) exposed this bug - every once
in a while (usually very quickly) when a socket was deleted had a
crash when a callout referring to the deleted socket was run.

Thanks to Guy for finding and fixing this bug.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>

7b82d241

zfs: implement zfs_import_rootpool for OSv · 1483deb5
Christoph Hellwig authored 11 years ago

1483deb5
zfs: fix vdev_disk_start_bio return value · 1c2f6592
Christoph Hellwig authored 11 years ago

1c2f6592
zfs: fix vdev_disk_physio · 433320e5
Christoph Hellwig authored 11 years ago

433320e5
zfs: use the corrct device name for device_open in vdev_disk_open · 6e099f6c
Christoph Hellwig authored 11 years ago

6e099f6c
solaris: make sure the nvpair code picks up the byte order defintions · 698d9067
Christoph Hellwig authored 11 years ago

698d9067

solaris: fix byteorder.h to behave correctly · a4a0f9fb

Christoph Hellwig authored 11 years ago

Without <endian.h> we won't pick up any byte order defines. In addition
Solaris also uses single underscore prefixed versions of the defines instead
of the more common non- or double underscore prefixed ones.

a4a0f9fb

solaris: remove incorrect endianess defines · 0b0fc5df
Christoph Hellwig authored 11 years ago

0b0fc5df
virtio-blk: fix size type · 8c7d6bfb
Christoph Hellwig authored 11 years ago

8c7d6bfb

sched: solve the thread completion/destruction race · 96ee4249

Nadav Har'El authored 11 years ago

Our code had a serious bug in thread completion: When a thread complete()s,
i.e., finishes its work, it wake()s the thread doing join() on it, and
that joiner thread in turn deletes the completed thread and its stack.
On rare occasions, the wake() was very slow but the joiner thread was very
quick in deleting the thread - leading to a crash on return (retq)
from wake() because the stack on which it was running has been deleted.

This patch includes a simple, but effective, fix for this bug:

We add a new per-cpu field, cpu::terminating_thread. complete() no longer
calls unref() itself - as the thread unref()ing itself caused the bug.
Instead, complete() just sets terminating_thread to the current thread.
After the scheduler on this CPU switches to the next thread, we
call unref() on the thread specified in terminating_thread. We know
this is safe because this thread is no longer running.

This fix seems simple and effective (the crashes that were apparent
in tst-wake and the sunflow benchmark seem to be gone, as far as I
can tell). Its biggest downside is an extra "if" on every context switch.
It is possible to devise different solutions, without the cost of the
extra if, but these solutions are more complicated and require a lot
more code changes. I'll add a bug-tracker entry documenting them.

96ee4249

Move two todo/* files to bug tracker · 6278271a
Nadav Har'El authored 11 years ago

6278271a

Jun 21, 2013

virtio: respect host virtqueue notification suppression · e0355a8e

Guy Zana authored 11 years ago

same as we can tell the host to disable interrupts via the _avail ring,
the host can tell us to supress notification via the _used ring.
every notificaion, or kick consumes about 10ns as it is implemented as
writing to an io port, which travels to usespace qemu in the host.

this simple patch, increase netperf's throughput by 600%, from a
300mbps to 1800mbps.

e0355a8e

virtio: fix wrong size register accesses · 4ff04bd1
Guy Zana authored 11 years ago

4ff04bd1

Jun 20, 2013

Limit the usage of indirect buffers · 5b751612

Dor Laor authored 11 years ago

Indirect is good for very large SG list but isn't required
in case there is enough place on the ring or the SG list is tiny.
For the time being there is barely use of it so I set it off
by default

5b751612

Add mergeable buffers support for virtio-net · d487ffd1

Dor Laor authored 11 years ago

The feature allows the hypervisor to batch several packets together
as one large SG list. Once such header is received, the guest rx
routine interates over the list and assembles a mega mbuf.

The patch also simplifies the rx path by using a single buffer for
the virtio data and its header. This shrinks the sg list from size of
two into a single one.

The issue is that at the moment I haven't seen packets w/ mbuf > 1
being received. Linux guest does receives such packets here and there.
It may be due to the use of offload features that enalrge the packet size

d487ffd1

Jun 19, 2013

rwlock: initial implementation for a rwlock · 6af9f16d

Guy Zana authored 11 years ago

this rwlock gives precedence to writers, it relies on a mutex and 2 condvars
for it's implementation.

it also supports taking the lock recursively for both readers and writers.

this implementation is not fully tested but yet the TCP stack uses it
extensively, so far without any seen races (tested TCPDownload and netperf).

6af9f16d

bsd: avoid using extern "C" in c++ files · 8536721b

Guy Zana authored 11 years ago

1. it is much cleaner that the header files perform extern "C" themselves,
   so they can be included both from C and C++ code.

2. when doing extern "C" from a C++ file then __cplusplus is also defined,
   and compilation can break in some situations.

3. as a bonus, this patch increase compilation time.

8536721b

bsd netport: rename log() to bsd_log() · bdd2b656

Nadav Har'El authored 11 years ago

netport.h defines a log() macro, which is an unfortunate choice of name
because log is also a pretty-well-known mathematical function, and this

So rename this macro bsd_log(), and change the dozen files which used
log() to use bsd_log().

bdd2b656

console: implement FIOREAD · ec2e5ebc

Nadav Har'El authored 11 years ago

Java's os::available() requires the FIONREAD on fds which do not
implement seek. So we need to support this ioctl for the console.

ec2e5ebc

lock-free mutex: use wake_with() · b4670b9c
Nadav Har'El authored 11 years ago
```
Use the new wake_with() in lock-free mutex
```
b4670b9c

condvar: improve wake performance while maintaining correctness · 43e34bbf

Nadav Har'El authored 11 years ago

Before commit 1b53ec56,
condvar_wake_all() had a crash which could be seen tst-pipe.so (which
apparently tests some condvar code paths that weren't tested by
tst-condvar.so).

The fix in this commit was for condvar_wait() to regain the condvar
internal lock after the wait, even if not needed (it's only really
needed in case of timeout). This masked the bug (see details below)
but also deteriorated performance: the woken up thread will now often
goes back to sleep to wait for the lock which is still held by
condvar_wake().

This patch reverts that commit, i.e., condvar_wait() does not retake
the lock when woken. Instead, we fix the real bug: The bug was in
condvar_wake_all() which did:

        sched::thread *t = wr->t;
        wr->t = nullptr;
        t->wake();
        wr = wr->newer;

but after the wake(), wr is no longer valid (the waiter, being woken,
would quickly exit the condvar_wait() function which held wr on the stack).

However, by not taking the lock after the wait we also have another
potential for bug - in rare cases merely doing wr->t = nullptr can
cause the thread t to start running, and if it not only stops waiting
but also exits - the call to t->wake() will refer to an invalid thread
and may crash. So we need to use the new wake_with() thread method
introduced in a previous patch.

43e34bbf

tst-wake: test for wake_with() · d0b56169

Nadav Har'El authored 11 years ago

Added a test for wake_with(). It tries to ensure that the problematic
case solved by wake_with() actually happens quickly, by:
1. Spin a long time between the setting of the flag and t->wake()
2. Do a spurious wake() to ensure that the waiting thread is woken
up right after setting the flag, before the intended wake.
3. Use mprotect() to ensure that working with an already join()ed
thread crashes immediately, instead of just maybe crashing.

This test fails when wake_with() doesn't use ref()/unref(), and succeeds
with the full wake_with().

tst-wake contains a second test, which does the same thing but without
the additional measures we used to show the bug (spinning, spurious
wake and mprotect). Without these additional measures the test iteration
is much faster, which allows us to stress wake/join much more.

d0b56169

sched::thread: wake_with() to wake a wait_until() · f99c5ccc

Nadav Har'El authored 11 years ago

When we use wait_until(), e.g.,

        wait_until([&] { return *x == 0; })

We used (in a bunch of places in the code, including condvar) the
following "obvious" idiom to wake it up:

        *x = 0;
        t->wake();

This does the right thing in *almost* all situations. But there's still
one rare (but very possible) scenario where this is wrong. The problem is
that the first line (*x = 0) may already cause the wait_until to return.
This can happen when wait_until didn't yet check the condition, or if it
was sleeping and by rare coincidence, got woken up by a spurious interrupt
at the same time we did *x = 0. Now, consider the case that the waiting
thread decides to exit after the wait_until... So the "*x = 0" causes the
thread to exit, and when we want to do "t->wake()" the thread no longer
exists, and the statement crashes.

This patch adds two new thread methods: t->ref() increments a counter
preventing a thread's destruction, until a matching t->unref().
With these methods, the correct way to wake the above wait_until() is:

        t->ref();
        *x = 0;
        t->wake();
        t->unref();

This patch also adds a one-line shortcut to the above 4 lines, with syntax
mirroring that of wait_until:

        t->wake_with([&] { *x = 0; });

The ref()/unref() methods are marked private, to encourage the use of
wake_with(), and also to allow wake_with() in the future to be optimized
to avoid calling ref()/unref() when not needed. For example, when the thread
is on the same CPU as the current thread, merely disabling preemption (a
very fast operation) prevents the thread from running - and exiting - and
ref()/unref() are not necessary.

Unfortunately, while this patch solves one bug, it does not solve two
additional bugs that existed before, and continue to exist after this
patch:

1. When a thread completes (see thread::complete()) it wakes a thread
   waiting on join() (if there is one) and this join() deletes the thread
   and its stack. The problem is that if the timing is right (or wrong ;-)),
   the joiner thread may delete the stack while complete() is still
   running on this stack, and can cause a crash.

2. If join() races with the thread's completion, it is possible that
   the thread thinks nobody is waiting for it so notifies nobody, but
   at the same time join() starts to wait, and will never be woken up.

Added two "FIXME" about these remaining bugs.

f99c5ccc

Don't crash on lseek() of non-regular file · 0c8ef37c

Nadav Har'El authored 11 years ago

lseek() crashes when used on pipes, sockets, and now also fd 0, 1 or 2
(the console), because they don't have an underlying vnode. No reason
to assert() in this case, should just return ESPIPE (like Linux does
for pipes, sockets and ttys).

Similarly, fsync, readdir and friends, fchdir and fstatfs shouldn't
crash if given a fd without a vnode, and rather should return the
expected error.

0c8ef37c