Commits · 6278271af5783ca6b130386d37991f033aa006d9 · Verlässliche Systemsoftware / projects / osv

Jun 23, 2013
- Move two todo/* files to bug tracker · 6278271a
  Nadav Har'El authored 11 years ago
  
  6278271a
Jun 21, 2013

virtio: respect host virtqueue notification suppression · e0355a8e

Guy Zana authored 11 years ago

same as we can tell the host to disable interrupts via the _avail ring,
the host can tell us to supress notification via the _used ring.
every notificaion, or kick consumes about 10ns as it is implemented as
writing to an io port, which travels to usespace qemu in the host.

this simple patch, increase netperf's throughput by 600%, from a
300mbps to 1800mbps.

e0355a8e

virtio: fix wrong size register accesses · 4ff04bd1
Guy Zana authored 11 years ago

4ff04bd1

Jun 20, 2013

Limit the usage of indirect buffers · 5b751612

Dor Laor authored 11 years ago

Indirect is good for very large SG list but isn't required
in case there is enough place on the ring or the SG list is tiny.
For the time being there is barely use of it so I set it off
by default

5b751612

Add mergeable buffers support for virtio-net · d487ffd1

Dor Laor authored 11 years ago

The feature allows the hypervisor to batch several packets together
as one large SG list. Once such header is received, the guest rx
routine interates over the list and assembles a mega mbuf.

The patch also simplifies the rx path by using a single buffer for
the virtio data and its header. This shrinks the sg list from size of
two into a single one.

The issue is that at the moment I haven't seen packets w/ mbuf > 1
being received. Linux guest does receives such packets here and there.
It may be due to the use of offload features that enalrge the packet size

d487ffd1

Jun 19, 2013

rwlock: initial implementation for a rwlock · 6af9f16d

Guy Zana authored 11 years ago

this rwlock gives precedence to writers, it relies on a mutex and 2 condvars
for it's implementation.

it also supports taking the lock recursively for both readers and writers.

this implementation is not fully tested but yet the TCP stack uses it
extensively, so far without any seen races (tested TCPDownload and netperf).

6af9f16d

bsd: avoid using extern "C" in c++ files · 8536721b

Guy Zana authored 11 years ago

1. it is much cleaner that the header files perform extern "C" themselves,
   so they can be included both from C and C++ code.

2. when doing extern "C" from a C++ file then __cplusplus is also defined,
   and compilation can break in some situations.

3. as a bonus, this patch increase compilation time.

8536721b

bsd netport: rename log() to bsd_log() · bdd2b656

Nadav Har'El authored 11 years ago

netport.h defines a log() macro, which is an unfortunate choice of name
because log is also a pretty-well-known mathematical function, and this

So rename this macro bsd_log(), and change the dozen files which used
log() to use bsd_log().

bdd2b656

console: implement FIOREAD · ec2e5ebc

Nadav Har'El authored 11 years ago

Java's os::available() requires the FIONREAD on fds which do not
implement seek. So we need to support this ioctl for the console.

ec2e5ebc

lock-free mutex: use wake_with() · b4670b9c
Nadav Har'El authored 11 years ago
```
Use the new wake_with() in lock-free mutex
```
b4670b9c

condvar: improve wake performance while maintaining correctness · 43e34bbf

Nadav Har'El authored 11 years ago

Before commit 1b53ec56,
condvar_wake_all() had a crash which could be seen tst-pipe.so (which
apparently tests some condvar code paths that weren't tested by
tst-condvar.so).

The fix in this commit was for condvar_wait() to regain the condvar
internal lock after the wait, even if not needed (it's only really
needed in case of timeout). This masked the bug (see details below)
but also deteriorated performance: the woken up thread will now often
goes back to sleep to wait for the lock which is still held by
condvar_wake().

This patch reverts that commit, i.e., condvar_wait() does not retake
the lock when woken. Instead, we fix the real bug: The bug was in
condvar_wake_all() which did:

        sched::thread *t = wr->t;
        wr->t = nullptr;
        t->wake();
        wr = wr->newer;

but after the wake(), wr is no longer valid (the waiter, being woken,
would quickly exit the condvar_wait() function which held wr on the stack).

However, by not taking the lock after the wait we also have another
potential for bug - in rare cases merely doing wr->t = nullptr can
cause the thread t to start running, and if it not only stops waiting
but also exits - the call to t->wake() will refer to an invalid thread
and may crash. So we need to use the new wake_with() thread method
introduced in a previous patch.

43e34bbf

tst-wake: test for wake_with() · d0b56169

Nadav Har'El authored 11 years ago

Added a test for wake_with(). It tries to ensure that the problematic
case solved by wake_with() actually happens quickly, by:
1. Spin a long time between the setting of the flag and t->wake()
2. Do a spurious wake() to ensure that the waiting thread is woken
up right after setting the flag, before the intended wake.
3. Use mprotect() to ensure that working with an already join()ed
thread crashes immediately, instead of just maybe crashing.

This test fails when wake_with() doesn't use ref()/unref(), and succeeds
with the full wake_with().

tst-wake contains a second test, which does the same thing but without
the additional measures we used to show the bug (spinning, spurious
wake and mprotect). Without these additional measures the test iteration
is much faster, which allows us to stress wake/join much more.

d0b56169

sched::thread: wake_with() to wake a wait_until() · f99c5ccc

Nadav Har'El authored 11 years ago

When we use wait_until(), e.g.,

        wait_until([&] { return *x == 0; })

We used (in a bunch of places in the code, including condvar) the
following "obvious" idiom to wake it up:

        *x = 0;
        t->wake();

This does the right thing in *almost* all situations. But there's still
one rare (but very possible) scenario where this is wrong. The problem is
that the first line (*x = 0) may already cause the wait_until to return.
This can happen when wait_until didn't yet check the condition, or if it
was sleeping and by rare coincidence, got woken up by a spurious interrupt
at the same time we did *x = 0. Now, consider the case that the waiting
thread decides to exit after the wait_until... So the "*x = 0" causes the
thread to exit, and when we want to do "t->wake()" the thread no longer
exists, and the statement crashes.

This patch adds two new thread methods: t->ref() increments a counter
preventing a thread's destruction, until a matching t->unref().
With these methods, the correct way to wake the above wait_until() is:

        t->ref();
        *x = 0;
        t->wake();
        t->unref();

This patch also adds a one-line shortcut to the above 4 lines, with syntax
mirroring that of wait_until:

        t->wake_with([&] { *x = 0; });

The ref()/unref() methods are marked private, to encourage the use of
wake_with(), and also to allow wake_with() in the future to be optimized
to avoid calling ref()/unref() when not needed. For example, when the thread
is on the same CPU as the current thread, merely disabling preemption (a
very fast operation) prevents the thread from running - and exiting - and
ref()/unref() are not necessary.

Unfortunately, while this patch solves one bug, it does not solve two
additional bugs that existed before, and continue to exist after this
patch:

1. When a thread completes (see thread::complete()) it wakes a thread
   waiting on join() (if there is one) and this join() deletes the thread
   and its stack. The problem is that if the timing is right (or wrong ;-)),
   the joiner thread may delete the stack while complete() is still
   running on this stack, and can cause a crash.

2. If join() races with the thread's completion, it is possible that
   the thread thinks nobody is waiting for it so notifies nobody, but
   at the same time join() starts to wait, and will never be woken up.

Added two "FIXME" about these remaining bugs.

f99c5ccc

Don't crash on lseek() of non-regular file · 0c8ef37c

Nadav Har'El authored 11 years ago

lseek() crashes when used on pipes, sockets, and now also fd 0, 1 or 2
(the console), because they don't have an underlying vnode. No reason
to assert() in this case, should just return ESPIPE (like Linux does
for pipes, sockets and ttys).

Similarly, fsync, readdir and friends, fchdir and fstatfs shouldn't
crash if given a fd without a vnode, and rather should return the
expected error.

0c8ef37c

Fix concurrent console read and write bug · 907e6336

Nadav Har'El authored 11 years ago

We had a bug where a read() on the console (fd 0) would block writes to
the console (fd 1 or 2). This was most noticable when background threads
in the CLI tried to write output, and were blocked until the next keypress
because the blocking read() would lock the writes out.

The bug happens because we opened the console using open("/dev/console")
and dup()'ed the resulting fd, but this results, in the current code, in
every read and write to these file descriptors to pass through vfs_read()/
vfs_write(), which lock a single vnode lock for all three file descriptors -
leading to write on fd 1 blocking while read is ongoing on fd 0.

This patch doesn't fix this vnode lock issue, which remains - and should
be fixed when the devfs or vfs layers are rewritten. Instead, this patch
adds a *second API* for opening a console which doesn't go through the
vnode or devfs layers:

A new console::open() function returns a file descriptor which implements
the correct file operations, and is not associated with any vnode.

The new implementation works well with write() while read() is ongoing.

Note that poll() support was missing from the old implementation (it
seems it can't be done with the vnode abstraction?) and is still missing
in the new implementation, although now shouldn't be hard to add
(need to implement the poll fileops, and to use poll_wake() in the
line-discipline function console_poll).

907e6336

cli: fix tab completion · 79a89f95

Avi Kivity authored 11 years ago

tab completion relies on a global 'ls' object, re-add it.

Broken by 4bfe157b.

79a89f95

Add unsupported_poll · 29652e61

Nadav Har'El authored 11 years ago

Sorry, missing unsupported_poll broke compilation after the previous patch

29652e61

Temporary, inefficient, epoll implementation · ad26fb4b

Nadav Har'El authored 11 years ago

This is an epoll_*() implementation which calls poll() to do the real work.
This is of course a terrible implementation, which makes epoll() less
efficient, instead of more efficient, then poll(). However, it allows me
to progress with running Jetty in parallel with perfecting epoll.

ad26fb4b

Add todo/dns · 15386921

Nadav Har'El authored 11 years ago

It's not clear if our DNS resolver works or not - need to test and fix
if needed.

15386921

BSD porting: implement mtx_assert() · 1b9a3b5b

Nadav Har'El authored 11 years ago

Trivially implement mtx_assert(). This would catch the "ifconfig" bug
fixed in the previous patch - where ifconfig called sofree() without
the accept lock.

1b9a3b5b

Fix "ifconfig" corrupting accept_mtx · 073d9ea7

Nadav Har'El authored 11 years ago

ifconfig used to call sofree(), which assumed accept_mtx was taken, which
wasn't true, resulting in either an assertion failure (if we implement
assert_mtx - see next patch) or a mutex corruption (if assert_mtx does
nothing).

Instead, we should call soclose(). This wasn't very hard to figure out,
given the comment in socreate() saying "The socket should be closed with
soclose()." :-)

073d9ea7

Jun 18, 2013

Make lock-free mutex our default mutex · bbcd59f7

Nadav Har'El authored 11 years ago

This patch turns on the flag which switches all our code to use the
lock-free mutex instead of the spinlock-based mutex.

It's time we start using the lock-free mutex, which is stable enough by
now - but please let me know if you do experience any performance problem,
or bugs, related to the new mutex.

If you need to disable the new mutex temporarily and return to the old,
just change the "#define LOCKFREE_MUTEX" in osv/mutex.h to #undef.

bbcd59f7

mutex: remove silly "return" · 8031eeae
Nadav Har'El authored 11 years ago
```
Returning a void does nothing, and just confusing.
```
8031eeae
build: rename build.mak to build.mk · aac18bc2
Avi Kivity authored 11 years ago
```
Eclipse recognizes .mk as a makefile, make it easier for new users to
use eclipse.
```
aac18bc2
zfs; use mutex_owned · b8e05b23
Christoph Hellwig authored 11 years ago

b8e05b23

CLI: add tiny HTTP server · 948bea47

Nadav Har'El authored 11 years ago

This single Java source file is a full-fledged HTTP 0.9 server.
I wanted to add it to expose the console lock bug (fixed in a separate
patch), and to verify that bind() works correctly (it does).

But additionally, this tiny HTTP server (about 6KB of compressed bytecode)
can be very useful for our CLI - it can be run in the background and let
you view files in the OSV system in your browser, even while another
program is running.

To run Shrew from the CLI, just run

	java com.cloudius.cli.util.Shrew

Which runs the HTTP server in the background (in a separate thread),
letting the user continue to use the CLI. If you add an argument "fg" to
this command, it runs the server in the current thread, never returning.

Currently, the HTTP server is written to browse OSV's root directory
hierarchy: accessing http://192.168.122.100:8080/ from the host shows
you the OSV guest's root directory, and you can decend into more
directories and download individual files.

948bea47

cli: self-registering commands · 4bfe157b

Avi Kivity authored 11 years ago

Instead of defining a command object in one file and registering it in
another, do everything in one place.

4bfe157b

Merge branch 'cli' · 141159e5

Avi Kivity authored 11 years ago

- per-cpu variables
- per-cpu kvmclock
- tracepoint probe functions
- tracepoint Java API
- 'perf stat' cli command

141159e5

cli: add stat command · 092f28c6

Avi Kivity authored 11 years ago

Usage:

  perf list (lists all tracepoints)
  perf stat tp... (counts tracepoints)

Example:

[/]$ perf stat mutex_lock ctxsw=sched_switch mutex_unlock wake=sched_wake
  mutex_lock   ctxsw  mutex_unlock    wake
          40       3          1909       2
        2075     147           190      82
         193     138           193      78
         146     139           146      92
         317     179           317      78
         146     139           146      78
         146     139           186      78
         205     139           165      78
         146     139           146      78
         146     139           146      78
         146     139           146      80
         193     143           193      81
         151     147           151      78
         146     139           146      78
         146     139           146      78
         146     139           146      78
         159     139           159      78
         149     139           149      78
         146     139           146      78
         164     139           164      78
         146     139           176      78
         176     139           146      78
         149     139           149      78
         146     139           146      78
         146     139           146      78
  mutex_lock   ctxsw  mutex_unlock    wake
         146     139           146      79
         715     147           715      80
         188     139           204      78

092f28c6

add a zfs test using the disk backend · f8a4ece3
Christoph Hellwig authored 11 years ago

f8a4ece3
actually wire up the simple zfs test · ff566211
Christoph Hellwig authored 11 years ago

ff566211
zfs: implement support for reading the root label · ef5c936a
Christoph Hellwig authored 11 years ago

ef5c936a
zfs: don't ignore DKIOCFLUSHWRITECACHE in vdev_disk · 4197debe
Christoph Hellwig authored 11 years ago
```
Don't actually implement it either yet, but at least don't abort.
```
4197debe
zfs: fix vdev_disk size detection · dde6b485
Christoph Hellwig authored 11 years ago

dde6b485
zfs: wirte up with the VFS · e6d95380
Christoph Hellwig authored 11 years ago

e6d95380
zfs: wire up sa and znode code · 650a3161
Christoph Hellwig authored 11 years ago

650a3161
solaris: remove MS_ defintions in <sys/mount.h> · 22ac43b3
Christoph Hellwig authored 11 years ago
```
We already get these from our API version of <sys/mount.h>
```
22ac43b3
solaris: stub out parts of <sys/pathname.h> · 2b89b034
Christoph Hellwig authored 11 years ago

2b89b034
solaris: stub out <sys/policy.h> · 56039c84
Christoph Hellwig authored 11 years ago

56039c84
solaris: wire up common acl code · af53e8c1
Christoph Hellwig authored 11 years ago

af53e8c1