Commits · 3e6763f7b045c042969fbe504a7a503608f5d446 · Verlässliche Systemsoftware / projects / osv

Dec 11, 2013

mmu: fix allocate_intermediate_level · 3e6763f7

Glauber Costa authored 11 years ago


We have recently seen a problems where eventual page fault outside
application would occur.

I managed to track that down to my huge page failure patch, but wasn't
really sure what was going on. Kudos for Raphael, then,  that figured
out that the problem happened when allocate_intemediate_level was called
from split_huge_page.

The problem here, is that in that case we do *not* enter
allocate_intermediate_level with the pte emptied, and were previously
expecting the write of the new pte to happen unconditionally. The
compare_exchange broke it, because the exchange doesn't really happen.

There are many ways to fix this issue, but the least confusing of them,
given that there are other callers to this function that could
potentially display this problem, is to do some deffensive programming
and clearly separate the semantics of both types of callers.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Tested-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3e6763f7

Verify slow page fault only happens when preemption is allowed · b7620ca2

Nadav Har'El authored 11 years ago


Once page_fault() checks that this is not a fast fixup (see safe_load()),
we reach the page-fault slow path, which needs to allocate memory or
even read from disk, and might sleep.

If we ever get such a slow page-fault inside kernel code which has
preemption or interrupts disabled, this is a serious bug, because the
code in question thinks it cannot sleep. So this patch adds two
assertions to verify this.

The preemptable() assertion is easily triggered if stacks are demand-paged
as explained in commit 41efdc1c (I have
a patch to solve this, but it won't fit in the margin).
However, I've also seen this assertion without demand-paged stacks, when
running all tests together through testrunner.so. So I'm hoping these
assertions will be helpful in hunting down some elusive bugs we still have.

This patch adds a third use of the "0x200" constant (the nineth bit of
the rflags register is the interrupt flag), so it replaces them by a
new symbolic name, processor::rflags_if.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b7620ca2

vma_fault: propagate exception frame to fault handlers · 7ab5f9e8

Glauber Costa authored 11 years ago

We suddenly stop propagating the exception frame down the vma_fault path.
There is no reason not to propagate it further, aside from the fact that
currently there are no users. However, aside from the fact that it presents a
more consistent frame passing, I intend to use it for the JVM balloon.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7ab5f9e8

tst-fs-link.so: Use mktemp() in check_vnode_duplicity() · 9dab3e92

Raphael S. Carvalho authored 11 years ago


Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9dab3e92

Rename blacklisted tests · 4e4e191f

Nadav Har'El authored 11 years ago


Rename blacklisted tests, from tst-wake.cc et al. to misc-wake.cc.

The different name will cause these tests not to be automatically run
by "make check" - without needing the separate blacklist in test.py
(which this patch deletes).
After this patch, testrunner.so will also only run tests called tst-*,
so will not run the misc-* tests.

The misc-* tests can still be run manually, e.g.,
  run.py -e tests/misc-mutex.so

In addition to the previously blacklisted tests, this patch "blacklists"
(renames) a few additional tests which fail quickly, but test.py didn't
know because they didn't use the word "fail". An example is tst-schedule.so,
which existed immediately when not run on 1 vcpu. So this patch also
renames it to misc-schedule.so, so "make check" or testrunner.so won't
run this test.

Note that after this patch, testrunner.so is a new way to run all tests,
but it isn't working well yet because it still exposes new bugs that do not
exist in the separate tests (depending on your view point, this might be
considered a feature, not a bug, in testrunner.so...).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4e4e191f

virtio-blk: Disable interrupts while irq handling is in progress · 8cc46dca

Asias He authored 11 years ago


This reduces unnecessary interrupts that host could send to guest
while guest is in the progress of irq handling.

In virtio_driver::wait_for_queue, we will re-enable interrupts when
there is nothing to process.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8cc46dca

tst-fs-link.so: Use mktemp() for path names · df4a7bd2

Pekka Enberg authored 11 years ago


Using hard-coded path names is problematic because other test cases may
use the same path names and forget to clean up after them.

Make tst-fs-link.so more robust by using mktemp() to generate unique
path names.

Reviewed-by: Tomasz Grabiec <tgrabiec@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

df4a7bd2

tests: fix threads being destroyed earlier. · b0dc3f1a

Glauber Costa authored 11 years ago


The last part of the standard thread tests created 4 threads and calls the
detach of one from the body of the other. They live in the same block to
guarantee that they will all be destroyed more or less at the same time (we
expect). Avi, however, demonstrated that a mistake prevents that from being
the actual case:

    t1 starts to run
    t2 starts to run
    t3 starts to run
    t4 starts to run
    t4 is detached
    t4 is destroyed (ok)
    t3 is destroyed. wasn't detached or join, to terminate
    t1, t2, t3 are detached, but too late

This introduces a simple wait mechanism to avoid having the threads
terminated after the block is gone.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b0dc3f1a

Dec 10, 2013

Fix shared-object finalization · 4d24b90a

Nadav Har'El authored 11 years ago


This patch fixes two bugs in shared-object finalization, i.e., running
its static destructors before it is unloaded. The bugs were seen when
osv::run()ing a test program using libboost_unit_test_framework-mt.so,
which crashed after the test program finished.

The two related bugs were:

1. We need to call the module's destructors (run_fini_funcs()) *before*
   removing it from the module list, otherwise the destructors will not
   be able to call functions from this module! (we got a symbol not
   found error in the destructor).

2. We need to unload the modules needed by this module *before* unloading
   this module, not after like was (implictly) done until now.
   This makes sense because of symmetry (during a module load, the needed
   modules are loaded after this one), but also practically: a needed
   module's destructor (in our case, boost unit test framework) might refer
   to objects in the needing module (in our case, the test program),
   so we cannot call the needed module's destructor after we've already
   unloaded the needing module.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4d24b90a

corrected small typo on the README.md · 1017f90d

Juan Antonio Osorio authored 11 years ago


Signed-off-by: Juan Antonio Osorio Robles <jaosorior@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

1017f90d

test.py: add '--repeat' option · 3acafce2

Pekka Enberg authored 11 years ago


Add a '--repeat' option to test.py that repeats the test suite until a
test fails.  This is useful for detecting test cases that fail some of
the time.

Reviewed-by: Tomasz Grabiec <tgrabiec@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3acafce2

test.py: Make output pretty and show duration · fbf2a946

Pekka Enberg authored 11 years ago


Make the test runner output look pretty and show test duration to make
it visible which tests take the longest time to run.  The output looks
as follows now:

    TEST tst-af-local.so           OK  (3.288 s)
    TEST tst-bdev-write.so         OK  (1.058 s)
    TEST tst-bsd-evh.so            OK  (1.071 s)
    TEST tst-bsd-kthread.so        OK  (1.234 s)
    TEST tst-bsd-taskqueue.so      OK  (1.062 s)
    TEST tst-bsd-tcp1.so           OK  (2.114 s)
    TEST tst-commands.so           OK  (1.141 s)
    TEST tst-condvar.so            OK  (1.776 s)
    TEST tst-dns-resolver.so       OK  (2.560 s)
    TEST tst-epoll.so              OK  (1.952 s)
    TEST tst-except.so             OK  (1.146 s)
    TEST tst-fpu.so                OK  (2.630 s)
    TEST tst-fs-link.so            OK  (1.051 s)
    TEST tst-fs-stress.so          OK  (1.027 s)
    TEST tst-fsx.so                OK  (1.067 s)
    TEST tst-hub.so                OK  (6.256 s)
    TEST tst-huge.so               OK  (2.199 s)
    TEST tst-kill.so               OK  (4.147 s)
    TEST tst-libc-locking.so       OK  (2.110 s)
    TEST tst-loadbalance.so        OK  (1.070 s)
    TEST tst-mmap-file.so          OK  (1.080 s)
    TEST tst-mmap.so               OK  (1.087 s)
    TEST tst-pipe.so               OK  (7.306 s)
    TEST tst-preempt.so            OK  (1.119 s)
    TEST tst-pthread.so            OK  (1.100 s)
    TEST tst-queue-mpsc.so         OK  (3.748 s)
    TEST tst-ramdisk.so            OK  (1.078 s)
    TEST tst-readdir.so            OK  (1.094 s)
    TEST tst-remove.so             OK  (1.030 s)
    TEST tst-rename.so             OK  (1.157 s)
    TEST tst-resolve.so            OK  (1.095 s)
    TEST tst-scheduler.so          OK  (1.087 s)
    TEST tst-sleep.so              OK  (3.083 s)
    TEST tst-solaris-taskq.so      OK  (1.061 s)
    TEST tst-stat.so               OK  (1.106 s)
    TEST tst-strerror_r.so         OK  (1.102 s)
    TEST tst-tcp-sendonly.so       OK  (2.014 s)
    TEST tst-tcp.so                OK  (1.080 s)
    TEST tst-threadcomplete.so     OK  (2.770 s)
    TEST tst-tracepoint.so         OK  (1.109 s)
    TEST tst-truncate.so           OK  (1.083 s)
    TEST tst-utimes.so             OK  (1.079 s)
    TEST tst-vblk.so               OK  (1.310 s)
    TEST tst-vfs.so                OK  (1.118 s)
    TEST tst-yield.so              OK  (1.992 s)
    TEST tst-zfs-mount.so          OK  (1.087 s)
  OK (58 tests run, 82.944 s)

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

fbf2a946

test.py: Switch to blacklist · 637eb584

Pekka Enberg authored 11 years ago


Switch the whitelist to a blacklist to increase testing coverage.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

637eb584

Add tests into tst-fs-link.so to check vnode duplicity · 36a15288

Raphael S. Carvalho authored 11 years ago


Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

36a15288

vfs: Fix duplicate in-memory vnodes · 9ecda822

Raphael S. Carvalho authored 11 years ago


Currently, namei() does vget() unconditionally if no dentry is found.
This is wrong because the path can be a hard link that points to a vnode
that's already in memory.

To fix the problem:

  - Use inode number as part of the hash in vget()

  - Use vn_lookup() in vget() to make sure we have one vnode in memory
    per inode number.

  - Push the vget() calls down to individual filesystems and make
    VOP_LOOKUP return an vnode

Changes since v2:
  - v1 dropped lock in vn_lookup, thus assert that vnode_lock is held.

Changes since v3:
  - Fix lock ordering issue in dentry_lookup. The lock respective to the parent
node must be acquired before dentry_lookup and released after the process is
done. Otherwise, a second thread looking up for the same dentry may take the
'NULL' path incorrectly.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

9ecda822

tst-threadcomplete: fix race between t2 running and t1 destroying · a95e2f5c

Avi Kivity authored 11 years ago


        sched::thread *t2 = nullptr;
        sched::thread *t1 = new sched::thread([&]{
            // wait for the t2 object to exist (not necessarily run)
            sched::thread::wait_until([&] { return t2 != nullptr; });
            if (quick) {
                return;
            }
            sched::thread::sleep_until(nanotime() + 10_ms);
        }, sched::thread::attr(sched::cpus[0]));

        t2 = new sched::thread([&]{
            t1->wake();
        }, sched::thread::attr(sched::cpus[1]));

        t1->start();
        t2->start();
        delete t1
        delete t2;

t1 may start, complete, and be destroyed before t2 gets a chance to run.  In
this case the call to t1->wake() will access deallocated memory.

Fix by making sure t1 is only destroyed after t2 completes.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

a95e2f5c

test.py: Make test runner silent by default · af70c145

Pekka Enberg authored 11 years ago


Make test.py silent by default and only print out OSv log on error or if
the '--verbose' command line option is passed.

Reviewed-by: Tomasz Grabiec <tgrabiec@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

af70c145

Fix wrong error codes in unlink(), rmdir() and readdir() · 86b5374f

Nadav Har'El authored 11 years ago


This patch fixes the error codes in four error cases:

1. unlink() of a directory used to return EPERM (as in Posix), and now
   returns EISDIR (as in Linux).

2. rmdir() of a non-empty directory used to return EEXIST (as in Posix)
   and now returns ENOTEMPTY (as in Linux).

3. rmdir() of a regular file (non-directory) used to return EBADF
   and now returns ENOTDIR (as in Linux).

4. readdir() of a regular file (non-directory) used to return EBADF
   and now returns ENOTDIR (as in Linux).

This patch also adds a test, tst-remove.cc, for the various unlink() and
rmdir() success and failure modes.

Fixes #123.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

86b5374f

sched: remove thread::_ref_counter · 9c9262b0

Avi Kivity authored 11 years ago


As ref() is now never called, we can remove the reference counter and
make unref() unconditional.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9c9262b0

sched: change wake_with() to use rcu locking · 3dec9895

Avi Kivity authored 11 years ago


Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3dec9895

sched: add a wake() function that is safe to use on a thread that may terminate · dc40b49e

Avi Kivity authored 11 years ago


One problem with wake() is, if the thread that it is waking can cuncurrently
exit, that it may touch freed memory belonging to the thread structure.

Fix by separating the state that wake() touches into a detached_state
structure, and free that using rcu.

Add a thread_handle class that references only this detached state, and
accesses it via rcu.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

dc40b49e

rcu: make rcu_ptr default initialize to a reasonable value · a828b340

Avi Kivity authored 11 years ago


Makes it much easier to use.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a828b340

rcu: add preempt_lock_in_rcu · 18cee681

Avi Kivity authored 11 years ago


rcu_read_lock disables preemption, but this is an implementation detail
and users should not make use of it.

Add preempt_lock_in_rcu that takes advantage of the implementation detail
and does nothing, but allows users to explicitly disable preemption.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

18cee681

rcu: forward declare preempt_enable() to avoid #include hell · dff28306

Avi Kivity authored 11 years ago


Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

dff28306

mman: Fix errno handling in mmap and munmap · 2358ac62

Pekka Enberg authored 11 years ago


Nadav Har'El explains:

  Traditionally, functions which succeed do NOT set errno to zero, but
  rather leave it unchanged (errno(3) on Linux says, for example, that
  "errno is never set to zero by any system call or library function.").

Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2358ac62

tests: Add munmap tests into tst-mmap-file · 06d3b771

Raphael S. Carvalho authored 11 years ago


Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

06d3b771

libc: Add munmap validation · 8c57f767

Raphael S. Carvalho authored 11 years ago


Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8c57f767

mmu: support MAP_UNINITIALIZED flag · f7249e73

Glauber Costa authored 11 years ago


When seeing this flag, pages fault in should not be filled with zeroes or any
other patterns, and should rather be just left alone in whatever state we find
them at.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f7249e73

Dec 09, 2013

build: Fix some debug build errors · 2a4f991d

Vlad Zolotarov authored 11 years ago


Add -Wno-maybe-uninitialized to compilation flags when mode=debug to
avoid bogus compilation errors.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2a4f991d

runtime: fix prio_find_thread() ignoring missing threads · 0fd4f259

Avi Kivity authored 11 years ago


prio_find_thread() is not checking correctly for missing threads, and
may return nulls to the caller.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0fd4f259

Implement mknod() · dd701e2d

Nadav Har'El authored 11 years ago


I tried using a test which called mknod() (to create an empty regular file).
Despite us having an mknod() implementation, it didn't work, and failed on
lookup of the symbol __xmknod.

Turns out that in glibc, mknod() is source-only, and converted to the ABI
function which is __xmknod, whose first parameter is a version number
_MKNOD_VER_LINUX (0 on x86-64 Linux).

So this patch implements __xmknod, and now mknod() works.

Note we already had the same kind of trick for __xstat(), needed so that
stat() would work.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

dd701e2d

libc/mount: Change umount2 and add umount · 2050ce8c

Raphael S. Carvalho authored 11 years ago

umount2 should call sys_umount2 instead. Add umount that calls sys_umount.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2050ce8c

loader.py: Skip inlined frames in 'osv info threads' · 82cd0548

Tomasz Grabiec authored 11 years ago


GDB python API does not handle inlined functions as nicely as regular
'backtrace' command does. Part of the frame attributes point to
the inlined function (variables, symtab) and part point to the caller.

For example frame.function() returns the nearest non-inlined function.

This breaks code which prints thread joining information.
The code thinks it's in "sched::thread::join()" when actually it's in
sched::schedule() context which does not have 'this' variable.

This solution skips inlined functions when considering print
candidates.  The printed information would be confusing anyway: file
and line number would be of the inlined function but printed function
name would belong to the caller. Finally we will reach the non-inlined
caller and print the call site properly.

Fixes #124.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

82cd0548

loader.py: Make 'osv info threads' not fail when all frames are blacklisted · 9e91603f

Tomasz Grabiec authored 11 years ago


When all resolved frames are blacklisted we try to print the oldest
resolved frame.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9e91603f

virtio-rng: Do not call queue->get_buf_gc() with wait_until · f0706cec

Asias He authored 11 years ago


Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f0706cec

virtio-blk: Do not call queue->get_buf_gc() with wait_until · 1cc3dee7

Asias He authored 11 years ago


When I hacked use_indirect() to always use indirect buffer, I saw this
assertion when running:

   $scripts/run.py  -e "/tests/tst-bdev-write.so vblk1"

   VFS: mounting devfs at /dev
   51.671 Mb/s
   Assertion failed: _status.load() == status::running
   (/home/asias/src/cloudius-systems/osv/core/sched.cc: prepare_wait: 655) Aborted

It turned out that we are making a waiting thread waiting again. get_buf_gc()
calls free which might make the thread in waiting state again.

Suggested-by: Dor Laor <dor@cloudius-systems.com>
Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

1cc3dee7

virtio: Add vring::used_ring_can_gc() helper · b7f8fa6d

Asias He authored 11 years ago


It is useful to test if we can do gc on the used ring.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b7f8fa6d

tests: Test huge page allocation failure. · 4eb97417

Glauber Costa authored 11 years ago

Our implementation of operate() will try to fill as much as the address space
as possible with huge pages. If that fails, we should be able to fill the range
with small pages instead of failing. This test should make sure that in such
scenarios, the resulting mapping looks sane.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4eb97417

mmu: don't bail out on huge page failure · eeeaf888

Glauber Costa authored 11 years ago


Addressing that FIXME, as part of my memory reclamation series. But this
is ready to go already. The goal is to retry to serve the allocation if a
huge page allocation fails, and fill the range with the 4k pages.

The simplest and most robust way I've found to do that was to propagate the
error up until we reach operate(). Being there, all we need to do is to
re-walk the range with 4k pages instead of 2Mb.

We could theoretically just bail out on huge pages and move hp_end, but,
specially when we have reclaim, it is likely that one operation will fail while
the upcoming ones may succeed.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
[ penberg: s/NULL/nullptr/ ]
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

eeeaf888

Reindent fs/vfs/main.cc · eb8451a0

Nadav Har'El authored 11 years ago


main.cc was still using tab characters instead of spaces as our coding
conventions dictate. Reindent it, using Eclipse's ctrl-I.
This patch doesn't change anything else.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

eb8451a0