Commits · 9b72ad475e9af40231615e10754d9f6d05aeb8bb · Verlässliche Systemsoftware / projects / osv

Dec 30, 2013

mmu: Validate file permission in mprotect() · 0dfda588

Gleb Natapov authored 11 years ago


mprotect(PROT_WRITE) on a file opened as read only should fail,
but current mprotect() implementation is missing the check. The patch
implements it.

Signed-off-by: Gleb Natapov <gleb@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0dfda588

Dec 26, 2013

mmu: add constexpr to some functions · 325a7b82

Gleb Natapov authored 11 years ago


Add constexpr to make sure they are evaluated in compile time if
possible. Compiler will probably do it anyway though.

Signed-off-by: Gleb Natapov <gleb@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

325a7b82

Dec 24, 2013

bsd: convert the Xen stuff to C++ · 828ec291

Avi Kivity authored 11 years ago


Helps making bsd header changes that xen includes.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

828ec291

sched: Overhaul sched::thread::attr construction · eb48b150

Nadav Har'El authored 11 years ago


We use sched::thread::attr to pass parameters to sched::thread creation,
i.e., create a thread with non-default stack parameters, pinned to a
particular CPU, or a detached thread.

Previously we had constructors taking many combinations of stack size
(integer), pinned cpu (cpu*) and detached (boolean), and doing "the
right thing". However, this makes the code hard to read (what does
attr(4096) specify?) and the constructors hard to expand with new
parameters.

Replace the attr() constructors with the so-called "named parameter"
idiom: attr now only has a null constructor attr(), and one modifies
it with calls to pin(cpu*), detach(), or stack(size).

For example,
    attr()                                  // default attributes
    attr().pin(sched::cpus[0])              // pin to cpu 0
    attr().stack(4096).pin(sched::cpus[0])  // pin and non-default stack
    and so on.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

eb48b150

Dec 20, 2013

Convenience attr constructor for a thread with a small stack · a19c3fda

Nadav Har'El authored 11 years ago

Our sched::thread makes it rather difficult to create threads with
non-default attributes. This patch makes it easier to create a thread
with a non-default stack size, e.g., a light thread with a one-page stack:

sched::thread a([&] { func(); }, sched::thread::attr(4096))

We should probably overhaul the sched::thread constructors at some
point to make it easier to specify options, but for now, this
specific constructor is convenient for my uses.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a19c3fda

Dec 19, 2013

rcu: add variant of assign() for null pointers · 9010ba2c

Avi Kivity authored 11 years ago

There is no need for release memory ordering when assigning a null pointer
to an rcu pointer, since the null pointer cannot be dereferenced. Add
a specialization of assign() that takes advantage of this fact.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9010ba2c

mprotect() should not fail if it encounters non present pte · 410efce0

Gleb Natapov authored 11 years ago


mprotect() should fails with ENOMEM if it is called on non mapped
virtual address, but this check is done by mmu::ismapped().

Signed-off-by: Gleb Natapov <gleb@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

410efce0

Dec 18, 2013

mempool: memory statistics · ca1ac80b

Glauber Costa authored 11 years ago

This patch adds the basic of memory tracking, and exposes an interface to for
that data to be collected.

We basically start with all stats at zero, and as we add memory to the System,
we bump it up and recalculate the watermarks (to avoid recomputing them all the
time). When a page range comes up, it will be added as free memory.

We operate based on what is currently sitting in the page ranges. This means
that we are effectively ignoring memory that sit in pools for memory usage. I
think it is a good assumption because it allow us to focus in the big picture,
and leave the pools to be used as liquid currency.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ca1ac80b

bio: Introduce BIO_SCSI · bd238b24

Asias He authored 11 years ago


bio with BIO_SCSI flag contains a SCSI command.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

bd238b24

bio: Add bio_private · 4b9ae14b

Asias He authored 11 years ago


This can be used by the low-level driver to store private data, e.g.
virtio-scsi driver uses it to store a virtio-scsi request bound to this
bio.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4b9ae14b

Dec 17, 2013

poll: move away from bsd msleep() · 1b87abf4

Avi Kivity authored 11 years ago


With net channels, poll() needs to wait not only on poll wakeups and the
timeout, but also requests from network interfaces to flush net channels
for polled sockets.

In preparation for that, switch from bsd msleep() to native wait_until().

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

1b87abf4

mmu: Replace magic values by constants · 0a62979d

Raphael S. Carvalho authored 11 years ago


Reviewed-by: Dor Laor <dor@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0a62979d

Dec 16, 2013

elf: fix handling of R_X86_64_TPOFF64 relocation · 3409d13d

Tomasz Grabiec authored 11 years ago


This code seems obviously broken to me:

  tls_data().size

there is no tls_data function, it is a struct. So this is creating
temporary uninitialized struct and reads size field from it.

What it meant instead is probably the TLS size, which is calculated by
tls() function and returned in a tls_data structure.

I am not able to actually test this change because I don't have any
DSO which has R_X86_64_TPOFF64 relocations. Any idea how to test it?

tls() is also broken, because it initializes file_size field instead
of the size field. The file_size field was added at some point but
this place wasn't updated. As it appears that tls() is not actually
used anywhere, this patch gets rid of it.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

3409d13d

tls: Make __tls_get_addr() work for symbols defined in core module · 3c0133a1

Tomasz Grabiec authored 11 years ago


Dynamically loaded modules use __tls_get_addr() to locate thread local
symbols. Symbol is identified by module index and offset in module's
TLS area. Module index and offset are filled in by dynamic linker when
DSO is loaded.

TLS area for given DSO is allocated dynamically, on-demand. OSv keeps
TLS areas in a vector indexed by module index, inside per-thread
vector.

TLS area of core module should be handled differently than that of
dynamically loaded modules. The TLS offsets for thread local symbols
defined in core module are known at link time and code inside core
module can use these offsets directly. The offsets are relative to TCB
pointer (fs register on x86).

The problem was that __tls_get_addr() was treating core module as a
dynamically loaded module and returned pointer inside dynamically
allocated TLS area instead of a pointer inside core module's TLS. As a
result code inside core module was reading value from different
location than code inside DSO has written value to.

The offending thread local varaible was __once_call. It was set by
call_once() defined in DSO (inlined from a definition inside header)
and read by __once_proxy() defined in core module.

Fixes #125.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

3c0133a1

elf: change module_index type from unsigned long to ulong · 9d4aeaf1

Tomasz Grabiec authored 11 years ago


In order to have unform naming, ulong is used in several places.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

9d4aeaf1

net: reconcile bsd and api IP_MAX_MEMBERSHIPS · 0a02e276

Avi Kivity authored 11 years ago


Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0a02e276

net: reconcile api and bsd INADDR_* macros · 6dcaca1d

Avi Kivity authored 11 years ago


Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

6dcaca1d

net: reconcile bsd and linux IP_ socket options · 84b96e0f

Avi Kivity authored 11 years ago


The bsd specific ones are nenumbered to avoid clashes.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

84b96e0f

net: reconcile in.h IN_CLASS* macros and friends · 8fa97c1b

Avi Kivity authored 11 years ago


Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8fa97c1b

net: reconcile bsd and api SOL_* macros · e9c888e1

Avi Kivity authored 11 years ago


Since most SOL_* macros are equivalent to the IPPROTO_* defines, only one
define needs to be changed.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e9c888e1

net: reconcile bsd and api IPPROTO_* macros · 4c5a10d5

Avi Kivity authored 11 years ago


Move them to a common include file.

Since they're defined externally, there is no real conflict.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4c5a10d5

Dec 15, 2013

Fix race between join() and thread completion · 649654af

Nadav Har'El authored 11 years ago


thread::destroy() had a "FIXME" comment:
// FIXME: we have a problem in case of a race between join() and the
// thread's completion. Here we can see _joiner==0 and not notify
// anyone, but at the same time join() decided to go to sleep (because
// status is not yet status::terminated) and we'll never wake it.

This is indeed a bug, which Glauber discovered was hanging the
tst-threadcomplete.so test once in a while - the test sometimes hangs
with one thread in the "terminated" state (waiting for someone to join
it), and a second thread waiting in join() but missed the other thread's
termination event.

The solution works like this:

join() uses a CAS to set itself as the _joiner. If it succeeded, it
waits like before for the status to become "terminated". But if the CAS
failed, it means a concurrent destroy() call beat us at the race, and we
can just return from join().

destroy() checks (with a CAS) if _joiner was already set - if so we need
to wake this thread just like in the original code. But if _joiner was
not yet set, either there is no-one doing join(), or there's a concurrent
join() call that will soon return (this is what the joiner does when it
loses the CAS race). In this case, all we need to do is to set the status
to "terminated" - and we must do it through a _detached_state we saved
earlier, because if join() already returned the thread may already be
deleted).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

649654af

Fix wake_with() · a6bbd0e7

Nadav Har'El authored 11 years ago


wake_with(action) was implemented using thread_handle, as the following:

thread_handle h(handle());
action();
h.wake();

This implementation is wrong: It only takes the RCU lock (which prevents
the destruction of _detached_state) during h.wake(), meaning that if the
thread is not sleeping, and action() causes it to exit, _detached_state
may also be destructed, and h.wake() will crash.

thread_handle is simply not needed for wake_with(), and was designed
with a completely different use case in mind (long-term holding of a
thread handler). We just need to use, in-line, the appropriate rcu
lock which keeps _detached_state alive. The resulting code is even
simpler, and nicely parallels the existing code of wake().

This patch fixes a real bug, but unfortunately we don't have a concrete
test-case which it is known to fix.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

a6bbd0e7

Add rcu_lock_in_preempt_type · 9f0e1287

Nadav Har'El authored 11 years ago


Add a new lock, "rcu_read_lock_in_preempt_disabled", which is exactly
like rcu_read_lock but assuming that preemption is already disabled.
Because all our rcu_read_lock does is to disable preemption, the new
lock type currently does absolutely nothing - but in some future
implementation of RCU it might need to do something.

We'll use the new lock type in the following patch, as an optimization
over the regular rcu_read_lock.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

9f0e1287

enable interrupts during page fault handling · ec7ed8cd

Glauber Costa authored 11 years ago

Context: going to wait with irqs_disabled is a call for disaster. While it is
true that not every time we call wait we actually end up waiting, that should
be an invalid call, due to the times we may wait. Because of that, it would
be good to express that nonsense in an assertion.

There is however, places we sleep with irqs disabled currently. Although they
are technically safe, because we implicitly enable interrupts, they end up
reaching wait() in a non-safe state. That happens in the page fault handler.
Explicitly enabling interrupts will allow us to test for valid / invalid wait
status.

With this test applied, all tests in our whitelist still passes.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

ec7ed8cd

Dec 13, 2013

mmu: Add page_size_shift constant to avoid magic values · fd2e7bed

Raphael S. Carvalho authored 11 years ago


Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

fd2e7bed

Dec 12, 2013

mmu: Add is_page_aligned() helper function · 71f1ffda

Pekka Enberg authored 11 years ago


Add a mmu::is_page_aligned() helper function and use it to get rid of
open-coded checks.

Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

71f1ffda

Added DEBUG_ASSERT() macro · b40cb27a

Vlad Zolotarov authored 11 years ago


 - It's compiled out when mode=release.
 - Uses an assert() for issuing the assert.
 - Has a printf-like semantics.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b40cb27a

build.mk: Add -DNDEBUG when mode!=debug · 5e55a29b

Vlad Zolotarov authored 11 years ago


 - Add -DNDEBUG to the compiler flags when mode!=debug.
 - Prevent assert() from compiling out in kernel when mode=release

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

5e55a29b

include/api/assert.h: Add the missing protection · 7ea87b98

Vlad authored 11 years ago

Add the missing #infdef X #define X protection to include/api/assert.h

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7ea87b98

Dec 11, 2013

x64: Make page fault handler arch specific · 43491705

Pekka Enberg authored 11 years ago


Simplify core/mmu.cc and make it more portable by moving the page fault
handler to arch/x64/mmu.cc.  There's more arch specific code in
core/mmu.cc that should be also moved.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

43491705

mmu: Use addr_range for vma constructors · bbec1a18

Pekka Enberg authored 11 years ago


Make vma constructors more strongly typed by using the addr_range type.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

bbec1a18

core: vma abstract base class · d83db0c9

Pekka Enberg authored 11 years ago


Separate the common vma code to an abstract base class that's inherited
by anon_vma and file_vma.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d83db0c9

vma_fault: propagate exception frame to fault handlers · 7ab5f9e8

Glauber Costa authored 11 years ago

We suddenly stop propagating the exception frame down the vma_fault path.
There is no reason not to propagate it further, aside from the fact that
currently there are no users. However, aside from the fact that it presents a
more consistent frame passing, I intend to use it for the JVM balloon.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7ab5f9e8

Dec 10, 2013

Fix shared-object finalization · 4d24b90a

Nadav Har'El authored 11 years ago


This patch fixes two bugs in shared-object finalization, i.e., running
its static destructors before it is unloaded. The bugs were seen when
osv::run()ing a test program using libboost_unit_test_framework-mt.so,
which crashed after the test program finished.

The two related bugs were:

1. We need to call the module's destructors (run_fini_funcs()) *before*
   removing it from the module list, otherwise the destructors will not
   be able to call functions from this module! (we got a symbol not
   found error in the destructor).

2. We need to unload the modules needed by this module *before* unloading
   this module, not after like was (implictly) done until now.
   This makes sense because of symmetry (during a module load, the needed
   modules are loaded after this one), but also practically: a needed
   module's destructor (in our case, boost unit test framework) might refer
   to objects in the needing module (in our case, the test program),
   so we cannot call the needed module's destructor after we've already
   unloaded the needing module.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4d24b90a

vfs: Fix duplicate in-memory vnodes · 9ecda822

Raphael S. Carvalho authored 11 years ago


Currently, namei() does vget() unconditionally if no dentry is found.
This is wrong because the path can be a hard link that points to a vnode
that's already in memory.

To fix the problem:

  - Use inode number as part of the hash in vget()

  - Use vn_lookup() in vget() to make sure we have one vnode in memory
    per inode number.

  - Push the vget() calls down to individual filesystems and make
    VOP_LOOKUP return an vnode

Changes since v2:
  - v1 dropped lock in vn_lookup, thus assert that vnode_lock is held.

Changes since v3:
  - Fix lock ordering issue in dentry_lookup. The lock respective to the parent
node must be acquired before dentry_lookup and released after the process is
done. Otherwise, a second thread looking up for the same dentry may take the
'NULL' path incorrectly.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

9ecda822

sched: remove thread::_ref_counter · 9c9262b0

Avi Kivity authored 11 years ago


As ref() is now never called, we can remove the reference counter and
make unref() unconditional.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9c9262b0

sched: change wake_with() to use rcu locking · 3dec9895

Avi Kivity authored 11 years ago


Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3dec9895

sched: add a wake() function that is safe to use on a thread that may terminate · dc40b49e

Avi Kivity authored 11 years ago


One problem with wake() is, if the thread that it is waking can cuncurrently
exit, that it may touch freed memory belonging to the thread structure.

Fix by separating the state that wake() touches into a detached_state
structure, and free that using rcu.

Add a thread_handle class that references only this detached state, and
accesses it via rcu.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

dc40b49e

rcu: make rcu_ptr default initialize to a reasonable value · a828b340

Avi Kivity authored 11 years ago


Makes it much easier to use.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a828b340