Commits · ca1ac80b5a472557deee1c11dc570bae1bd5fe07 · Verlässliche Systemsoftware / projects / osv

Dec 18, 2013

Glauber Costa authored 11 years ago

This patch adds the basic of memory tracking, and exposes an interface to for
that data to be collected.

We basically start with all stats at zero, and as we add memory to the System,
we bump it up and recalculate the watermarks (to avoid recomputing them all the
time). When a page range comes up, it will be added as free memory.

We operate based on what is currently sitting in the page ranges. This means
that we are effectively ignoring memory that sit in pools for memory usage. I
think it is a good assumption because it allow us to focus in the big picture,
and leave the pools to be used as liquid currency.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ca1ac80b

Dec 17, 2013

poll: move away from bsd msleep() · 1b87abf4

Avi Kivity authored 11 years ago


With net channels, poll() needs to wait not only on poll wakeups and the
timeout, but also requests from network interfaces to flush net channels
for polled sockets.

In preparation for that, switch from bsd msleep() to native wait_until().

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

1b87abf4

mmu: Replace magic values by constants · 0a62979d

Raphael S. Carvalho authored 11 years ago


Reviewed-by: Dor Laor <dor@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0a62979d

Dec 16, 2013

bsd: fix horrible m_* macro clash · f81d5121

Avi Kivity authored 11 years ago


bsd defines some m_ macros, for example m_flags, to save some typing.  However
if you have a variable of the same name in another header, for example
m_flags, have fun trying to compile your code.

Expand the code in place and eliminate the macros.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f81d5121

elf: fix handling of R_X86_64_TPOFF64 relocation · 3409d13d

Tomasz Grabiec authored 11 years ago


This code seems obviously broken to me:

  tls_data().size

there is no tls_data function, it is a struct. So this is creating
temporary uninitialized struct and reads size field from it.

What it meant instead is probably the TLS size, which is calculated by
tls() function and returned in a tls_data structure.

I am not able to actually test this change because I don't have any
DSO which has R_X86_64_TPOFF64 relocations. Any idea how to test it?

tls() is also broken, because it initializes file_size field instead
of the size field. The file_size field was added at some point but
this place wasn't updated. As it appears that tls() is not actually
used anywhere, this patch gets rid of it.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

3409d13d

tls: Make __tls_get_addr() work for symbols defined in core module · 3c0133a1

Tomasz Grabiec authored 11 years ago


Dynamically loaded modules use __tls_get_addr() to locate thread local
symbols. Symbol is identified by module index and offset in module's
TLS area. Module index and offset are filled in by dynamic linker when
DSO is loaded.

TLS area for given DSO is allocated dynamically, on-demand. OSv keeps
TLS areas in a vector indexed by module index, inside per-thread
vector.

TLS area of core module should be handled differently than that of
dynamically loaded modules. The TLS offsets for thread local symbols
defined in core module are known at link time and code inside core
module can use these offsets directly. The offsets are relative to TCB
pointer (fs register on x86).

The problem was that __tls_get_addr() was treating core module as a
dynamically loaded module and returned pointer inside dynamically
allocated TLS area instead of a pointer inside core module's TLS. As a
result code inside core module was reading value from different
location than code inside DSO has written value to.

The offending thread local varaible was __once_call. It was set by
call_once() defined in DSO (inlined from a definition inside header)
and read by __once_proxy() defined in core module.

Fixes #125.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

3c0133a1

elf: change module_index type from unsigned long to ulong · 9d4aeaf1

Tomasz Grabiec authored 11 years ago


In order to have unform naming, ulong is used in several places.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

9d4aeaf1

elf: implement missing module_index() function. · 7a823108

Tomasz Grabiec authored 11 years ago


Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

7a823108

x64: Move PTE definitions to arch-mmu.hh · d761f58f

Pekka Enberg authored 11 years ago


Move the x86-64 PTE definitions to a new arch specific arch-mmu.hh
header file to make core/mmu.cc smaller and more portable.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d761f58f

Dec 15, 2013

Fix race between join() and thread completion · 649654af

Nadav Har'El authored 11 years ago


thread::destroy() had a "FIXME" comment:
// FIXME: we have a problem in case of a race between join() and the
// thread's completion. Here we can see _joiner==0 and not notify
// anyone, but at the same time join() decided to go to sleep (because
// status is not yet status::terminated) and we'll never wake it.

This is indeed a bug, which Glauber discovered was hanging the
tst-threadcomplete.so test once in a while - the test sometimes hangs
with one thread in the "terminated" state (waiting for someone to join
it), and a second thread waiting in join() but missed the other thread's
termination event.

The solution works like this:

join() uses a CAS to set itself as the _joiner. If it succeeded, it
waits like before for the status to become "terminated". But if the CAS
failed, it means a concurrent destroy() call beat us at the race, and we
can just return from join().

destroy() checks (with a CAS) if _joiner was already set - if so we need
to wake this thread just like in the original code. But if _joiner was
not yet set, either there is no-one doing join(), or there's a concurrent
join() call that will soon return (this is what the joiner does when it
loses the CAS race). In this case, all we need to do is to set the status
to "terminated" - and we must do it through a _detached_state we saved
earlier, because if join() already returned the thread may already be
deleted).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

649654af

Add rcu_lock_in_preempt_type · 9f0e1287

Nadav Har'El authored 11 years ago


Add a new lock, "rcu_read_lock_in_preempt_disabled", which is exactly
like rcu_read_lock but assuming that preemption is already disabled.
Because all our rcu_read_lock does is to disable preemption, the new
lock type currently does absolutely nothing - but in some future
implementation of RCU it might need to do something.

We'll use the new lock type in the following patch, as an optimization
over the regular rcu_read_lock.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

9f0e1287

Dec 13, 2013

mmu: Add page_size_shift constant to avoid magic values · fd2e7bed

Raphael S. Carvalho authored 11 years ago


Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

fd2e7bed

Dec 11, 2013

x64: Make page fault handler arch specific · 43491705

Pekka Enberg authored 11 years ago


Simplify core/mmu.cc and make it more portable by moving the page fault
handler to arch/x64/mmu.cc.  There's more arch specific code in
core/mmu.cc that should be also moved.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

43491705

mmu: Use addr_range for vma constructors · bbec1a18

Pekka Enberg authored 11 years ago


Make vma constructors more strongly typed by using the addr_range type.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

bbec1a18

core: vma abstract base class · d83db0c9

Pekka Enberg authored 11 years ago


Separate the common vma code to an abstract base class that's inherited
by anon_vma and file_vma.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d83db0c9

mmu: fix allocate_intermediate_level · 3e6763f7

Glauber Costa authored 11 years ago


We have recently seen a problems where eventual page fault outside
application would occur.

I managed to track that down to my huge page failure patch, but wasn't
really sure what was going on. Kudos for Raphael, then,  that figured
out that the problem happened when allocate_intemediate_level was called
from split_huge_page.

The problem here, is that in that case we do *not* enter
allocate_intermediate_level with the pte emptied, and were previously
expecting the write of the new pte to happen unconditionally. The
compare_exchange broke it, because the exchange doesn't really happen.

There are many ways to fix this issue, but the least confusing of them,
given that there are other callers to this function that could
potentially display this problem, is to do some deffensive programming
and clearly separate the semantics of both types of callers.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Tested-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3e6763f7

Verify slow page fault only happens when preemption is allowed · b7620ca2

Nadav Har'El authored 11 years ago


Once page_fault() checks that this is not a fast fixup (see safe_load()),
we reach the page-fault slow path, which needs to allocate memory or
even read from disk, and might sleep.

If we ever get such a slow page-fault inside kernel code which has
preemption or interrupts disabled, this is a serious bug, because the
code in question thinks it cannot sleep. So this patch adds two
assertions to verify this.

The preemptable() assertion is easily triggered if stacks are demand-paged
as explained in commit 41efdc1c (I have
a patch to solve this, but it won't fit in the margin).
However, I've also seen this assertion without demand-paged stacks, when
running all tests together through testrunner.so. So I'm hoping these
assertions will be helpful in hunting down some elusive bugs we still have.

This patch adds a third use of the "0x200" constant (the nineth bit of
the rflags register is the interrupt flag), so it replaces them by a
new symbolic name, processor::rflags_if.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b7620ca2

vma_fault: propagate exception frame to fault handlers · 7ab5f9e8

Glauber Costa authored 11 years ago

We suddenly stop propagating the exception frame down the vma_fault path.
There is no reason not to propagate it further, aside from the fact that
currently there are no users. However, aside from the fact that it presents a
more consistent frame passing, I intend to use it for the JVM balloon.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7ab5f9e8

Dec 10, 2013

Fix shared-object finalization · 4d24b90a

Nadav Har'El authored 11 years ago


This patch fixes two bugs in shared-object finalization, i.e., running
its static destructors before it is unloaded. The bugs were seen when
osv::run()ing a test program using libboost_unit_test_framework-mt.so,
which crashed after the test program finished.

The two related bugs were:

1. We need to call the module's destructors (run_fini_funcs()) *before*
   removing it from the module list, otherwise the destructors will not
   be able to call functions from this module! (we got a symbol not
   found error in the destructor).

2. We need to unload the modules needed by this module *before* unloading
   this module, not after like was (implictly) done until now.
   This makes sense because of symmetry (during a module load, the needed
   modules are loaded after this one), but also practically: a needed
   module's destructor (in our case, boost unit test framework) might refer
   to objects in the needing module (in our case, the test program),
   so we cannot call the needed module's destructor after we've already
   unloaded the needing module.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4d24b90a

sched: remove thread::_ref_counter · 9c9262b0

Avi Kivity authored 11 years ago


As ref() is now never called, we can remove the reference counter and
make unref() unconditional.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9c9262b0

sched: add a wake() function that is safe to use on a thread that may terminate · dc40b49e

Avi Kivity authored 11 years ago


One problem with wake() is, if the thread that it is waking can cuncurrently
exit, that it may touch freed memory belonging to the thread structure.

Fix by separating the state that wake() touches into a detached_state
structure, and free that using rcu.

Add a thread_handle class that references only this detached state, and
accesses it via rcu.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

dc40b49e

rcu: add preempt_lock_in_rcu · 18cee681

Avi Kivity authored 11 years ago


rcu_read_lock disables preemption, but this is an implementation detail
and users should not make use of it.

Add preempt_lock_in_rcu that takes advantage of the implementation detail
and does nothing, but allows users to explicitly disable preemption.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

18cee681

mmu: support MAP_UNINITIALIZED flag · f7249e73

Glauber Costa authored 11 years ago


When seeing this flag, pages fault in should not be filled with zeroes or any
other patterns, and should rather be just left alone in whatever state we find
them at.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f7249e73

Dec 09, 2013

mmu: don't bail out on huge page failure · eeeaf888

Glauber Costa authored 11 years ago


Addressing that FIXME, as part of my memory reclamation series. But this
is ready to go already. The goal is to retry to serve the allocation if a
huge page allocation fails, and fill the range with the 4k pages.

The simplest and most robust way I've found to do that was to propagate the
error up until we reach operate(). Being there, all we need to do is to
re-walk the range with 4k pages instead of 2Mb.

We could theoretically just bail out on huge pages and move hp_end, but,
specially when we have reclaim, it is likely that one operation will fail while
the upcoming ones may succeed.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
[ penberg: s/NULL/nullptr/ ]
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

eeeaf888

Dec 08, 2013

sched: implement pthread_detach · afcf4735

Glauber Costa authored 11 years ago


I needed to call detach in a test code of mine, and this is isn't implemented.
The code I wrote to use it may or may not stay in the end, but nevertheless,
let's implement it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

afcf4735

sched: standardize call to _cleanup · d754d662

Glauber Costa authored 11 years ago

set_cleanup is quite a complicated piece of code. It is very easy to get it to
race with other thread destruction sites, which was made abundantly clear when
we tried to implement pthread detach.

This patch tries to make it easier, by restricting how and when set_cleanup can
be called. The trick here is that currently, a thread may or may not have a
cleanup function, and through a call to set_cleanup, our decision to cleanup
may change.

From this point on, set_cleanup will only tell us *how* to cleanup. If and
when, is a decision that we will make ourselves. For instance, if a thread
is block-local, the destructor will be called by the end of the block. In
that case, the _cleanup function will be there anyhow: we'll just not call
it.

We're setting here a default cleanup function for all created threads, that
just deletes the current thread object. Anything coming from pthread will try
to override it by also deleting the pthread object. And again, it is important
to node that they will set up those cleanup function unconditionally.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

d754d662

sched: Use an integer for thread ids · 5c652796

Glauber Costa authored 11 years ago

Linux uses a 32-bit integer for pid_t, so let's do it as well. This is because
there are function in which we have to return our id back to the application.
One application is gettid, that we already have in the tree.

Theoretically, we could come up with a mapping between our 64-bit id and the
Linux one, but since we have to maintain the mapping anyway, we might as well
just use the Linux pids as our default IDs. The max size for that is 32-bit. It
is not enough if we're just allocating pids by bumping the counter, but again,
since we will have to maintain the bitmaps, 32-bit will allow us as much as 4
billion PIDs.

avi: remove unneeded #include

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

5c652796

sched: initialize clock later · 1d31d9c3

Glauber Costa authored 11 years ago

Right now we are taking a clock measure very early for cpu initialization.
That forces an unnecessary dependency between sched and clock initializations.

Since that lock is used to determine for how long the cpu has been running, we
can initialize the runtime later, when we init the idle thread. Nothing should
be running before it. After doing this, we can move the sched initialization
a bit earlier.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

1d31d9c3

Dec 06, 2013

interrupt: Allow not providing a bottom half thread for easy_register · 4d2eb1d2

Asias He authored 11 years ago


Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4d2eb1d2

loader: Move back __libc_stack_end to the loader. · 3c062cf6

Benoît Canet authored 11 years ago

The exact location of the stack end is not needed by java so move back this
variable to restore the state to what was done before the mkfs.so/cpiod.so split.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3c062cf6

Dec 05, 2013

elf: fix DPTMOD64 relocations with null symbol · ccc021b2

Avi Kivity authored 11 years ago


Some objects have DPTMOD64 relocations with the null symbol, presumably to
set the value to 0 (it is too much trouble to write zero into the file during
the link phase, apparently).  Detect this condition and write the zero.

Needed by JDK8.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ccc021b2

sched: add function to find a thread given its id · a5a3aedc

Glauber Costa authored 11 years ago


Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a5a3aedc

sched: change thread list into an unordered map · 54a0beff

Glauber Costa authored 11 years ago

A list can be slow to search for an element if we have many threads. Even under
normal load, the number of threads we span may not be classified as huge, but it
is not tiny either.

Change it to a map so we can implement functions that operate on a given thread
without that much overhead - O(1) for the common case. Note that ideally we would
use an unordered_set, that doesn't require an extra key. However, that would also
mean that the key is implicit and set to be of type key_type&. Threads are not very
lightweight to create for search purposes, so we go for a id-as-key approach.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

54a0beff

core: make osv::run return shared pointer or null and store it on loader.cc · fbb54062

Benoît Canet authored 11 years ago


This restore the original behavior of osv::run in place before the mkfs.so and
cpiod.so split committed a day ago.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

fbb54062

Dec 04, 2013

Add a few missing __*_chk functions · 2f4b8777

Nadav Har'El authored 11 years ago


When source code is compiled with -D_FORTIFY_SOURCE on Linux, various
functions are sometimes replaced by __*_chk variants (e.g., __strcpy_chk)
which can help avoid buffer overflows when the compiler knows the buffer's
size during compilation.

If we want to run source compiled on Linux with -D_FORTIFY_SOURCE (either
deliberately or unintentionally - see issue #111), we need to implement
these functions otherwise the program will crash because of a missing
symbol. We already implement a bunch of _chk functions, but we are
definitely missing some more.

This patch implements 6 more _chk functions which are needed to run
the "rogue" program (mentioned in issue #111) when compiled with
-D_FORTIFY_SOURCE=1.

Following the philosophy of our existing *_chk functions, we do not
aim for either ultimate performance or iron-clad security for our
implementation of these functions. If this becomes important, we
should revisit all our *_chk functions.

When compiled with -D_FORTIFY_SOURCE=2, rogue still doesn't work, but
not because of a missing symbol, but because it fails reading the
terminfo file for a yet unknown reason (a patch for that issue will
be sent separately).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2f4b8777

epoll: convert to a derived class of file · 008d5245
Avi Kivity authored 11 years ago
```
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
```
008d5245

Dec 03, 2013

mmu: Simplify mmu::map_file interface · c4fb37c3

Raphael S. Carvalho authored 11 years ago


Besides simplifying mmu::map_file interface, let's make it more similar
to mmu::map_anon.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c4fb37c3

loader: Allow to execute multiple .so file in sequential order. · 3352489d

Benoît Canet authored 11 years ago


A ';' at the end of a parameter mark the end of a program's arguments list.

The goal of this patch is to be able to split mkfs.so in to parts mkfs.so and
cpiod.so.

The patch uses a full spirit parser to escape "" and split commands around ';'.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3352489d

epoll: convert to make_file() · a4adc4f9

Avi Kivity authored 11 years ago


Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a4adc4f9

Dec 01, 2013

Fix crash on malformed command line · 082ff373

Nadav Har'El authored 11 years ago


Before this patch, OSv crashes or continuously reboots when given unknown
command line paramters, e.g.,

        scripts/run.py -c1 -e "--help --z a"

With this patch, it says, as expected that the "--z" option is not
recognized, and displays the list of known options:

    unrecognised option '--z'
    OSv options:
      --help                show help text
      --trace arg           tracepoints to enable
      --trace-backtrace     log backtraces in the tracepoint log
      --leak                start leak detector after boot
      --nomount             don't mount the file system
      --noshutdown          continue running after main() returns
      --env arg             set Unix-like environment variable (putenv())
      --cwd arg             set current working directory
    Aborted

The problem was that to parse the command line options, we used Boost,
which throws an exception when an unrecognized option is seen. We need
to catch this exception, and show a message accordingly.

But before this patch, C++ exceptions did not work correctly during this
stage of the boot process, because exceptions use elf::program(), and we
only set it up later. So this patch moves the setup of the elf::program()
object earlier in the boot, to the beginning of main_cont().

Now we'll be able to use C++ exceptions throughout main_cont(), not just
in command line parsing.

This patch also removes the unused "filesystem" paramter of
elf::program(), rather than move the initializion of this empty object
as well.

Fixes #103.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

082ff373