Commits · da8d3daf1431dd580db49b2d845718b7cacbea6e · Verlässliche Systemsoftware / projects / osv

Jun 14, 2013

acpi: map bios into our linear mapping · da8d3daf

Glauber Costa authored 11 years ago

The algorithm we follow for memory discovery is quite simple: iterate over the
E820h map, and for every type 1 (== RAM) memory, we increment total size, and
map it linearly to our address space mappings.

That breaks on xen, however. I have no idea what is seabios doing for KVM, but
xen's hvmloader will put most of the ACPI tables at a reserved region around
physical address 0xfc000000. When we try to parse the ACPI tables, we will reach
an unmapped portion of the address space and fault (BTW, those faults are really
hard to debug, we're triple faulting directly, at least in my setup)

Luckily, the acpi driver code is prepared for such scenarios, and before using
any of that memory it will call map and unmap functions - we just don't implement
it.

This patch implements the necessary map function - and while we are at it, its
unmap counterpart. This is all far away from being performance critical, so I am
being as dump as possible and just servicing the request without tracking any
previous state.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

da8d3daf

Jun 13, 2013

Merge branch 'trace' · 00627067
Avi Kivity authored 11 years ago
```
Make the function tracer work again.
```
00627067

sched: make sure the idle thread is never scheduled when other threads can run · 4b199e18

Avi Kivity authored 11 years ago

Set it to have the largest possible vruntime, so it is only ever picked up
from the queue if there are no other threads on it.

This avoids a pointless context switch to the idle thread, where it wakes,
sees another thread, and switches out again.

4b199e18

Implement usleep() · 0be1f9e0

Nadav Har'El authored 11 years ago

usleep() was scrubbed out of POSIX in 2008, and not used in Java, but
it does exist in glibc and is damn easy to use compared to its newer
relative, nanosleep, so I want to use it in a test.

0be1f9e0

shutdown_af_local: add missing locks · 67923f37

Nadav Har'El authored 11 years ago

As Avi pointed out, shutdown_af_local() did read-modify-write to
f->f_flags without locking. Add the missing locks.

67923f37

Clean up todo/mutex · 2be1f82d
Nadav Har'El authored 11 years ago
```
Remove things already done.
```
2be1f82d

Jun 12, 2013

trace: prevent recursion in function tracing · a7f920f2

Avi Kivity authored 11 years ago

The functions that are used in function tracing must not themselves be
traced, lest we recurse endlessly. Rather than marking them all with
no_instrument_function, keep a nesting counter and check if we're nested.
This way only the functions used for the test must not be traced.

a7f920f2

trace: disable interrupts during tracing · f057d76b

Avi Kivity authored 11 years ago

Seeing a trace from an interrupt incurred while tracing can be confusing, so
disable them.

f057d76b

x64: provide some uninstrumented versions of irq flag manipulation functions · f7af76ee

Avi Kivity authored 11 years ago

In the tracer, we don't want interrupt manipulation to cause recursion, so
provide uninstrumented versions of select functions.

f7af76ee

Optionally enable (disabled by default) lock-free mutex · a2cb99d5

Nadav Har'El authored 11 years ago

This patch optionally enables, at compile-time, OSV to use the lock-free
mutex instead of the spin-lock-based mutex. To use the lock-free mutex,
change the line "#undef LOCKFREE_MUTEX" in include/osv/mutex.h to
"#define LOCKFREE_MUTEX".

LOCKFREE_MUTEX is currently disabled by default, awaiting a few more
tests, but at this point I'm happy to say that beyond one known
unrelated bug (see details below), it seems the lock-free mutex is
fairly stable, and survives all tests and benchmarks I threw at it.

The remaining known bug involves a thread destruction race between
complete() and join(): complete wake()s the joiner thread, which in
rare cases can really quickly delete the thread's stack, before wake()
returns, causing a crash on return from wake(). This bug is really
unrelated to the lock-free mutex, but for some unknown reason I can
only reproduce it with the lock-free mutex on the SPECjvm2008 "sunflow"
benchmark.

To make lockfree::mutex our default mutex, this patch does the following
when LOCKFREE_MUTEX is defined:

1. In core/mutex.cc, #ifndef away out the old mutex code, leaving the
   spinlock code in case someone wants to use it directly.

2. In include/osv/mutex.h, do different things in C++ and C (remember that
   lockfree::mutex is a C++ class, and cannot be used directly from C):

   * In C++, simply make mutex and mutex_t aliases for lockfree::mutex.

   * In C, make struct mutex and mutex_t an opaque 40-byte structure (in
     C++ compilation, we verify that this 40 is indeed the C++ class's
     length), and define the operations on it.

3. In libc/pthread.cc, if LOCKFREE_MUTEX, unfortunately the new mutex
   will not fit into pthread_mutex_t, and neither will condvar fit now
   into pthread_cond_t. So use a lazily allocated mutex or condvar, using
   the lazy_indirect<> template.

a2cb99d5

run.py: allow command line select of alternative hypervisors · 3c354af1

Glauber Costa authored 11 years ago

I have been commenting in and out lines in this script to choose the right
underlying hypervisor to run. So here is the automated version of it. I haven't
choosed the letters h or y because they usually denote help and yes,
respectively. Also not a kvm/no-kvm boolean because very soon we will like to
include xen.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

3c354af1

update loader Copyright. · f6e4bfb7

Glauber Costa authored 11 years ago


Now that we can actually see the debug message, print our name on it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

f6e4bfb7

console: dump early messages to the serial port · 4fd29712

Glauber Costa authored 11 years ago

We can use a very simple outb instruction to write data to the serial
port in case we don't have a console implementation yet. We don't need
to be fancy, and even limited functionality will already allow us to
print messages early, (specially debug).

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

4fd29712

run console earlier · 4b5afd0f

Glauber Costa authored 11 years ago

We could benefit from the console being ready a bit earlier. The only
dependency that I see to it are the interrupts that need to be working. So as
soon as we initialize the ioapic, we should be able to initialize the console.

This is not the end of story: we still need an even earlier console to debug the
driver initialization functions, and I was inclined to just leave console_init
where it is, for now.

But additionally, I felt that loader is really a more appropriate place for
that than vfs_init... So I propose we switch. In the mean time, it might help
debug things that happen between ioapic init and the old vfs_init (mem
initialization, smp bring up, etc)

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

4b5afd0f

Jun 11, 2013

Add missing change to lfmutex.cc · d96020e3
Nadav Har'El authored 11 years ago
```
Sorry, forgot one hunk in "git add -p" :(
```
d96020e3

opensolaris: fix cv_timedwait() · da5939f9

Avi Kivity authored 11 years ago

cv_timedwait() has a relative timeout expressed in ticks (microseconds),
while condvar_wait() has an absolute timeout expressed in nanoseconds.

Replace the 1:1 macro with a function that does the correct translation.

da5939f9

lock-free queue: update test · 954bd855
Nadav Har'El authored 11 years ago
```
Updated test with the new API. Sorry about forgetting to commit it earlier.
```
954bd855
Merge remote-tracking branch 'origin/master' · ccad2d9b
Glauber Costa authored 11 years ago

ccad2d9b

lock-free queue: change pop() API · df217cef

Nadav Har'El authored 11 years ago

Changed lockfree::queue_mpsc (lock-free multiple-producer single-consumer
queue) pop() API. Instead of returning separately the popped value (type
T) and a boolean success/failed, now return a pointer to the
linked_item<T> originally pushed(), or nullptr on failure.

The new pop() API is slightly more awkward (instead of using the returned
value directly, you need to take it's field "value") but has an important
new feature: It gives you not just the value, but also the address where
this value is stored. So it is now possible to change value in its original
structure. This allows us to implement our (by now) traditional waitqueue
technique: The values on the queue are thread pointers, and the popper,
before waking up a thread, sets the thread pointer to zero - this way the
woken up thread knows it isn't a spurious wakeup.

A followup patch will use this capability to cleanup lockfree::mutex not
to abuse the "owner" field as a notifier of non-spurious wakeups. After
that patch, "owner" will be used only for implementing recursive mutex,
and will not be part of the wakeup protocol.

df217cef

lock-free mutex: change and clarify the role of depth and owner · 99b477dc

Nadav Har'El authored 11 years ago

The way "owner" and "depth" were used in lockfree::mutex was messy.
Ideally, neither should be needed if we implemented a non-recursive
mutex, but following the design of ::mutex, we (re)used "owner" also as
a marker that a thread was waken to have the lock (and it's not a
spurious wake).

After this patch, owner and depth are used in lockfree::mutex *only*
for implementing a recursive mutex, and building a non-recursive
mutex should be as simple as dropping these two variables.

In more detail:

1. "owner" is no longer used to tell a woken up thread that the wake
wasn't spurious. Instead, zero the thread in the wait-record. This
is a familiar idiom, which we already used a few times before.

2. "depth" isn't an atomic variable, so it should only be read by the
same thread which set it, and this wasn't the case previously. Now,
depth is only ever written (set to 1, incremented or decremented)
or read by the lock-holding thread - and not the lock releasing thread.

3. "owner" needs to be an atomic variable - a non-lock-holding thread
needs to read it and recognize it isn't holding the lock - but it
doesn't need any special memory ordering with other variables, so
should always be accessed with "relaxed" memory ordering.

99b477dc

sched::thread - fix very rare join() hang · cf4c46c4

Nadav Har'El authored 11 years ago

Fixed a very rare hang in sched::thread::join():

thread::complete() included the following code:

    _status.store(status::terminated);
    if (_joiner) {
        _joiner->wake();
    }

If we are preempted right after setting status to "terminated", but
before calling wake(), this thread will never be scheduled again (it will
remain in the terminated status forever), and will never call wake() -
so the join()ing thread may just wait forever.

I saw this happening in a test case that started and joined millions of
threads, and eventually the join() hangs.

The solution is to enclose the above lines with preempt_disable()/
preempt_enable().

cf4c46c4

wake(): Don't miss a preemption opportunity · aee17ba4

Nadav Har'El authored 11 years ago

wake() normally calls schedule(), but doesn't do so if preemption is
disabled. So we should mark need_reschedule = true, to suggest that
schedule() can be called when preemption is later enabled.

aee17ba4

x64: prevent nested exceptions from corrupting the stack · 1dbddc44

Avi Kivity authored 11 years ago

Due to the need to handle the x64 red zone, we use a separate stack for
exceptions via the IST mechanism. This means that a nested exception will
reuse the parent exception's stack, corrupting it. It is usually very hard
to figure out the root cause when this happens.

Prevent this by setting up a separate stack for nested exceptions, and
aborting immediately if a nested exception happens.

1dbddc44

Jun 10, 2013
- x64: switch ifunc resolvers to processor::features() · 48323f23
  Avi Kivity authored 11 years ago
  
  Now that processor::features() is initialized early enough, we can use it in ifunc dispatchers.
  48323f23
- x64: make processor::features usable early on · 7fb119b0
  Avi Kivity authored 11 years ago
  
  cpuid is useful for ifunc-dispatched functions (like memcpy), so we can select the correct function based on available processor features. Make processor::features available early to support this. We use a static function-local variable to ensure it is initialized early enough.
  7fb119b0
- Merge branch 'repmov' · a810513e
  Avi Kivity authored 11 years ago
  
  Optimized memcpy() using rep mobsb
  a810513e
- libc: optimized memcpy() · 06dd5386
  Avi Kivity authored 11 years ago
  
  If the cpu supports "Enhanced REP MOVS / STOS" (ERMS), use an rep movsb instruction to implement memcpy. This speeds up copies significantly, especially large misaligned ones.
  06dd5386
- elf: add support for STT_IFUNC · 5948f0d7
  Avi Kivity authored 11 years ago
  
  Used for implementing support for indirect functions referenced from shared libraries.
  5948f0d7
- add a simple ZFS test · 42687ce9
  Christoph Hellwig authored 11 years ago
  
  42687ce9
- zfs: port vdev_disk · d7fce044
  Christoph Hellwig authored 11 years ago
  
  d7fce044
- zfs: port vdev_file · b6c95040
  Christoph Hellwig authored 11 years ago
  
  b6c95040
- zfs: initialize during boot · e91d6a13
  Christoph Hellwig authored 11 years ago
  
  e91d6a13
- solaris: handle the KM_ZERO flag · 449ade70
  Christoph Hellwig authored 11 years ago
  
  449ade70
- kvmclock: fix compilation error · 99d5cb28
  Glauber Costa authored 11 years ago
  
  The compiler rightfully complains that we use the symbol at sizeof instead of its dereferrence. Fix it. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
  99d5cb28
Jun 09, 2013

lazy_indirect: typos in comment · 3302a1aa
Nadav Har'El authored 11 years ago

3302a1aa

Add, and use, new abort(msg) function · e6208f1e

Nadav Har'El authored 11 years ago

Recently Guy fixed abort() so it will *really* not infinitely recurse trying
to print a message, using a lock, causing a new abort, ad infinitum.

Unfortunately, that didn't fix one remaining case: DUMMY_HANDLER (see
exceptions.cc) used the idiom

        debug(....); abort();

which can again cause infinite recursion - a #GP calls debug() which causes a
new #GP, which again calls debug, etc.

Instead of the above broken idiom, created a new function abort(msg), which is
just like the familiar abort(), just changes the "Aborted" message to some
other message (a constant string). Like abort(), the new variant abort(msg) will
only print the message once even if called recursively - and uses a lockless
version of debug().

Note that the new abort(msg) is a C++-only API. C will only see the abort(void)
which is extern "C". At first I wanted to call the new function panic(msg) and
export it to C, but gave when I saw the name panic() was already in use in a
bunch of BSD code.

e6208f1e

Add a simpler tracepoint syntax · 3cd499af

Nadav Har'El authored 11 years ago

Before this patch tracepoints required manual tracepoint numbers:

    tracepoint<17, unsigned int> trace_event1("event1", "%d");
    tracepoint<18> trace_event2("event2", "");

While the numbers only had to be unique in the file, so it wasn't hard to
achieve, this was still tedious and verbose.

This patch adds an additional, shorter, tracepoint syntax, not requiring
those numbers and in general less repetitive and clearer:

    TRACEPOINT(trace_event1, "%d", unsigned int);
    TRACEPOINT(trace_event2, "");

The first parameter is the name of the generated tracepoint function -
it's convenient to see it so that grep can find it, for example.
The name of the tracepoint itself (shown in "osv trace") is this string
without the prefix trace_ (if the name of the tracepoint function, for some
reason, doesn't start with trace_, the full function name is used as the
tracepoint name).

3cd499af

callouts: perform wake() outside of lock. · fb97aadf

Guy Zana authored 11 years ago

given the scheduler state, wake() sometimes rescheduled the dispatcher thread
immidiately, and then it blocked on the mutex that is still held by the caller
of _callout_stop_safe_locked().

this patch does wake() outside of the lock to eliminated these spurious context
switches.

fb97aadf

logger: changed debugging calls to use tprintf_X variants · 41360613
Guy Zana authored 11 years ago

41360613
tst-sockets: return an error if poll fails · 5a298ef9
Guy Zana authored 11 years ago

5a298ef9