Commits · d96020e3170465d6500d6ab6f6ba36f8c61f4047 · Verlässliche Systemsoftware / projects / osv

Jun 11, 2013

Add missing change to lfmutex.cc · d96020e3
Nadav Har'El authored 11 years ago
```
Sorry, forgot one hunk in "git add -p" :(
```
d96020e3

opensolaris: fix cv_timedwait() · da5939f9

Avi Kivity authored 11 years ago

cv_timedwait() has a relative timeout expressed in ticks (microseconds),
while condvar_wait() has an absolute timeout expressed in nanoseconds.

Replace the 1:1 macro with a function that does the correct translation.

da5939f9

lock-free queue: update test · 954bd855
Nadav Har'El authored 11 years ago
```
Updated test with the new API. Sorry about forgetting to commit it earlier.
```
954bd855
Merge remote-tracking branch 'origin/master' · ccad2d9b
Glauber Costa authored 11 years ago

ccad2d9b

lock-free queue: change pop() API · df217cef

Nadav Har'El authored 11 years ago

Changed lockfree::queue_mpsc (lock-free multiple-producer single-consumer
queue) pop() API. Instead of returning separately the popped value (type
T) and a boolean success/failed, now return a pointer to the
linked_item<T> originally pushed(), or nullptr on failure.

The new pop() API is slightly more awkward (instead of using the returned
value directly, you need to take it's field "value") but has an important
new feature: It gives you not just the value, but also the address where
this value is stored. So it is now possible to change value in its original
structure. This allows us to implement our (by now) traditional waitqueue
technique: The values on the queue are thread pointers, and the popper,
before waking up a thread, sets the thread pointer to zero - this way the
woken up thread knows it isn't a spurious wakeup.

A followup patch will use this capability to cleanup lockfree::mutex not
to abuse the "owner" field as a notifier of non-spurious wakeups. After
that patch, "owner" will be used only for implementing recursive mutex,
and will not be part of the wakeup protocol.

df217cef

lock-free mutex: change and clarify the role of depth and owner · 99b477dc

Nadav Har'El authored 11 years ago

The way "owner" and "depth" were used in lockfree::mutex was messy.
Ideally, neither should be needed if we implemented a non-recursive
mutex, but following the design of ::mutex, we (re)used "owner" also as
a marker that a thread was waken to have the lock (and it's not a
spurious wake).

After this patch, owner and depth are used in lockfree::mutex *only*
for implementing a recursive mutex, and building a non-recursive
mutex should be as simple as dropping these two variables.

In more detail:

1. "owner" is no longer used to tell a woken up thread that the wake
wasn't spurious. Instead, zero the thread in the wait-record. This
is a familiar idiom, which we already used a few times before.

2. "depth" isn't an atomic variable, so it should only be read by the
same thread which set it, and this wasn't the case previously. Now,
depth is only ever written (set to 1, incremented or decremented)
or read by the lock-holding thread - and not the lock releasing thread.

3. "owner" needs to be an atomic variable - a non-lock-holding thread
needs to read it and recognize it isn't holding the lock - but it
doesn't need any special memory ordering with other variables, so
should always be accessed with "relaxed" memory ordering.

99b477dc

sched::thread - fix very rare join() hang · cf4c46c4

Nadav Har'El authored 11 years ago

Fixed a very rare hang in sched::thread::join():

thread::complete() included the following code:

    _status.store(status::terminated);
    if (_joiner) {
        _joiner->wake();
    }

If we are preempted right after setting status to "terminated", but
before calling wake(), this thread will never be scheduled again (it will
remain in the terminated status forever), and will never call wake() -
so the join()ing thread may just wait forever.

I saw this happening in a test case that started and joined millions of
threads, and eventually the join() hangs.

The solution is to enclose the above lines with preempt_disable()/
preempt_enable().

cf4c46c4

wake(): Don't miss a preemption opportunity · aee17ba4

Nadav Har'El authored 11 years ago

wake() normally calls schedule(), but doesn't do so if preemption is
disabled. So we should mark need_reschedule = true, to suggest that
schedule() can be called when preemption is later enabled.

aee17ba4

x64: prevent nested exceptions from corrupting the stack · 1dbddc44

Avi Kivity authored 11 years ago

Due to the need to handle the x64 red zone, we use a separate stack for
exceptions via the IST mechanism. This means that a nested exception will
reuse the parent exception's stack, corrupting it. It is usually very hard
to figure out the root cause when this happens.

Prevent this by setting up a separate stack for nested exceptions, and
aborting immediately if a nested exception happens.

1dbddc44

Jun 10, 2013
- x64: switch ifunc resolvers to processor::features() · 48323f23
  Avi Kivity authored 11 years ago
  
  Now that processor::features() is initialized early enough, we can use it in ifunc dispatchers.
  48323f23
- x64: make processor::features usable early on · 7fb119b0
  Avi Kivity authored 11 years ago
  
  cpuid is useful for ifunc-dispatched functions (like memcpy), so we can select the correct function based on available processor features. Make processor::features available early to support this. We use a static function-local variable to ensure it is initialized early enough.
  7fb119b0
- Merge branch 'repmov' · a810513e
  Avi Kivity authored 11 years ago
  
  Optimized memcpy() using rep mobsb
  a810513e
- libc: optimized memcpy() · 06dd5386
  Avi Kivity authored 11 years ago
  
  If the cpu supports "Enhanced REP MOVS / STOS" (ERMS), use an rep movsb instruction to implement memcpy. This speeds up copies significantly, especially large misaligned ones.
  06dd5386
- elf: add support for STT_IFUNC · 5948f0d7
  Avi Kivity authored 11 years ago
  
  Used for implementing support for indirect functions referenced from shared libraries.
  5948f0d7
- add a simple ZFS test · 42687ce9
  Christoph Hellwig authored 11 years ago
  
  42687ce9
- zfs: port vdev_disk · d7fce044
  Christoph Hellwig authored 11 years ago
  
  d7fce044
- zfs: port vdev_file · b6c95040
  Christoph Hellwig authored 11 years ago
  
  b6c95040
- zfs: initialize during boot · e91d6a13
  Christoph Hellwig authored 11 years ago
  
  e91d6a13
- solaris: handle the KM_ZERO flag · 449ade70
  Christoph Hellwig authored 11 years ago
  
  449ade70
- kvmclock: fix compilation error · 99d5cb28
  Glauber Costa authored 11 years ago
  
  The compiler rightfully complains that we use the symbol at sizeof instead of its dereferrence. Fix it. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
  99d5cb28
Jun 09, 2013

lazy_indirect: typos in comment · 3302a1aa
Nadav Har'El authored 11 years ago

3302a1aa

Add, and use, new abort(msg) function · e6208f1e

Nadav Har'El authored 11 years ago

Recently Guy fixed abort() so it will *really* not infinitely recurse trying
to print a message, using a lock, causing a new abort, ad infinitum.

Unfortunately, that didn't fix one remaining case: DUMMY_HANDLER (see
exceptions.cc) used the idiom

        debug(....); abort();

which can again cause infinite recursion - a #GP calls debug() which causes a
new #GP, which again calls debug, etc.

Instead of the above broken idiom, created a new function abort(msg), which is
just like the familiar abort(), just changes the "Aborted" message to some
other message (a constant string). Like abort(), the new variant abort(msg) will
only print the message once even if called recursively - and uses a lockless
version of debug().

Note that the new abort(msg) is a C++-only API. C will only see the abort(void)
which is extern "C". At first I wanted to call the new function panic(msg) and
export it to C, but gave when I saw the name panic() was already in use in a
bunch of BSD code.

e6208f1e

Add a simpler tracepoint syntax · 3cd499af

Nadav Har'El authored 11 years ago

Before this patch tracepoints required manual tracepoint numbers:

    tracepoint<17, unsigned int> trace_event1("event1", "%d");
    tracepoint<18> trace_event2("event2", "");

While the numbers only had to be unique in the file, so it wasn't hard to
achieve, this was still tedious and verbose.

This patch adds an additional, shorter, tracepoint syntax, not requiring
those numbers and in general less repetitive and clearer:

    TRACEPOINT(trace_event1, "%d", unsigned int);
    TRACEPOINT(trace_event2, "");

The first parameter is the name of the generated tracepoint function -
it's convenient to see it so that grep can find it, for example.
The name of the tracepoint itself (shown in "osv trace") is this string
without the prefix trace_ (if the name of the tracepoint function, for some
reason, doesn't start with trace_, the full function name is used as the
tracepoint name).

3cd499af

callouts: perform wake() outside of lock. · fb97aadf

Guy Zana authored 11 years ago

given the scheduler state, wake() sometimes rescheduled the dispatcher thread
immidiately, and then it blocked on the mutex that is still held by the caller
of _callout_stop_safe_locked().

this patch does wake() outside of the lock to eliminated these spurious context
switches.

fb97aadf

logger: changed debugging calls to use tprintf_X variants · 41360613
Guy Zana authored 11 years ago

41360613
tst-sockets: return an error if poll fails · 5a298ef9
Guy Zana authored 11 years ago

5a298ef9

logger: remove not very useful logger tags · f806a014

Guy Zana authored 11 years ago

controlling the logger output in tests is simplistic, it is enough to
change the severity level to logger_error and all logging messages
will appear.

f806a014

virtio: remove some extra verbose debug messages · f928d509

Guy Zana authored 11 years ago

next patch is changing the debug function to tprintf_d, which may be
implemented as do{}while(0) in case conf-logger_debug=0, in this case
compilation breaks complaining about unused variables.

these debug prints are not very useful today, so I remove them. Instead,
they may be implemented as tracepoints.

f928d509

make: added conf-logger_debug option which is on when mode=debug · 9da6f1bc

Guy Zana authored 11 years ago

the conf-logger_debug option control whether tprintf_d (verbose debug
logging) is enabled at all, if it is set to 0 then tprintf_d is
implemented as do{}while(0)

9da6f1bc

trace: add tracepoint to mutex_lock, mutex_unlock · bf7fe5e7
Guy Zana authored 11 years ago

bf7fe5e7
trace: record time for a tracepoint · 82491864
Avi Kivity authored 11 years ago

82491864
clock: don't instrument · c19dfddc
Avi Kivity authored 11 years ago
```
We want to use the clock in tracepoints.
```
c19dfddc
sched: add tracepoint for preemption events · 76dae193
Avi Kivity authored 11 years ago

76dae193

lock-free mutex: add C API for lockfree::mutex methods · d3c156d8

Nadav Har'El authored 11 years ago

Add some extern "C" versions of the lockfree::mutex methods. They will be
necessary for providing the lockfree::mutex type to C code - as you'll see
in later patches, C code will see an opaque type, a byte array, and will
call these functions to operate on it.

d3c156d8

lock-free mutex: Avoid including <sched.hh> in <lockfree/mutex.hh> · 874b10ee

Nadav Har'El authored 11 years ago

Do not include <sched.hh> in <lockfree/mutex.hh>.

Including <sched.hh> creates annoying dependency loops when we (in a
later patch) replace <osv/mutex.h> by <lockfree/mutex.hh>, and some header
files included by <sched.hh> themselves use mutexes, so they include
<osv/mutex.h>. This last include does nothing (because of the include guard)
but the compiler never finished reading osv/mutex.h (it was only in its
beginning, when it included sched.hh) so the inner-included code lacks the
definitions it assumes after including mutex.h.

874b10ee

lockfree mutex: add owned() and getdepth() methods · 1ed9c982

Nadav Har'El authored 11 years ago

Add to lockfree::mutex the simple owned() and getdepth() methods which
existed in ::mutex and were used in a few places - so we need these
methods to switch from ::mutex to lockfree::mutex.

1ed9c982

lockfree mutex: fix wait/wake bug · 9bcf790a

Nadav Har'El authored 11 years ago

When I developed lockfree mutex, the scheduler, preemption, and related code
still had a bunch of bugs, so I resorted to some workarounds that in hindsite
look unnecessary, and even wrong.

When it seemed that I can only wake() a thread in wait state, I made an
effort to enter the waiting state (with "wait_guard") before adding the
thread to the to-awake queue, and then slept with schedule(). The biggest
problem with this approach was that a spurious wake(), for any reason, of
this thread, would cause us to end the lock() - and fail on an assert that
we're the owners of the lock - instead of repeating the wait. When switching
to lockfree mutex, the sunflow benchmark (for example) would die on this
assertion failure.

So now I replaced this ugliness with our familiar idiom, wait_until().
The thread is in running state for some time after entering queue, so
it might be woken when not yet sleeping and the wake() will be ignored -
but this is fine because part of our protocol is that the wake() before
waking also sets "owner" to the to-be-woken thread, and before sleeping
we check if owner isn't already us.

Also changed the comment on "owner" to say that it is not *just* for
implementing a recursive mutex, but also nessary for the wakeup protocol.

9bcf790a

Implement shutdown() on unix domain sockets · 30f6e9dd

Nadav Har'El authored 11 years ago

The existing shutdown() code only worked with AF_INET sockets, and returned
ENOTSOCK for AT_LOCAL sockets, because we implemented the latter sockets in
completely different code (in af_local.cc).

So in uipc_syscalls_wrap.c, the same place we call a the special af-local
socketpair(), we also need to call the special af-local shutdown().

The way we do it is a bit ugly, but effective: shutdown() first calls
shutdown_af_local(), and if that returns ENOTSOCK (so it's not an af_local
socket), we continue trying the regular socket shutdown code.

A better way would have been to add shutdown() to the fileops table -
either the generic one (why not?), or invent a new mechanism whereby
certain file types (in this case, "sockets" of all types) can have additional
ops tables including in this case a shutdown() operation. Linux has
something of this sort for implementing shutdown().

30f6e9dd

Jun 06, 2013

msix: provide high priority handler when registering interrupt · 66066b07

Guy Zana authored 11 years ago

we have to disable virio interrupts before msix EOI so disabling
must be done in the ISR handler context. This patch adds an std::function
isr to the bindings.

references to the rx and tx queues are saved as well (_rx_queue and _tx_queue),
so they can be used in the ISR context.

this patch reduces virtio net rx interrupts by a factor of 450.

66066b07

virtio: expose disable/enable interrupts · 8efa9c02
Guy Zana authored 11 years ago

8efa9c02