Commits · 45e4042157cedebd9e221ee726aa9d2f00c05b6a · Verlässliche Systemsoftware / projects / osv

Jul 08, 2013

pcpu-worker: add a per cpu worker thread that can execute work items · 45e40421

Guy Zana authored 11 years ago

simply allows setting up and execution of a handler in the context of
a specified CPU, the handler is defined staticly in compile time, and
is invoked when the worker_item is signaled for a specied CPU.

doesn't use locks to avoid unnecessary contention.

needed for the per-cpu memory allocator, instead of creating additional
n threads (per each cpu), the plan is to define and register a simple
handler (lambda function).

example of usage:

void say_hello()
{
    debug("Hello, world!");
}

// define hello_tester as a worker_item
PCPU_WORKERITEM(hello_tester, [] { say_hello(); });

.
.
.

// anywhere in the code:
hello_tester.signal(sched::cpus[1]);

// will invoke say_hello() in the context of cpu 1

Thanks to Avi for adding code that I was able to copy & paste :)

45e40421

tests: test the spsc lockless ring · c00f2dc1

Guy Zana authored 11 years ago

2 threads are created on 2 different vcpus, one consumer and one producer.

Both threads are pushing and popping concurrently 1,000,000,000 elements,
the producer is pushing a random number between 0 and 7 and consumer pops
those numbers. Both of the threads keeps track on the values they
pushed/popped. per each value, the number of pushed elements
should be equal to the number of popped elements.

 - ring_spsc: 14.8 Mop/s per core

c00f2dc1

lockless: spsc ring buffer of fixed size · 120a19d9

Guy Zana authored 11 years ago

single-producer / single-consumer lockless ring buffer of fixed size.

    1. _begin points to the head of the ring, _end points to the tail.
       both advance forward, adding items (_end++), consuming items (_begin++)

    2. all indexes of the ring are running, so the condition for empty ring is
       different than full ring.
            *) Empty -> (_begin == _end)
            *) Full -> (_end - _begin = MaxSize)

    3. uses only store and load operations, no rmw.

120a19d9

arch: add CACHELINE_ALIGNED macro · 2f6cf02f
Guy Zana authored 11 years ago

2f6cf02f

Jul 04, 2013

Fix misalignment bug in lock-free mutex_t · 506c4642

Nadav Har'El authored 11 years ago

Because the lockfree::mutex type is heavy in C++, in C it was just a
char[40], with static assertions verifying that this 40 is indeed the
correct size.

But this wasn't quite correct - if a mutex is contained in another
structure, the char[40] can come anywhere, while the lockfree::mutex
C++ type starts with a pointer, so the compiler adds padding to
ensure 8-byte alignment of this pointer. So we had a serious bug
when C and C++ code which used the same structure containing a
mutex. For example, struct ifqueue from if_var.h. Each language
thought the mutex was positioned in a different place inside this
structure, causing attempts to access the same mutex at different
addresses (offset by 4 bytes).

Now, instead of a char[40], the C mutex will be a union, of
char[40] (to achieve the desired size), and a void* (to
achieve the desired alignment). Static assertions verify that
this indeed results in the same alignment and the same size
as that of the C++ object.

506c4642

Trivial: used relaxed memory ordering in lockfree::mutex::owned() · 40afd5b5

Nadav Har'El authored 11 years ago

No reason to do anything but relaxed memory ordering here.
We aren't touching any other memory in this function, and owned()
isn't expected to synchronize access to any memory.

40afd5b5

Trivial: get rid of sglist entirely · 18beb1b6
Dor Laor authored 11 years ago

18beb1b6

Sglist virtio usage refactore · ddef97f6

Dor Laor authored 11 years ago

Use a single instance per queue vector of sglist data.
Before this patch sglist was implemented as a std::list
which caused it to allocate heap memory and travel through pointers.
Now we use a single vector per queue to temoprary keep the
buffer data between the upper virtio layer and the lower one.

ddef97f6

Trivial code movement, preparation for sglist changes. · 481f4ebd
Dor Laor authored 11 years ago

481f4ebd

Fix typo bug in lockfree::mutex::try_lock() · a90a024a

Nadav Har'El authored 11 years ago

From the "how could I have not seen this before" department, we bring you
the following bug:

try_lock() as a last attempt tries to get a handoff from a concurrent
unlock() to take the lock for itself. Obviously, it should only do so
if the "handoff" variable is set (non-zero). The typo was that we checked
!handoff instead of handoff, so we took a non-existant handoff and
possibly got the lock while someone else was still holding it.

This bug is especially visible in Java, which uses try_lock() a lot
in its monitor implementation (it spins with try_lock() and only
uses lock() after giving up the spinning) - but wasn't visible in
my tst-mutex.cc which didn't test try_lock().

a90a024a

Add assertions to lockfree::mutex::unlock() · 3432dc6e

Nadav Har'El authored 11 years ago

assert() that unlock() is only done on a mutex locked by this thread.
This helped catch a couple of nasty lockfree-mutex bugs (see next
patches), so I think it's good to have such a check, at least for
now.

I previously avoided this assertion thinking it will significantly
impact performance, but it seems it doesn't have a big impact (at
least in benchmarks I tried). In the future we should perhaps have
a mechanism to disable assertions (or some of them) in release
builds.

3432dc6e

Add sigignore() function · 7b91c1e9

Nadav Har'El authored 11 years ago

Add the simple sigignore() function. It's obsolete (it's from the System V
days..) but memcached uses it if it exists (and it does exist on Linux).

7b91c1e9

bsd: xenbus original files · d7af1416
Glauber Costa authored 11 years ago

d7af1416
bsd: original gnttab file · df37a640
Glauber Costa authored 11 years ago
```
Included separately so we can easily diff for our changes
```
df37a640

bsd: xenstore file · 1a67d927

Glauber Costa authored 11 years ago

This is the unmodified BSD xenstore file. I am including it separately
so we can easily diff our modifications.

1a67d927

Jul 03, 2013

Add a virtio_kick trace point · 37008a2b
Dor Laor authored 11 years ago

37008a2b

Use the right memory barriers when accessing the 'used' data. · 8bbb7d26

Dor Laor authored 11 years ago

When the guest reads the used pointer it must make sure that
all the other relevant writes to the descriptors by the
host are up to date.
The same goes for the other direction when the guest
updates the ring in get_buf - the write to used_event should
make all the descriptor changes visible to the host

8bbb7d26

Use nullptr instead of reinterpret_cast · 723e8efc
Dor Laor authored 11 years ago

723e8efc

Reduce singifincantly the amount of tx interrupts · 6a323929

Dor Laor authored 11 years ago

Instead of enabling interrupts for tx by the host when we have
a single used pkt in the ring, wait until we have 1/2 ring.
This improves the amount of tx irqs from one per pkt to practically
zero (note that we actively call tx_gc if there is not place on the
ring when doing tx). There was a 40% performance boost on the
netperf rx test

6a323929

Convert avail._idx to a std::atomic variable As a result, get rid of the... · be462b48

Dor Laor authored 11 years ago

Convert avail._idx to a std::atomic variable As a result, get rid of the compiler barrier calls since we use std:atomic load/store instead

be462b48

Use std::atomic load on the share guest:host index · 4f7085d4
Dor Laor authored 11 years ago

4f7085d4

Jul 02, 2013

create a zfs filesystem for /usr · 26ef8e21
Christoph Hellwig authored 11 years ago

26ef8e21
tst-zfs-mount: test reading from a file · 9796da74
Christoph Hellwig authored 11 years ago

9796da74
zfs: wire up vop_read · 8694f457
Christoph Hellwig authored 11 years ago

8694f457
zfs: add a missing ZFS_EXIT in zfs_lookup · 497c2b9c
Christoph Hellwig authored 11 years ago
```
We currently stub it out so it doesn't really matter, but let's stick to
the code conventions.
```
497c2b9c
zfs: handle a not fully constructed vnode in zfs_inactive · 69a62f03
Christoph Hellwig authored 11 years ago
```
We'll get this when a lookup fails.
```
69a62f03

libc: use TLS in readdir · 7a6b8963

Christoph Hellwig authored 11 years ago

That way multiple threads can use readdir at the same time safelẏ.
While not required by Posix, glibc and other common implementations work
this way.

7a6b8963

libc: return proper errors from readdir_r · 6c8a21ad
Christoph Hellwig authored 11 years ago

6c8a21ad
vfs: don't set errno in ll_readdir · 2bd91d7f
Christoph Hellwig authored 11 years ago
```
readdir_r doesn't want errno set at all, and readdir does by itself already.
```
2bd91d7f

Implement __vsnprintf_chk · 2689d43d

Nadav Har'El authored 11 years ago

__vsnprintf_chk can be used by glibc when it knows the size of a buffer,
to verify that the size parameter given to vsnprintf() doesn't overflow
the buffer. Unfortunately, it happens to be used in libevent.so (used
by memcached).

2689d43d

Fix msleep() timeout bug · 81b22cd6

Nadav Har'El authored 11 years ago

When msleep() is woken up by a timeout and not wakeup(), it would leave
its wait record - a pointer to a structure on the stack - in _evlist.
When wakeup() or wakeup_one() is next called, it can find this structure
on the stack - and it now points to random garbage (or even unmapped
area, if the thread exited).

This bug caused an easy to reproduce crash in memcached.

The fix here is to, in case of timeout, reacquire the lock and remove this
thread from _evlist - if it still there (it might no longer be there,
if the timeout and wakeup raced).

81b22cd6

Jul 01, 2013

Merge branch 'sched2' · ef155e5a
Avi Kivity authored 11 years ago
```
Timer-based thread preemption.
```
ef155e5a

sched: reduce over-eager timer rearming · 0a2cc13e

Avi Kivity authored 11 years ago

If a thread schedules out before its time slice expires (due to blocking)
and is then made runnable and scheduled back in again, the scheduler will
update the preemption timer, first cancelling it (or possibly setting it
for another thread), then re-setting it for the original thread. Usually
the new expiration time will be later than the original.

For a context switch intensive load, this causes a lot of timer re-arms,
which are fairly slow.

Reduce the amount of re-arms by remembering the last expiration time we set.
If the new expiration time comes after that time, don't bother rearming the
timer. Instead, the expiration of the already-set timer will recalculate
the expiration time and set a new expiration time.

This reduces timer arming greatly, and speeds up context switches back to
their performance before the preemption patches.

0a2cc13e

sched: initialize borrow · 3b7ede7d

Avi Kivity authored 11 years ago

It will be reset during a future context switch, but best to start with a
clean slate.

3b7ede7d

sched: initialize idle thread vruntime · a56b258e
Avi Kivity authored 11 years ago
```
Make sure it starts out high so we don't see needless context switches on
startup.
```
a56b258e
sched: add vruntime to sched_switch tracepoint · 56c55def
Avi Kivity authored 11 years ago

56c55def

sched: preemption timer · 0d1606ce

Avi Kivity authored 11 years ago

When switching to a new thread, or when a new thread is queued, calculate
the vruntime difference to set a timer for the point in time when we need to
context switch again.

This change makes tst-fpu.so threads run completely in parallel.

0d1606ce

ELF: Allow using prelinked libraries · 4723a592

Nadav Har'El authored 11 years ago

This patch allows to use shared libraries copied from Linux, even if
they already underwent "prelink" modifications.

Without this patch, if the prelinked library used one of our library
functions (e.g., malloc()), its address would not be correctly
calculated, and the call will be made to a random address, and crash.

A few details about the problem and the fix:

When using a function from a shared library, calls go to a ".plt"
function, which, to make a long story short, jumps to a address written
in the ".plt.got" entry for this function.

Normally, and this is what the existing code expected, the .plt.got entry
is initialized to point to the second instruction in the .plt function
(after the jump), which calls the linker to look up the function.
This allows lazy symbol lookup, but after the lookup, the next calls
to the PLT function will jump to the right function address immediately.

But prelinking changes that - the prelinker looks up the symbols once
and fills the .plt.got with the addresses it assigned to the functions
in the other library. These addresses do not make any sense in the
context of OSV (or any other system besides the one that the prelinker
ran on), so we cannot use them, and instead need to overwrite them
again with links to the .plt functions.

4723a592

sched: adjust timer expiration when the clock is not running · 6085ca37

Avi Kivity authored 11 years ago

Change the condition for expiration to allow an exact match between the
current time and the timer expiration time.

This helps the scheduler keep going when the clock is still not running
during system startup.  Ideally we shouldn't depend on such details, but
fixing this is too complicated for now.

6085ca37

sched: stop vruntime overflow in idle thread · 5c96e31c

Avi Kivity authored 11 years ago

We set the idle thread's vruntime at the maximum possible, to ensure it
has the lowest priority, but this can cause an overflow when we add the
idle thread's running time.

Detect the overflow and prevent it.

5c96e31c