Commits · 29a4b2dd310a46174493dede7814c9defdd38055 · Verlässliche Systemsoftware / projects / osv

Jun 17, 2013

java: add tracepoint interface class · 29a4b2dd
Avi Kivity authored 11 years ago
```
Exposes tracepoints and counters
```
29a4b2dd

Avi Kivity authored 11 years ago

Add a facility to run functions when a tracepoint is hit.  This is independent
of logging; you can add a probe function with logging disabled or enabled.

e58366ac

percpu: add allocatable per-cpu counters · 43ded08d
Avi Kivity authored 11 years ago

43ded08d

kvmclock: make per-cpu · d0c2b805

Avi Kivity authored 11 years ago

The kvmclock ABI requires it to calculate system time using values for the cpu
it is running on.

Do this by:
  - changing the system time structure to be per-cpu
  - adding a cpu notifier so that per-cpu MSRs are initialized for each cpu
  - hacking around initialization order issues

d0c2b805

sched: add cpu notifiers · 26a9fd2f

Avi Kivity authored 11 years ago

cpu notifiers are called whenever a cpu is brought up (and one day, down), so
that drivers that manage the cpu (for example, kvmclock) can initialize
themselves.

The callback is called on the cpu that is being brought up.

26a9fd2f

percpu: per-cpu variables · 63ab89b6

Avi Kivity authored 11 years ago

Per-cpu variables can be used from contexts where preemption is disabled
(such as interrupts) or when migration is impossible (pinned threads) for
managing data that is replicated for each cpu.

The API is a smart pointer to a variable which resolves to the object's
location on the current cpu.

Define:

   #include <osv/percpu.hh>

   PERCPU(int, my_counter);
   PERCPU(foo, my_foo);

Use:

   ++*my_counter;
   my_foo->member = 7;

63ab89b6

Jun 16, 2013
- sched: initialize cpu::id earlier · e822050c
  Avi Kivity authored 11 years ago
  
  Make it usable in the constructor, for percpu initialization.
  e822050c
Jun 14, 2013

acpi: map bios into our linear mapping · da8d3daf

Glauber Costa authored 11 years ago

The algorithm we follow for memory discovery is quite simple: iterate over the
E820h map, and for every type 1 (== RAM) memory, we increment total size, and
map it linearly to our address space mappings.

That breaks on xen, however. I have no idea what is seabios doing for KVM, but
xen's hvmloader will put most of the ACPI tables at a reserved region around
physical address 0xfc000000. When we try to parse the ACPI tables, we will reach
an unmapped portion of the address space and fault (BTW, those faults are really
hard to debug, we're triple faulting directly, at least in my setup)

Luckily, the acpi driver code is prepared for such scenarios, and before using
any of that memory it will call map and unmap functions - we just don't implement
it.

This patch implements the necessary map function - and while we are at it, its
unmap counterpart. This is all far away from being performance critical, so I am
being as dump as possible and just servicing the request without tracking any
previous state.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

da8d3daf

Jun 13, 2013

Merge branch 'trace' · 00627067
Avi Kivity authored 11 years ago
```
Make the function tracer work again.
```
00627067

sched: make sure the idle thread is never scheduled when other threads can run · 4b199e18

Avi Kivity authored 11 years ago

Set it to have the largest possible vruntime, so it is only ever picked up
from the queue if there are no other threads on it.

This avoids a pointless context switch to the idle thread, where it wakes,
sees another thread, and switches out again.

4b199e18

Implement usleep() · 0be1f9e0

Nadav Har'El authored 11 years ago

usleep() was scrubbed out of POSIX in 2008, and not used in Java, but
it does exist in glibc and is damn easy to use compared to its newer
relative, nanosleep, so I want to use it in a test.

0be1f9e0

shutdown_af_local: add missing locks · 67923f37

Nadav Har'El authored 11 years ago

As Avi pointed out, shutdown_af_local() did read-modify-write to
f->f_flags without locking. Add the missing locks.

67923f37

Clean up todo/mutex · 2be1f82d
Nadav Har'El authored 11 years ago
```
Remove things already done.
```
2be1f82d

Jun 12, 2013

trace: prevent recursion in function tracing · a7f920f2

Avi Kivity authored 11 years ago

The functions that are used in function tracing must not themselves be
traced, lest we recurse endlessly. Rather than marking them all with
no_instrument_function, keep a nesting counter and check if we're nested.
This way only the functions used for the test must not be traced.

a7f920f2

trace: disable interrupts during tracing · f057d76b

Avi Kivity authored 11 years ago

Seeing a trace from an interrupt incurred while tracing can be confusing, so
disable them.

f057d76b

x64: provide some uninstrumented versions of irq flag manipulation functions · f7af76ee

Avi Kivity authored 11 years ago

In the tracer, we don't want interrupt manipulation to cause recursion, so
provide uninstrumented versions of select functions.

f7af76ee

Optionally enable (disabled by default) lock-free mutex · a2cb99d5

Nadav Har'El authored 11 years ago

This patch optionally enables, at compile-time, OSV to use the lock-free
mutex instead of the spin-lock-based mutex. To use the lock-free mutex,
change the line "#undef LOCKFREE_MUTEX" in include/osv/mutex.h to
"#define LOCKFREE_MUTEX".

LOCKFREE_MUTEX is currently disabled by default, awaiting a few more
tests, but at this point I'm happy to say that beyond one known
unrelated bug (see details below), it seems the lock-free mutex is
fairly stable, and survives all tests and benchmarks I threw at it.

The remaining known bug involves a thread destruction race between
complete() and join(): complete wake()s the joiner thread, which in
rare cases can really quickly delete the thread's stack, before wake()
returns, causing a crash on return from wake(). This bug is really
unrelated to the lock-free mutex, but for some unknown reason I can
only reproduce it with the lock-free mutex on the SPECjvm2008 "sunflow"
benchmark.

To make lockfree::mutex our default mutex, this patch does the following
when LOCKFREE_MUTEX is defined:

1. In core/mutex.cc, #ifndef away out the old mutex code, leaving the
   spinlock code in case someone wants to use it directly.

2. In include/osv/mutex.h, do different things in C++ and C (remember that
   lockfree::mutex is a C++ class, and cannot be used directly from C):

   * In C++, simply make mutex and mutex_t aliases for lockfree::mutex.

   * In C, make struct mutex and mutex_t an opaque 40-byte structure (in
     C++ compilation, we verify that this 40 is indeed the C++ class's
     length), and define the operations on it.

3. In libc/pthread.cc, if LOCKFREE_MUTEX, unfortunately the new mutex
   will not fit into pthread_mutex_t, and neither will condvar fit now
   into pthread_cond_t. So use a lazily allocated mutex or condvar, using
   the lazy_indirect<> template.

a2cb99d5

run.py: allow command line select of alternative hypervisors · 3c354af1

Glauber Costa authored 11 years ago

I have been commenting in and out lines in this script to choose the right
underlying hypervisor to run. So here is the automated version of it. I haven't
choosed the letters h or y because they usually denote help and yes,
respectively. Also not a kvm/no-kvm boolean because very soon we will like to
include xen.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

3c354af1

update loader Copyright. · f6e4bfb7

Glauber Costa authored 11 years ago


Now that we can actually see the debug message, print our name on it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

f6e4bfb7

console: dump early messages to the serial port · 4fd29712

Glauber Costa authored 11 years ago

We can use a very simple outb instruction to write data to the serial
port in case we don't have a console implementation yet. We don't need
to be fancy, and even limited functionality will already allow us to
print messages early, (specially debug).

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

4fd29712

run console earlier · 4b5afd0f

Glauber Costa authored 11 years ago

We could benefit from the console being ready a bit earlier. The only
dependency that I see to it are the interrupts that need to be working. So as
soon as we initialize the ioapic, we should be able to initialize the console.

This is not the end of story: we still need an even earlier console to debug the
driver initialization functions, and I was inclined to just leave console_init
where it is, for now.

But additionally, I felt that loader is really a more appropriate place for
that than vfs_init... So I propose we switch. In the mean time, it might help
debug things that happen between ioapic init and the old vfs_init (mem
initialization, smp bring up, etc)

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

4b5afd0f

Jun 11, 2013

Add missing change to lfmutex.cc · d96020e3
Nadav Har'El authored 11 years ago
```
Sorry, forgot one hunk in "git add -p" :(
```
d96020e3

opensolaris: fix cv_timedwait() · da5939f9

Avi Kivity authored 11 years ago

cv_timedwait() has a relative timeout expressed in ticks (microseconds),
while condvar_wait() has an absolute timeout expressed in nanoseconds.

Replace the 1:1 macro with a function that does the correct translation.

da5939f9

lock-free queue: update test · 954bd855
Nadav Har'El authored 11 years ago
```
Updated test with the new API. Sorry about forgetting to commit it earlier.
```
954bd855
Merge remote-tracking branch 'origin/master' · ccad2d9b
Glauber Costa authored 11 years ago

ccad2d9b

lock-free queue: change pop() API · df217cef

Nadav Har'El authored 11 years ago

Changed lockfree::queue_mpsc (lock-free multiple-producer single-consumer
queue) pop() API. Instead of returning separately the popped value (type
T) and a boolean success/failed, now return a pointer to the
linked_item<T> originally pushed(), or nullptr on failure.

The new pop() API is slightly more awkward (instead of using the returned
value directly, you need to take it's field "value") but has an important
new feature: It gives you not just the value, but also the address where
this value is stored. So it is now possible to change value in its original
structure. This allows us to implement our (by now) traditional waitqueue
technique: The values on the queue are thread pointers, and the popper,
before waking up a thread, sets the thread pointer to zero - this way the
woken up thread knows it isn't a spurious wakeup.

A followup patch will use this capability to cleanup lockfree::mutex not
to abuse the "owner" field as a notifier of non-spurious wakeups. After
that patch, "owner" will be used only for implementing recursive mutex,
and will not be part of the wakeup protocol.

df217cef

lock-free mutex: change and clarify the role of depth and owner · 99b477dc

Nadav Har'El authored 11 years ago

The way "owner" and "depth" were used in lockfree::mutex was messy.
Ideally, neither should be needed if we implemented a non-recursive
mutex, but following the design of ::mutex, we (re)used "owner" also as
a marker that a thread was waken to have the lock (and it's not a
spurious wake).

After this patch, owner and depth are used in lockfree::mutex *only*
for implementing a recursive mutex, and building a non-recursive
mutex should be as simple as dropping these two variables.

In more detail:

1. "owner" is no longer used to tell a woken up thread that the wake
wasn't spurious. Instead, zero the thread in the wait-record. This
is a familiar idiom, which we already used a few times before.

2. "depth" isn't an atomic variable, so it should only be read by the
same thread which set it, and this wasn't the case previously. Now,
depth is only ever written (set to 1, incremented or decremented)
or read by the lock-holding thread - and not the lock releasing thread.

3. "owner" needs to be an atomic variable - a non-lock-holding thread
needs to read it and recognize it isn't holding the lock - but it
doesn't need any special memory ordering with other variables, so
should always be accessed with "relaxed" memory ordering.

99b477dc

sched::thread - fix very rare join() hang · cf4c46c4

Nadav Har'El authored 11 years ago

Fixed a very rare hang in sched::thread::join():

thread::complete() included the following code:

    _status.store(status::terminated);
    if (_joiner) {
        _joiner->wake();
    }

If we are preempted right after setting status to "terminated", but
before calling wake(), this thread will never be scheduled again (it will
remain in the terminated status forever), and will never call wake() -
so the join()ing thread may just wait forever.

I saw this happening in a test case that started and joined millions of
threads, and eventually the join() hangs.

The solution is to enclose the above lines with preempt_disable()/
preempt_enable().

cf4c46c4

wake(): Don't miss a preemption opportunity · aee17ba4

Nadav Har'El authored 11 years ago

wake() normally calls schedule(), but doesn't do so if preemption is
disabled. So we should mark need_reschedule = true, to suggest that
schedule() can be called when preemption is later enabled.

aee17ba4

x64: prevent nested exceptions from corrupting the stack · 1dbddc44

Avi Kivity authored 11 years ago

Due to the need to handle the x64 red zone, we use a separate stack for
exceptions via the IST mechanism. This means that a nested exception will
reuse the parent exception's stack, corrupting it. It is usually very hard
to figure out the root cause when this happens.

Prevent this by setting up a separate stack for nested exceptions, and
aborting immediately if a nested exception happens.

1dbddc44

Jun 10, 2013
- x64: switch ifunc resolvers to processor::features() · 48323f23
  Avi Kivity authored 11 years ago
  
  Now that processor::features() is initialized early enough, we can use it in ifunc dispatchers.
  48323f23
- x64: make processor::features usable early on · 7fb119b0
  Avi Kivity authored 11 years ago
  
  cpuid is useful for ifunc-dispatched functions (like memcpy), so we can select the correct function based on available processor features. Make processor::features available early to support this. We use a static function-local variable to ensure it is initialized early enough.
  7fb119b0
- Merge branch 'repmov' · a810513e
  Avi Kivity authored 11 years ago
  
  Optimized memcpy() using rep mobsb
  a810513e
- libc: optimized memcpy() · 06dd5386
  Avi Kivity authored 11 years ago
  
  If the cpu supports "Enhanced REP MOVS / STOS" (ERMS), use an rep movsb instruction to implement memcpy. This speeds up copies significantly, especially large misaligned ones.
  06dd5386
- elf: add support for STT_IFUNC · 5948f0d7
  Avi Kivity authored 11 years ago
  
  Used for implementing support for indirect functions referenced from shared libraries.
  5948f0d7
- add a simple ZFS test · 42687ce9
  Christoph Hellwig authored 11 years ago
  
  42687ce9
- zfs: port vdev_disk · d7fce044
  Christoph Hellwig authored 11 years ago
  
  d7fce044
- zfs: port vdev_file · b6c95040
  Christoph Hellwig authored 11 years ago
  
  b6c95040
- zfs: initialize during boot · e91d6a13
  Christoph Hellwig authored 11 years ago
  
  e91d6a13
- solaris: handle the KM_ZERO flag · 449ade70
  Christoph Hellwig authored 11 years ago
  
  449ade70