Commits · 788b0a8d72788a3437a111d0cbc54b461095f21e · Verlässliche Systemsoftware / projects / osv

May 28, 2013
- tests: make TCPConcurrentDownloads runnable second time · 788b0a8d
  Guy Zana authored 11 years ago
  
  788b0a8d
May 27, 2013

Add "memory clobber" to STI and CLI instructions · a200bb7a

Nadav Har'El authored 11 years ago

When some code section happens to be called from both thread context and
interrupt context, and we need mutual exclusion (we don't want the interrupt
context to start while the critical section is in the middle of running in
thread context), we surround the critical code section with CLI and STI.

But we need the compiler to assure us that writes to memory done between
the calls to CLI and STI stay between them. For example, if we have

    thread context:                 interrupt handler:

      CLI;                          a--;
      a++;
      STI;

We don't want the a++ to be moved by the compiler before the CLI. We also
don't want the compiler to save a's value in a register and only actually
write it back to the memory location 'a' after the STI (when an interrupt
handler might be concurrently writing). We also don't want the compiler
to remember a's last value in a register and use it again after the next
CLI.

To ensure these things, we need the "memory clobber" option on both the CLI
and STI instructions. The "volatile" keyword is not enough - it guarantees
that the instruction isn't deleted or moved, but not that stuff that
should have been in memory isn't just in registers.

Note that Linux also has these memory clobbers on sti() and cli().
Linus Torvals explains in a post from 1996 why these were necessary:
http://lkml.indiana.edu/hypermail/linux/kernel/9605/0214.html

All that being said, we never noticed a bug caused by the missing
"memory" clobbers. But better safe than sorry....

a200bb7a

tests: add new Java test, TCPConcurrentDownloads · 313b7ff2

Guy Zana authored 11 years ago

the new test creates several thread, each is downloading a ~280MB file
concurrently. It aims to find concurrency bugs in the netport, dead locks,
etc... but (fortunately) it pass.

313b7ff2

Merge branch 'ifunc' · 67dd9d0e

Avi Kivity authored 11 years ago

Optimize fsbase reload using the wrfsbase instruction, where available.

67dd9d0e

tests: add iterations to the TCPDownloadFile test · fe5029da
Guy Zana authored 11 years ago
```
making it even more stressful ;)
```
fe5029da

bsd: add -D SMP to build.mak, used by atomic.h · 03fef2b5

Guy Zana authored 11 years ago

the atomic operations in atomic.h weren't really atomic. this is
something that was missed in the netport and now fixed.

03fef2b5

Update misc.bin for boost-system · 78bff744
Avi Kivity authored 11 years ago

78bff744
external: update misc.bin for boost-filesystem · 626a908b
Avi Kivity authored 11 years ago

626a908b

provide a utsname structure · 43c3f6dd

Christoph Hellwig authored 11 years ago

ZFS wants direct access to a global utsname structure. Provide one from
core OSv code and rewrite uname to just copy it out. To ease this move
the uname implementation to a C file as this allows using designated
initializers and avoids the casting mess around memcpy.

43c3f6dd

debug: introduce debug_ll() and use it in abort() · 6ebb582e

Guy Zana authored 11 years ago

the debug() console function is taking a lock before it access the console driver,
it does that by acquiring a mutex which may sleep.

since we want to be able to debug (and abort) in contexts where it's not possible sleep,
such as in page_fault, a lockless debug print method is introduced.

previousely to this patch, any abort on page_fault would cause an "endless" recursive
abort() loop which hanged the system in a peculiar state.

6ebb582e

abort: debug() may cause an abort() as well · 9ef87755

Guy Zana authored 11 years ago

the current code handles the case of recursive aborts incorrectly, while
the existing comment is very precise :)

9ef87755

zfs: enable the solaris compat <sys/vnode.h> · 1cf30084

Christoph Hellwig authored 11 years ago

This allows to remove various #if 0'ed code using vnode_t or znode_t to be
compiled, both in the current headers and future ported code.

1cf30084

zfs: use get_cpuid() · 5799fb60
Christoph Hellwig authored 11 years ago

5799fb60
solaris: allow code using TASKQ_THREADS_CPU_PCT to build · c23a9917
Christoph Hellwig authored 11 years ago

c23a9917
solaris: define ptob using PAGE_SIZE instead of PAGE_SHIFT · 7794d411
Christoph Hellwig authored 11 years ago

7794d411
solaris: provide more credential related stubs · 13bd4625
Christoph Hellwig authored 11 years ago

13bd4625
solaris: include the right <sys/param.h> · c94c43ff
Christoph Hellwig authored 11 years ago

c94c43ff
solaris: provide an issig stub · f49ddbca
Christoph Hellwig authored 11 years ago

f49ddbca

netport: provide a proc0 definition · b7bdb42c

Christoph Hellwig authored 11 years ago

BSD and Solaris code likes to pass this identifier for the "kernel" process
to various thread creation routines.  Make our life simpler by providing it
and ignoring it.

b7bdb42c

netport: provide physical memory size information · c55bb8b9
Christoph Hellwig authored 11 years ago

c55bb8b9

May 26, 2013

Fix comment · 0ad3e2e0

Nadav Har'El authored 11 years ago

The comment about unlocking the irq_lock was put on the wrong line.
Move it (and rephrase it a bit - the word "release" immediately after
calling an unrelated release() function - is confusing).

0ad3e2e0

Fix two bugs in yield() · 19e52ce6

Nadav Har'El authored 11 years ago

yield() had two bugs - thanks to Avi for pinpointing them:

1. It used runqueue.push_back() to put the thread on the run queue, but
push_back() is a low-level function which can only be used if we're
sure that the item we're pushing has the vruntime suitable for being
*last* on the queue - and in the code we did nothing to ensure this
is the case (we should...). So use insert_equal(), not push_back().

2. It was wrongly divided into two separate sections with interrupts
disabled. The scheduler code is run both at interrupt time (e.g.,
preempt()) and at thread time (e.g., wait(), yield(), etc.) so to
guarantee it does not get called in the middle of itself, it needs
to disable interrupts while working on the (per-cpu) runqueue.
In the broken yield() code, we disabled interupts while adding the
current thread to the run queue, and then again to reschedule.
Between those two critical sections, an interrupt could arrive and
do something with this thread (e.g., migrate it to another CPU, or
preempt it), yet when the interrupt returns yield continues to run
reschedule_from_interrupt which assumes that this thread is still
running, and definitely not on the run queue.

Bug 2 is what caused the crashes in the tst-yield.so test. The fix is
to hold the interrupts disabled throughout the entire yield().
This is easiest done with with lock_guard, which simplifies the flow
of this function.

19e52ce6

sched: avoid unnecessary FPU saving · 947b49ee

Nadav Har'El authored 11 years ago

Because of Linux's calling convention, it should not be necessary to
save the FPU state when a reschedule is caused by a function call.

Because we had a bug and forgot to save the FPU state when calling
a signal handler, and because this signal handler can cause a reschedule,
we had to save the FPU on any reschedule. But after fixing that bug, we
no longer need these unnecessary FPU saves.

The "sunflow" benchmark still runs well after this patch.

947b49ee

x64: use wrfsbase for faster context switching, when available · 3c9ba28d
Avi Kivity authored 11 years ago
```
Drops context switch time by ~80ns.
```
3c9ba28d
x64: add wrfsbase accessor · bb33c998
Avi Kivity authored 11 years ago
```
Faster way to write fsbase on newer processors.
```
bb33c998

elf: add support for IRELATIVE relocations · e8c62c5e

Avi Kivity authored 11 years ago

This are used to support ifunc functions, which are resolved at load-time
based on cpu features, rather than at link time.

e8c62c5e

tests: fix tst-timer, enable test1() and avoid using hardcoded values · c0eebe80
Guy Zana authored 11 years ago

c0eebe80
tst-ctxsw: refine to have warm-up time and fixed execution time · e7dde95d
Avi Kivity authored 11 years ago

e7dde95d

sched: fix preempt_enable() when interrupts are disabled · 84046f23

Avi Kivity authored 11 years ago

If interrupts are disabled, we must not call schedule() even if
the preemption counter says we need to, as the context is not preemption
safe.

This manifested itself in a wake() within a timer causing a schedule(),
which re-enabled interrupts, which caused further manipulation of the timer
list to occur concurrently with the next interrupt, resulting in corruption.

Fixes timer stress test failure.

84046f23

tests: extend timer test, make it more stressful · 858f7666

Guy Zana authored 11 years ago

noticed an assert in the download file test that was related to timers,
this test reproduce the same bug.

858f7666

signal handling: fix FPU clobbering bug · 94a7015e

Nadav Har'El authored 11 years ago

This patch adds missing FPU-state saving when calling signal handlers.
The state is saved on the stack, to allow nesting of signal handling
(delivery of a second signal while a first signal's handler is running).

In Linux calling conventions, the FPU state is caller-saved, i.e., a
called function can use FPU at will because the caller is assumed to have
saved it if needed. However, signal handlers are called asynchronously,
possibly in the middle of some FPU computation without that computation
getting a chance to save its state. So we must save this state before calling
the signal handling function.

Without this fix, we had problems even if the signal handlers themselves
did not use the FPU. A typical scenario - which we encountered in the
"sunflow" benchmark - is that the signal handler does something which uses
a mutex (e.g., malloc()) and causes a reschedule. The reschedule, not a
preempt(), thinks it does not need to save the FPU state, and the thread
we switch to clobbers this state.

94a7015e

tests: make the TCPDownloadFile test a bit more stressful · 1e66b4eb
Guy Zana authored 11 years ago
```
now it downloads a ~200MB file and validating md5 on it.
```
1e66b4eb
loader.py: revisit osv info callouts given the new implementation · cc6ff2f9
Guy Zana authored 11 years ago

cc6ff2f9
tests: fix tst-bsd-callout test, uncomment test #1 · bbf0aa7b
Guy Zana authored 11 years ago

bbf0aa7b

bsd: rewrite callout mechanism to avoid a race · c5dbdcc8

Guy Zana authored 11 years ago

the old implementation used threads for dispatching callouts, each callout
owned a thread and it suffered from a race where a callout structure could have
been deleted before the callout thread even begun.

the current implementation is dispatching all callouts in a single callout
dispatcher thread, it maintains an ordered list of callouts to achieve that.

this patch solve a crash with the TCPDownloadFile test, that now proceeds.

c5dbdcc8

loader.py: make osv info threads more readable · 4a63055e
Guy Zana authored 11 years ago

4a63055e

uma: fix order to finit/dtor in uma_zfree() · 357d68d7

Guy Zana authored 11 years ago

the mbuf ext buffer is freed in the dtor, so it should be called before finit.
this is fixing a crash that surfaced by using the conf-memory-debug=1

357d68d7

bsd: zero a few uninitialized structures · 62712056

Guy Zana authored 11 years ago

this haven't caused a real bug, I just noticed it while tracing.
it may be dangerous if in some flow, the stack will not be zeroed

62712056

bsd: implement panic() · 91db62cf
Guy Zana authored 11 years ago

91db62cf
libc: fix strerror_r, should not appear as __xpg_strerror_r() · c481b94d
Guy Zana authored 11 years ago
```
strerror_r is needed by the JVM in order to print errors correctly.
```
c481b94d