Commits · bbec1a1836e15ea42d07dff00cf69a698fe17fb6 · Verlässliche Systemsoftware / projects / osv

Dec 11, 2013

mmu: Use addr_range for vma constructors · bbec1a18

Pekka Enberg authored 11 years ago


Make vma constructors more strongly typed by using the addr_range type.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

bbec1a18

core: vma abstract base class · d83db0c9

Pekka Enberg authored 11 years ago


Separate the common vma code to an abstract base class that's inherited
by anon_vma and file_vma.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d83db0c9

mmu: fix allocate_intermediate_level · 3e6763f7

Glauber Costa authored 11 years ago


We have recently seen a problems where eventual page fault outside
application would occur.

I managed to track that down to my huge page failure patch, but wasn't
really sure what was going on. Kudos for Raphael, then,  that figured
out that the problem happened when allocate_intemediate_level was called
from split_huge_page.

The problem here, is that in that case we do *not* enter
allocate_intermediate_level with the pte emptied, and were previously
expecting the write of the new pte to happen unconditionally. The
compare_exchange broke it, because the exchange doesn't really happen.

There are many ways to fix this issue, but the least confusing of them,
given that there are other callers to this function that could
potentially display this problem, is to do some deffensive programming
and clearly separate the semantics of both types of callers.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Tested-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3e6763f7

Verify slow page fault only happens when preemption is allowed · b7620ca2

Nadav Har'El authored 11 years ago


Once page_fault() checks that this is not a fast fixup (see safe_load()),
we reach the page-fault slow path, which needs to allocate memory or
even read from disk, and might sleep.

If we ever get such a slow page-fault inside kernel code which has
preemption or interrupts disabled, this is a serious bug, because the
code in question thinks it cannot sleep. So this patch adds two
assertions to verify this.

The preemptable() assertion is easily triggered if stacks are demand-paged
as explained in commit 41efdc1c (I have
a patch to solve this, but it won't fit in the margin).
However, I've also seen this assertion without demand-paged stacks, when
running all tests together through testrunner.so. So I'm hoping these
assertions will be helpful in hunting down some elusive bugs we still have.

This patch adds a third use of the "0x200" constant (the nineth bit of
the rflags register is the interrupt flag), so it replaces them by a
new symbolic name, processor::rflags_if.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b7620ca2

vma_fault: propagate exception frame to fault handlers · 7ab5f9e8

Glauber Costa authored 11 years ago

We suddenly stop propagating the exception frame down the vma_fault path.
There is no reason not to propagate it further, aside from the fact that
currently there are no users. However, aside from the fact that it presents a
more consistent frame passing, I intend to use it for the JVM balloon.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7ab5f9e8

Dec 10, 2013

Fix shared-object finalization · 4d24b90a

Nadav Har'El authored 11 years ago


This patch fixes two bugs in shared-object finalization, i.e., running
its static destructors before it is unloaded. The bugs were seen when
osv::run()ing a test program using libboost_unit_test_framework-mt.so,
which crashed after the test program finished.

The two related bugs were:

1. We need to call the module's destructors (run_fini_funcs()) *before*
   removing it from the module list, otherwise the destructors will not
   be able to call functions from this module! (we got a symbol not
   found error in the destructor).

2. We need to unload the modules needed by this module *before* unloading
   this module, not after like was (implictly) done until now.
   This makes sense because of symmetry (during a module load, the needed
   modules are loaded after this one), but also practically: a needed
   module's destructor (in our case, boost unit test framework) might refer
   to objects in the needing module (in our case, the test program),
   so we cannot call the needed module's destructor after we've already
   unloaded the needing module.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4d24b90a

sched: remove thread::_ref_counter · 9c9262b0

Avi Kivity authored 11 years ago


As ref() is now never called, we can remove the reference counter and
make unref() unconditional.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9c9262b0

sched: add a wake() function that is safe to use on a thread that may terminate · dc40b49e

Avi Kivity authored 11 years ago


One problem with wake() is, if the thread that it is waking can cuncurrently
exit, that it may touch freed memory belonging to the thread structure.

Fix by separating the state that wake() touches into a detached_state
structure, and free that using rcu.

Add a thread_handle class that references only this detached state, and
accesses it via rcu.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

dc40b49e

rcu: add preempt_lock_in_rcu · 18cee681

Avi Kivity authored 11 years ago


rcu_read_lock disables preemption, but this is an implementation detail
and users should not make use of it.

Add preempt_lock_in_rcu that takes advantage of the implementation detail
and does nothing, but allows users to explicitly disable preemption.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

18cee681

mmu: support MAP_UNINITIALIZED flag · f7249e73

Glauber Costa authored 11 years ago


When seeing this flag, pages fault in should not be filled with zeroes or any
other patterns, and should rather be just left alone in whatever state we find
them at.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f7249e73

Dec 09, 2013

mmu: don't bail out on huge page failure · eeeaf888

Glauber Costa authored 11 years ago


Addressing that FIXME, as part of my memory reclamation series. But this
is ready to go already. The goal is to retry to serve the allocation if a
huge page allocation fails, and fill the range with the 4k pages.

The simplest and most robust way I've found to do that was to propagate the
error up until we reach operate(). Being there, all we need to do is to
re-walk the range with 4k pages instead of 2Mb.

We could theoretically just bail out on huge pages and move hp_end, but,
specially when we have reclaim, it is likely that one operation will fail while
the upcoming ones may succeed.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
[ penberg: s/NULL/nullptr/ ]
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

eeeaf888

Dec 08, 2013

sched: implement pthread_detach · afcf4735

Glauber Costa authored 11 years ago


I needed to call detach in a test code of mine, and this is isn't implemented.
The code I wrote to use it may or may not stay in the end, but nevertheless,
let's implement it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

afcf4735

sched: standardize call to _cleanup · d754d662

Glauber Costa authored 11 years ago

set_cleanup is quite a complicated piece of code. It is very easy to get it to
race with other thread destruction sites, which was made abundantly clear when
we tried to implement pthread detach.

This patch tries to make it easier, by restricting how and when set_cleanup can
be called. The trick here is that currently, a thread may or may not have a
cleanup function, and through a call to set_cleanup, our decision to cleanup
may change.

From this point on, set_cleanup will only tell us *how* to cleanup. If and
when, is a decision that we will make ourselves. For instance, if a thread
is block-local, the destructor will be called by the end of the block. In
that case, the _cleanup function will be there anyhow: we'll just not call
it.

We're setting here a default cleanup function for all created threads, that
just deletes the current thread object. Anything coming from pthread will try
to override it by also deleting the pthread object. And again, it is important
to node that they will set up those cleanup function unconditionally.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

d754d662

sched: Use an integer for thread ids · 5c652796

Glauber Costa authored 11 years ago

Linux uses a 32-bit integer for pid_t, so let's do it as well. This is because
there are function in which we have to return our id back to the application.
One application is gettid, that we already have in the tree.

Theoretically, we could come up with a mapping between our 64-bit id and the
Linux one, but since we have to maintain the mapping anyway, we might as well
just use the Linux pids as our default IDs. The max size for that is 32-bit. It
is not enough if we're just allocating pids by bumping the counter, but again,
since we will have to maintain the bitmaps, 32-bit will allow us as much as 4
billion PIDs.

avi: remove unneeded #include

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

5c652796

sched: initialize clock later · 1d31d9c3

Glauber Costa authored 11 years ago

Right now we are taking a clock measure very early for cpu initialization.
That forces an unnecessary dependency between sched and clock initializations.

Since that lock is used to determine for how long the cpu has been running, we
can initialize the runtime later, when we init the idle thread. Nothing should
be running before it. After doing this, we can move the sched initialization
a bit earlier.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

1d31d9c3

Dec 06, 2013

interrupt: Allow not providing a bottom half thread for easy_register · 4d2eb1d2

Asias He authored 11 years ago


Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4d2eb1d2

loader: Move back __libc_stack_end to the loader. · 3c062cf6

Benoît Canet authored 11 years ago

The exact location of the stack end is not needed by java so move back this
variable to restore the state to what was done before the mkfs.so/cpiod.so split.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3c062cf6

Dec 05, 2013

elf: fix DPTMOD64 relocations with null symbol · ccc021b2

Avi Kivity authored 11 years ago


Some objects have DPTMOD64 relocations with the null symbol, presumably to
set the value to 0 (it is too much trouble to write zero into the file during
the link phase, apparently).  Detect this condition and write the zero.

Needed by JDK8.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ccc021b2

sched: add function to find a thread given its id · a5a3aedc

Glauber Costa authored 11 years ago


Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a5a3aedc

sched: change thread list into an unordered map · 54a0beff

Glauber Costa authored 11 years ago

A list can be slow to search for an element if we have many threads. Even under
normal load, the number of threads we span may not be classified as huge, but it
is not tiny either.

Change it to a map so we can implement functions that operate on a given thread
without that much overhead - O(1) for the common case. Note that ideally we would
use an unordered_set, that doesn't require an extra key. However, that would also
mean that the key is implicit and set to be of type key_type&. Threads are not very
lightweight to create for search purposes, so we go for a id-as-key approach.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

54a0beff

core: make osv::run return shared pointer or null and store it on loader.cc · fbb54062

Benoît Canet authored 11 years ago


This restore the original behavior of osv::run in place before the mkfs.so and
cpiod.so split committed a day ago.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

fbb54062

Dec 04, 2013

Add a few missing __*_chk functions · 2f4b8777

Nadav Har'El authored 11 years ago


When source code is compiled with -D_FORTIFY_SOURCE on Linux, various
functions are sometimes replaced by __*_chk variants (e.g., __strcpy_chk)
which can help avoid buffer overflows when the compiler knows the buffer's
size during compilation.

If we want to run source compiled on Linux with -D_FORTIFY_SOURCE (either
deliberately or unintentionally - see issue #111), we need to implement
these functions otherwise the program will crash because of a missing
symbol. We already implement a bunch of _chk functions, but we are
definitely missing some more.

This patch implements 6 more _chk functions which are needed to run
the "rogue" program (mentioned in issue #111) when compiled with
-D_FORTIFY_SOURCE=1.

Following the philosophy of our existing *_chk functions, we do not
aim for either ultimate performance or iron-clad security for our
implementation of these functions. If this becomes important, we
should revisit all our *_chk functions.

When compiled with -D_FORTIFY_SOURCE=2, rogue still doesn't work, but
not because of a missing symbol, but because it fails reading the
terminfo file for a yet unknown reason (a patch for that issue will
be sent separately).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2f4b8777

epoll: convert to a derived class of file · 008d5245
Avi Kivity authored 11 years ago
```
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
```
008d5245

Dec 03, 2013

mmu: Simplify mmu::map_file interface · c4fb37c3

Raphael S. Carvalho authored 11 years ago


Besides simplifying mmu::map_file interface, let's make it more similar
to mmu::map_anon.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c4fb37c3

loader: Allow to execute multiple .so file in sequential order. · 3352489d

Benoît Canet authored 11 years ago


A ';' at the end of a parameter mark the end of a program's arguments list.

The goal of this patch is to be able to split mkfs.so in to parts mkfs.so and
cpiod.so.

The patch uses a full spirit parser to escape "" and split commands around ';'.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3352489d

epoll: convert to make_file() · a4adc4f9

Avi Kivity authored 11 years ago


Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a4adc4f9

Dec 01, 2013

Fix crash on malformed command line · 082ff373

Nadav Har'El authored 11 years ago


Before this patch, OSv crashes or continuously reboots when given unknown
command line paramters, e.g.,

        scripts/run.py -c1 -e "--help --z a"

With this patch, it says, as expected that the "--z" option is not
recognized, and displays the list of known options:

    unrecognised option '--z'
    OSv options:
      --help                show help text
      --trace arg           tracepoints to enable
      --trace-backtrace     log backtraces in the tracepoint log
      --leak                start leak detector after boot
      --nomount             don't mount the file system
      --noshutdown          continue running after main() returns
      --env arg             set Unix-like environment variable (putenv())
      --cwd arg             set current working directory
    Aborted

The problem was that to parse the command line options, we used Boost,
which throws an exception when an unrecognized option is seen. We need
to catch this exception, and show a message accordingly.

But before this patch, C++ exceptions did not work correctly during this
stage of the boot process, because exceptions use elf::program(), and we
only set it up later. So this patch moves the setup of the elf::program()
object earlier in the boot, to the beginning of main_cont().

Now we'll be able to use C++ exceptions throughout main_cont(), not just
in command line parsing.

This patch also removes the unused "filesystem" paramter of
elf::program(), rather than move the initializion of this empty object
as well.

Fixes #103.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

082ff373

Nov 28, 2013

sched: ignore backward jumps in clock · 0f56c7d3

Nadav Har'El authored 11 years ago

Due to an unknown bug (bug in our HPET driver?), OSv crashed on our build
machine with an assertion that the clock only goes foward. The clock
jumping backward is a bad sign, but it's not really necessary to crash the
VM when it happens, assuming it only happens rarely. This patch makes the
scheduler handle time<0 just like time==0.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0f56c7d3

Nov 27, 2013

Trivial: typos in core/lfmutex.cc · d2c488b7

Nadav Har'El authored 11 years ago


Fix a couple of spelling mistakes in core/lfmutex.cc

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

d2c488b7

Nov 26, 2013

sched: New scheduler algorithm · dbc0d507

Nadav Har'El authored 11 years ago

This patch replaces the algorithm which the scheduler uses to keep track of
threads' runtime, and to choose which thread to run next and for how long.

The previous algorithm used the raw cumulative runtime of a thread as its
runtime measure. But comparing these numbers directly was impossible: e.g.,
should a thread that slept for an hour now get an hour of uninterrupted CPU
time? This resulted in a hodgepodge of heuristics which "modified" and
"fixed" the runtime. These heuristics did work quite well in our test cases,
but we were forced to add more and more unjustified heuristics and constants
to fix scheduling bugs as they were discovered. The existing scheduler was
especially problematic with thread migration (moving a thread from one CPU
to another) as the runtime measure on one CPU was meaningless in another.
This bug, if not corrected, (e.g., by the patch which I sent a month
ago) can cause crucial threads to acquire exceedingly high runtimes by
mistake, and resulted in the tst-loadbalance test using only one CPU on
a two-CPU guest.

The new scheduling algorithm follows a much more rigorous design,
proposed by Avi Kivity in:
https://docs.google.com/document/d/1W7KCxOxP-1Fy5EyF2lbJGE2WuKmu5v0suYqoHas1jRM/edit?usp=sharing

To make a long story short (read the document if you want all the
details), the new algorithm is based on a runtime measure R which
is the running decaying average of the thread's running time.
It is a decaying average in the sense that the thread's act of running or
sleeping in recent history is given more weight than its behavior
a long time ago. This measure R can tell us which of the runnable
threads to run next (the one with the lowest R), and using some
highschool-level mathematics, we can calculate for how long to run
this thread until it should be preempted by the next one. R carries
the same meaning on all CPUs, so CPU migration becomes trivial.

The actual implementation uses a normalized version of R, called R''
(Rtt in the code), which is also explained in detail in the document.
This Rtt allows updating just the running thread's runtime - not all
threads' runtime - as time passes, making the whole calculation much
more tractable.

The benefits of the new scheduler code over the existing one are:

1. A more rigourous design with fewer unjustified heuristics.

2. A thread's runtime measurement correctly survives a migration to a
different CPU, unlike the existing code (which sometimes botches
it up, leading to threads hanging). In particular, tst-loadbalance
now gives good results for the "intermittent thread" test, unlike
the previous code which in 50% of the runs caused one CPU to be
completely wasted (when the load- balancing thread hung).

3. The new algorithm can look at a much longer runtime history than the
previous algorithm did. With the default tau=200ms, the one-cpu
intermittent thread test of tst-scheduler now provides good
fairness for sleep durations of 1ms-32ms.
The previous algorithm was never fair in any of those tests.

4. The new algorithm is more deterministic in its use of timers
(with thyst=2_ms: up to 500 timers a second), resulting in less
varied performance in high-context-switch benchmarks like tst-ctxsw.

This scheduler does very well on the fairness tests tst-scheduler and
fairly well on tst-loadbalance. Even better performance on that second
test will require an additional patch for the idle thread to wake other
cpus' load balanacing threads.

As expected the new scheduler is somewhat slower than the existing one
(as we now do some relatively complex calculations instead of trivial
integer operations), but thanks to using approximations when possible
and to various other optimizations, the difference is relatively small:

On my laptop, tst-ctxsw.so, which measures "context switch" time (actually,
also including the time to use mutex and condvar which this test uses to
cause context switching), on the "colocated" test I measured 355 ns with
the old scheduler, and 382 ns with the new scheduler - meaning that the
new scheduler adds 27ns of overhead to every context switch. To see that
this penalty is minor, consider that tst-ctxsw is an extreme example,
doing 3 million context switches a second, and even there it only slows
down the workload by 7%.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

dbc0d507

sched: No need for "yield" parameter of schedule() · e1722351

Nadav Har'El authored 11 years ago


The schedule() and cpu::schedule() functions had a "yield" parameter.
This parameter was inconsistently used (it's not clear why specific
places called it with "true" and other with "false"), but moreover, was
always ignored!

So this patch removes the parameter of schedule(). If you really want
a yield, call yield(), not schedule().

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e1722351

sched: Use schedule(), not yield() in idle thread · da583f27

Nadav Har'El authored 11 years ago


The idle thread cpu::idle() waits for other threads to become runnable,
and then lets them run. It used to yield the CPU by calling yield(),
because in early OSv history we didn't have an idle priority so simply
calling schedule() would not guarantee that the new thread, not the idle
thread, will run.

But now we actually do have an idle priority; If the run queue is not
empty, we are sure that calling schedule() will run another thread,
not the idle thread. So this patch calls schedule(), which is simpler,
faster, and more reliable than yield().

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

da583f27

sched: Don't change runtime of a queued thread · e60ebaf3

Nadav Har'El authored 11 years ago


The scheduler (reschedule_from_interrupt()) changes the runtime of the
current thread. This assumes that the current thread is not in the
runqueue - because the runqueue is sorted by runtime, and modifying the
runtime of a thread which is already in the runqueue ruins the sorted
tree's invariants.

Unfortunately, the existing code broke this assumption in two places:

1.  When handle_incoming_wakeups() wakes up the current thread (i.e., a
thread that prepared to wait but was woken before it could go to sleep),
the current thread was queued. We need to instead to simply return
the thread to the "running" state.

2.  yield() queued the current thread. Rather, it needs to just change
its runtime, and reschedule_from_interrupt() will decide to queue this
thread.

This patch fixes the first problem. The second problem will be solved
by a yield() rewrite which is part of the new scheduler in a later
patch.

By the way, after we fix both problems, we can also be sure that the
strange if(n != thread::current()) in the scheduler is always true.
This is because n, picked up from the run queue, could never be the
current thread, because the current thread isn't in the run queue.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e60ebaf3

Nov 25, 2013

mmu: MAP_POPULATE support for anon mmap() · fb24dc3e

Pekka Enberg authored 11 years ago


Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

fb24dc3e

mmu: Anonymous memory demand paging · c1d5fccb

Pekka Enberg authored 11 years ago


Switch to demand paging for anonymous virtual memory.

I used SPECjvm2008 to verify performance impact. The numbers are mostly
the same with few exceptions, most visible in the 'serial' benchmark.
However, there's quite a lot of variance between SPECjvm2008 runs so I
wouldn't read too much into them.

As we need the demand paging mechanism and the performance numbers
suggest that the implementation is reasonable, I'd merge the patch as-is
and see optimize it later.

  Before:

    Running specJVM2008 benchmarks on an OSV guest.
    Score on compiler.compiler: 331.23 ops/m
    Score on compiler.sunflow: 131.87 ops/m
    Score on compress: 118.33 ops/m
    Score on crypto.aes: 41.34 ops/m
    Score on crypto.rsa: 204.12 ops/m
    Score on crypto.signverify: 196.49 ops/m
    Score on derby: 170.12 ops/m
    Score on mpegaudio: 70.37 ops/m
    Score on scimark.fft.large: 36.68 ops/m
    Score on scimark.lu.large: 13.43 ops/m
    Score on scimark.sor.large: 22.29 ops/m
    Score on scimark.sparse.large: 29.35 ops/m
    Score on scimark.fft.small: 195.19 ops/m
    Score on scimark.lu.small: 233.95 ops/m
    Score on scimark.sor.small: 90.86 ops/m
    Score on scimark.sparse.small: 64.11 ops/m
    Score on scimark.monte_carlo: 145.44 ops/m
    Score on serial: 94.95 ops/m
    Score on sunflow: 73.24 ops/m
    Score on xml.transform: 207.82 ops/m
    Score on xml.validation: 343.59 ops/m

  After:

    Score on compiler.compiler: 346.78 ops/m
    Score on compiler.sunflow: 132.58 ops/m
    Score on compress: 116.05 ops/m
    Score on crypto.aes: 40.26 ops/m
    Score on crypto.rsa: 206.67 ops/m
    Score on crypto.signverify: 194.47 ops/m
    Score on derby: 175.22 ops/m
    Score on mpegaudio: 76.18 ops/m
    Score on scimark.fft.large: 34.34 ops/m
    Score on scimark.lu.large: 15.00 ops/m
    Score on scimark.sor.large: 24.80 ops/m
    Score on scimark.sparse.large: 33.10 ops/m
    Score on scimark.fft.small: 168.67 ops/m
    Score on scimark.lu.small: 236.14 ops/m
    Score on scimark.sor.small: 110.77 ops/m
    Score on scimark.sparse.small: 121.29 ops/m
    Score on scimark.monte_carlo: 146.03 ops/m
    Score on serial: 87.03 ops/m
    Score on sunflow: 77.33 ops/m
    Score on xml.transform: 205.73 ops/m
    Score on xml.validation: 351.97 ops/m

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c1d5fccb

mmu: Optimistic locking in populate() · 7e568ba0

Pekka Enberg authored 11 years ago


Use optimistic locking in populate() to make it robust against
concurrent page faults.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7e568ba0

mmu: VMA permission flags · 8a56dc8c

Pekka Enberg authored 11 years ago


Add permission flags to VMAs. They will be used by mprotect() and the
page fault handler.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8a56dc8c

sched: fix iteration across timer list · 9c3308f1

Avi Kivity authored 11 years ago


We iterate over the timer list using an iterator, but the timer list can
change during iteration due to timers being re-inserted.

Switch to just looking at the head of the list instead, maintaining no
state across loop iterations.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Tested-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9c3308f1

sched: prevent a re-armed timer from being ignored · 870d8410

Avi Kivity authored 11 years ago


When a hardware timer fires, we walk over the timer list, expiring timers
and erasing them from the list.

This is all well and good, except that a timer may rearm itself in its
callback (this only holds for timer_base clients, not sched::timer, which
consumes its own callback).  If it does, we end up erasing it even though
it wants to be triggered.

Fix by checking for the armed state before erasing.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Tested-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

870d8410

Fix possible deadlock in condvar · 15a32ac8

Nadav Har'El authored 11 years ago

When a condvar's timeout and wakeup race, we wait for the concurrent
wakeup to complete, so it won't crash. We did this wr.wait() with
the condvar's internal mutex (m) locked, which was fine when this code
was written; But now that we have wait morphing, wr.wait() waits not
just for the wakeup to complete, but also for the user_mutex to become
available. With m locked and us waiting for user_mutex, we're now in
deadlock territory - because a common idiom of using a condvar is to
do the locks in opposite order: lock user_mutex first and then use the
condvar, which locks m.

I can't think of an easy way to actually demonstrate this deadlock,
short of having a locked condvar_wait timeout racing with condvar_wake_one
racing and then an additional locked condvar operation coming in
concurrently, but I don't have a test case demonstrating this.
I am hoping it will fix the lockups that Pekka is seeing in his
Cassandra tests (which are the reason I looked for possible condvar
deadlocks in the first place).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Tested-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

15a32ac8