Commits · 4513537e3b1bad7bc232385868d56fa89b6421f6 · Verlässliche Systemsoftware / projects / osv

Jan 27, 2014

clock: condvar::wait with a time point · 0f5992ca

Nadav Har'El authored 11 years ago


Replace the old function condvar::wait(mutex*, uint64_t) with one taking
a timepoint. This timepoint can use any clock which the timer supports,
namely osv::clock::uptime or osv::clock::wall (as usual, wall-clock timers
are not recommended, and are converted to an uptime timer at the point
of instantiation).

Leave a C-only function condvar_wait(convar*, mutex*, s64) but comment on
what it takes.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0f5992ca

clock: Drop nanotime() function · 770e21f1

Nadav Har'El authored 11 years ago


Drop the nanotime() function.

Change the few remaining callers to use the appropriate osv::clock or
std::chrono replacements.

We already got rid in previous patches of most references to nanotime()
by switching from absolute times to relative times.

The direct equivalent of the old nanotime() function, where we actually
need the number of nanoseconds since the UNIX epoch, is the rather
verbose expression osv::clock::wall::now().time_since_epoch().count(),
or the shorter clock::get()->time().

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

770e21f1

clock: Remove old type-less time literals · 74a57376

Nadav Har'El authored 11 years ago


Drop the s64 literals _ms, _ns, etc., from <drivers/clock.hh>.
Fix a few places which still use the old literals.

The std:chrono::duration version from <osv/clock.hh> remains -
but remember you need to "using namespace osv::clock::literals"
to use them.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

74a57376

clock: Drop sched::thread::sleep_until() · b3bfe5fa

Nadav Har'El authored 11 years ago


Delete the sched::thread::sleep_until() function. All users of this
function actually wanted a relative time, not absolute time, and can
use the simpler new sched::thread::sleep() instead.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b3bfe5fa

clock: Use monotonic clock in the thread scheduler · d73fe79c

Nadav Har'El authored 11 years ago


Switch the thread scheduler from using the s64 type for durations and
the wall time, to the osv::clock::uptime::duration type (which is
std::chrono::nanoseconds) and monotonic clock.

Also, now that the per-thread CPU-time clock (thread::thread_clock())
returns an std::chrono::duration instead of s64, we no longer need the
fill_ts(s64) variant in libc/time.cc (if we leave it unused, we'll get a
compilation warning).

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d73fe79c

clock: base timers on monotonic clock · 477b9427

Nadav Har'El authored 11 years ago

sched::timer_base, used to implement all forms of timeouts in OSv, is
currently based on the wall-clock time in s64 form. This is problematic
for two reasons:

1. The type s64 doesn't say which units should be used, whether the time
is absolute or relative or what is its epoch.

2. wall-clock time is a bad choice for short-term timers: If a thread
intends to sleep for a millisecond, and the wall-clock goes back a minute,
the thread will end up sleeping a whole minute.

So this patch changes the basis of sched::timer_base to strongly-typed
time points from a monotonic clock, osv::clock::uptime::time_point.

We also allow setting timers using the wall-clock time, but with a big
caveat: The expiration time is converted from wall to uptime clocks at
the moment of timer_base::set(), so if the wall-clock is adjusted later,
the wall time at expiration may not be exactly the one intended.

So that we don't have to change all timer users in this one patch, we
also temporarily implement the old weakly-typed timer_base::set(s64)
using the new mechanism. This variant will be removed in a later patch
in this series, when it is no longer used.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

477b9427

clock: Avoid #include <drivers/clockevent.hh> · e1c4aa82

Nadav Har'El authored 11 years ago


Several source files include <drivers/clockevent.hh>, though this is a
very low-level feature which they don't actually use.

sched.cc does use <drivers/clockevent.hh>, but already gets it through
sched.hh, so also doesn't need to include it explicitly.

This patch removes the unnecessary includes.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e1c4aa82

Jan 24, 2014

balloon: fix the double move problem · 4eb7d9c2

Glauber Costa authored 11 years ago

As we have recently discovered, some parallel GCs will move an object to two
different locations at times, and later on decide on which one to use. This
breaks our implementation if the final object is the second one to be copied,
because by then the original region is already mapped - so we won't fault, and
the unmapped region will not be the actual balloon, so we will have a bogus
fault

The core of this solution is to keep all the regions unmapped. Because they had
only garbage before, we know Java shouldn't read anything from it before it
writes something new. And when it does that, we declare that to be no longer a
balloon.

Movement is then split in two phases: the normal phase, and the finish phase.
In the finish phase we will remove the old VMA and create the new VMA again,
with heap characteristics.

Special care needs to be taken when "conciliating" the array: because we use
the difference between first faulting address and original array address to
calculate how many bytes we are skipping, we need to store that information
somewhere. We're using an unordered_map (hash) for that. We'll keep track of
all in-flight ballooned regions and hold the original address of the array.

When we detect movement *from* that region, we know it is the new location
and update the balloon object with the new address.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4eb7d9c2

balloon: store flags from original vma when ballooning · 95c644dc

Glauber Costa authored 11 years ago

To avoid using hard code values for the original anonymous vma that mapped the
region before we ballooned, let's store the original flags in the JVM vma. This
patch does not yet use it, but only lays down the infrastructure. User will
come in the next patch.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

95c644dc

mmu: give the jvm balloon write permissions · 54b1d4dc

Glauber Costa authored 11 years ago


Up until now the JVM balloon explicitly forbade writes to its range, because
we didn't expect the JVM to ever write to it. But with the recently problem
Gleb discovered of double-copying of objects inside the Heap, we will use
writes to figure out when that object is no longer a balloon. But still,
we need to be able to go back to the JVM specific fault handler for that.

Therefore, we need write permissions in the VMA itself.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
[ penberg: use perm_rw as suggested by gleb ]
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

54b1d4dc

Jan 23, 2014

mempool: ensure early_alloc_page errors are visible · 9e09145f

Claudio Fontana authored 11 years ago


calling debug() and then abort() buffers a message that never
has a chance to reach the console early on;
pass the message to display directly to abort() instead.

Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9e09145f

mmu: Fix signed and unsigned integer comparison error · 8d4a1806

Zhi Yong Wu authored 11 years ago


  CC bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.o
  CXX core/mmu.o
/home/zwu/osv/core/mmu.cc: In instantiation of ‘void mmu::map_level<PageOp, ParentLevel>::operator()(mmu::hw_ptep, uintptr_t, uintptr_t) [with PageOp = mmu::virt_to_phys_map; int ParentLevel = 4; uintptr_t = long unsigned int]’:
/home/zwu/osv/core/mmu.cc:323:5:   required from ‘void mmu::map_range(uintptr_t, size_t, PageOp&, size_t) [with PageOp = mmu::virt_to_phys_map; uintptr_t = long unsigned int; size_t = long unsigned int]’
/home/zwu/osv/core/mmu.cc:623:43:   required from here
/home/zwu/osv/core/mmu.cc:383:13: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
...

Signed-off-by: Zhi Yong Wu <zwu.kernel@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8d4a1806

Jan 22, 2014

mmu: correctly calculate huge page pte address in unpopulate · f8c2ce68

Gleb Natapov authored 11 years ago


Signed-off-by: Gleb Natapov <gleb@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f8c2ce68

loader: Add support for "&" in command line · 995fe31a

Nadav Har'El authored 11 years ago


Our loader's command line (what is given to the "-e" option of run.py)
already allows running multiple commands (each a shared object with
arguments) separated with a semicolon - e.g.,

    run.py -e "program1.so; program2.so; program3.so"

This patch allows, just like in Unix, to use a "&" instead of a ";",
in which case the preceding program is run in the background, in our case
this means in a new thread.

For example,
    run.py -e "httpserver.so& java.so ..."

As before a command line can constitute multiple commands, and whitespaces
around the separators (; or &) are optional.

Take care if you intend to run the *same* object multiple times concurrently,
e.g., "something.so& something.so". For an object to support this use case,
it should support its main() being called in parallel, and in particular
avoid using global variables.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

995fe31a

timers: add assertion · e1b59973

Nadav Har'El authored 11 years ago


Issue #178, and recent netperf 3 experiments by Vlad, suggests we have a
bug in our timer code which very rarely causes crashes on
timer_list::fired(). It appears we somehow corrupt our per-cpu timer list,
or some armed timer object - but I still haven't been able to figure out
where.

This patch adds an assertion that the timer we find on the list is
actually armed. Unfortunately, it consistantly fails (I can see the same
assertion failure once every 100-200 runs of tst-queue-mpsc.so), and this
failure seems to replace the page-fault crash of issue #178.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e1b59973

include: Move execinfo.h to include/osv · 9a451ec7
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
9a451ec7

include: Merge <debug.h> with <osv/debug.h> · 6275a371

Pekka Enberg authored 11 years ago


Move definitions from <debug.h> to <osv/debug.h> and update includes to
use the latter.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

6275a371

include: Move debug.hh to include/osv · 7809519b
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
7809519b
include: Move mempool.hh to include/osv · 9c95f49d
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
9c95f49d
include: Move irqlock.hh to include/osv · b42b49ad
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
b42b49ad
include: Move preempt-lock.hh to include/osv · 5e374b7f
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
5e374b7f
include: Move prio.hh to include/osv · 5bb3e7b4
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
5bb3e7b4
include: Move ilog2.hh to include/osv · d8df3fd1
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
d8df3fd1
include: Move alloctracker.hh to include/osv · 93e5e338
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
93e5e338
include: Move mmio.hh to include/osv · 078d4732
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
078d4732
include: Move dhcp.hh to include/osv · f880005c
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
f880005c
include: Move elf.hh to include/osv · b8034e34
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
b8034e34
include: Move commands.hh to include/osv · 86110819
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
86110819
include: Move barrier.hh to include/osv · c80be886
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
c80be886
include: Move mmu.hh to include/osv · 9cb900b7
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
9cb900b7
include: Move interrupt.hh to include/osv · d7cc6216
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
d7cc6216
include: Move align.hh to include/osv · 4473f2ca
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
4473f2ca
include: Move sched.hh to include/osv · fae5693e
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
fae5693e

Jan 21, 2014

DHCP: Repeat DHCP discovery on timeout · d241a2c0

Dmitry Fleytman authored 11 years ago


It is a bad practice to have DHCP discovery without timeout
and retries. In case discovery packet gets lost boot stucks.

Beside this there is an interesting phenomena on some systems.
A few first DHCP discovery packets sent on boot get lost in some cases.

This started to happen from time to time on my KVM system and almost
every time on my Xen system after installing recent Fedora Core updates.
Packet leaves VM's interface but never arrives to bridge interface.
The packet itself built properly and arrives to DHCP server just fine
after a few retransmissions.

Most probably this phenomena is a bug (or limitation) in the current
Linux bridge version so this patch is actually a work-around, but
since in general case it is a good idea to have DHCP timeouts/retries
it worth to have it anyway.

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d241a2c0

Jan 20, 2014

core: add waitqueue support · f559807e

Avi Kivity authored 11 years ago


A waitqueue is an object on which multiple threads can wait; other threads
can wake up either one or all waiting threads.  A waitqueue is associated
with an external mutex which the user must supply for both wait and wake
operations.

Waitqueues differ from condition variables in three respects:
- waitqueues do not contain an internal mutex.  This makes them smaller, and
  reduces lock acquisitions.  On the other hand the waker must hold the
  associated mutex, whereas this is not required with condition variables.
- waitqueues support sched::thread::wait_for()

waitqueues support wait morphing and do not cause excess lock contention,
even with wake_all().

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

f559807e

sched: wake_lock() · acd36f2d

Avi Kivity authored 11 years ago


This adds a facility to wake a thread, but with the intention that it will
acquire a certain lock after waking, and while the waker holds the lock.
This is implemented using the regular wait morphing code (send_lock() and
receive_lock()), but with additional mutual exclusion to allow regular
wake()s in parallel.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

acd36f2d

posix_memalign: Remove the extra check on size · d0c473d4

Vlad Zolotarov authored 11 years ago


Remove the extra check on size just like the remark above implies.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d0c473d4

Jan 19, 2014

elf: fix object::lookup_addr to lookup correct symbol · 38566444

Takuya ASADA authored 11 years ago


Fix object::lookup_addr to lookup correct symbol.
It should returns the nearest symbol which is s_addr < addr, but it
compares opposite way.

Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

38566444

Jan 17, 2014

DHCP: Support MTU option · 69bf74a7

Dmitry Fleytman authored 11 years ago

This patch introduces support for MTU option as described in
RFC2132, chapter 5.1. Interface MTU Option

Amazon EC2 networking uses this option in some cases and it gives
throughput improvement of about 250% on big instances with 10G networking.

Netperf results for hi1.4xlarge instances, TCP_MAERTS test, OSv runs netserver:

Send buffer size Throughput w/ patch (Mbps) Throughput w/o patch (Mbps) Improvement (%)

32 4912.29 1386.28 254
64 4832.01 1385.99 249
128 4835.09 1401.46 245
256 4746.41 1382.28 243
512 4849.04 1375.23 253
1024 4631.8 1356.69 241
2048 4859.59 1371.92 254
4096 4864.99 1383.67 252
8192 4627.07 1364.05 239
16384 4868.73 1366.48 256
32768 4822.69 1366.63 253
65536 4837.67 1353.87 257

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

69bf74a7

mmu: procfs support · b01a5444

Pekka Enberg authored 11 years ago


Add procfs_maps() function to core/mmu.cc that returns all the VMAs
formatted for Linux compatible "/proc/<pid>/maps" file.

This will be called by the procfs filesystem.

Limitations:

  * Shared mappings are not identified as such.
  * File-backed mmap offset, device, inode, and pathname are not
    reported.
  * Special region names such as [heap] and [stack] are not reported.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b01a5444