Commits · 0ddd6ef158602c4cff0bf2911a744a7f2214957a · Verlässliche Systemsoftware / projects / osv

Dec 08, 2013

sched: initialize clock later · 1d31d9c3

Glauber Costa authored 11 years ago

Right now we are taking a clock measure very early for cpu initialization.
That forces an unnecessary dependency between sched and clock initializations.

Since that lock is used to determine for how long the cpu has been running, we
can initialize the runtime later, when we init the idle thread. Nothing should
be running before it. After doing this, we can move the sched initialization
a bit earlier.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

1d31d9c3

vfs: Fix duplicate in-memory vnodes · e4aad1ba

Raphael S. Carvalho authored 11 years ago


Currently, namei() does vget() unconditionally if no dentry is found.
This is wrong because the path can be a hard link that points to a vnode
that's already in memory.

To fix the problem:

  - Use inode number as part of the hash in vget()

  - Use vn_lookup() in vget() to make sure we have one vnode in memory
    per inode number.

  - Push the vget() calls down to individual filesystems and make
    VOP_LOOKUP return an vnode

  - Drop lock in vn_lookup() and assert that vnode_lock is held.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e4aad1ba

Dec 05, 2013

net: fix socket ioctl() bypassing Linux adjustments · 6b071736

Avi Kivity authored 11 years ago


Prior to 65ccda4c (net: use a file derived class for sockets
 (socket_file)), ioctl()s for socket were directed to linux_ioctl_socket()
and thence to soo_ioctl().  However that commit short-circuited
linux_ioctl_socket() out and dipatched directly to what was previously
known as soo_ioctl() (and became socket_file::ioctl()).  The caused
interface enumeration ioctl()s to fail, for example in Cassandra.

Fix by bringing back the previous behaviour.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

6b071736

sched: add function to find a thread given its id · a5a3aedc

Glauber Costa authored 11 years ago


Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a5a3aedc

sched: change thread list into an unordered map · 54a0beff

Glauber Costa authored 11 years ago

A list can be slow to search for an element if we have many threads. Even under
normal load, the number of threads we span may not be classified as huge, but it
is not tiny either.

Change it to a map so we can implement functions that operate on a given thread
without that much overhead - O(1) for the common case. Note that ideally we would
use an unordered_set, that doesn't require an extra key. However, that would also
mean that the key is implicit and set to be of type key_type&. Threads are not very
lightweight to create for search purposes, so we go for a id-as-key approach.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

54a0beff

sched: remove on_thread_stack · 9bd939f8

Glauber Costa authored 11 years ago


no users in tree.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9bd939f8

core: make osv::run return shared pointer or null and store it on loader.cc · fbb54062

Benoît Canet authored 11 years ago


This restore the original behavior of osv::run in place before the mkfs.so and
cpiod.so split committed a day ago.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

fbb54062

Dec 04, 2013

file: remove fileops · c67f9ebf

Avi Kivity authored 11 years ago


Everyone is now overriding file's virtual functions; we can make them
pure virtual and remove fileops completely.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

c67f9ebf

file: remove badfileops · 57741446
Avi Kivity authored 11 years ago
```
Unused.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
```
57741446
vfs: convert vfs files to be derived from class file · b086d996
Avi Kivity authored 11 years ago
```
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
```
b086d996
net: use a file derived class for sockets (socket_file) · 65ccda4c
Avi Kivity authored 11 years ago
```
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
```
65ccda4c
initialize: add a way to initialize an array, similar to C99 designated initializers · 6338e1d3
Avi Kivity authored 11 years ago
```
Useful for C -> C++ conversions.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
```
6338e1d3

file: make 'opaque' optional · 4eee5123

Avi Kivity authored 11 years ago


Not everyone wants it.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

4eee5123

file: make fo_init() optional · ff6fa58e

Avi Kivity authored 11 years ago


Derived file objects will be initialized by the class constructor, no need
for fo_init().

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

ff6fa58e

Dec 03, 2013

mmu: Simplify mmu::map_file interface · c4fb37c3

Raphael S. Carvalho authored 11 years ago


Besides simplifying mmu::map_file interface, let's make it more similar
to mmu::map_anon.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c4fb37c3

loader: Allow to execute multiple .so file in sequential order. · 3352489d

Benoît Canet authored 11 years ago


A ';' at the end of a parameter mark the end of a program's arguments list.

The goal of this patch is to be able to split mkfs.so in to parts mkfs.so and
cpiod.so.

The patch uses a full spirit parser to escape "" and split commands around ';'.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3352489d

trace: infer storage/runtime args from assign() signature · b365e314

Gleb Natapov authored 11 years ago


storage/runtime arguments for tracepoint can be inferred from
assign() function signature instead of specified explicitly
by storage_args/runtime_args. This makes boilerplate code smaller.

Signed-off-by: Gleb Natapov <gleb@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b365e314

file: add virtual functions corresponding to f_ops · cc1e31d1

Avi Kivity authored 11 years ago


The default is to dispatch directly to the corresponding member of f_ops,
but that can be overridden.

The fo_*() functions are redirected to dispatch via the virtual functions.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

cc1e31d1

vfs: MNT_FORCE unmount · 66658a6b

Pekka Enberg authored 11 years ago


Add sys_umount2() and implement support for MNT_FORCE that will be used
to force rootfs unmount at poweroff.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

66658a6b

file: convert finit() to file's constructor · 54f41759
Avi Kivity authored 11 years ago
```
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
```
54f41759

file: don't initialize with memset() · cabd6f1e

Avi Kivity authored 11 years ago


If we're going to have a vtable in there, the memset() will kill it.

Instead, add initializers for those members not already initialized by
make_file().

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

cabd6f1e

file: drop file_makebad() · 69daf614

Avi Kivity authored 11 years ago


The only caller, soo_close() can only be called from a context where
no file references remain, so no further file API calls can be made.

Remove it.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

69daf614

file: drop public declarations of falloc_noinstall() and finit() · adc3da8e

Avi Kivity authored 11 years ago


Subsumed by make_file().

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

adc3da8e

vfs: drop falloc() · 3b6eef19

Avi Kivity authored 11 years ago

falloc() is inherently racy in that it installs an uninitialized file
descriptor in a user accessible fd. It is also hard to use correctly when
an error occurs. Luckily, we don't use it anywhere, so we can just remove it.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

3b6eef19

Dec 01, 2013

devfs: device_destroy() API · 765c9afc

Pekka Enberg authored 11 years ago


Enable device_destroy() API for the virtio-rng driver.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

765c9afc

Fix crash on malformed command line · 082ff373

Nadav Har'El authored 11 years ago


Before this patch, OSv crashes or continuously reboots when given unknown
command line paramters, e.g.,

        scripts/run.py -c1 -e "--help --z a"

With this patch, it says, as expected that the "--z" option is not
recognized, and displays the list of known options:

    unrecognised option '--z'
    OSv options:
      --help                show help text
      --trace arg           tracepoints to enable
      --trace-backtrace     log backtraces in the tracepoint log
      --leak                start leak detector after boot
      --nomount             don't mount the file system
      --noshutdown          continue running after main() returns
      --env arg             set Unix-like environment variable (putenv())
      --cwd arg             set current working directory
    Aborted

The problem was that to parse the command line options, we used Boost,
which throws an exception when an unrecognized option is seen. We need
to catch this exception, and show a message accordingly.

But before this patch, C++ exceptions did not work correctly during this
stage of the boot process, because exceptions use elf::program(), and we
only set it up later. So this patch moves the setup of the elf::program()
object earlier in the boot, to the beginning of main_cont().

Now we'll be able to use C++ exceptions throughout main_cont(), not just
in command line parsing.

This patch also removes the unused "filesystem" paramter of
elf::program(), rather than move the initializion of this empty object
as well.

Fixes #103.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

082ff373

file: make it a C++ type · f1e6ba49

Avi Kivity authored 11 years ago


Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f1e6ba49

file: add helpers for accessing fields · c75c2985

Avi Kivity authored 11 years ago


Needed for C++ conversion.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c75c2985

file: move file operations out-of-line · 11540a67

Avi Kivity authored 11 years ago


In preparation for making 'file' a C++ type.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

11540a67

Add helper for converting designated initializers to C++ · e5753f2b

Avi Kivity authored 11 years ago


Unfortunately, C++ does not support designated initializers.  Add a function
that helps fill their place.

Use example:

-static struct netisr_handler ether_nh = {
-       .nh_name = "ether",
-       .nh_handler = ether_nh_input,
-       .nh_proto = NETISR_ETHER,
-       .nh_policy = NETISR_POLICY_SOURCE,
-       .nh_dispatch = NETISR_DISPATCH_DIRECT,
-};
+static netisr_handler ether_nh = initialize_with([] (netisr_handler& x) {
+       x.nh_name = "ether";
+       x.nh_handler = ether_nh_input;
+       x.nh_proto = NETISR_ETHER;
+       x.nh_policy = NETISR_POLICY_SOURCE;
+       x.nh_dispatch = NETISR_DISPATCH_DIRECT;
+});

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e5753f2b

Nov 26, 2013

libc: implement the GNU variant of strerror_r() · 7053ac3a

Avi Kivity authored 11 years ago


We previously had the POSIX variant only.  Implement the GNU variant as well,
and update the header to point to the correct function based on the dialect
selected.

The POSIX variant is renamed __xpg_strerror_r() to conform to the ABI
standards.

This fixes calls to strerror_r() from binaries which were compiled with
_GNU_SOURCE (libboost_system.a) but preserves the correct behaviour for
BSD derived source.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7053ac3a

sched: Doxygen documentation of a bit of the scheduler · 6f825816

Nadav Har'El authored 11 years ago


Started adding Doxygen documentation for the scheduler. Currently
only set_priority() and priority() are documented.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

6f825816

sched: New scheduler algorithm · dbc0d507

Nadav Har'El authored 11 years ago

This patch replaces the algorithm which the scheduler uses to keep track of
threads' runtime, and to choose which thread to run next and for how long.

The previous algorithm used the raw cumulative runtime of a thread as its
runtime measure. But comparing these numbers directly was impossible: e.g.,
should a thread that slept for an hour now get an hour of uninterrupted CPU
time? This resulted in a hodgepodge of heuristics which "modified" and
"fixed" the runtime. These heuristics did work quite well in our test cases,
but we were forced to add more and more unjustified heuristics and constants
to fix scheduling bugs as they were discovered. The existing scheduler was
especially problematic with thread migration (moving a thread from one CPU
to another) as the runtime measure on one CPU was meaningless in another.
This bug, if not corrected, (e.g., by the patch which I sent a month
ago) can cause crucial threads to acquire exceedingly high runtimes by
mistake, and resulted in the tst-loadbalance test using only one CPU on
a two-CPU guest.

The new scheduling algorithm follows a much more rigorous design,
proposed by Avi Kivity in:
https://docs.google.com/document/d/1W7KCxOxP-1Fy5EyF2lbJGE2WuKmu5v0suYqoHas1jRM/edit?usp=sharing

To make a long story short (read the document if you want all the
details), the new algorithm is based on a runtime measure R which
is the running decaying average of the thread's running time.
It is a decaying average in the sense that the thread's act of running or
sleeping in recent history is given more weight than its behavior
a long time ago. This measure R can tell us which of the runnable
threads to run next (the one with the lowest R), and using some
highschool-level mathematics, we can calculate for how long to run
this thread until it should be preempted by the next one. R carries
the same meaning on all CPUs, so CPU migration becomes trivial.

The actual implementation uses a normalized version of R, called R''
(Rtt in the code), which is also explained in detail in the document.
This Rtt allows updating just the running thread's runtime - not all
threads' runtime - as time passes, making the whole calculation much
more tractable.

The benefits of the new scheduler code over the existing one are:

1. A more rigourous design with fewer unjustified heuristics.

2. A thread's runtime measurement correctly survives a migration to a
different CPU, unlike the existing code (which sometimes botches
it up, leading to threads hanging). In particular, tst-loadbalance
now gives good results for the "intermittent thread" test, unlike
the previous code which in 50% of the runs caused one CPU to be
completely wasted (when the load- balancing thread hung).

3. The new algorithm can look at a much longer runtime history than the
previous algorithm did. With the default tau=200ms, the one-cpu
intermittent thread test of tst-scheduler now provides good
fairness for sleep durations of 1ms-32ms.
The previous algorithm was never fair in any of those tests.

4. The new algorithm is more deterministic in its use of timers
(with thyst=2_ms: up to 500 timers a second), resulting in less
varied performance in high-context-switch benchmarks like tst-ctxsw.

This scheduler does very well on the fairness tests tst-scheduler and
fairly well on tst-loadbalance. Even better performance on that second
test will require an additional patch for the idle thread to wake other
cpus' load balanacing threads.

As expected the new scheduler is somewhat slower than the existing one
(as we now do some relatively complex calculations instead of trivial
integer operations), but thanks to using approximations when possible
and to various other optimizations, the difference is relatively small:

On my laptop, tst-ctxsw.so, which measures "context switch" time (actually,
also including the time to use mutex and condvar which this test uses to
cause context switching), on the "colocated" test I measured 355 ns with
the old scheduler, and 382 ns with the new scheduler - meaning that the
new scheduler adds 27ns of overhead to every context switch. To see that
this penalty is minor, consider that tst-ctxsw is an extreme example,
doing 3 million context switches a second, and even there it only slows
down the workload by 7%.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

dbc0d507

sched: No need for "yield" parameter of schedule() · e1722351

Nadav Har'El authored 11 years ago


The schedule() and cpu::schedule() functions had a "yield" parameter.
This parameter was inconsistently used (it's not clear why specific
places called it with "true" and other with "false"), but moreover, was
always ignored!

So this patch removes the parameter of schedule(). If you really want
a yield, call yield(), not schedule().

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e1722351

vfs: Add the utimes system call · 832bba6e

Raphael S. Carvalho authored 11 years ago


v2: Check limit of microseconds, among other minor changes (Nadav Har'El, Avi Kivity).
v3: Get rid of goto & label by adding an else clause (Nadav Har'El).

- This patch adds utimes support.
- This patch addresses the issue #93

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Tested-by: Tomasz Grabiec <tgrabiec@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

832bba6e

vfs: Unify attribute flags into a common place · 1519d3d1

Raphael S. Carvalho authored 11 years ago


Attribute flags were moved from 'bsd/sys/cddl/compat/opensolaris/sys/vnode.h'
to 'include/osv/vnode_attr.h'

'bsd/sys/cddl/compat/opensolaris/sys/vnode.h' now includes 'include/osv/vnode_attr.h'
exactly at the place the flags were previously located.

'fs/vfs/vfs.h' includes 'include/osv/vnode_attr.h' as functions that rely on the setattr
feature must specify the flags respective to the attr fields that are going to be changed.

Approach sugested by Nadav Har'El

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Tested-by: Tomasz Grabiec <tgrabiec@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

1519d3d1

Warn about incorrect use of percpu<> / PERCPU(..). · 8add1b91

Nadav Har'El authored 11 years ago


This patch causes incorrect usage of percpu<>/PERCPU() to cause
compilation errors instead of silent runtime corruptions.

Thanks to Dmitry for first noticing this issue in xen_intr.cc (see his
separate patch), and to Avi for suggesting a compile-time fix.

With this patch:

1. Using percpu<...> to *define* a per-cpu variable fails compilation.
   Instead, PERCPU(...) must be used for the definition, which is important
   because it places the variable in the ".percpu" section.

2. If a *declaration* is needed additionally (e.g., for a static class
   member), percpu<...> must be used, not PERCPU().
   Trying to use PERCPU() for declaration will cause a compilation error.

3. PERCPU() only works on statically-constructed objects - global variables,
   static function-variables and static class-members. Trying to use it
   on a dynamically-constructed object - stack variable, class field,
   or operator new - will cause a compilation error.

With this patch, the bug in xen_intr.cc would have been caught at
compile time.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8add1b91

Nov 25, 2013

mmu: MAP_POPULATE support for anon mmap() · fb24dc3e

Pekka Enberg authored 11 years ago


Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

fb24dc3e

mmu: Anonymous memory demand paging · c1d5fccb

Pekka Enberg authored 11 years ago


Switch to demand paging for anonymous virtual memory.

I used SPECjvm2008 to verify performance impact. The numbers are mostly
the same with few exceptions, most visible in the 'serial' benchmark.
However, there's quite a lot of variance between SPECjvm2008 runs so I
wouldn't read too much into them.

As we need the demand paging mechanism and the performance numbers
suggest that the implementation is reasonable, I'd merge the patch as-is
and see optimize it later.

  Before:

    Running specJVM2008 benchmarks on an OSV guest.
    Score on compiler.compiler: 331.23 ops/m
    Score on compiler.sunflow: 131.87 ops/m
    Score on compress: 118.33 ops/m
    Score on crypto.aes: 41.34 ops/m
    Score on crypto.rsa: 204.12 ops/m
    Score on crypto.signverify: 196.49 ops/m
    Score on derby: 170.12 ops/m
    Score on mpegaudio: 70.37 ops/m
    Score on scimark.fft.large: 36.68 ops/m
    Score on scimark.lu.large: 13.43 ops/m
    Score on scimark.sor.large: 22.29 ops/m
    Score on scimark.sparse.large: 29.35 ops/m
    Score on scimark.fft.small: 195.19 ops/m
    Score on scimark.lu.small: 233.95 ops/m
    Score on scimark.sor.small: 90.86 ops/m
    Score on scimark.sparse.small: 64.11 ops/m
    Score on scimark.monte_carlo: 145.44 ops/m
    Score on serial: 94.95 ops/m
    Score on sunflow: 73.24 ops/m
    Score on xml.transform: 207.82 ops/m
    Score on xml.validation: 343.59 ops/m

  After:

    Score on compiler.compiler: 346.78 ops/m
    Score on compiler.sunflow: 132.58 ops/m
    Score on compress: 116.05 ops/m
    Score on crypto.aes: 40.26 ops/m
    Score on crypto.rsa: 206.67 ops/m
    Score on crypto.signverify: 194.47 ops/m
    Score on derby: 175.22 ops/m
    Score on mpegaudio: 76.18 ops/m
    Score on scimark.fft.large: 34.34 ops/m
    Score on scimark.lu.large: 15.00 ops/m
    Score on scimark.sor.large: 24.80 ops/m
    Score on scimark.sparse.large: 33.10 ops/m
    Score on scimark.fft.small: 168.67 ops/m
    Score on scimark.lu.small: 236.14 ops/m
    Score on scimark.sor.small: 110.77 ops/m
    Score on scimark.sparse.small: 121.29 ops/m
    Score on scimark.monte_carlo: 146.03 ops/m
    Score on serial: 87.03 ops/m
    Score on sunflow: 77.33 ops/m
    Score on xml.transform: 205.73 ops/m
    Score on xml.validation: 351.97 ops/m

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c1d5fccb

mmu: VMA permission flags · 8a56dc8c

Pekka Enberg authored 11 years ago


Add permission flags to VMAs. They will be used by mprotect() and the
page fault handler.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8a56dc8c