Commits · 9b72ad475e9af40231615e10754d9f6d05aeb8bb · Verlässliche Systemsoftware / projects / osv

Dec 30, 2013

mmu: Validate file permission in mprotect() · 0dfda588

Gleb Natapov authored 11 years ago


mprotect(PROT_WRITE) on a file opened as read only should fail,
but current mprotect() implementation is missing the check. The patch
implements it.

Signed-off-by: Gleb Natapov <gleb@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0dfda588

libc: Add getgrgid_r for group querying · 31973bda

Or Cohen authored 11 years ago


getgrgid_r(3) is needed when querying file attributes from Java (see
java.nio.file.Files.readAttributes()).
This is needed for long format (-l) flag of ls.

getgrgid_r also requires sysconf(_SC_GETGR_R_SIZE_MAX)

Signed-off-by: Or Cohen <orc@fewbytes.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

31973bda

Dec 24, 2013

sched: Overhaul sched::thread::attr construction · eb48b150

Nadav Har'El authored 11 years ago


We use sched::thread::attr to pass parameters to sched::thread creation,
i.e., create a thread with non-default stack parameters, pinned to a
particular CPU, or a detached thread.

Previously we had constructors taking many combinations of stack size
(integer), pinned cpu (cpu*) and detached (boolean), and doing "the
right thing". However, this makes the code hard to read (what does
attr(4096) specify?) and the constructors hard to expand with new
parameters.

Replace the attr() constructors with the so-called "named parameter"
idiom: attr now only has a null constructor attr(), and one modifies
it with calls to pin(cpu*), detach(), or stack(size).

For example,
    attr()                                  // default attributes
    attr().pin(sched::cpus[0])              // pin to cpu 0
    attr().stack(4096).pin(sched::cpus[0])  // pin and non-default stack
    and so on.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

eb48b150

Dec 20, 2013

Add timerfd_*() system calls · 46e73b66

Nadav Har'El authored 11 years ago


This patch implements the Linux's timerfd_*() system calls, declared in
<sys/timerfd.h>. These define a file descriptor, usable for read() or
poll() and friends, which becomes readable when a timer expires.

This aspires to be a full implementation of timerfd, with all the intricate
details explained in timerfd_create(2).

timerfd was added to Linux five years ago (Linux 2.6.25). Boost's asio,
in particular, uses this feature if it thinks it is available.

Fixes #129.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

46e73b66

Dec 19, 2013

mprotect() should not fail if it encounters non present pte · 410efce0

Gleb Natapov authored 11 years ago


mprotect() should fails with ENOMEM if it is called on non mapped
virtual address, but this check is done by mmu::ismapped().

Signed-off-by: Gleb Natapov <gleb@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

410efce0

Dec 18, 2013

Trivial: pipe.cc doesn't need to include af_local.h · 5f9a0a4a

Nadav Har'El authored 11 years ago


af_local.h declares a couple of functions implemented in af_local.cc.
There is no reason for pipe.cc to include it.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

5f9a0a4a

Dec 15, 2013

backtrace(): Use libgcc_eh.a instead of libunwind.a. · 1d581c75

Nadav Har'El authored 11 years ago


This patch changes backtrace() to use the _Unwind_* facilities provided
by the GCC runtime (libgcc_eh.a), instead of the separate libunwind.a.

After this patch, we don't use libunwind.a in OSv any more, and it can
be removed (see issue #83).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

1d581c75

Dec 13, 2013

umount2: Add parameter checks · 2afd6f60

Raphael S. Carvalho authored 11 years ago


Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2afd6f60

Dec 12, 2013

munmap: Fail if munmap address range is not mapped · df69a596

Pekka Enberg authored 11 years ago

Make sure that the address range passed to munmap() is actually mapped.

Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

df69a596

mman: Simplify mmap() · 6495e14c

Pekka Enberg authored 11 years ago

Simplify mmap() by converting flags and permissions in one place.

Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

6495e14c

mman: Move mincore() to libc/mman.cc · c41a394a

Pekka Enberg authored 11 years ago


Move mincore() to libc/mman.cc where all other memory mapping libc
functions are.

Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c41a394a

mmu: Add is_page_aligned() helper function · 71f1ffda

Pekka Enberg authored 11 years ago


Add a mmu::is_page_aligned() helper function and use it to get rid of
open-coded checks.

Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

71f1ffda

Dec 10, 2013

mman: Fix errno handling in mmap and munmap · 2358ac62

Pekka Enberg authored 11 years ago


Nadav Har'El explains:

  Traditionally, functions which succeed do NOT set errno to zero, but
  rather leave it unchanged (errno(3) on Linux says, for example, that
  "errno is never set to zero by any system call or library function.").

Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2358ac62

libc: Add munmap validation · 8c57f767

Raphael S. Carvalho authored 11 years ago


Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8c57f767

mmu: support MAP_UNINITIALIZED flag · f7249e73

Glauber Costa authored 11 years ago


When seeing this flag, pages fault in should not be filled with zeroes or any
other patterns, and should rather be just left alone in whatever state we find
them at.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f7249e73

Dec 09, 2013

libc/mount: Change umount2 and add umount · 2050ce8c

Raphael S. Carvalho authored 11 years ago

umount2 should call sys_umount2 instead. Add umount that calls sys_umount.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2050ce8c

libc: Fix remove() return value · 7a986ba7

Nadav Har'El authored 11 years ago

The remove() function is part of the ISO C 1989 standard, and used, for
example, to implement Java's File.delete(). It's supposed to remove a
file, regardless of whether unlink() or rmdir() is needed to remove it.

Our implementation (from Musl's) assumed that unlink() on a directory fails
with EISDIR, and only on that case it tried rmdir(). However, returning
EISDIR on unlink() is a Linux extension, which (deliberately) goes against
the Posix standard - which specified EPERM should be returned in that case.
Our ZFS implementation of unlink, following Solaris and FreeBSD (and not
Linux), returns EPERM in that case.

This meant that remove() used to fail deleting empty directories, and
Java code (like the SpecJVM2008 "derby" benchmark) using it to recursively
delete a directory, left behind undeleted empty directories.

So this patch fixes remove() to try rmdir() if unlink() returned either
the Linux-specific EISDIR, or the Posix-standard EPERM. It also adds
to the readdir test another test which verifies that remove() can delete
all files in a directory - both regular files and empty directories.

Fixes #112.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7a986ba7

Dec 08, 2013

sched: implement pthread_detach · afcf4735

Glauber Costa authored 11 years ago


I needed to call detach in a test code of mine, and this is isn't implemented.
The code I wrote to use it may or may not stay in the end, but nevertheless,
let's implement it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

afcf4735

sched: standardize call to _cleanup · d754d662

Glauber Costa authored 11 years ago

set_cleanup is quite a complicated piece of code. It is very easy to get it to
race with other thread destruction sites, which was made abundantly clear when
we tried to implement pthread detach.

This patch tries to make it easier, by restricting how and when set_cleanup can
be called. The trick here is that currently, a thread may or may not have a
cleanup function, and through a call to set_cleanup, our decision to cleanup
may change.

From this point on, set_cleanup will only tell us *how* to cleanup. If and
when, is a decision that we will make ourselves. For instance, if a thread
is block-local, the destructor will be called by the end of the block. In
that case, the _cleanup function will be there anyhow: we'll just not call
it.

We're setting here a default cleanup function for all created threads, that
just deletes the current thread object. Anything coming from pthread will try
to override it by also deleting the pthread object. And again, it is important
to node that they will set up those cleanup function unconditionally.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

d754d662

Dec 05, 2013

pthread: add stubs for pthread_condattr_* functions · 7257cb91

Gleb Natapov authored 11 years ago


pthread_condattr_init() is needed for JDK8 to run. Add stub for now.

Signed-off-by: Gleb Natapov <gleb@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7257cb91

Fix error in __vsnprintf_chk() · a9b3e1c3

Nadav Har'El authored 11 years ago


__vsnprintf_chk() passed the wrong length argument to the vsnprintf()
call. I'm not aware of any specific bug this solves, but I found this
error while auditing the *_chk() functions to figure out why "rogue"
works when compiled with -DUSE_FORTIFY_LEVEL=1 but not with
USE_FORTIFY_LEVEL=2.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a9b3e1c3

Dec 04, 2013

Add a few missing __*_chk functions · 2f4b8777

Nadav Har'El authored 11 years ago


When source code is compiled with -D_FORTIFY_SOURCE on Linux, various
functions are sometimes replaced by __*_chk variants (e.g., __strcpy_chk)
which can help avoid buffer overflows when the compiler knows the buffer's
size during compilation.

If we want to run source compiled on Linux with -D_FORTIFY_SOURCE (either
deliberately or unintentionally - see issue #111), we need to implement
these functions otherwise the program will crash because of a missing
symbol. We already implement a bunch of _chk functions, but we are
definitely missing some more.

This patch implements 6 more _chk functions which are needed to run
the "rogue" program (mentioned in issue #111) when compiled with
-D_FORTIFY_SOURCE=1.

Following the philosophy of our existing *_chk functions, we do not
aim for either ultimate performance or iron-clad security for our
implementation of these functions. If this becomes important, we
should revisit all our *_chk functions.

When compiled with -D_FORTIFY_SOURCE=2, rogue still doesn't work, but
not because of a missing symbol, but because it fails reading the
terminfo file for a yet unknown reason (a patch for that issue will
be sent separately).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2f4b8777

af_local: convert to a derive class of file · d40418a7
Avi Kivity authored 11 years ago
```
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
```
d40418a7
pipe: make it a derived class of 'file' · 0343c154
Avi Kivity authored 11 years ago
```
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
```
0343c154

Dec 03, 2013

mmu: Simplify mmu::map_file interface · c4fb37c3

Raphael S. Carvalho authored 11 years ago


Besides simplifying mmu::map_file interface, let's make it more similar
to mmu::map_anon.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c4fb37c3

mmap: Check file-backed mmap arguments · 25806da3

Raphael S. Carvalho authored 11 years ago


(flags & MAP_ANONYMOUS) must be instead of (fd == -1) to determine
the mapping type as the latter one is a valid argument to file mappings.
Tests related to files were added into mmap_validate_file.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
[ penberg: cleanups ]
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

25806da3

mmap: Improve mmap validation · 761a5014

Raphael S. Carvalho authored 11 years ago


Rename mmap_validate_flags to mmap_validate as it's not only related to
flags now.  Add new tests to check bad paramater values.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
[ penberg: cleanups ]
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

761a5014

pipe: convert to make_file() · c5459392

Avi Kivity authored 11 years ago


Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c5459392

af_local: convert to make_file() · 247cb9f4

Avi Kivity authored 11 years ago


Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

247cb9f4

mmap: fail if flags contain both MAP_SHARED and MAP_PRIVATE · 6d6a1ea3

Raphael S. Carvalho authored 11 years ago


Currently, we only check if neither MAP_PRIVATE nor MAP_SHARED were
passed to mmap, however, if it was called with both flags, then EINVAL
should be returned.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

6d6a1ea3

Nov 27, 2013

Fix boot-time stdout support · ad46be1b

Nadav Har'El authored 11 years ago


We only open file descriptor 1 relatively late in our boot process (see
vfs_init() in fs/vfs/main.cc). We would like to be able to use stdout
(and C++'s std::cout) much earlier than that - examples include ACPI's
information messages (before c9dadf2d)
and our "--help" command line parameter.

Before this patch, early writes to stdout almost work, but with a strange
twist: They only write the string up to the last newline, and whatever is
left is buffered until much later - when all those "string ends" are lumped
together.

The basis of Musl's stdio write mechanism is the "f->write()" method.
It needs to write *two* things: Whatever we have buffered previously,
and the new string given to it. __stdio_write() is the default
implementation, which does this correctly using writev(). But our
early implementation, __stdout_write only write the new string, and
the buffered part remained buffered, collecting various string parts
until it was finally flushed when we switched to the correct __stdio_write.

This patch fixes __stdout_write(), to write both strings as expected.

Fixes #104.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ad46be1b

pthread: Populate mmap'd stack pages · 41efdc1c

Pekka Enberg authored 11 years ago


Nadav Har'El reports that tst-pipe.so starts to hang some of the time
after commit c1d5fccb ("mmu: Anonymous memory demand paging"). Tracing
page faults points to pthread stacks which are now demand faulted.

Avi Kivity explains:

  It's a logical bug in our design.  User code runs on mmap()ed stacks,
  then calls "kernel" code, which doesn't tolerate page faults (interrupts
  disabled, preemption disabled, already in the page fault path,
  whatever).

  Possible solutions:

  - insert "thunk code" between user and kernel code that switches the
    stacks to known resident stacks.  We could abuse the elf linker code
    to do that for us, at run time.
  - use -fsplit-stack to allow a dynamically allocated, discontiguous
    stack on physical memory
  - use map_populate and live with the memory wastage

Switch to map_populate as a stop-gap measure until OSv "kernel" code is
able to deal with page faults.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

41efdc1c

Nov 26, 2013

libc: implement the GNU variant of strerror_r() · 7053ac3a

Avi Kivity authored 11 years ago


We previously had the POSIX variant only.  Implement the GNU variant as well,
and update the header to point to the correct function based on the dialect
selected.

The POSIX variant is renamed __xpg_strerror_r() to conform to the ABI
standards.

This fixes calls to strerror_r() from binaries which were compiled with
_GNU_SOURCE (libboost_system.a) but preserves the correct behaviour for
BSD derived source.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7053ac3a

Nov 25, 2013

mmu: MAP_POPULATE support for anon mmap() · fb24dc3e

Pekka Enberg authored 11 years ago


Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

fb24dc3e

mmu: Anonymous memory demand paging · c1d5fccb

Pekka Enberg authored 11 years ago


Switch to demand paging for anonymous virtual memory.

I used SPECjvm2008 to verify performance impact. The numbers are mostly
the same with few exceptions, most visible in the 'serial' benchmark.
However, there's quite a lot of variance between SPECjvm2008 runs so I
wouldn't read too much into them.

As we need the demand paging mechanism and the performance numbers
suggest that the implementation is reasonable, I'd merge the patch as-is
and see optimize it later.

  Before:

    Running specJVM2008 benchmarks on an OSV guest.
    Score on compiler.compiler: 331.23 ops/m
    Score on compiler.sunflow: 131.87 ops/m
    Score on compress: 118.33 ops/m
    Score on crypto.aes: 41.34 ops/m
    Score on crypto.rsa: 204.12 ops/m
    Score on crypto.signverify: 196.49 ops/m
    Score on derby: 170.12 ops/m
    Score on mpegaudio: 70.37 ops/m
    Score on scimark.fft.large: 36.68 ops/m
    Score on scimark.lu.large: 13.43 ops/m
    Score on scimark.sor.large: 22.29 ops/m
    Score on scimark.sparse.large: 29.35 ops/m
    Score on scimark.fft.small: 195.19 ops/m
    Score on scimark.lu.small: 233.95 ops/m
    Score on scimark.sor.small: 90.86 ops/m
    Score on scimark.sparse.small: 64.11 ops/m
    Score on scimark.monte_carlo: 145.44 ops/m
    Score on serial: 94.95 ops/m
    Score on sunflow: 73.24 ops/m
    Score on xml.transform: 207.82 ops/m
    Score on xml.validation: 343.59 ops/m

  After:

    Score on compiler.compiler: 346.78 ops/m
    Score on compiler.sunflow: 132.58 ops/m
    Score on compress: 116.05 ops/m
    Score on crypto.aes: 40.26 ops/m
    Score on crypto.rsa: 206.67 ops/m
    Score on crypto.signverify: 194.47 ops/m
    Score on derby: 175.22 ops/m
    Score on mpegaudio: 76.18 ops/m
    Score on scimark.fft.large: 34.34 ops/m
    Score on scimark.lu.large: 15.00 ops/m
    Score on scimark.sor.large: 24.80 ops/m
    Score on scimark.sparse.large: 33.10 ops/m
    Score on scimark.fft.small: 168.67 ops/m
    Score on scimark.lu.small: 236.14 ops/m
    Score on scimark.sor.small: 110.77 ops/m
    Score on scimark.sparse.small: 121.29 ops/m
    Score on scimark.monte_carlo: 146.03 ops/m
    Score on serial: 87.03 ops/m
    Score on sunflow: 77.33 ops/m
    Score on xml.transform: 205.73 ops/m
    Score on xml.validation: 351.97 ops/m

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c1d5fccb

libc/network: feof shouldn't be used on a closed file · df6278fe

Raphael S. Carvalho authored 11 years ago


Calling feof on a closed file isn't safe, and the result is undefined.
Found while auditing the code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

df6278fe

Nov 21, 2013

Replace numbers in prio.hh by automatically defined numbers · 147de06c

Nadav Har'El authored 11 years ago

prio.hh defines various initialization priorities. The actual numbers
don't matter, just the order between them. But when we add too many
priorities between existing ones, we may hit a need to renumber. This
is plain ugly, and reminds me of Basic programming ;-)

So this patch switches to an enum (enum class, actually).
We now just have a list of priority names in order, with no numbers.

It would have been straightforward, if it weren't for a bug in GCC
(see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59211

) where the
"init_priority" attribute doesn't accept the enum (while the "constructor"
attribute does). Luckily, a simple workaround - explicitly casting to
int - works.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

147de06c

Nov 14, 2013

libc/pthread.cc: Switch to WARN_STUBBED() · e5b8acd4
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
e5b8acd4

libc: pthread_kill() stub · 39b9c04b

Pekka Enberg authored 11 years ago


Add pthread_kill() stub. Needed by Cassandra when its stopped with
Ctrl-C.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

39b9c04b

Nov 13, 2013

Fix lack of locking in libc in file and memstream related operations · 3f053b8f

Tomasz Grabiec authored 11 years ago

Spotted by Nadav: libc.threaded field is not set but is used in
several 'if' statements when setting the lock_owner field.

When 'libc.threaded' is false then 'lock_owner' of a FILE is set to a
special value which indicates no locking. This field is initially set
to 0 and the original musl code had a logic which upon creation of the
first thread set it to true and adjusted 'lock_owner' field of all
open files to the value of libc.main_thread. In OSv we had no such
logic which resulted in no locking of the FILE structure.

This patch fixes the issue by using threaded mode from the very
beginning. We also do not rely anymore on posix thread existence so
that stdlib can be used very early in the boot process without
unexpected behavior. It is used (rightfully or not) for example in
ramdisk_init(). We do not have to hold the pthread id in the
'lock_owner' field because the mutex already tracks the owner and we
can do the check using 'mutex_owned()' function.

This patch also gets rid of a magic value STDIO_SINGLETHREADED, which
is of type pthread_t and was used to disable locking when it was known
to be not necessary. A new field is introduced named 'no_locking'
which serves this purpose.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3f053b8f