- Dec 30, 2013
-
-
Gleb Natapov authored
mprotect(PROT_WRITE) on a file opened as read only should fail, but current mprotect() implementation is missing the check. The patch implements it. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Or Cohen authored
getgrgid_r(3) is needed when querying file attributes from Java (see java.nio.file.Files.readAttributes()). This is needed for long format (-l) flag of ls. getgrgid_r also requires sysconf(_SC_GETGR_R_SIZE_MAX) Signed-off-by:
Or Cohen <orc@fewbytes.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 24, 2013
-
-
Nadav Har'El authored
We use sched::thread::attr to pass parameters to sched::thread creation, i.e., create a thread with non-default stack parameters, pinned to a particular CPU, or a detached thread. Previously we had constructors taking many combinations of stack size (integer), pinned cpu (cpu*) and detached (boolean), and doing "the right thing". However, this makes the code hard to read (what does attr(4096) specify?) and the constructors hard to expand with new parameters. Replace the attr() constructors with the so-called "named parameter" idiom: attr now only has a null constructor attr(), and one modifies it with calls to pin(cpu*), detach(), or stack(size). For example, attr() // default attributes attr().pin(sched::cpus[0]) // pin to cpu 0 attr().stack(4096).pin(sched::cpus[0]) // pin and non-default stack and so on. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 20, 2013
-
-
Nadav Har'El authored
This patch implements the Linux's timerfd_*() system calls, declared in <sys/timerfd.h>. These define a file descriptor, usable for read() or poll() and friends, which becomes readable when a timer expires. This aspires to be a full implementation of timerfd, with all the intricate details explained in timerfd_create(2). timerfd was added to Linux five years ago (Linux 2.6.25). Boost's asio, in particular, uses this feature if it thinks it is available. Fixes #129. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 19, 2013
-
-
Gleb Natapov authored
mprotect() should fails with ENOMEM if it is called on non mapped virtual address, but this check is done by mmu::ismapped(). Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 18, 2013
-
-
Nadav Har'El authored
af_local.h declares a couple of functions implemented in af_local.cc. There is no reason for pipe.cc to include it. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 15, 2013
-
-
Nadav Har'El authored
This patch changes backtrace() to use the _Unwind_* facilities provided by the GCC runtime (libgcc_eh.a), instead of the separate libunwind.a. After this patch, we don't use libunwind.a in OSv any more, and it can be removed (see issue #83). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 13, 2013
-
-
Raphael S. Carvalho authored
Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 12, 2013
-
-
Pekka Enberg authored
Make sure that the address range passed to munmap() is actually mapped. Reviewed-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Simplify mmap() by converting flags and permissions in one place. Reviewed-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Move mincore() to libc/mman.cc where all other memory mapping libc functions are. Reviewed-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Add a mmu::is_page_aligned() helper function and use it to get rid of open-coded checks. Reviewed-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 10, 2013
-
-
Pekka Enberg authored
Nadav Har'El explains: Traditionally, functions which succeed do NOT set errno to zero, but rather leave it unchanged (errno(3) on Linux says, for example, that "errno is never set to zero by any system call or library function."). Reviewed-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
When seeing this flag, pages fault in should not be filled with zeroes or any other patterns, and should rather be just left alone in whatever state we find them at. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 09, 2013
-
-
Raphael S. Carvalho authored
umount2 should call sys_umount2 instead. Add umount that calls sys_umount. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
The remove() function is part of the ISO C 1989 standard, and used, for example, to implement Java's File.delete(). It's supposed to remove a file, regardless of whether unlink() or rmdir() is needed to remove it. Our implementation (from Musl's) assumed that unlink() on a directory fails with EISDIR, and only on that case it tried rmdir(). However, returning EISDIR on unlink() is a Linux extension, which (deliberately) goes against the Posix standard - which specified EPERM should be returned in that case. Our ZFS implementation of unlink, following Solaris and FreeBSD (and not Linux), returns EPERM in that case. This meant that remove() used to fail deleting empty directories, and Java code (like the SpecJVM2008 "derby" benchmark) using it to recursively delete a directory, left behind undeleted empty directories. So this patch fixes remove() to try rmdir() if unlink() returned either the Linux-specific EISDIR, or the Posix-standard EPERM. It also adds to the readdir test another test which verifies that remove() can delete all files in a directory - both regular files and empty directories. Fixes #112. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 08, 2013
-
-
Glauber Costa authored
I needed to call detach in a test code of mine, and this is isn't implemented. The code I wrote to use it may or may not stay in the end, but nevertheless, let's implement it. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Glauber Costa authored
set_cleanup is quite a complicated piece of code. It is very easy to get it to race with other thread destruction sites, which was made abundantly clear when we tried to implement pthread detach. This patch tries to make it easier, by restricting how and when set_cleanup can be called. The trick here is that currently, a thread may or may not have a cleanup function, and through a call to set_cleanup, our decision to cleanup may change. From this point on, set_cleanup will only tell us *how* to cleanup. If and when, is a decision that we will make ourselves. For instance, if a thread is block-local, the destructor will be called by the end of the block. In that case, the _cleanup function will be there anyhow: we'll just not call it. We're setting here a default cleanup function for all created threads, that just deletes the current thread object. Anything coming from pthread will try to override it by also deleting the pthread object. And again, it is important to node that they will set up those cleanup function unconditionally. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 05, 2013
-
-
Gleb Natapov authored
pthread_condattr_init() is needed for JDK8 to run. Add stub for now. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
__vsnprintf_chk() passed the wrong length argument to the vsnprintf() call. I'm not aware of any specific bug this solves, but I found this error while auditing the *_chk() functions to figure out why "rogue" works when compiled with -DUSE_FORTIFY_LEVEL=1 but not with USE_FORTIFY_LEVEL=2. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 04, 2013
-
-
Nadav Har'El authored
When source code is compiled with -D_FORTIFY_SOURCE on Linux, various functions are sometimes replaced by __*_chk variants (e.g., __strcpy_chk) which can help avoid buffer overflows when the compiler knows the buffer's size during compilation. If we want to run source compiled on Linux with -D_FORTIFY_SOURCE (either deliberately or unintentionally - see issue #111), we need to implement these functions otherwise the program will crash because of a missing symbol. We already implement a bunch of _chk functions, but we are definitely missing some more. This patch implements 6 more _chk functions which are needed to run the "rogue" program (mentioned in issue #111) when compiled with -D_FORTIFY_SOURCE=1. Following the philosophy of our existing *_chk functions, we do not aim for either ultimate performance or iron-clad security for our implementation of these functions. If this becomes important, we should revisit all our *_chk functions. When compiled with -D_FORTIFY_SOURCE=2, rogue still doesn't work, but not because of a missing symbol, but because it fails reading the terminfo file for a yet unknown reason (a patch for that issue will be sent separately). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 03, 2013
-
-
Raphael S. Carvalho authored
Besides simplifying mmu::map_file interface, let's make it more similar to mmu::map_anon. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
(flags & MAP_ANONYMOUS) must be instead of (fd == -1) to determine the mapping type as the latter one is a valid argument to file mappings. Tests related to files were added into mmap_validate_file. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> [ penberg: cleanups ] Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Rename mmap_validate_flags to mmap_validate as it's not only related to flags now. Add new tests to check bad paramater values. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> [ penberg: cleanups ] Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Currently, we only check if neither MAP_PRIVATE nor MAP_SHARED were passed to mmap, however, if it was called with both flags, then EINVAL should be returned. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 27, 2013
-
-
Nadav Har'El authored
We only open file descriptor 1 relatively late in our boot process (see vfs_init() in fs/vfs/main.cc). We would like to be able to use stdout (and C++'s std::cout) much earlier than that - examples include ACPI's information messages (before c9dadf2d) and our "--help" command line parameter. Before this patch, early writes to stdout almost work, but with a strange twist: They only write the string up to the last newline, and whatever is left is buffered until much later - when all those "string ends" are lumped together. The basis of Musl's stdio write mechanism is the "f->write()" method. It needs to write *two* things: Whatever we have buffered previously, and the new string given to it. __stdio_write() is the default implementation, which does this correctly using writev(). But our early implementation, __stdout_write only write the new string, and the buffered part remained buffered, collecting various string parts until it was finally flushed when we switched to the correct __stdio_write. This patch fixes __stdout_write(), to write both strings as expected. Fixes #104. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Nadav Har'El reports that tst-pipe.so starts to hang some of the time after commit c1d5fccb ("mmu: Anonymous memory demand paging"). Tracing page faults points to pthread stacks which are now demand faulted. Avi Kivity explains: It's a logical bug in our design. User code runs on mmap()ed stacks, then calls "kernel" code, which doesn't tolerate page faults (interrupts disabled, preemption disabled, already in the page fault path, whatever). Possible solutions: - insert "thunk code" between user and kernel code that switches the stacks to known resident stacks. We could abuse the elf linker code to do that for us, at run time. - use -fsplit-stack to allow a dynamically allocated, discontiguous stack on physical memory - use map_populate and live with the memory wastage Switch to map_populate as a stop-gap measure until OSv "kernel" code is able to deal with page faults. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 26, 2013
-
-
Avi Kivity authored
We previously had the POSIX variant only. Implement the GNU variant as well, and update the header to point to the correct function based on the dialect selected. The POSIX variant is renamed __xpg_strerror_r() to conform to the ABI standards. This fixes calls to strerror_r() from binaries which were compiled with _GNU_SOURCE (libboost_system.a) but preserves the correct behaviour for BSD derived source. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 25, 2013
-
-
Pekka Enberg authored
Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Switch to demand paging for anonymous virtual memory. I used SPECjvm2008 to verify performance impact. The numbers are mostly the same with few exceptions, most visible in the 'serial' benchmark. However, there's quite a lot of variance between SPECjvm2008 runs so I wouldn't read too much into them. As we need the demand paging mechanism and the performance numbers suggest that the implementation is reasonable, I'd merge the patch as-is and see optimize it later. Before: Running specJVM2008 benchmarks on an OSV guest. Score on compiler.compiler: 331.23 ops/m Score on compiler.sunflow: 131.87 ops/m Score on compress: 118.33 ops/m Score on crypto.aes: 41.34 ops/m Score on crypto.rsa: 204.12 ops/m Score on crypto.signverify: 196.49 ops/m Score on derby: 170.12 ops/m Score on mpegaudio: 70.37 ops/m Score on scimark.fft.large: 36.68 ops/m Score on scimark.lu.large: 13.43 ops/m Score on scimark.sor.large: 22.29 ops/m Score on scimark.sparse.large: 29.35 ops/m Score on scimark.fft.small: 195.19 ops/m Score on scimark.lu.small: 233.95 ops/m Score on scimark.sor.small: 90.86 ops/m Score on scimark.sparse.small: 64.11 ops/m Score on scimark.monte_carlo: 145.44 ops/m Score on serial: 94.95 ops/m Score on sunflow: 73.24 ops/m Score on xml.transform: 207.82 ops/m Score on xml.validation: 343.59 ops/m After: Score on compiler.compiler: 346.78 ops/m Score on compiler.sunflow: 132.58 ops/m Score on compress: 116.05 ops/m Score on crypto.aes: 40.26 ops/m Score on crypto.rsa: 206.67 ops/m Score on crypto.signverify: 194.47 ops/m Score on derby: 175.22 ops/m Score on mpegaudio: 76.18 ops/m Score on scimark.fft.large: 34.34 ops/m Score on scimark.lu.large: 15.00 ops/m Score on scimark.sor.large: 24.80 ops/m Score on scimark.sparse.large: 33.10 ops/m Score on scimark.fft.small: 168.67 ops/m Score on scimark.lu.small: 236.14 ops/m Score on scimark.sor.small: 110.77 ops/m Score on scimark.sparse.small: 121.29 ops/m Score on scimark.monte_carlo: 146.03 ops/m Score on serial: 87.03 ops/m Score on sunflow: 77.33 ops/m Score on xml.transform: 205.73 ops/m Score on xml.validation: 351.97 ops/m Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Calling feof on a closed file isn't safe, and the result is undefined. Found while auditing the code. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 21, 2013
-
-
Nadav Har'El authored
prio.hh defines various initialization priorities. The actual numbers don't matter, just the order between them. But when we add too many priorities between existing ones, we may hit a need to renumber. This is plain ugly, and reminds me of Basic programming ;-) So this patch switches to an enum (enum class, actually). We now just have a list of priority names in order, with no numbers. It would have been straightforward, if it weren't for a bug in GCC (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59211 ) where the "init_priority" attribute doesn't accept the enum (while the "constructor" attribute does). Luckily, a simple workaround - explicitly casting to int - works. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 14, 2013
-
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Add pthread_kill() stub. Needed by Cassandra when its stopped with Ctrl-C. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 13, 2013
-
-
Tomasz Grabiec authored
Spotted by Nadav: libc.threaded field is not set but is used in several 'if' statements when setting the lock_owner field. When 'libc.threaded' is false then 'lock_owner' of a FILE is set to a special value which indicates no locking. This field is initially set to 0 and the original musl code had a logic which upon creation of the first thread set it to true and adjusted 'lock_owner' field of all open files to the value of libc.main_thread. In OSv we had no such logic which resulted in no locking of the FILE structure. This patch fixes the issue by using threaded mode from the very beginning. We also do not rely anymore on posix thread existence so that stdlib can be used very early in the boot process without unexpected behavior. It is used (rightfully or not) for example in ramdisk_init(). We do not have to hold the pthread id in the 'lock_owner' field because the mutex already tracks the owner and we can do the check using 'mutex_owned()' function. This patch also gets rid of a magic value STDIO_SINGLETHREADED, which is of type pthread_t and was used to disable locking when it was known to be not necessary. A new field is introduced named 'no_locking' which serves this purpose. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-