- Apr 24, 2014
-
-
Nadav Har'El authored
Posix allows read() on directories in some filesystems. However, Linux always returns EISDIR in this case, so because we're emulating Linux, so should we, for every filesystem. All our filesystems except ZFS (e.g., ramfs) already return EISDIR when reading a directory, but ZFS doesn't, so this patch adds the missing check in ZFS. This patch is related to issue #94: the first step to fixing #94 is to return the right error when reading a directory. This patch also adds a test case, which can be compiled both on OSv and Linux, to verify they both have the same behavior. Before the patch, the test succeeded on Linux but failed on OSv when the directory is on ZFS. Instead of fixing zfs_read like I do in this patch, I could have also fixed sys_read() in vfs_syscalls.cc which is the top layer of all read() operations, and I could have done there (fp->f_dentry && fp->f_dentry->d_vnode->v_type == VDIR) { return EISDIR; } to cover all the filesystems. I decided not to do that, because all filesystems except ZFS already have this check, and because the lower layers like zfs_read() already have more natural access to d_vnode. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
txq_encap decrements txq.avail for each mbuf, so we need to increment same number here. Reviewed-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This patch implements a version of flock that only performs error checking. It assumes any valid operation succeeds, since we are using a single-process model. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
The jemalloc memory allocator will make intense use of MADV_DONTNEED to flush pages it is no longer using. Respect that advice. Let's keep returning -1 for the remaining cases so we don't fool anybody Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
MongoDB wants it. In general, I am providing the information that is easy to get, and ignoring the ones which are not - with the exception of process count, that seemed easy enough to implement. This is the kind of thing Mongo does with it: 2014-04-15T09:54:12.322+0000 [clientcursormon] mem (MB) res:670160 virt:25212 2014-04-15T09:54:12.323+0000 [clientcursormon] mapped (incl journal view):160 2014-04-15T09:54:12.324+0000 [clientcursormon] connections:0 Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This is one of the statistics that shows up in /proc/self/stat under Linux, but this is generally interesting for applications. Since we don't have kernel mode and userspace mode, it is very hard to differentiate between "time spent in userspace" and "kernel time spent on behalf of the process". Therefore, we will present system time as always 0. If we wanted, we could at least differentiate clearly osv-specific threads as system time, but there is no need to go through the trouble now. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
It will be used for procfs compatibility. Applications may want to know how much memory is potentially available through mmap mappings (not necessarily allocated). Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
The current code does not allow ioctls on unix sockets, but real world applications - nginx in my example will issue them. I am starting by implementing FIONBIO to set blocking/nonblocking mode. I notice here that we already have a flag in the read and write functions, and they seem to handle that flag: the internal pipe_buffer implementation does check and act on it. So let's just flip it. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Apr 23, 2014
-
-
Glauber Costa authored
First of all, we need the symbol. But so far it is fine returning EINVAL to everybody except for PR_SET_DUMPABLE. There are situations (see man prctl) in which this flag is automatically cleared. Some programs (nginx here) will then proceed to set it to make sure it is on. The flag itself is meanginless to us, since we are not producing per-thread coredumps, and when we do dump, we will dump regardless of this. But we need the call to suceed. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
empty implementation will suit us. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Nadav says: The clock() function is stubbed. We can easily implement it by calling clock_gettime(CLOCK_PROCESS_CPUTIME_ID). Closes #275. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Compile tst-mmap.so with "-z now", so that symbols are resolved at load time instead of when needed. The problem was that tst-mmap uses wait_until, which runs the functions sched::thread::wait() and sched::thread::stop_wait() while preemption is disabled. In about 1 in 10 runs of tst-mmap.so, this caused an assertion- failure crash (symbol resolution caused sleeping while preemption is disabled). Fixes #256. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Same as we currently have for AS, show that value as unlimited. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Enhance our runtime with the getgrnam function. Although we have no groups, we will recognize group "nobody" for compatibility and return that as group 0 Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Python3 no longer supports __cmp__, we need to define __lt__ to make the objects comparable. This fixes `trace extract` on GDB linked with python3. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Whenever you change the working directory through either fchdir or chdir, a file structure holding refcnts to the underlying mount point is set to the field t_cwdfp of main_task. If t_cwdfp was previously set, then the older data will be freed and replaced by the newer one. The problem exists because these refcnts are never released afterwards. Changes introduced by this patch: 1) Create a function and use lambda to reuse as much code as possible (Suggested by Glauber Costa). 2) Create vfs_exit that will use the function previously created to free up main_task resources, and call unmount_rootfs afterwards. Needless to say, vfs_exit replaces the function unmount_rootfs at the shutdown procedure. After applying the changes described above, the refcnt leaks on the root mount point are gone. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Apr 22, 2014
-
-
https://github.com/avikivity/osvAvi Kivity authored
This patchset fixes bad interaction between the debug allocator and virtio. After this patchset, 'make check' still fails, but this is due to genuine use-after-free problems in the vfs, so will be fixed later. Fixes #240. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Gleb Natapov authored
mmu::page_table_root symbol is no longer global. It will take more than that to make it work on ARM. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
The comment states that option names are the same for BSD and Linux, but that is not true at all for a lot of them. Checking the constants, I have came up with the following conversion list. Before this patch, MongoDB complains that getsockopt to query the interval time returns an error. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
We are currently setting those values okay, but getting them takes us to the default clause. They are all very simples cases of copying a variable, so it's just a matter of doing that. Both options mentioned are used by MongoDB. TCP_KEEPINIT is not, but well, it can't hurt. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Since most applications running under OSv will be compiled against glibc on Linux, we need to provide symbols for gnu_get_libc_version and gnu_get_libc_release. Unless we find a case where we have to, we shouldn't lie, though. Answer OSv specific data here. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
While we take pride in having no spinlocks in the system, if an application wants to use them, who are we to deny them this god given right? Some applications will implement spinlocks through a pthread interface, which is what I implement here. We did not have any standard trylock mechanism, so one is provided. Other than that, the interface is pretty trivial except for the fact that it seems to provide some protection against deadlocks. We will just ignore that for the moment and assume a well behaved application. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
MongoDB expects that call and would like to guarantee allocation of blocks in the file. It does have a fallback, so for the time being I am just providing the symbol. I have opened Issue #265 to track this. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Right now doing the same as exit(), since we don't support the atexit handlers anyway Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
The debug allocator can allocate non-contiguous memory for large requests, but since b7de9871 it uses only one sg entry for the entire buffer. One possible fix is to allocate contiguous memory even under the debug allocator, but in the future we may wish to allow discontiguous allocation when not enough contiguous space is available. So instead we implement a virt_to_phys() variant that takes a range, and outputs the physical segments that make it up, and use that to construct a minimal sg list depending on the input. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
When the debug allocator is active, it will call vpopulate() to manage the page table. This causes a deadlock, as vpopulate() takes vma_list_mutex, while other operations that hold vma_list_mutex may allocate memory. Fix by using a separate mutex for the debug range. This is safe since the page table root is pre-allocated, and any lower page tables will be either in the vma list range, or the debug range, but not both. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Normally, symbol binding in shared objects is lazy, using the PLTGOT mechanism. This means that a symbol is resolved only when first used. This is great because it speeds up object load, and also allows us never to implement symbols which aren't actually used in any real code path. However, as issue #256 shows, symbols which are used in DSOs from a preemption-disabled context cannot be resolved on first use, because symbol resolution may sleep. Two important examples of this are sched::thread::wait() and sched::thread::stop_wait(), both used by wait_until() while it is in preempt_disable. This patch adds the missing support for the standard DT_BIND_NOW tag. This tag can be added added to an object with the "-z now" ld option. When an object has this tag, all its symbols should be resolved on load time, instead of lazily (when first used). Bug #256 can be fixed by linking tst-mmap.so with "-z now" (this will be a separate patch). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Apr 20, 2014
-
-
Avi Kivity authored
The debug allocator can allocate non-contiguous memory for large requests, but since b7de9871 it uses only one sg entry for the entire buffer. One possible fix is to allocate contiguous memory even under the debug allocator, but in the future we may wish to allow discontiguous allocation when not enough contiguous space is available. So instead we implement a virt_to_phys() variant that takes a range, and outputs the physical segments that make it up, and use that to construct a minimal sg list depending on the input. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
When the debug allocator is active, it will call vpopulate() to manage the page table. This causes a deadlock, as vpopulate() takes vma_list_mutex, while other operations that hold vma_list_mutex may allocate memory. Fix by using a separate mutex for the debug range. This is safe since the page table root is pre-allocated, and any lower page tables will be either in the vma list range, or the debug range, but not both. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
The MALLOC_ALIGNMENT constant is supposed to be the default minimum alignment returned by malloc(). I set it to 8, but this is actually not what is normally done by recent modern of glibc on Linux, which takes the maximum of 2*sizeof(size_t) (which is indeed 8) and the alignof(long double), which is 16 bytes. So to be ABI compatible with Linux, we should do the same. Note that while this patch is good for future reference, it doesn't actually have any practical consequences in the current implementation: In our current small-malloc implementation, MALLOC_ALIGNMENT is not used at all, and the alignment is determined just by the size of the allocated object: A 16-byte object would have 16-byte alignment anyway. For large allocations the alignment will always be a full page, so again this change doesn't matter either. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Apr 17, 2014
-
-
Calle Wilund authored
Per-cpu trace buffers. Actual buffer space is kept at roughly the "same" as previously, up to 4 vcpu. Above this used space will be higher. Does not handle vcpu:s appearing or disappearing in runtime. Trace events are allocated with a "not done" terminator marker, which is finalized when event is written, which should prevent any partial data messing up extraction. Fixes #146 Signed-off-by:
Calle Wilund <calle@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Apr 16, 2014
-
-
Glauber Costa authored
We are usign %x instead of %lx for x86_64. This has the effect of trimming all 64 bits to 32 bits creating a very, very confusing situation for people debugging OSv. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
If you mount the osv/build directory to another fast disk device, make will complaint: $ make make[1]: Entering directory `/home/asias/src/cloudius-systems/osv/build/release' GEN gen/include/osv/version.h fatal: Not a git repository (or any parent up to mount point /home/asias/src/cloudius-systems/osv/build) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). Fix it by making scirtps/osv-version.sh using --git-dir. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Claudio Fontana authored
also enable core/pagecache.cc in the AArch64 build. Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com>
-