- May 08, 2014
-
-
Jaspal Singh Dhillon authored
This patch changes the definition of __assert_fail() in api/assert.h which would allow it and other header files which include it (such as debug.hh) to be used in mgmt submodules. Fixes conflict with declaration of __assert_fail() in external/x64/glibc.bin/usr/include/assert.h Signed-off-by:
Jaspal Singh Dhillon <jaspal.iiith@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- May 07, 2014
-
-
Jani Kokkonen authored
the class construction of the page_table_root must happen before priority "mempool", or all the work in arch-setup will be destroyed by the class constructor. Problem noticed while working on the page fault handler for AArch64. Signed-off-by:
Jani Kokkonen <jani.kokkonen@huawei.com> Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- May 05, 2014
-
-
Tomasz Grabiec authored
The synchronizer allows any thread to block on it until it is unlocked. It is unlocked once count_down() has been called given number of times. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
The current tracepoint coverage does not handle all situations well. In particular: * it does not cover link layer devices other than virtio-net. This change fixes that by tracing in more abstract layers. * it records incoming packets at enqueue time, whereas sometimes it's better to trace at handling time. This can be very useful when correlating TCP state changes with incoming packets. New tracepoint was introduced for that: net_packet_handling. * it does not record protocol of the buffer. For non-ethernet protocols we should set appropriate protocol type when reconstructing ethernet frame when dumping to PCAP. We now have the following tracepoints: * net_packet_in - for incoming packets, enqueued or handled directly. * net_packet_out - for outgoing packets hitting link layer (not loopback). * net_packet_handling - for packets which have been queued and are now being handled. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- May 04, 2014
-
-
Tomasz Grabiec authored
Currently tracepoint's signature string is encoded into u64 which gives 8 character limit to the signature. When signature does not fit into that limit, only the first 8 characters are preserved. This patch fixes the problem by storing the signature as a C string of arbitrary length. Fixes #288. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- May 03, 2014
-
-
Gleb Natapov authored
Attempt to get read ARC buffer for a hole in a file results in temporary ARC buffer which is destroyed immediately after use. It means that mapping such buffer is impossible, it is unmapped before page fault handler return to application. The patch solves this by detecting that hole in a file is accessed and mapping special zero page instead. It is mapped as COW, so on write attempt new page is allocated. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Apr 29, 2014
-
-
Claudio Fontana authored
Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
This patch implements the sigsetjmp()/siglongjmp() functions. Fixes #241. sigsetjmp() and siglongjmp() are similar to setjmp() and longjmp(), except that they also save and restore the signals mask. Signals are hardly useful in OSv, so we don't necessarily need this signal mask feature, but we still want to implement these functions, if only so that applications which use them by default could run (see issue #241). Most of the code in this patch is from Musl 1.0.0, with a few small modifications - namely, call our sigprocmask() function instead a Linux syscall. Note I copied the x64 version of sigsetjmp.s only. Musl also has this file for ARM and other architectures. Interestingly we already had in our source tree, but didn't use, block.c, and this patch starts to use it. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Functions to be run when a thread finishes. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
While working with blocked signals and notifications, it would be good to test what's the current state of other thread's pending signal mask. That machinery exists in sched.cc but isn't exposed. This patch exposes that, together with a more convenient helper for when we are interested in the pointer itself, without dereferencing it. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Calle Wilund authored
Also, move platform dependent fast dispatch to platform arch code tree(s) The patching code is a bit more complex than would seem immediately (or even factually) neccesary. However, depending on cpu, there might be issues with trying to code patch across cache lines (unaligned). To be safe, we do it with the old 16-bit jmp + write + finish dance. [avi: fix up build.mk] Signed-off-by:
Calle Wilund <calle@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Apr 28, 2014
-
-
Avi Kivity authored
phys_ptr<T>: unique_ptr<> for physical memory make_phys_ptr(): allocate and initialize a phys_ptr<> make_phys_array(): allocate a phys_ptr<> referencing an array Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Apr 25, 2014
-
-
Tomasz Grabiec authored
There was no way to sniff packets going through OSv's loopback interface. I faced a need to debug in-guest TCP traffic. Packets are logged using tracing infrastructure. Packet data is serialized as sample data up to a limit, which is currently hardcoded to 128 bytes. To enable capturing of packets just enable tracepoints named: - net_packet_loopback - net_packet_eth Raw data can be seen in `trace list` output. Better presentation methods will be added in the following patches. This may also become useful when debugging network problems in the cloud, as we have no ability to run tcpdump on the host there. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Tracepoint argument which extends 'blob_tag' will be interpreted as a range of byte-sized values. Storage required to serialize such object is proportional to its size. I need it to implement storage-fiendly packet capturing using tracing layer. It could be also used to capture variable length strings. Current limit (50 chars) is too short for some paths passed to vfs calls. With variable-length encoding, we could set a more generous limit. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
We already have a facility that to indicate that a thread is a reclaimer and should be allowed to allocate reserve memory (since that memory will be used to free memory). Extend it to allow indicating that a particular code section is used to free memory, not the entire thread. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Gleb Natapov authored
Now vma_list_mutex is used to protect against races between ARC buffer mapping by MMU and eviction by ZFS. The problem is that MMU code calls into ZFS with vma_list_mutex held, so on that path all ZFS related locks are taken after vma_list_mutex. An attempt to acquire vma_list_mutex during ARC buffer eviction, while many of the same ZFS locks are already held, causes deadlock. It was solved by using trylock() and skipping an eviction if vma_list_mutex cannot be acquired, but it appears that some mmapped buffers are destroyed not during eviction, but after writeback and this destruction cannot be delayed. It calls for locking scheme redesign. This patch introduce arc_lock that have to be held during access to read_cache. It prevents simultaneous eviction and mapping. arc_lock should be the most inner lock held on any code path. Code is change to adhere to this rule. For that the patch replaces ARC_SHARED_BUF flag by new b_mmaped field. The reason is that access to b_flags field is guarded by hash_lock and it is impossible to guaranty same order between hash_lock and arc_lock on all code paths. Dropping the need for hash_lock is a nice solution. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Currently page_allocator return a page to a page mapper and the later populates a pte with it. Sometimes page allocation and pte population needs to be appear atomic though. For instance in case of a pagecache we want to prevent page eviction before pte is populated since page eviction clears pte, but if allocation and mapping is not atomic pte can be populated with stale data after eviction. With current approach very wide scoped lock is needed to guaranty atomicity. Moving pte population into page_allocator allows for much simpler locking. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Current code assumes that for the same file and same offset ZFS will always return same ARC buffer, but this appears to be not the case. ZFS may create new ARC buffer while an old one is undergoing writeback. It means that we need to track mapping between file/offset and mmapped ARC buffer by ourselves. It's exactly what this patch is about. It adds new kind of cached page that holds pointers to an ARC buffer and stores them in new read_cache map. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Unmap page as soon as possible instead of waiting for max_pages to accumulate. Will allow to free pages outside of vma_list_mutex in the feature. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
- Apr 24, 2014
-
-
Gleb Natapov authored
Write permission should not be granted to ptes that has no write permission because they are COW, but currently there is no way to distinguish between write protection due to vma permission and write protection due to COW. Use bit reserved for software use in pte as a marker for COW ptes and check it during permission changes. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Glauber Costa authored
The jemalloc memory allocator will make intense use of MADV_DONTNEED to flush pages it is no longer using. Respect that advice. Let's keep returning -1 for the remaining cases so we don't fool anybody Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
MongoDB wants it. In general, I am providing the information that is easy to get, and ignoring the ones which are not - with the exception of process count, that seemed easy enough to implement. This is the kind of thing Mongo does with it: 2014-04-15T09:54:12.322+0000 [clientcursormon] mem (MB) res:670160 virt:25212 2014-04-15T09:54:12.323+0000 [clientcursormon] mapped (incl journal view):160 2014-04-15T09:54:12.324+0000 [clientcursormon] connections:0 Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This is one of the statistics that shows up in /proc/self/stat under Linux, but this is generally interesting for applications. Since we don't have kernel mode and userspace mode, it is very hard to differentiate between "time spent in userspace" and "kernel time spent on behalf of the process". Therefore, we will present system time as always 0. If we wanted, we could at least differentiate clearly osv-specific threads as system time, but there is no need to go through the trouble now. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
It will be used for procfs compatibility. Applications may want to know how much memory is potentially available through mmap mappings (not necessarily allocated). Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Apr 22, 2014
-
-
Glauber Costa authored
While we take pride in having no spinlocks in the system, if an application wants to use them, who are we to deny them this god given right? Some applications will implement spinlocks through a pthread interface, which is what I implement here. We did not have any standard trylock mechanism, so one is provided. Other than that, the interface is pretty trivial except for the fact that it seems to provide some protection against deadlocks. We will just ignore that for the moment and assume a well behaved application. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
MongoDB expects that call and would like to guarantee allocation of blocks in the file. It does have a fallback, so for the time being I am just providing the symbol. I have opened Issue #265 to track this. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
The debug allocator can allocate non-contiguous memory for large requests, but since b7de9871 it uses only one sg entry for the entire buffer. One possible fix is to allocate contiguous memory even under the debug allocator, but in the future we may wish to allow discontiguous allocation when not enough contiguous space is available. So instead we implement a virt_to_phys() variant that takes a range, and outputs the physical segments that make it up, and use that to construct a minimal sg list depending on the input. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Normally, symbol binding in shared objects is lazy, using the PLTGOT mechanism. This means that a symbol is resolved only when first used. This is great because it speeds up object load, and also allows us never to implement symbols which aren't actually used in any real code path. However, as issue #256 shows, symbols which are used in DSOs from a preemption-disabled context cannot be resolved on first use, because symbol resolution may sleep. Two important examples of this are sched::thread::wait() and sched::thread::stop_wait(), both used by wait_until() while it is in preempt_disable. This patch adds the missing support for the standard DT_BIND_NOW tag. This tag can be added added to an object with the "-z now" ld option. When an object has this tag, all its symbols should be resolved on load time, instead of lazily (when first used). Bug #256 can be fixed by linking tst-mmap.so with "-z now" (this will be a separate patch). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Apr 20, 2014
-
-
Avi Kivity authored
The debug allocator can allocate non-contiguous memory for large requests, but since b7de9871 it uses only one sg entry for the entire buffer. One possible fix is to allocate contiguous memory even under the debug allocator, but in the future we may wish to allow discontiguous allocation when not enough contiguous space is available. So instead we implement a virt_to_phys() variant that takes a range, and outputs the physical segments that make it up, and use that to construct a minimal sg list depending on the input. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Apr 17, 2014
-
-
Calle Wilund authored
Per-cpu trace buffers. Actual buffer space is kept at roughly the "same" as previously, up to 4 vcpu. Above this used space will be higher. Does not handle vcpu:s appearing or disappearing in runtime. Trace events are allocated with a "not done" terminator marker, which is finalized when event is written, which should prevent any partial data messing up extraction. Fixes #146 Signed-off-by:
Calle Wilund <calle@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Apr 16, 2014
-
-
Claudio Fontana authored
also enable core/pagecache.cc in the AArch64 build. Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com>
-
Jani Kokkonen authored
Signed-off-by:
Jani Kokkonen <jani.kokkonen@huawei.com> [claudio: some fixes] Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com>
-
Jani Kokkonen authored
add the APIs to flush a single processor tlb or all tlbs in the cluster. Signed-off-by:
Jani Kokkonen <jani.kokkonen@huawei.com> Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com>
-
- Apr 15, 2014
-
-
Pawel Dziepak authored
Segment GNU_RELRO is used to inform dynamic linker which sections needs to be writable only when relocating ELF file and can be made read-only later. Usually GNU_RELRO overlaps with standard LOAD segment that contains readable and writable data but ELF file is generated in such way that it is possible to properly set per-page permissions. Signed-off-by:
Pawel Dziepak <pdziepak@quarnos.org> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pawel Dziepak authored
Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pawel Dziepak <pdziepak@quarnos.org> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-