- May 19, 2014
-
-
Glauber Costa authored
As Nadav pointed out during review, this macro could use a bit more work, to use a single parameter instead of one. That is what is done in this patch. Unfortunately just pasting __COUNTER__ doesn't work because of preprocessor rules, and we need some indirection to get it working. Also, visibility "hidden" can go because that is already implied by "static". The problem then becomes the fact that gcc does not really like unreferenced static variables, which is solved by the "used" attribute. From gcc docs about "used": "This attribute, attached to a variable with the static storage, means that the variable must be emitted even if it appears that the variable is not referenced." Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- May 18, 2014
-
-
Avi Kivity authored
Take the migration lock for pinned threads instead of a separate check whether they are pinned or not. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- May 16, 2014
-
-
Jani Kokkonen authored
implement fixup fault and the backtrace functionality which is its first simple user. Signed-off-by:
Jani Kokkonen <jani.kokkonen@huawei.com> [claudio: added elf changes to allow lookup and demangling to work] Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com>
-
Nadav Har'El authored
thread::current()->thread_clock() returns the CPU time consumed by this thread. A thread that wishes to measure the amount of CPU time consumed by some short section of code will want this clock to have high resolution, but in the existing code it was only updated on context switches, so shorter durations could not be measured with it. This patch fixes thread_clock() to also add the time that passed since the the time slice started. When running thread_clock() on *another* thread (not thread::current()), we still return a cpu time snapshot from the last context switch - even if the thread happens to be running now (on another CPU). Fixing that case is quite difficult (and will probably require additional memory-ordering guarantees), and anyway not very important: Usually we don't need a high-resolution estimate of a different thread's cpu time. Fixes #302. Reviewed-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Again, we are currently calling a function everytime we disable/enable preemption (actually a pair of functions), where simple mov instructions would do. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
We are heavily using this function to grab the address of the current thread. That means a function call will be issued every time that is done, where a simple mov instruction would do. For objects outside the main ELF, we don't want that to be inlined, since that would mean the resolution would have to go through an expensive __tls_get_addr. So what we do is that we don't present the symbol as inline for them, and make sure the symbol is always generated. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- May 15, 2014
-
-
Pawel Dziepak authored
This patch implements lockfree_queue (which is used as incoming_wakeup_queue) so that it doesn't need exchange or compare_exchange operations. The idea is to use a linked list but interleave actual objects stored in the queue with helper object (lockless_queue_helper) which are just pointer to the next element. Each object in the queue owns the helper that precedes it (and they are dequeued together) while the last helper, which does not precede any object is owned by the queue itself. When a new object is enqueued it gains ownership of the last helper in the queue in exchange of the helper it owned before which now becomes the new tail of the list. Unlike the original implementation this version of lockfree_queue really requires that there is no more than one concurrent producer and no more than one concurrent consumer. The results oftests/misc-ctxs on my test machine are as follows (the values are medians of five runs): before: colocated: 332 ns apart: 590 ns after: colocated: 313 ns apart: 558 ns Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pawel Dziepak <pdziepak@quarnos.org> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- May 14, 2014
-
-
Tomasz Grabiec authored
This introduces a simple timer-based sampling profiler which is reusing our tracing infrastructure to collect samples. To enable sampler from run.py run it like this: $ scripts/run.py ... --sampler [frequency] Where 'frequency' is an optional parameter for overriding sampling frequency. The default is 1000 (ticks per second). The bigger the frequency the bigger sampling overhead is. Too low values will hurt profile accuracy. Ad-hoc sampler enabling is planned. The code already takes that into account. To see the profile you need to extract the trace: $ trace extract And then show it like this: $ trace prof All 'prof' options can be applied, for example you can group by CPU: $ trace prof -g cpu Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Sampler will need to set and later restore value of this option. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Tracepoints can be enbaled not only via enable_tracepoint(std::string) but via tracepoint_base::enable() also. This change also makes the initialization thread-safe as it may be called from aribtrary thread. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
lookup_name_demangled() lookups a symbol name, demangle it, then snprintf onto preallocated buffer. Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Claudio Fontana authored
an effect of commit 9bbbe9dc is that no output is possible before prio 'console' initializers have been run. This change allows to have at least one API available really early (from boot code and premain). Document the requirements for the early console class regarding the write() method. Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- May 12, 2014
-
-
Glauber Costa authored
Export the shrinker interface to C users. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- May 07, 2014
-
-
Gleb Natapov authored
dentry object represents a directory, vnode represent a file, so it is better to use vnode in the page cache. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- May 06, 2014
-
-
Pawel Dziepak authored
There is already defined (but unused before this patch) function that extracts binding type from Elf_Sym::st_info. Signed-off-by:
Pawel Dziepak <pdziepak@quarnos.org> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Boqun Feng authored
-Werror=unused-function complains symbol_binding is unused, add an attribute of unused to mark this function unused. Signed-off-by:
Boqun Feng <boqun.feng@linux.vnet.ibm.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Boqun Feng authored
-Werror=sign-compare complains about when comparing (unsigned)level with page_mapper.nr_page_sizes(). Since nr_page_sizes() is meaningful only when it's non-negative and the mmu::nr_page_sizes is unsigned, changing the return types of all nr_page_sizes functions to unsigned is reasonable. Signed-off-by:
Boqun Feng <boqun.feng@linux.vnet.ibm.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- May 05, 2014
-
-
Tomasz Grabiec authored
The current tracepoint coverage does not handle all situations well. In particular: * it does not cover link layer devices other than virtio-net. This change fixes that by tracing in more abstract layers. * it records incoming packets at enqueue time, whereas sometimes it's better to trace at handling time. This can be very useful when correlating TCP state changes with incoming packets. New tracepoint was introduced for that: net_packet_handling. * it does not record protocol of the buffer. For non-ethernet protocols we should set appropriate protocol type when reconstructing ethernet frame when dumping to PCAP. We now have the following tracepoints: * net_packet_in - for incoming packets, enqueued or handled directly. * net_packet_out - for outgoing packets hitting link layer (not loopback). * net_packet_handling - for packets which have been queued and are now being handled. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- May 03, 2014
-
-
Gleb Natapov authored
Attempt to get read ARC buffer for a hole in a file results in temporary ARC buffer which is destroyed immediately after use. It means that mapping such buffer is impossible, it is unmapped before page fault handler return to application. The patch solves this by detecting that hole in a file is accessed and mapping special zero page instead. It is mapped as COW, so on write attempt new page is allocated. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- May 01, 2014
-
-
Takuya ASADA authored
Current OSv implementation suppress most of output when it's not verbose mode. It may speed up bootup speed, but supress printing out IP address is inconvenient for most of users. So I added infof(), which acts like debugf() but print out string even not on verbose mode, and use it in dhcp.cc to print out IP address. Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Apr 29, 2014
-
-
Glauber Costa authored
Functions to be run when a thread finishes. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
While working with blocked signals and notifications, it would be good to test what's the current state of other thread's pending signal mask. That machinery exists in sched.cc but isn't exposed. This patch exposes that, together with a more convenient helper for when we are interested in the pointer itself, without dereferencing it. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Calle Wilund authored
Also, move platform dependent fast dispatch to platform arch code tree(s) The patching code is a bit more complex than would seem immediately (or even factually) neccesary. However, depending on cpu, there might be issues with trying to code patch across cache lines (unaligned). To be safe, we do it with the old 16-bit jmp + write + finish dance. [avi: fix up build.mk] Signed-off-by:
Calle Wilund <calle@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Apr 28, 2014
-
-
Nadav Har'El authored
When a small allocation is requested with large alignment, we ignored the alignment, and as a consequence posix_memalign() or alloc_phys_contiguous_aligned() could crash when it failed to achieve the desired alignment. This is not a common case (usually, size >= alignment, and the new C11 aligned_alloc() even supports only this case), but still it might happen, and we saw it in cloudius-systems/capstan#75. When size < alignment, this patch changes the size so we can achieve the desired alignment. For small alignments, this means setting size=alignment, so for example to get an alignment of 1024 bytes we need at least 1024-byte allocation. This is a waste of memory, but as these allocations are rare, we expect this to be acceptable. For large alignments, e.g., alignment=8192, we don't need size=alignment but we do need size to be large enough so we'll use malloc_large() (malloc_large() already supports arbitrarily large alignments). This patch also adds test cases to tst-align.so to test alignments larger than the desired size. Fixes #271 and cloudius-systems/capstan#75. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Apr 25, 2014
-
-
Tomasz Grabiec authored
There was no way to sniff packets going through OSv's loopback interface. I faced a need to debug in-guest TCP traffic. Packets are logged using tracing infrastructure. Packet data is serialized as sample data up to a limit, which is currently hardcoded to 128 bytes. To enable capturing of packets just enable tracepoints named: - net_packet_loopback - net_packet_eth Raw data can be seen in `trace list` output. Better presentation methods will be added in the following patches. This may also become useful when debugging network problems in the cloud, as we have no ability to run tcpdump on the host there. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
If the rcu threads need memory, let them have it, since they will use it to free even more memory. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
malloc() must wait for memory, and since page table operations can allocate memory, it must be able to dip into the reserve pool. free() should indicate it is a reclaimer. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
We already have a facility that to indicate that a thread is a reclaimer and should be allowed to allocate reserve memory (since that memory will be used to free memory). Extend it to allow indicating that a particular code section is used to free memory, not the entire thread. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
After the previous patches, when we try to run an executable we cannot read (e.g., a directory - see issue #94), a "struct error" exception will be thrown out of osv::run, and nobody will catch it so the user will see a somewhat-puzzling "uncaught exception" error. With this patch, we catch the read error exception inside osv::run(), and when it happens, just return a normal load failure (nullptr). E.g, now trying to run a directory will result in a normal failure: $ scripts/run.py -e / OSv v0.07-39-g03feb99 run_main(): cannot execute /. Powering off. Fixes #94. The osv::run() API currently (before this patch, and also after it) doesn't have any way to say *why* the loading failed - it could have been that the executable was a directory, that it was not an ELF shared object, that it was a shared object and didn't have a main - in all cases the return value is nullptr. In the future this should probably change. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Gleb Natapov authored
No longer used. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Now vma_list_mutex is used to protect against races between ARC buffer mapping by MMU and eviction by ZFS. The problem is that MMU code calls into ZFS with vma_list_mutex held, so on that path all ZFS related locks are taken after vma_list_mutex. An attempt to acquire vma_list_mutex during ARC buffer eviction, while many of the same ZFS locks are already held, causes deadlock. It was solved by using trylock() and skipping an eviction if vma_list_mutex cannot be acquired, but it appears that some mmapped buffers are destroyed not during eviction, but after writeback and this destruction cannot be delayed. It calls for locking scheme redesign. This patch introduce arc_lock that have to be held during access to read_cache. It prevents simultaneous eviction and mapping. arc_lock should be the most inner lock held on any code path. Code is change to adhere to this rule. For that the patch replaces ARC_SHARED_BUF flag by new b_mmaped field. The reason is that access to b_flags field is guarded by hash_lock and it is impossible to guaranty same order between hash_lock and arc_lock on all code paths. Dropping the need for hash_lock is a nice solution. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Currently page_allocator return a page to a page mapper and the later populates a pte with it. Sometimes page allocation and pte population needs to be appear atomic though. For instance in case of a pagecache we want to prevent page eviction before pte is populated since page eviction clears pte, but if allocation and mapping is not atomic pte can be populated with stale data after eviction. With current approach very wide scoped lock is needed to guaranty atomicity. Moving pte population into page_allocator allows for much simpler locking. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Current code assumes that for the same file and same offset ZFS will always return same ARC buffer, but this appears to be not the case. ZFS may create new ARC buffer while an old one is undergoing writeback. It means that we need to track mapping between file/offset and mmapped ARC buffer by ourselves. It's exactly what this patch is about. It adds new kind of cached page that holds pointers to an ARC buffer and stores them in new read_cache map. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
All pagecache functions run under vma_list_lock, so no additional locking is needed. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Unmap page as soon as possible instead of waiting for max_pages to accumulate. Will allow to free pages outside of vma_list_mutex in the feature. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Useful for debugging cache related problems. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-