- Oct 03, 2013
-
-
Avi Kivity authored
Some structures are duplicated; move the duplicates to a common header <netinet/__in.h>. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Tested-By:
Benoit Canet <benoit@irqsave.net>
-
Avi Kivity authored
Some structures are duplicated; deduplicate them. A few are source-compatible but not binary-compatible; use the ones from <bits/socket.h>. Others are both source- and binary- compatible; put them in a new header <sys/__socket.h> which is included from both. Work around a problem with the byteorder functions/macros. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Tested-By:
Benoit Canet <benoit@irqsave.net>
-
- Sep 30, 2013
-
-
Benoît Canet authored
Signed-off-by:
Benoit Canet <benoit@irqsave.net> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Sep 29, 2013
-
-
Nadav Har'El authored
Add a comment to condvar explaining that it makes a guarantee that POSIX Threads' condition variables do not - that there are no spurious wakeups. The comment goes on to explain that at least in one place (semaphore.cc) we make use of this added guarantee, so if the condition variable implementation is ever rewritten, we'll need to keep this guarantee. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com>
-
Nadav Har'El authored
Change the name of the argument of wake_with from "Pred" to "Action". This argument is a function to run, *not* a predicate (it's not supposed to return a boolean value), so it doesn't make sense to call it Pred. The implementation of wake_with already used the name "Action" - this patch fixes the prototype too. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com>
-
Avi Kivity authored
Also fix other review comments related to 1f161695. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Sep 26, 2013
-
-
Raphael S. Carvalho authored
Add VOP_LINK to the VFS interface in preparation for sys_link(). Signed-off-by:
Raphael S. Carvalho <raphael.scarv@gmail.com> [ penberg: split to separate commit ] Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Sep 25, 2013
-
-
Nadav Har'El authored
ELF allows specifying initializers - functions to be run after loading a a shared object, in DT_INIT_ARRAY, and also finalizers - functions to be run before unloading a shared objects, in DT_FINI_ARRAY. The existing code ran the initializers, but forgot to run the finalizers, and this patch fixes this oversight. This fix is necessary for destructors of static objects defined in the shared object. But this fix is not sufficient for C++ destructors - see also the next patch. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com>
-
- Sep 24, 2013
-
-
Nadav Har'El authored
Our poll_wake() code ignored calls with the POLLHUP event, because the user did not explicitly ask for this event. This causes a poll() waiting on read from a pipe whose write side closes not to wake up. This patch adds a test for this case in tst-pipe.cc, and fixes the bug by adding to the poll structure's _events also ~POLL_REQUESTABLE, i.e., any bits which do not have to be explicitly requested by the user (POLL_REQUESTABLE is a new macro defined in this patch). After this patch, poll() wakes as needed in the test (instead of just hang), but returns the wrong event because of another bug which will be fixed in a separate patch. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com>
-
- Sep 20, 2013
-
-
Nadav Har'El authored
Some trivial comment cleanup and line-breaks in queue-mpsc.hh. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com>
-
- Sep 15, 2013
-
-
Nadav Har'El authored
Added Cloudius copyright statement to our own code in include/. Also added include/api/LICENSE saying that these are copied from Musl and public domain (according to the Musl COPYRIGHT file).
-
- Sep 12, 2013
-
-
Avi Kivity authored
Command line option: --trace-backtraces
-
Dmitry Fleytman authored
-
- Sep 11, 2013
-
-
Nadav Har'El authored
Added a new function, osv::reboot() (declared in <osv/power.hh>) for rebooting the VM. Also added a Java interface - com.cloudius.util.Power.reboot(). NOTE: Power.java and/or jni/power.cc also need to be copied into the mgmt submodule.
-
Avi Kivity authored
Statically allocated mutexes are very common. Make the mutex constructor constexpr to ensure that a statically allocated mutex is initialized before use, even if that use is from static constructors.
-
- Sep 10, 2013
-
-
Pekka Enberg authored
Commit 3510a5ea ("mmu: File-backed VMAs") forgot to fix vma::split() to take file-backed mappings into account. Fix the problem by making vma::split() a virtual function and implementing it separately for file_vma. Spotted by Avi Kivity.
-
- Sep 08, 2013
-
-
Guy Zana authored
-
- Sep 05, 2013
-
-
Glauber Costa authored
This code, living in device.c for maximum generality, will read the partition table from any disk that calls it. Ideally, each new device would have its own private data. But that would mean having to callback to the driver to set each of the partitions up. Therefore, I found it easier to convention that all partitions in the same drive have the same private data. This makes some sense if we consider that the hypervisors are usually agnostic about partitions, and all of the addressing and communications go through a single entry point, which is the disk.
-
Glauber Costa authored
To support multiple partitions to a disk, I found it easier to add a post-processing offset calculation to the bio just before calling the strategy. The reason is, we have many (really many) entry points for bio preparation (pre-strategy) and only two entry points for the strategy itself (the drivers). Since multiplex_strategy is a good thing to be used even for virtio (although I am not converting it now), since it allows for arbitrary sized requests, we could very well reduce it to just one. At this moment, the offset is always 0 and everything works as before.
-
Glauber Costa authored
This patch implement the HPET clock driver, that should work as a fallback for both Xen and KVM, in case the paravirtual clock is not present. This is unfortunately the situation for all HVM guests running on EC2, so support for this is paramount. I have tested on KVM forcing the kvmclock to disappear, and it seems to work all right.
-
Glauber Costa authored
Right now we are doing it right before we parse the MADT, but this is by far not MADT specific. Other users are planned, and the best way to resolve the disputes is to have it in a separate constructor
-
- Sep 03, 2013
-
-
Avi Kivity authored
In an attempt to be clever, we define irq_lock as an object in an anonymous namespace, so that each translation unit gets its own copy, which is then optimized away, since the object is never touched. But the compiler complains that the object is defined but not used if we include the file but don't use irq_lock. Simplify by only declaring the object there, and defining it somewhere else.
-
- Sep 02, 2013
-
-
Pekka Enberg authored
This adds simple msync() implementation for file-backed memory maps. It uses the newly added 'file_vma' data structure to write out and fsync the msync'd region as suggested by Avi Kivity.
-
Pekka Enberg authored
Add a new 'file_vma' class that extends 'vma'. This is needed to keep track of fileref and offset for file-backed VMAs for msync().
-
Avi Kivity authored
Spotted by Pekka.
-
Avi Kivity authored
Different source bases have different error conventions; libc has 0/-1+errno, while the rest os the source base uses 0/error. Wrap errors in a class to prevent confusion between the two.
-
- Aug 29, 2013
-
-
Avi Kivity authored
This is used for temporarily dropping a lock in a lexical scope, and reacquiring it after an exit from the scope (similar to wait_until(mutex), but without the waiting): WITH_LOCK(preempt_lock) { // do some stuff while (not enough resources) { DROP_LOCK(preempt_lock) { acquire more resources } // reload anything that may have changed after DROP_LOCK() } // do more stuff with the acquired resources } Note that DROP_LOCK() doesn't work will with recursively-taken locks.
-
Avi Kivity authored
We don't want the compiler moving reads after a possible rcu_defer().
-
- Aug 27, 2013
-
-
Nadav Har'El authored
Commit 65afd075 fixed mincore() to recognize unmapped addresses. However, it used mmu::ismapped() which just checks for mmap()'ed addresses, and doesn't know about malloc()ed memory. This causes trouble for libunwind (which we use for backtrace()) which tests mincore() on an on-stack variable, and for non-pthread threads, this stack might be malloc'ed, not mmap'ed. So this patch adds mmu::isreadable(), which checks that a given memory range is all readable (this memory can be mmapped, malloced, stack, whatever). mincore() now uses that. mmu::isreadable() is implemented, following Avi's idea, by trying to read, with safe_load(), one byte from every page in the range. This approach is faster than page-table-walking especially for one-byte checks (which all libunwind uses anyway), and also very simple.
-
Nadav Har'El authored
Commit 65afd075 that fixed mincore() exposed a deadlock in the leak detector, caused by two threads taking two locks in opposite order: Thread 1: malloc() does alloc_tracker::remember(). This takes the tracker lock and calls backtrace() calling mincore() which takes the vma_list_mutex. Thread 2: mmap() does mmu::allocate() which takes the vma_list_mutex and then through mmu::populate::small_page calls memory::alloc_page() which calls alloc_tracker::remember() and takes the tracker lock. This patch fixes this deadlock: alloc_tracker::remember() will now drop its lock while running backtrace(), as the lock is only needed to protect the allocations[] array. We need to retake the lock after backtrace() completes, to copy the backtrace back to the allocations[] array. Previously, the lock's depth was also (ab)used for avoiding nested allocation tracking (e.g., tracking of memory allocation done inside backtrace() itself), but now that backtrace() is run without the lock, we need a different mechanism - a per-thread "in_tracker" flag, which is turned on inside the alloc_tracker::remember()/forget() methods.
-
- Aug 26, 2013
-
-
Nadav Har'El authored
sched.hh included elf.hh, just so it can refer to the elf::tls_data type. But now that we have rcu.hh which includes sched.hh and therefore elf.hh, if we wish to use rcu in elf.hh (we'll do this in a later patch), we have an include loop mess. So better not include elf.hh from sched.hh, and just declare the one struct we need. After sched.hh no longer includes elf.hh and the dozen includes that it further included, we need to add missing includes to some of the code that included sched.hh and relied on its implict includes.
-
- Aug 18, 2013
-
-
Avi Kivity authored
Following 71fec998, we note that if any bit in the wakeup mask is set, then an IPI to that cpu is either imminent or already in flight, and we can elide our own IPI to that cpu.
-
- Aug 16, 2013
-
-
Pekka Enberg authored
Avoid sending an IPI to a CPU that's already being woken up by another IPI. This reduces IPIs by 17% for a cassandra-stress run. Execution time is obviously unaffected because execution is bound by lock contention. Before: [penberg@localhost ~]$ sudo perf kvm stat -e kvm:* -p `pidof qemu-system-x86_64` ^C Performance counter stats for process id '610': 6,909,333 kvm:kvm_entry 0 kvm:kvm_hypercall 0 kvm:kvm_hv_hypercall 1,035,125 kvm:kvm_pio 0 kvm:kvm_cpuid 5,149,393 kvm:kvm_apic 6,909,369 kvm:kvm_exit 2,108,440 kvm:kvm_inj_virq 0 kvm:kvm_inj_exception 982 kvm:kvm_page_fault 2,783,005 kvm:kvm_msr 0 kvm:kvm_cr 7,354 kvm:kvm_pic_set_irq 2,366,388 kvm:kvm_apic_ipi 2,468,569 kvm:kvm_apic_accept_irq 2,067,044 kvm:kvm_eoi 1,982,000 kvm:kvm_pv_eoi 0 kvm:kvm_nested_vmrun 0 kvm:kvm_nested_intercepts 0 kvm:kvm_nested_vmexit 0 kvm:kvm_nested_vmexit_inject 0 kvm:kvm_nested_intr_vmexit 0 kvm:kvm_invlpga 0 kvm:kvm_skinit 3,677 kvm:kvm_emulate_insn 0 kvm:vcpu_match_mmio 0 kvm:kvm_update_master_clock 0 kvm:kvm_track_tsc 7,354 kvm:kvm_userspace_exit 7,354 kvm:kvm_set_irq 7,354 kvm:kvm_ioapic_set_irq 674 kvm:kvm_msi_set_irq 0 kvm:kvm_ack_irq 0 kvm:kvm_mmio 609,915 kvm:kvm_fpu 0 kvm:kvm_age_page 0 kvm:kvm_try_async_get_page 0 kvm:kvm_async_pf_doublefault 0 kvm:kvm_async_pf_not_present 0 kvm:kvm_async_pf_ready 0 kvm:kvm_async_pf_completed 81.180469772 seconds time elapsed After: [penberg@localhost ~]$ sudo perf kvm stat -e kvm:* -p `pidof qemu-system-x86_64` ^C Performance counter stats for process id '30824': 6,411,175 kvm:kvm_entry [100.00%] 0 kvm:kvm_hypercall [100.00%] 0 kvm:kvm_hv_hypercall [100.00%] 992,454 kvm:kvm_pio [100.00%] 0 kvm:kvm_cpuid [100.00%] 4,300,001 kvm:kvm_apic [100.00%] 6,411,133 kvm:kvm_exit [100.00%] 2,055,189 kvm:kvm_inj_virq [100.00%] 0 kvm:kvm_inj_exception [100.00%] 9,760 kvm:kvm_page_fault [100.00%] 2,356,260 kvm:kvm_msr [100.00%] 0 kvm:kvm_cr [100.00%] 3,354 kvm:kvm_pic_set_irq [100.00%] 1,943,731 kvm:kvm_apic_ipi [100.00%] 2,047,024 kvm:kvm_apic_accept_irq [100.00%] 2,019,044 kvm:kvm_eoi [100.00%] 1,949,821 kvm:kvm_pv_eoi [100.00%] 0 kvm:kvm_nested_vmrun [100.00%] 0 kvm:kvm_nested_intercepts [100.00%] 0 kvm:kvm_nested_vmexit [100.00%] 0 kvm:kvm_nested_vmexit_inject [100.00%] 0 kvm:kvm_nested_intr_vmexit [100.00%] 0 kvm:kvm_invlpga [100.00%] 0 kvm:kvm_skinit [100.00%] 1,677 kvm:kvm_emulate_insn [100.00%] 0 kvm:vcpu_match_mmio [100.00%] 0 kvm:kvm_update_master_clock [100.00%] 0 kvm:kvm_track_tsc [100.00%] 3,354 kvm:kvm_userspace_exit [100.00%] 3,354 kvm:kvm_set_irq [100.00%] 3,354 kvm:kvm_ioapic_set_irq [100.00%] 927 kvm:kvm_msi_set_irq [100.00%] 0 kvm:kvm_ack_irq [100.00%] 0 kvm:kvm_mmio [100.00%] 620,278 kvm:kvm_fpu [100.00%] 0 kvm:kvm_age_page [100.00%] 0 kvm:kvm_try_async_get_page [100.00%] 0 kvm:kvm_async_pf_doublefault [100.00%] 0 kvm:kvm_async_pf_not_present [100.00%] 0 kvm:kvm_async_pf_ready [100.00%] 0 kvm:kvm_async_pf_completed 79.947992238 seconds time elapsed
-
Christoph Hellwig authored
We'll need this for any pathname related actions.
-
Christoph Hellwig authored
Create a new dentry structure for pathname components, following the Linux VFS model. The vnodes are left-as is for now but are always fronted by dentries for pathname lookups. In a second step they will be moved to use non-pathname indices. [penberg: fix open(O_CREAT|O_EXCL) breakage ]
-
Christoph Hellwig authored
-
- Aug 14, 2013
-
-
Pekka Enberg authored
As suggested by Avi, RCU-protect tracepoint_base::probes to make sure probes are really stopped before the caller accesses the collected traces.
-
Avi Kivity authored
-
- Aug 13, 2013
-
-
Glauber Costa authored
I am proposing, with this patch, a very simple alternative system to serve as a basis for xen pv operations. The end goal is to patch the performance critical instructions in, but I will defer it until later since this is a performance optmization. Let's get that working first. However, I figured that if we are already writing the xen pv code enclosed in some kind of macro, then when we do patch, we won't have to change anything. That said, I don't expect to have a lot of pure pv users - It is 2013, and even VMWare discontinued their vmi, leaving Xen as the only relevant player. We don't need, then, a fully featured core-pv ops like Linux. This system of alternatives is simple enough to accomodate xen, and it works by providing two code blocks and a condition. The first block is executed if the condition is false, and the second if the condition is true. For future reference, note that we can use when patching by doing something very similar to Linux jump labels: we replace the branch with a jump instruction that just jumps to the right place (taken or not-taken part). This brings simplicity and runtime efficiency at the expense of a little bit more icache pressure.
-
Glauber Costa authored
This is used by subr_disk during bio flush operation
-