- Dec 11, 2013
-
-
Nadav Har'El authored
Once page_fault() checks that this is not a fast fixup (see safe_load()), we reach the page-fault slow path, which needs to allocate memory or even read from disk, and might sleep. If we ever get such a slow page-fault inside kernel code which has preemption or interrupts disabled, this is a serious bug, because the code in question thinks it cannot sleep. So this patch adds two assertions to verify this. The preemptable() assertion is easily triggered if stacks are demand-paged as explained in commit 41efdc1c (I have a patch to solve this, but it won't fit in the margin). However, I've also seen this assertion without demand-paged stacks, when running all tests together through testrunner.so. So I'm hoping these assertions will be helpful in hunting down some elusive bugs we still have. This patch adds a third use of the "0x200" constant (the nineth bit of the rflags register is the interrupt flag), so it replaces them by a new symbolic name, processor::rflags_if. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 10, 2013
-
-
Avi Kivity authored
One problem with wake() is, if the thread that it is waking can cuncurrently exit, that it may touch freed memory belonging to the thread structure. Fix by separating the state that wake() touches into a detached_state structure, and free that using rcu. Add a thread_handle class that references only this detached state, and accesses it via rcu. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 08, 2013
-
-
Glauber Costa authored
It seems that we also had problems with our own code for int vs long issues. I am really surprised that the C++ compiler didn't throw any warnings for this since all word sizes are quite explicit. In any case, this seems to be the missing piece for xen booting with many CPUs. It boots fine now with up to 32 CPUs. After that, other problems start to appear. Fixes #113 Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 05, 2013
-
-
Glauber Costa authored
no users in tree. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Our APIC code is so wrong, but so wrong, that it even produce incorrect results. X2APIC is fine, but XAPIC is using xapic::ipi() for all its interrupts. The problem with that, is that the costumary place for "vector" is inverted in the case of allbutself delivery mode, and therefore, we're sending these IPIs to God Knows Where - not to the processors, that is for sure. As a result, we would spin waiting for IRQ acks that would never arrive. I could invert and reorganize the parameters and comment this out, but I've decided it is a lot clearer just to open code it. Also, there is no need at all to set ICR2 for allbutself, because the destination is already embedded in the firing mode. One issue: NMI is copied over because it is also wrong by the same reasons, so I fixed. But I don't have a test case for this. Fixes #110 Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 04, 2013
-
-
Glauber Costa authored
This patch fixes the Amazon crash many CPUs. Funny enough, it is not actually related to the number of CPUs. In that situation, the port numbers allocated for the event channels are quite high. I don't really know the reason for that, maybe the Hypervisor reserves the small bits for CPU related things... As we have more and more CPUs and the bits shift more and more rightwards, they eventually reach the second long word of the event channel quadword. But we have been operating this with brsl, which will only reach a long word. If bit 32, for instance, is 1, it will be interpreted as bit 0 == 1. Bit 0 having no registered handler, that will turn into a nullptr access. Cheers for Dima for doing most of the debugging and heavy lifting here. The hang issues are still present. Fixes #109. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Dmitry Fleytman <dmitry@daynix.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Dan Schatzberg authored
The SMP bringup code uses a linked list of stacks. The APs are all brought up concurrently and so to acquire a stack, they compare and swap on smp_stack_free. The current code works as long as the compare and swap succeeds, if it fails, then smp_stack_free MUST be read again otherwise the AP will be deadlocked. This patch fixes the current code by enforcing a re-read on a cmpxchg failure. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Dan Schatzberg <schatzberg.dan@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 03, 2013
-
-
Glauber Costa authored
We had at least one report for a system in which the __mm_setcsr function we were using is not found. Since this is heavily arch specific anyway, we should just use inline assembly to do so. Generated code verified to be the same: 33c6e7: c7 45 e0 80 1f 00 00 movl $0x1f80,-0x20(%rbp) 33c6ee: 0f ae 55 e0 ldmxcsr -0x20(%rbp) Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 01, 2013
-
-
Nadav Har'El authored
During boot, between the time main() set the IDT and when later smp_launch() is called, the IDT doesn't actually work correctly. The problem is that we use the separate stacks feature (IST), and that doesn't work without also setting the GDT, not only the IDT. So use init_on_cpu() to initialize not only the IDT, but other stuff as well. Fix smp_launch() not to repeat this initialization on the boot CPU, as it was already done. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Nov 26, 2013
-
-
Nadav Har'El authored
This patch resolves issue #26. As you can see with "objdump -h build/release/loader.elf", our executable had over a thousand (!) separate sections, most of them should really be merged. We already started doing this in arch/x64/loader.ld, but didn't complete the work. This patch merges all the ".gcc_except_table.*" sections into one, and all the ".data.rel.ro.*" sections into one. After this merge, we are left with just 52 sections, instead of more than 1000. The default linker script (run "ld --verbose" to see it) also does similar merges, so there's no reason why we shouldn't. By reducing the number of ELF sections (each comes with a name, headers, etc.), this patch also reduces the size of our loader-stripped.elf by about 140K. Fixes #26. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Dmitry Fleytman authored
Bug fixed by this patch made OSv crash on Xen during boot. The problem started to show up after commit: commit ed808267 Author: Nadav Har'El <nyh@cloudius-systems.com> Date: Mon Nov 18 23:01:09 2013 +0200 percpu: Reduce size of .percpu section Signed-off-by:
Dmitry Fleytman <dmitry@daynix.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 21, 2013
-
-
Nadav Har'El authored
prio.hh defines various initialization priorities. The actual numbers don't matter, just the order between them. But when we add too many priorities between existing ones, we may hit a need to renumber. This is plain ugly, and reminds me of Basic programming ;-) So this patch switches to an enum (enum class, actually). We now just have a list of priority names in order, with no numbers. It would have been straightforward, if it weren't for a bug in GCC (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59211 ) where the "init_priority" attribute doesn't accept the enum (while the "constructor" attribute does). Luckily, a simple workaround - explicitly casting to int - works. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 19, 2013
-
-
Nadav Har'El authored
This patch reduces the size of the .percpu section 64-fold from about 5 MB to 70 KB, and solves issue #95. The ".percpu" section is part of the .data section of our executable (loader-stripped.elf). In our 15 MB executable, roughly 7 MB is text (code), and 7 MB is data, and out of that, a whopping 5 MB is the ".percpu" section. The executable is read in real mode, and this is especially slow on Amazon EC2, hence our wish to make the executable as small as possible. The percpu section starts with all the PERCPU variables defined in the program. We have about 70 KB of those, and believe it or not, most of this 70 KB is just a single variable, the 65K dynamic_percpu_buffer (see percpu.cc). But then, we need a copy of these variables for each CPU. The unpatched code duplicated this 70KB section 64 times in the executable file (!), and then used these memory locations for up-to-64 cpus. But there is no reason to duplicate this data in the executable! All we need to do is to dynamically allocate a copy of this section for each CPU, and this is what this patch does. This patch removes about 5 MB from our executable: After this patch, our loader-stripped.elf is just 9.7 MB, and its data section's size is just 2.8 MB. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 12, 2013
-
-
Pekka Enberg authored
Make sure stack pointer is 16-byte aligned in fault handler as required by x86-64 ABI. This is needed for the page fault handler to be able to use stack for FPU state save/restore. Spotted by Nadav. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 11, 2013
-
-
Avi Kivity authored
For an unknown reason, the current calculation of .tls_template_size yields 0x10 instead of the correct value. This results in part of the initial tls block being freed by arch::setup(), and subsequent corruption. Fix by switching to the ld SIZEOF() operator. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 07, 2013
-
-
Pekka Enberg authored
Replace magic numbers with constants for CR0 and CR4 control register values in arch/x86/boot.S. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 04, 2013
-
-
Dmitry Fleytman authored
slop made page size by default because this is the most frequent case Signed-off-by:
Dmitry Fleytman <dmitry@daynix.com>
-
Glauber Costa authored
If we do, we'll have extra xen threads running around for no reason in the !xen case. is_xen() should already work here since the initializers are run after cpuid discovery, which is when the information for is_xen is filled up. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Ironically, my commit 0c77d3c2 broke xen_features initialization. This patch fixes it. Reported-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphael.scarv@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 01, 2013
-
-
Raphael S. Carvalho authored
The code wouldn't work if XENFEAT_NR_SUBMAPS > 1. It's currently 1. The assignment to xen_features must take i as well as j into consideration. Signed-off-by:
Raphael S. Carvalho <raphael.scarv@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Oct 30, 2013
-
-
Avi Kivity authored
The TLS segment is a little wierd in that it grows backwards from percpu_base instead of forwards. This causes the alignment code to calculate wrong offsets when the segment size is 8 (mod 16). A failure was seen where ::percpu_base was set at offset 0xfffffffffffffa08 in code that was in the same translation unit as ::percpu_base, and 0xfffffffffffffa10 elsewhere. This caused all dynamic_percpu instances to crash. Fix by aligning the segment size. For good measure, align also the segment base, both to a cacheline boundary. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Oct 28, 2013
-
-
Avi Kivity authored
The expression *p >> 24, where p is an unsigned*, is optimized by the compiler to *((u8*)p + 3) - reading the most significant byte only and dropping the shift. When this optimization is applied to reading the APIC ID, QEMU returns zero for all processors, since the manual requires reading entire words. Fix by using a volatile pointer, disabling the optimization. Note that QEMU is technically correct here though it violates all known real x86 implementations. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
xapic::init_ipi() shifts apic_id by 24, unaware that xapic::ipi will do it again. The result is that the boot processor is reset instead of the auxiliary processor. Remove the extraneous shift. Found by booting with QEMU without kvm. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Oct 24, 2013
-
-
Pekka Enberg authored
Spotted by Clang: ../../arch/x64/arch-cpu.hh:57:1: error: 'arch_thread' defined as a struct here but previously declared as a class [-Werror,-Wmismatched-tags] struct arch_thread { ^ ../../arch/x64/arch-cpu.hh:37:1: note: did you mean struct here? class arch_thread; ^~~~~ struct Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Spotted by Clang: ../../include/sched.hh:278:12: error: class 'arch_cpu' was previously declared as a struct [-Werror,-Wmismatched-tags] friend class arch_cpu; ^ ../../arch/x64/arch-cpu.hh:39:8: note: previous use is here struct arch_cpu { ^ ../../include/sched.hh:278:12: note: did you mean struct here? friend class arch_cpu; ^~~~~ struct Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Oct 23, 2013
-
-
Nadav Har'El authored
As noticed by Tomek in issue #64, unhandled C++ exceptions cause OSv to silently hang, in an endless loop inside the unwinding code. So this patch fixes the wrong CFI (DWARF Call Frame Information) which caused the unwinder to loop. We just had a single line of assembly missing: The topmost frame - the thread's main function - needs to undefine the saved %rip to prevent going further back. If we don't do that, gdb will end every "bt" output with a warning "Frame did not save its PC" (but hey, nobody complained... ;-)), and the unwinding library, will, unfortunately, go into an endless loop as seen in issue #64. With this one-line patch, unhandled exceptions now work as expected - they abort with a message like: terminate called after throwing an instance of 'int' Aborted And attaching a debugger you can see exactly where the offending throw came from (i.e., the stack does *not* unnecessarily unwind when there's nobody waiting to catch the exception). This works for uncaught exceptions anywhere - including inside main() and from constructors when loading the object (before running main()). "bt" in gdb also no longer ends each stack trace with an error message. The last frame it shows is "thread_main()". Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com>
-
- Oct 22, 2013
-
-
Pekka Enberg authored
The debug() call can deadlock because it's using boost format. Switch to debug_ll(). Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
The debug() format string is missing a newline. Fix that up. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
This is a workaround for linker error when compiling with -O0 `.text._Z9safe_loadIcEbPKT_RS0_' referenced in section `.text.fixup' of core/mmu.o: defined in discarded section `.text._Z9safe_loadIcEbPKT_RS0_[_Z9safe_loadIcEbPKT_RS0_]' of core/mmu.o The safe_load() template is used in both runtime.cc and core/mmu.cc but the linker keeps it only in one section discarding the other. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com>
-
- Oct 16, 2013
-
-
Pekka Enberg authored
Dump registers on general protection fault for debugging purposes. Even if you have gdb available, getting to the exception frame is not always possible after OSv has crashed. Example output looks as follows: registers: RIP: 0x0000100000b7e913 RFL: 0x0000000000010202 CS: 0x0000000000000008 SS: 0x0000000000000010 RAX: 0xffffc000418ed278 RBX: 0xffffc00041b2c050 RCX: 0x0000000000000004 RDX: 0x0000000000000000 RSI: 0x0000000000000001 RDI: 0x43e0000000000000 RBP: 0x0000200008548d10 R8: 0xffffc000426e3010 R9: 0x0000000000000004 R10: 0x43e0000000000000 R11: 0xffffc00041b2c050 R12: 0xffffc000418ed1e8 R13: 0x0000000000000004 R14: 0x43e0000000000000 R15: 0xffffc00041b2c050 RSP: 0x0000200008548aa0 general protection fault Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Oct 11, 2013
-
-
Pekka Enberg authored
As of commit a449b889 ("x64: Enable sleeping in fault context") it's now safe for another thread to enter a fault handler on the same CPU. Fix exception guard to reflect that. This is needed for demand paging where a page fault from another thread can happen on the same CPU where a thread is sleeping in the page fault handler. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Oct 10, 2013
-
-
Avi Kivity authored
We have _KERNEL defines scattered throughout the code, which makes understanding it difficult. Define it just once, and adjust the source to build. We define it in an overridable variable, so that non-kernel imported code can undo it.
-
- Oct 01, 2013
-
-
Pekka Enberg authored
In preparation for enabling demand paging, enable sleeping in fault context by using a per-thread exception stack for normal faults and per-CPU exception stack for nested faults. Avi Kivity explains: Before [demand paging] can even hope to work, we need to enable sleeping in fault context. Right now each cpu has its own exception stack, which leads immediately to stack corruption: thread 1 faults enters exception stack tries to take mutex scheduler switches to thread 2 thread 2 faults enters same exception stack So we need to switch stacks. This can be done in the same way as for interrupt stacks (see thread::switch_to()). Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Sep 30, 2013
-
-
Venkatesh Srinivas authored
Older versions of KVM and user VMMs expose kvmclock MSRs at different MSR offsets. Detect the old flag in kvmclock::probe() and use the old MSRs if they are the only ones available. Signed-off-by:
Venkatesh Srinivas <venkateshs@google.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
- Sep 21, 2013
-
-
Glauber Costa authored
Now that we have an efficient interrupt handler, use it.No need to delete the old bsd code, just to avoid disrupting the file too much. Make sure through an assertion that it is never used, though. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
Glauber Costa authored
This version of the Xen interrupt handler tries to do as less work as possible in the interrupt itself. The previous version and my previous fix attempt would still clean the channels during interrupt. Because now we have pending_sel still set in the irq thread, we can ditch _irq_pending completely. There is now only one xen_irq for the entire system, and therefore I am registering one per cpu, since we will eventually have to process this in different cpus. (for different event channels). With this, in my (very course, host to guest) netperf test, I am achieving 9600 * 10^6 bps, while linux can reach ~10000 * 10^bps. So we're getting close: Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 65536 16384 16384 10.00 9589.32 Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
Glauber Costa authored
Some of the fields in the xen shared structure need to be accessed atomically. Move them to std::atomic so we can do that using C++11 primitives. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
- Sep 18, 2013
-
-
Sasha Levin authored
percpu had too little space allocated to support 64 vcpus, which lead to a crash when booting with more than 13 vcpus. Fix it by using a correct size to support 64 vcpus. Signed-off-by:
Sasha Levin <levinsasha928@gmail.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Sep 15, 2013
-
-
Nadav Har'El authored
Add Cloudius copyright to everything in arch/x64. This includes C++ code, assembly code, and ld scripts.
-
- Sep 12, 2013
-
-
Dmitry Fleytman authored
This patch implements GSI interrupt support for Xen bus. Needed in Xen environments w/o vector callbacks for HVM. One example of such an environment is Amazon EC2.
-