- Feb 07, 2014
-
-
Glauber Costa authored
In the past, we have struggled with long delays while reading data from disk in real mode, leading to big boot times (not that they are totally gone). For that reason, it is useful to know how much time is being spent in that process. As unstable and broken the TSC is, it is pretty much our only ally for that. What I am proposing in this patch, is that we take timings from key states of the bootloader, and pass that to main loader. We will do that by adding some space at the end of the multiboot_info structure, so that we can pass some fields to it. Right now, we are using 16 bytes so we can pass 2 64-bit tsc reads. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
I am proposing a mechanism here that will allow us to have a better idea about how much time do we spend booting, and also how much time each of the pieces contribute to. For that, we need to be able to get time stamps really early, in places where tracepoints may not be available, and a clock most definitely won't. With my proposal, one should be able to register events. After the system boots, we will calculate the total time since the first event, as well as the delta since the last event. If the first event is early enough, that should produce a very good picture about our boot time. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 03, 2014
-
-
Takuya ASADA authored
If abort() called the result of an exception (ex: segv), my backtrace-on-abort patch outputs incorrect symbol name on exception handler. Because it doesn't have enough information on elf header. This patch adds the information. Before the patch applied: page fault outside application [backtrace] 0x2f3b31 <mmu::vm_sigsegv(unsigned long, exception_frame*)+37> 0x2f3d4d <mmu::vm_fault(unsigned long, exception_frame*)+215> 0x325b34 <page_fault+157> 0x324797 <void std::__introsort_loop<fault_fixup*, long>(fault_fixup*, fault_fixup*, long)+1129> <-- This is incorrect ! 0x312a05 <vmware::vmxnet3::vmxnet3(pci::device&)+473> After applied: page fault outside application [backtrace] 0x2f3b31 <mmu::vm_sigsegv(unsigned long, exception_frame*)+37> 0x2f3d4d <mmu::vm_fault(unsigned long, exception_frame*)+215> 0x325b34 <page_fault+157> 0x324797 <ex_pf+35> <-- Correct name 0x312a05 <vmware::vmxnet3::vmxnet3(pci::device&)+473> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 02, 2014
-
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
idle, load balancer, init. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Jan 30, 2014
-
-
Claudio Fontana authored
put all of .ctors, .init_array, .ctors.*, init_array.* into the init_array section. This fixes the build for mixed ctors / init_array tooling and dependencies. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 27, 2014
-
-
Nadav Har'El authored
OSv's timer mechanism hinges on the Local APIC's (per-cpu) one-shot timer, which delivers an interrupt after the requested number of nanoseconds. The API to set this timer, clock_event::set(), took the absolute time of the next interrupt. However, what it really needs is the duration in nanoseconds until the next interrupt. So this patch we change the basic clock_event::set() API to take a duration, and implement the original clock_event::set(s64) - taking an s64 absolute wall-clock time - as a simple wrapper. The next patch will add more wrappers for set() taking absolute times from different clocks. Later patches in this series will stop using the old set(s64) version, until it is dropped in the end of the series. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 22, 2014
-
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 10, 2014
-
-
Pekka Enberg authored
The "APIC base" message is not very useful to users. Drop it. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Currently, OSv prints out the following at boot: acpi 0 apic 0 acpi 1 apic 1 acpi 2 apic 2 acpi 3 apic 3 replace that with a simpler message: 4 CPUs detected We do lose the ACPI ID -> CPU ID mapping but it is not terribly important for users. Suggested-by:
Nadav Har'El <nyh@cloudius-systems.com> Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
When we start using the JVM balloon, our memcpy could fail for valid reasons when the JVM is moving memory that is now in an unmapped region. To handle that, register a fixup that will trigger a JVM call when the fault happens. If all goes well, we will be able to continue normally. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 08, 2014
-
-
Glauber Costa authored
This patch provides a backwards version of memcpy. It works all the same, but will start the copy from dst + n <= src + n, instead of dst <= src. That is needed for memmove when the source and destination operands overlap. Being a nonstandard interface, I've just named it "memcpy_backwards" Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 30, 2013
-
-
Avi Kivity authored
Without this, only even megabytes are accessible, and accesses to odd megabytes hit the even megabytes. For example address 0x312345 is aliased to address 0x212345. Enable the A20 gate to prevent this. Fixes boot on VMware. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 24, 2013
-
-
Avi Kivity authored
Helps making bsd header changes that xen includes. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
We use sched::thread::attr to pass parameters to sched::thread creation, i.e., create a thread with non-default stack parameters, pinned to a particular CPU, or a detached thread. Previously we had constructors taking many combinations of stack size (integer), pinned cpu (cpu*) and detached (boolean), and doing "the right thing". However, this makes the code hard to read (what does attr(4096) specify?) and the constructors hard to expand with new parameters. Replace the attr() constructors with the so-called "named parameter" idiom: attr now only has a null constructor attr(), and one modifies it with calls to pin(cpu*), detach(), or stack(size). For example, attr() // default attributes attr().pin(sched::cpus[0]) // pin to cpu 0 attr().stack(4096).pin(sched::cpus[0]) // pin and non-default stack and so on. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 16, 2013
-
-
Pekka Enberg authored
Move the x86-64 PTE definitions to a new arch specific arch-mmu.hh header file to make core/mmu.cc smaller and more portable. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 15, 2013
-
-
Glauber Costa authored
Context: going to wait with irqs_disabled is a call for disaster. While it is true that not every time we call wait we actually end up waiting, that should be an invalid call, due to the times we may wait. Because of that, it would be good to express that nonsense in an assertion. There is however, places we sleep with irqs disabled currently. Although they are technically safe, because we implicitly enable interrupts, they end up reaching wait() in a non-safe state. That happens in the page fault handler. Explicitly enabling interrupts will allow us to test for valid / invalid wait status. With this test applied, all tests in our whitelist still passes. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 11, 2013
-
-
Pekka Enberg authored
Simplify core/mmu.cc and make it more portable by moving the page fault handler to arch/x64/mmu.cc. There's more arch specific code in core/mmu.cc that should be also moved. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Once page_fault() checks that this is not a fast fixup (see safe_load()), we reach the page-fault slow path, which needs to allocate memory or even read from disk, and might sleep. If we ever get such a slow page-fault inside kernel code which has preemption or interrupts disabled, this is a serious bug, because the code in question thinks it cannot sleep. So this patch adds two assertions to verify this. The preemptable() assertion is easily triggered if stacks are demand-paged as explained in commit 41efdc1c (I have a patch to solve this, but it won't fit in the margin). However, I've also seen this assertion without demand-paged stacks, when running all tests together through testrunner.so. So I'm hoping these assertions will be helpful in hunting down some elusive bugs we still have. This patch adds a third use of the "0x200" constant (the nineth bit of the rflags register is the interrupt flag), so it replaces them by a new symbolic name, processor::rflags_if. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 10, 2013
-
-
Avi Kivity authored
One problem with wake() is, if the thread that it is waking can cuncurrently exit, that it may touch freed memory belonging to the thread structure. Fix by separating the state that wake() touches into a detached_state structure, and free that using rcu. Add a thread_handle class that references only this detached state, and accesses it via rcu. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 08, 2013
-
-
Glauber Costa authored
It seems that we also had problems with our own code for int vs long issues. I am really surprised that the C++ compiler didn't throw any warnings for this since all word sizes are quite explicit. In any case, this seems to be the missing piece for xen booting with many CPUs. It boots fine now with up to 32 CPUs. After that, other problems start to appear. Fixes #113 Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 05, 2013
-
-
Glauber Costa authored
no users in tree. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Our APIC code is so wrong, but so wrong, that it even produce incorrect results. X2APIC is fine, but XAPIC is using xapic::ipi() for all its interrupts. The problem with that, is that the costumary place for "vector" is inverted in the case of allbutself delivery mode, and therefore, we're sending these IPIs to God Knows Where - not to the processors, that is for sure. As a result, we would spin waiting for IRQ acks that would never arrive. I could invert and reorganize the parameters and comment this out, but I've decided it is a lot clearer just to open code it. Also, there is no need at all to set ICR2 for allbutself, because the destination is already embedded in the firing mode. One issue: NMI is copied over because it is also wrong by the same reasons, so I fixed. But I don't have a test case for this. Fixes #110 Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 04, 2013
-
-
Glauber Costa authored
This patch fixes the Amazon crash many CPUs. Funny enough, it is not actually related to the number of CPUs. In that situation, the port numbers allocated for the event channels are quite high. I don't really know the reason for that, maybe the Hypervisor reserves the small bits for CPU related things... As we have more and more CPUs and the bits shift more and more rightwards, they eventually reach the second long word of the event channel quadword. But we have been operating this with brsl, which will only reach a long word. If bit 32, for instance, is 1, it will be interpreted as bit 0 == 1. Bit 0 having no registered handler, that will turn into a nullptr access. Cheers for Dima for doing most of the debugging and heavy lifting here. The hang issues are still present. Fixes #109. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Dmitry Fleytman <dmitry@daynix.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Dan Schatzberg authored
The SMP bringup code uses a linked list of stacks. The APs are all brought up concurrently and so to acquire a stack, they compare and swap on smp_stack_free. The current code works as long as the compare and swap succeeds, if it fails, then smp_stack_free MUST be read again otherwise the AP will be deadlocked. This patch fixes the current code by enforcing a re-read on a cmpxchg failure. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Dan Schatzberg <schatzberg.dan@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 03, 2013
-
-
Glauber Costa authored
We had at least one report for a system in which the __mm_setcsr function we were using is not found. Since this is heavily arch specific anyway, we should just use inline assembly to do so. Generated code verified to be the same: 33c6e7: c7 45 e0 80 1f 00 00 movl $0x1f80,-0x20(%rbp) 33c6ee: 0f ae 55 e0 ldmxcsr -0x20(%rbp) Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 01, 2013
-
-
Nadav Har'El authored
During boot, between the time main() set the IDT and when later smp_launch() is called, the IDT doesn't actually work correctly. The problem is that we use the separate stacks feature (IST), and that doesn't work without also setting the GDT, not only the IDT. So use init_on_cpu() to initialize not only the IDT, but other stuff as well. Fix smp_launch() not to repeat this initialization on the boot CPU, as it was already done. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Nov 26, 2013
-
-
Nadav Har'El authored
This patch resolves issue #26. As you can see with "objdump -h build/release/loader.elf", our executable had over a thousand (!) separate sections, most of them should really be merged. We already started doing this in arch/x64/loader.ld, but didn't complete the work. This patch merges all the ".gcc_except_table.*" sections into one, and all the ".data.rel.ro.*" sections into one. After this merge, we are left with just 52 sections, instead of more than 1000. The default linker script (run "ld --verbose" to see it) also does similar merges, so there's no reason why we shouldn't. By reducing the number of ELF sections (each comes with a name, headers, etc.), this patch also reduces the size of our loader-stripped.elf by about 140K. Fixes #26. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Dmitry Fleytman authored
Bug fixed by this patch made OSv crash on Xen during boot. The problem started to show up after commit: commit ed808267 Author: Nadav Har'El <nyh@cloudius-systems.com> Date: Mon Nov 18 23:01:09 2013 +0200 percpu: Reduce size of .percpu section Signed-off-by:
Dmitry Fleytman <dmitry@daynix.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Nov 21, 2013
-
-
Nadav Har'El authored
prio.hh defines various initialization priorities. The actual numbers don't matter, just the order between them. But when we add too many priorities between existing ones, we may hit a need to renumber. This is plain ugly, and reminds me of Basic programming ;-) So this patch switches to an enum (enum class, actually). We now just have a list of priority names in order, with no numbers. It would have been straightforward, if it weren't for a bug in GCC (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59211 ) where the "init_priority" attribute doesn't accept the enum (while the "constructor" attribute does). Luckily, a simple workaround - explicitly casting to int - works. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-