Commits · 2209c95a2417c8e54c2ecf11fe4ec0c125f1bdc3 · Verlässliche Systemsoftware / projects / osv

Feb 07, 2014

Glauber Costa authored 11 years ago

In the past, we have struggled with long delays while reading data from disk in
real mode, leading to big boot times (not that they are totally gone). For that
reason, it is useful to know how much time is being spent in that process. As
unstable and broken the TSC is, it is pretty much our only ally for that.

What I am proposing in this patch, is that we take timings from key states of
the bootloader, and pass that to main loader. We will do that by adding some
space at the end of the multiboot_info structure, so that we can pass some
fields to it. Right now, we are using 16 bytes so we can pass 2 64-bit tsc
reads.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d38883fa

general infrastructure for boot time calculation · 3ab3a6bb

Glauber Costa authored 11 years ago

I am proposing a mechanism here that will allow us to have a better idea about
how much time do we spend booting, and also how much time each of the pieces
contribute to. For that, we need to be able to get time stamps really early, in
places where tracepoints may not be available, and a clock most definitely
won't.

With my proposal, one should be able to register events. After the system
boots, we will calculate the total time since the first event, as well as the
delta since the last event. If the first event is early enough, that should
produce a very good picture about our boot time.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3ab3a6bb

Feb 03, 2014

x64: Add symbol on assembly functions · f6ab17a4

Takuya ASADA authored 11 years ago


If abort() called the result of an exception (ex: segv), my
backtrace-on-abort patch outputs incorrect symbol name on exception
handler.  Because it doesn't have enough information on elf header.

This patch adds the information.

Before the patch applied:
page fault outside application
[backtrace]
0x2f3b31 <mmu::vm_sigsegv(unsigned long, exception_frame*)+37>
0x2f3d4d <mmu::vm_fault(unsigned long, exception_frame*)+215>
0x325b34 <page_fault+157>
0x324797 <void std::__introsort_loop<fault_fixup*, long>(fault_fixup*, fault_fixup*, long)+1129>  <-- This is incorrect !
0x312a05 <vmware::vmxnet3::vmxnet3(pci::device&)+473>

After applied:
page fault outside application
[backtrace]
0x2f3b31 <mmu::vm_sigsegv(unsigned long, exception_frame*)+37>
0x2f3d4d <mmu::vm_fault(unsigned long, exception_frame*)+215>
0x325b34 <page_fault+157>
0x324797 <ex_pf+35> <-- Correct name
0x312a05 <vmware::vmxnet3::vmxnet3(pci::device&)+473>

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f6ab17a4

Feb 02, 2014
- xen: name interrupt threads · 9869ed19
  Avi Kivity authored 11 years ago
  
  Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
  9869ed19
- sched: name some core threads · 4dc9591b
  Avi Kivity authored 11 years ago
  
  idle, load balancer, init. Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
  4dc9591b
Jan 30, 2014

loader.ld: merge init_array and ctors into init_array · 9b6be880

Claudio Fontana authored 11 years ago


put all of .ctors, .init_array, .ctors.*, init_array.*
into the init_array section.
This fixes the build for mixed ctors / init_array tooling
and dependencies.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9b6be880

Jan 27, 2014

clock: relative-time clock_event::set() · 5c5f5971

Nadav Har'El authored 11 years ago

OSv's timer mechanism hinges on the Local APIC's (per-cpu) one-shot timer,
which delivers an interrupt after the requested number of nanoseconds.

The API to set this timer, clock_event::set(), took the absolute time of
the next interrupt. However, what it really needs is the duration in
nanoseconds until the next interrupt.

So this patch we change the basic clock_event::set() API to take a
duration, and implement the original clock_event::set(s64) - taking an s64
absolute wall-clock time - as a simple wrapper. The next patch will add
more wrappers for set() taking absolute times from different clocks.
Later patches in this series will stop using the old set(s64) version,
until it is dropped in the end of the series.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

5c5f5971

Jan 22, 2014
- include: Move debug.hh to include/osv · 7809519b
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  7809519b
- include: Move mempool.hh to include/osv · 9c95f49d
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  9c95f49d
- include: Move prio.hh to include/osv · 5bb3e7b4
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  5bb3e7b4
- include: Move ilog2.hh to include/osv · d8df3fd1
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  d8df3fd1
- include: Move alternative.hh to include/osv · 372b4df6
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  372b4df6
- include: Move elf.hh to include/osv · b8034e34
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  b8034e34
- include: Move barrier.hh to include/osv · c80be886
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  c80be886
- include: Move mmu.hh to include/osv · 9cb900b7
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  9cb900b7
- include: Move interrupt.hh to include/osv · d7cc6216
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  d7cc6216
- include: Move align.hh to include/osv · 4473f2ca
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  4473f2ca
- include: Move sched.hh to include/osv · fae5693e
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  fae5693e
Jan 10, 2014

x64: Drop APIC base boot message · ba7250c9

Pekka Enberg authored 11 years ago


The "APIC base" message is not very useful to users. Drop it.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ba7250c9

x64: Simplify CPU bringup boot message · d291601d

Pekka Enberg authored 11 years ago


Currently, OSv prints out the following at boot:

  acpi 0 apic 0
  acpi 1 apic 1
  acpi 2 apic 2
  acpi 3 apic 3

replace that with a simpler message:

  4 CPUs detected

We do lose the ACPI ID -> CPU ID mapping but it is not terribly
important for users.

Suggested-by: Nadav Har'El <nyh@cloudius-systems.com>
Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d291601d

string: add fixups for memcpy operations · 9cce0f87

Glauber Costa authored 11 years ago

When we start using the JVM balloon, our memcpy could fail for valid reasons
when the JVM is moving memory that is now in an unmapped region. To handle that,
register a fixup that will trigger a JVM call when the fault happens. If all goes
well, we will be able to continue normally.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9cce0f87

Jan 08, 2014

x64: Provide a backwards version of memcpy · d25859ce

Glauber Costa authored 11 years ago

This patch provides a backwards version of memcpy. It works all the same, but
will start the copy from dst + n <= src + n, instead of dst <= src. That is
needed for memmove when the source and destination operands overlap.

Being a nonstandard interface, I've just named it "memcpy_backwards"

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d25859ce

Dec 30, 2013

x64: enable A20 gate very early · 0a73fbe6

Avi Kivity authored 11 years ago


Without this, only even megabytes are accessible, and accesses to odd megabytes
hit the even megabytes.  For example address 0x312345 is aliased to address
0x212345.

Enable the A20 gate to prevent this.

Fixes boot on VMware.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0a73fbe6

Dec 24, 2013

bsd: convert the Xen stuff to C++ · 828ec291

Avi Kivity authored 11 years ago


Helps making bsd header changes that xen includes.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

828ec291

sched: Overhaul sched::thread::attr construction · eb48b150

Nadav Har'El authored 11 years ago


We use sched::thread::attr to pass parameters to sched::thread creation,
i.e., create a thread with non-default stack parameters, pinned to a
particular CPU, or a detached thread.

Previously we had constructors taking many combinations of stack size
(integer), pinned cpu (cpu*) and detached (boolean), and doing "the
right thing". However, this makes the code hard to read (what does
attr(4096) specify?) and the constructors hard to expand with new
parameters.

Replace the attr() constructors with the so-called "named parameter"
idiom: attr now only has a null constructor attr(), and one modifies
it with calls to pin(cpu*), detach(), or stack(size).

For example,
    attr()                                  // default attributes
    attr().pin(sched::cpus[0])              // pin to cpu 0
    attr().stack(4096).pin(sched::cpus[0])  // pin and non-default stack
    and so on.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

eb48b150

Dec 16, 2013

x64: Move PTE definitions to arch-mmu.hh · d761f58f

Pekka Enberg authored 11 years ago


Move the x86-64 PTE definitions to a new arch specific arch-mmu.hh
header file to make core/mmu.cc smaller and more portable.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d761f58f

Dec 15, 2013

enable interrupts during page fault handling · ec7ed8cd

Glauber Costa authored 11 years ago

Context: going to wait with irqs_disabled is a call for disaster. While it is
true that not every time we call wait we actually end up waiting, that should
be an invalid call, due to the times we may wait. Because of that, it would
be good to express that nonsense in an assertion.

There is however, places we sleep with irqs disabled currently. Although they
are technically safe, because we implicitly enable interrupts, they end up
reaching wait() in a non-safe state. That happens in the page fault handler.
Explicitly enabling interrupts will allow us to test for valid / invalid wait
status.

With this test applied, all tests in our whitelist still passes.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

ec7ed8cd

Dec 11, 2013

x64: Make page fault handler arch specific · 43491705

Pekka Enberg authored 11 years ago


Simplify core/mmu.cc and make it more portable by moving the page fault
handler to arch/x64/mmu.cc.  There's more arch specific code in
core/mmu.cc that should be also moved.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

43491705

Verify slow page fault only happens when preemption is allowed · b7620ca2

Nadav Har'El authored 11 years ago


Once page_fault() checks that this is not a fast fixup (see safe_load()),
we reach the page-fault slow path, which needs to allocate memory or
even read from disk, and might sleep.

If we ever get such a slow page-fault inside kernel code which has
preemption or interrupts disabled, this is a serious bug, because the
code in question thinks it cannot sleep. So this patch adds two
assertions to verify this.

The preemptable() assertion is easily triggered if stacks are demand-paged
as explained in commit 41efdc1c (I have
a patch to solve this, but it won't fit in the margin).
However, I've also seen this assertion without demand-paged stacks, when
running all tests together through testrunner.so. So I'm hoping these
assertions will be helpful in hunting down some elusive bugs we still have.

This patch adds a third use of the "0x200" constant (the nineth bit of
the rflags register is the interrupt flag), so it replaces them by a
new symbolic name, processor::rflags_if.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b7620ca2

Dec 10, 2013

sched: add a wake() function that is safe to use on a thread that may terminate · dc40b49e

Avi Kivity authored 11 years ago


One problem with wake() is, if the thread that it is waking can cuncurrently
exit, that it may touch freed memory belonging to the thread structure.

Fix by separating the state that wake() touches into a detached_state
structure, and free that using rcu.

Add a thread_handle class that references only this detached state, and
accesses it via rcu.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

dc40b49e

Dec 08, 2013

xen: int vs long issues - OSv side · 1bbe05dd

Glauber Costa authored 11 years ago


It seems that we also had problems with our own code for int vs long
issues. I am really surprised that the C++ compiler didn't throw any
warnings for this since all word sizes are quite explicit. In any case,
this seems to be the missing piece for xen booting with many CPUs.

It boots fine now with up to 32 CPUs. After that, other problems start
to appear.

Fixes #113

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

1bbe05dd

Dec 05, 2013

sched: remove on_thread_stack · 9bd939f8

Glauber Costa authored 11 years ago


no users in tree.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9bd939f8

apic: fix allbutself delivery mode · 8a48cb55

Glauber Costa authored 11 years ago

Our APIC code is so wrong, but so wrong, that it even produce incorrect
results. X2APIC is fine, but XAPIC is using xapic::ipi() for all its
interrupts. The problem with that, is that the costumary place for "vector" is
inverted in the case of allbutself delivery mode, and therefore, we're sending
these IPIs to God Knows Where - not to the processors, that is for sure.
As a result, we would spin waiting for IRQ acks that would never arrive.

I could invert and reorganize the parameters and comment this out, but I've
decided it is a lot clearer just to open code it. Also, there is no need at all
to set ICR2 for allbutself, because the destination is already embedded in the
firing mode.

One issue: NMI is copied over because it is also wrong by the same reasons, so
I fixed. But I don't have a test case for this.

Fixes #110

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8a48cb55

Dec 04, 2013

xen: use bsrq instead of bsrl for event channels · 587063a8

Glauber Costa authored 11 years ago


This patch fixes the Amazon crash many CPUs. Funny enough, it is not
actually related to the number of CPUs. In that situation, the port
numbers allocated for the event channels are quite high. I don't
really know the reason for that, maybe the Hypervisor reserves the
small bits for CPU related things...

As we have more and more CPUs and the bits shift more and more rightwards,
they eventually reach the second long word of the event channel quadword.

But we have been operating this with brsl, which will only reach a long word.
If bit 32, for instance, is 1, it  will be interpreted as bit 0 == 1. Bit 0
having no registered handler, that will turn into a nullptr access.

Cheers for Dima for doing most of the debugging and heavy lifting here.
The hang issues are still present.

Fixes #109.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

587063a8

x64: Fix rare SMP deadlock when acquiring stack · 43e77bbb

Dan Schatzberg authored 11 years ago

The SMP bringup code uses a linked list of stacks. The APs are all
brought up concurrently and so to acquire a stack, they compare and swap
on smp_stack_free. The current code works as long as the compare and
swap succeeds, if it fails, then smp_stack_free MUST be read again
otherwise the AP will be deadlocked. This patch fixes the current code
by enforcing a re-read on a cmpxchg failure.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

43e77bbb

Dec 03, 2013

x64: Use inline assembly for setcsr function · d6dfe96d

Glauber Costa authored 11 years ago


We had at least one report for a system in which the __mm_setcsr function
we were using is not found. Since this is heavily arch specific anyway,
we should just use inline assembly to do so.

Generated code verified to be the same:

  33c6e7:       c7 45 e0 80 1f 00 00    movl   $0x1f80,-0x20(%rbp)
  33c6ee:       0f ae 55 e0             ldmxcsr -0x20(%rbp)

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d6dfe96d

Dec 01, 2013

Fix exception (in x86 sense, not C++) handling during boot · cba2f09a

Nadav Har'El authored 11 years ago


During boot, between the time main() set the IDT and when later
smp_launch() is called, the IDT doesn't actually work correctly.
The problem is that we use the separate stacks feature (IST), and that
doesn't work without also setting the GDT, not only the IDT.

So use init_on_cpu() to initialize not only the IDT, but other stuff
as well. Fix smp_launch() not to repeat this initialization on the boot
CPU, as it was already done.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

cba2f09a

Nov 26, 2013

Reduce number of unnecessary sections in our executable · 03aaf6b8

Nadav Har'El authored 11 years ago


This patch resolves issue #26. As you can see with "objdump -h
build/release/loader.elf", our executable had over a thousand (!)
separate sections, most of them should really be merged.
We already started doing this in arch/x64/loader.ld, but didn't
complete the work.

This patch merges all the ".gcc_except_table.*" sections into one,
and all the ".data.rel.ro.*" sections into one. After this merge,
we are left with just 52 sections, instead of more than 1000.

The default linker script (run "ld --verbose" to see it) also does
similar merges, so there's no reason why we shouldn't.

By reducing the number of ELF sections (each comes with a name, headers,
etc.), this patch also reduces the size of our loader-stripped.elf
by about 140K.

Fixes #26.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

03aaf6b8

xen: move per-cpu interrupt threads to .percpu section · 63d2e472

Dmitry Fleytman authored 11 years ago


Bug fixed by this patch made OSv crash on Xen during boot.
The problem started to show up after commit:

  commit ed808267
  Author: Nadav Har'El <nyh@cloudius-systems.com>
  Date:   Mon Nov 18 23:01:09 2013 +0200

      percpu: Reduce size of .percpu section

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

63d2e472

Nov 21, 2013

Replace numbers in prio.hh by automatically defined numbers · 147de06c

Nadav Har'El authored 11 years ago

prio.hh defines various initialization priorities. The actual numbers
don't matter, just the order between them. But when we add too many
priorities between existing ones, we may hit a need to renumber. This
is plain ugly, and reminds me of Basic programming ;-)

So this patch switches to an enum (enum class, actually).
We now just have a list of priority names in order, with no numbers.

It would have been straightforward, if it weren't for a bug in GCC
(see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59211

) where the
"init_priority" attribute doesn't accept the enum (while the "constructor"
attribute does). Luckily, a simple workaround - explicitly casting to
int - works.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

147de06c