Commits · bbec1a1836e15ea42d07dff00cf69a698fe17fb6 · Verlässliche Systemsoftware / projects / osv

Dec 11, 2013

Verify slow page fault only happens when preemption is allowed · b7620ca2

Nadav Har'El authored 11 years ago


Once page_fault() checks that this is not a fast fixup (see safe_load()),
we reach the page-fault slow path, which needs to allocate memory or
even read from disk, and might sleep.

If we ever get such a slow page-fault inside kernel code which has
preemption or interrupts disabled, this is a serious bug, because the
code in question thinks it cannot sleep. So this patch adds two
assertions to verify this.

The preemptable() assertion is easily triggered if stacks are demand-paged
as explained in commit 41efdc1c (I have
a patch to solve this, but it won't fit in the margin).
However, I've also seen this assertion without demand-paged stacks, when
running all tests together through testrunner.so. So I'm hoping these
assertions will be helpful in hunting down some elusive bugs we still have.

This patch adds a third use of the "0x200" constant (the nineth bit of
the rflags register is the interrupt flag), so it replaces them by a
new symbolic name, processor::rflags_if.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b7620ca2

Dec 10, 2013

sched: add a wake() function that is safe to use on a thread that may terminate · dc40b49e

Avi Kivity authored 11 years ago


One problem with wake() is, if the thread that it is waking can cuncurrently
exit, that it may touch freed memory belonging to the thread structure.

Fix by separating the state that wake() touches into a detached_state
structure, and free that using rcu.

Add a thread_handle class that references only this detached state, and
accesses it via rcu.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

dc40b49e

Dec 08, 2013

xen: int vs long issues - OSv side · 1bbe05dd

Glauber Costa authored 11 years ago


It seems that we also had problems with our own code for int vs long
issues. I am really surprised that the C++ compiler didn't throw any
warnings for this since all word sizes are quite explicit. In any case,
this seems to be the missing piece for xen booting with many CPUs.

It boots fine now with up to 32 CPUs. After that, other problems start
to appear.

Fixes #113

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

1bbe05dd

Dec 05, 2013

sched: remove on_thread_stack · 9bd939f8

Glauber Costa authored 11 years ago


no users in tree.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9bd939f8

apic: fix allbutself delivery mode · 8a48cb55

Glauber Costa authored 11 years ago

Our APIC code is so wrong, but so wrong, that it even produce incorrect
results. X2APIC is fine, but XAPIC is using xapic::ipi() for all its
interrupts. The problem with that, is that the costumary place for "vector" is
inverted in the case of allbutself delivery mode, and therefore, we're sending
these IPIs to God Knows Where - not to the processors, that is for sure.
As a result, we would spin waiting for IRQ acks that would never arrive.

I could invert and reorganize the parameters and comment this out, but I've
decided it is a lot clearer just to open code it. Also, there is no need at all
to set ICR2 for allbutself, because the destination is already embedded in the
firing mode.

One issue: NMI is copied over because it is also wrong by the same reasons, so
I fixed. But I don't have a test case for this.

Fixes #110

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8a48cb55

Dec 04, 2013

xen: use bsrq instead of bsrl for event channels · 587063a8

Glauber Costa authored 11 years ago


This patch fixes the Amazon crash many CPUs. Funny enough, it is not
actually related to the number of CPUs. In that situation, the port
numbers allocated for the event channels are quite high. I don't
really know the reason for that, maybe the Hypervisor reserves the
small bits for CPU related things...

As we have more and more CPUs and the bits shift more and more rightwards,
they eventually reach the second long word of the event channel quadword.

But we have been operating this with brsl, which will only reach a long word.
If bit 32, for instance, is 1, it  will be interpreted as bit 0 == 1. Bit 0
having no registered handler, that will turn into a nullptr access.

Cheers for Dima for doing most of the debugging and heavy lifting here.
The hang issues are still present.

Fixes #109.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

587063a8

x64: Fix rare SMP deadlock when acquiring stack · 43e77bbb

Dan Schatzberg authored 11 years ago

The SMP bringup code uses a linked list of stacks. The APs are all
brought up concurrently and so to acquire a stack, they compare and swap
on smp_stack_free. The current code works as long as the compare and
swap succeeds, if it fails, then smp_stack_free MUST be read again
otherwise the AP will be deadlocked. This patch fixes the current code
by enforcing a re-read on a cmpxchg failure.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

43e77bbb

Dec 03, 2013

x64: Use inline assembly for setcsr function · d6dfe96d

Glauber Costa authored 11 years ago


We had at least one report for a system in which the __mm_setcsr function
we were using is not found. Since this is heavily arch specific anyway,
we should just use inline assembly to do so.

Generated code verified to be the same:

  33c6e7:       c7 45 e0 80 1f 00 00    movl   $0x1f80,-0x20(%rbp)
  33c6ee:       0f ae 55 e0             ldmxcsr -0x20(%rbp)

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d6dfe96d

Dec 01, 2013

Fix exception (in x86 sense, not C++) handling during boot · cba2f09a

Nadav Har'El authored 11 years ago


During boot, between the time main() set the IDT and when later
smp_launch() is called, the IDT doesn't actually work correctly.
The problem is that we use the separate stacks feature (IST), and that
doesn't work without also setting the GDT, not only the IDT.

So use init_on_cpu() to initialize not only the IDT, but other stuff
as well. Fix smp_launch() not to repeat this initialization on the boot
CPU, as it was already done.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

cba2f09a

Nov 26, 2013

Reduce number of unnecessary sections in our executable · 03aaf6b8

Nadav Har'El authored 11 years ago


This patch resolves issue #26. As you can see with "objdump -h
build/release/loader.elf", our executable had over a thousand (!)
separate sections, most of them should really be merged.
We already started doing this in arch/x64/loader.ld, but didn't
complete the work.

This patch merges all the ".gcc_except_table.*" sections into one,
and all the ".data.rel.ro.*" sections into one. After this merge,
we are left with just 52 sections, instead of more than 1000.

The default linker script (run "ld --verbose" to see it) also does
similar merges, so there's no reason why we shouldn't.

By reducing the number of ELF sections (each comes with a name, headers,
etc.), this patch also reduces the size of our loader-stripped.elf
by about 140K.

Fixes #26.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

03aaf6b8

xen: move per-cpu interrupt threads to .percpu section · 63d2e472

Dmitry Fleytman authored 11 years ago


Bug fixed by this patch made OSv crash on Xen during boot.
The problem started to show up after commit:

  commit ed808267
  Author: Nadav Har'El <nyh@cloudius-systems.com>
  Date:   Mon Nov 18 23:01:09 2013 +0200

      percpu: Reduce size of .percpu section

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

63d2e472

Nov 21, 2013

Replace numbers in prio.hh by automatically defined numbers · 147de06c

Nadav Har'El authored 11 years ago

prio.hh defines various initialization priorities. The actual numbers
don't matter, just the order between them. But when we add too many
priorities between existing ones, we may hit a need to renumber. This
is plain ugly, and reminds me of Basic programming ;-)

So this patch switches to an enum (enum class, actually).
We now just have a list of priority names in order, with no numbers.

It would have been straightforward, if it weren't for a bug in GCC
(see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59211

) where the
"init_priority" attribute doesn't accept the enum (while the "constructor"
attribute does). Luckily, a simple workaround - explicitly casting to
int - works.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

147de06c

Nov 19, 2013

percpu: Reduce size of .percpu section · ed808267

Nadav Har'El authored 11 years ago


This patch reduces the size of the .percpu section 64-fold from about
5 MB to 70 KB, and solves issue #95.

The ".percpu" section is part of the .data section of our executable
(loader-stripped.elf). In our 15 MB executable, roughly 7 MB is text
(code), and 7 MB is data, and out of that, a whopping 5 MB is the
".percpu" section. The executable is read in real mode, and this is
especially slow on Amazon EC2, hence our wish to make the executable
as small as possible.

The percpu section starts with all the PERCPU variables defined in the
program. We have about 70 KB of those, and believe it or not, most of
this 70 KB is just a single variable, the 65K dynamic_percpu_buffer
(see percpu.cc).

But then, we need a copy of these variables for each CPU. The unpatched
code duplicated this 70KB section 64 times in the executable file (!),
and then used these memory locations for up-to-64 cpus. But there is
no reason to duplicate this data in the executable! All we need to do
is to dynamically allocate a copy of this section for each CPU, and
this is what this patch does.

This patch removes about 5 MB from our executable: After this patch,
our loader-stripped.elf is just 9.7 MB, and its data section's size is
just 2.8 MB.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ed808267

Nov 12, 2013

x64: Fix stack alignment in fault handlers · 9be9dd99

Pekka Enberg authored 11 years ago


Make sure stack pointer is 16-byte aligned in fault handler as required
by x86-64 ABI. This is needed for the page fault handler to be able to
use stack for FPU state save/restore.

Spotted by Nadav.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9be9dd99

Nov 11, 2013

build: fix .tls_template_size calculation · 3527954f

Avi Kivity authored 11 years ago


For an unknown reason, the current calculation of .tls_template_size
yields 0x10 instead of the correct value.  This results in part of the
initial tls block being freed by arch::setup(), and subsequent
corruption.

Fix by switching to the ld SIZEOF() operator.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3527954f

Nov 07, 2013

x64: Use constants for CR0 and CR4 values · f77a8c32

Pekka Enberg authored 11 years ago


Replace magic numbers with constants for CR0 and CR4 control register
values in arch/x86/boot.S.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f77a8c32

Nov 04, 2013

mmu: default value for slop parameter of linear_map() · 9bd5556d

Dmitry Fleytman authored 11 years ago


slop made page size by default because this is the most frequent case

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>

9bd5556d

xen: do not start xen interrupt threads in non-xen guests · b14b78c4

Glauber Costa authored 11 years ago

If we do, we'll have extra xen threads running around for no reason in the !xen
case. is_xen() should already work here since the initializers are run after
cpuid discovery, which is when the information for is_xen is filled up.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b14b78c4

xen: Fix xen_features initialization · 0bdd4bc6

Raphael S. Carvalho authored 11 years ago


Ironically, my commit 0c77d3c2 broke xen_features initialization.
This patch fixes it.

Reported-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphael.scarv@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0bdd4bc6

Nov 01, 2013

x64/xen: Fix assignment to xen_features. · 0c77d3c2

Raphael S. Carvalho authored 11 years ago

The code wouldn't work if XENFEAT_NR_SUBMAPS > 1. It's currently 1.
The assignment to xen_features must take i as well as j into consideration.

Signed-off-by: Raphael S. Carvalho <raphael.scarv@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0c77d3c2

Oct 30, 2013

x64: fix TLS segment alignment · 15d30a11

Avi Kivity authored 11 years ago


The TLS segment is a little wierd in that it grows backwards from percpu_base
instead of forwards.  This causes the alignment code to calculate wrong
offsets when the segment size is 8 (mod 16).  A failure was seen where
::percpu_base was set at offset 0xfffffffffffffa08 in code that was in the
same translation unit as ::percpu_base, and 0xfffffffffffffa10 elsewhere.
This caused all dynamic_percpu instances to crash.

Fix by aligning the segment size.  For good measure, align also the segment
base, both to a cacheline boundary.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

15d30a11

Oct 28, 2013

x64: fix APIC ID register read on QEMU · c19e28f4

Avi Kivity authored 11 years ago


The expression *p >> 24, where p is an unsigned*, is optimized by
the compiler to *((u8*)p + 3) - reading the most significant byte only and
dropping the shift.

When this optimization is applied to reading the APIC ID, QEMU
returns zero for all processors, since the manual requires reading
entire words.

Fix by using a volatile pointer, disabling the optimization.

Note that QEMU is technically correct here though it violates all known
real x86 implementations.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

c19e28f4

x64: fix xapic INIT IPI · e7ba2287

Avi Kivity authored 11 years ago


xapic::init_ipi() shifts apic_id by 24, unaware that xapic::ipi will do it
again.  The result is that the boot processor is reset instead of the
auxiliary processor.

Remove the extraneous shift.

Found by booting with QEMU without kvm.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e7ba2287

Oct 24, 2013

arch-cpu.hh: Fix arch_thread forward declaration · ba2abdf7

Pekka Enberg authored 11 years ago


Spotted by Clang:

../../arch/x64/arch-cpu.hh:57:1: error: 'arch_thread' defined as a struct here but previously
      declared as a class [-Werror,-Wmismatched-tags]
struct arch_thread {
^
../../arch/x64/arch-cpu.hh:37:1: note: did you mean struct here?
class arch_thread;
^~~~~
struct

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ba2abdf7

arch-cpu.hh: Fix arch_cpu forward declaration · 766b9719

Pekka Enberg authored 11 years ago


Spotted by Clang:

../../include/sched.hh:278:12: error: class 'arch_cpu' was previously declared as a struct
      [-Werror,-Wmismatched-tags]
    friend class arch_cpu;
           ^
../../arch/x64/arch-cpu.hh:39:8: note: previous use is here
struct arch_cpu {
       ^
../../include/sched.hh:278:12: note: did you mean struct here?
    friend class arch_cpu;
           ^~~~~
           struct

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

766b9719

Oct 23, 2013

Fix top of call stack - and treatment of unhandled C++ exceptions · 7fc023e8

Nadav Har'El authored 11 years ago


As noticed by Tomek in issue #64, unhandled C++ exceptions cause OSv to
silently hang, in an endless loop inside the unwinding code.

So this patch fixes the wrong CFI (DWARF Call Frame Information) which
caused the unwinder to loop. We just had a single line of assembly missing:
The topmost frame - the thread's main function - needs to undefine the
saved %rip to prevent going further back. If we don't do that, gdb will
end every "bt" output with a warning "Frame did not save its PC" (but hey,
nobody complained... ;-)), and the unwinding library, will, unfortunately,
go into an endless loop as seen in issue #64.

With this one-line patch, unhandled exceptions now work as expected -
they abort with a message like:

	terminate called after throwing an instance of 'int'
	Aborted

And attaching a debugger you can see exactly where the offending throw came
from (i.e., the stack does *not* unnecessarily unwind when there's nobody
waiting to catch the exception).

This works for uncaught exceptions anywhere - including inside main()
and from constructors when loading the object (before running main()).

"bt" in gdb also no longer ends each stack trace with an error message.
The last frame it shows is "thread_main()".

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>

7fc023e8

Oct 22, 2013

x64: Make dump_register() fault-handler safe · ab623c9f

Pekka Enberg authored 11 years ago


The debug() call can deadlock because it's using boost format. Switch to
debug_ll().

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ab623c9f

x64: Fix missing newline in dump_registers() · e8b33142

Pekka Enberg authored 11 years ago


The debug() format string is missing a newline. Fix that up.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e8b33142

Fix linker error in make mode=debug · 7f9faa65

Tomasz Grabiec authored 11 years ago


This is a workaround for linker error when compiling with -O0

  `.text._Z9safe_loadIcEbPKT_RS0_' referenced in section `.text.fixup'
  of core/mmu.o: defined in discarded section
  `.text._Z9safe_loadIcEbPKT_RS0_[_Z9safe_loadIcEbPKT_RS0_]' of
  core/mmu.o

The safe_load() template is used in both runtime.cc and core/mmu.cc
but the linker keeps it only in one section discarding the other.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>

7f9faa65

Oct 16, 2013

x64: Register dump on GP fault · ca52fa23

Pekka Enberg authored 11 years ago

Dump registers on general protection fault for debugging purposes. Even
if you have gdb available, getting to the exception frame is not always
possible after OSv has crashed.

Example output looks as follows:

registers:
RIP: 0x0000100000b7e913 RFL: 0x0000000000010202 CS: 0x0000000000000008 SS: 0x0000000000000010
RAX: 0xffffc000418ed278 RBX: 0xffffc00041b2c050 RCX: 0x0000000000000004 RDX: 0x0000000000000000
RSI: 0x0000000000000001 RDI: 0x43e0000000000000 RBP: 0x0000200008548d10 R8: 0xffffc000426e3010
R9: 0x0000000000000004 R10: 0x43e0000000000000 R11: 0xffffc00041b2c050 R12: 0xffffc000418ed1e8
R13: 0x0000000000000004 R14: 0x43e0000000000000 R15: 0xffffc00041b2c050 RSP: 0x0000200008548aa0
general protection fault

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ca52fa23

Oct 11, 2013

x64: Fix nested exception debugging · 82301253

Pekka Enberg authored 11 years ago


As of commit a449b889 ("x64: Enable sleeping in fault context") it's now
safe for another thread to enter a fault handler on the same CPU.  Fix
exception guard to reflect that.

This is needed for demand paging where a page fault from another thread
can happen on the same CPU where a thread is sleeping in the page fault
handler.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

82301253

Oct 10, 2013

build: define _KERNEL everywhere · 95ce17e3

Avi Kivity authored 11 years ago

We have _KERNEL defines scattered throughout the code, which makes
understanding it difficult.

Define it just once, and adjust the source to build.

We define it in an overridable variable, so that non-kernel imported code
can undo it.

95ce17e3

Oct 01, 2013

x64: Enable sleeping in fault context · a449b889

Pekka Enberg authored 11 years ago


In preparation for enabling demand paging, enable sleeping in fault
context by using a per-thread exception stack for normal faults and
per-CPU exception stack for nested faults.

Avi Kivity explains:

  Before [demand paging] can even hope to work, we need to enable
  sleeping in fault context.  Right now each cpu has its own exception
  stack, which leads immediately to stack corruption:

  thread 1 faults
  enters exception stack
  tries to take mutex
  scheduler switches to thread 2
  thread 2 faults
  enters same exception stack

  So we need to switch stacks.  This can be done in the same way as for
  interrupt stacks (see thread::switch_to()).

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

a449b889

Sep 30, 2013

kvmclock: Implement support for old kvmclock MSRs. · e4ceea63

Venkatesh Srinivas authored 11 years ago


Older versions of KVM and user VMMs expose kvmclock MSRs at different
MSR offsets. Detect the old flag in kvmclock::probe() and use the old
MSRs if they are the only ones available.

Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

e4ceea63

Sep 21, 2013

xen: use c++ interrupt handler · ea4cb9f6

Glauber Costa authored 11 years ago

Now that we have an efficient interrupt handler, use it.No need to delete the
old bsd code, just to avoid disrupting the file too much. Make sure through
an assertion that it is never used, though.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

ea4cb9f6

xen: rework interrupt handler · a837afe5

Glauber Costa authored 11 years ago


This version of the Xen interrupt handler tries to do as less work as possible
in the interrupt itself. The previous version and my previous fix attempt would
still clean the channels during interrupt.

Because now we have pending_sel still set in the irq thread, we can ditch
_irq_pending completely.

There is now only one xen_irq for the entire system, and therefore I am
registering one per cpu, since we will eventually have to process this in
different cpus. (for different event channels).

With this, in my (very course, host to guest) netperf test, I am achieving
9600 * 10^6 bps, while linux can reach ~10000 * 10^bps. So we're getting close:

Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 65536  16384  16384    10.00    9589.32

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

a837afe5

xen: declare shared types as atomic · de6ba640

Glauber Costa authored 11 years ago

Some of the fields in the xen shared structure need to be accessed atomically.
Move them to std::atomic so we can do that using C++11 primitives.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

de6ba640

Sep 18, 2013

percpu: use correct percpu sect size to support 64 vcpus · 439bad31

Sasha Levin authored 11 years ago


percpu had too little space allocated to support 64 vcpus, which
lead to a crash when booting with more than 13 vcpus. Fix it by
using a correct size to support 64 vcpus.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

439bad31

Sep 15, 2013

Add Cloudius copyright to everything in arch/x64 · 6e918b0c

Nadav Har'El authored 11 years ago

Add Cloudius copyright to everything in arch/x64. This includes C++ code, assembly
code, and ld scripts.

6e918b0c

Sep 12, 2013

Support for Xen w/o vector callbacks · 1d3e336c

Dmitry Fleytman authored 11 years ago

This patch implements GSI interrupt support for Xen bus.
Needed in Xen environments w/o vector callbacks for HVM.
One example of such an environment is Amazon EC2.

1d3e336c