Commits · 153f6513f13ff9e49cf98ca81ad9ec46e1747671 · Verlässliche Systemsoftware / projects / osv

Dec 01, 2013

Fix exception (in x86 sense, not C++) handling during boot · cba2f09a

Nadav Har'El authored 11 years ago


During boot, between the time main() set the IDT and when later
smp_launch() is called, the IDT doesn't actually work correctly.
The problem is that we use the separate stacks feature (IST), and that
doesn't work without also setting the GDT, not only the IDT.

So use init_on_cpu() to initialize not only the IDT, but other stuff
as well. Fix smp_launch() not to repeat this initialization on the boot
CPU, as it was already done.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

cba2f09a

Nov 26, 2013

Reduce number of unnecessary sections in our executable · 03aaf6b8

Nadav Har'El authored 11 years ago


This patch resolves issue #26. As you can see with "objdump -h
build/release/loader.elf", our executable had over a thousand (!)
separate sections, most of them should really be merged.
We already started doing this in arch/x64/loader.ld, but didn't
complete the work.

This patch merges all the ".gcc_except_table.*" sections into one,
and all the ".data.rel.ro.*" sections into one. After this merge,
we are left with just 52 sections, instead of more than 1000.

The default linker script (run "ld --verbose" to see it) also does
similar merges, so there's no reason why we shouldn't.

By reducing the number of ELF sections (each comes with a name, headers,
etc.), this patch also reduces the size of our loader-stripped.elf
by about 140K.

Fixes #26.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

03aaf6b8

xen: move per-cpu interrupt threads to .percpu section · 63d2e472

Dmitry Fleytman authored 11 years ago


Bug fixed by this patch made OSv crash on Xen during boot.
The problem started to show up after commit:

  commit ed808267
  Author: Nadav Har'El <nyh@cloudius-systems.com>
  Date:   Mon Nov 18 23:01:09 2013 +0200

      percpu: Reduce size of .percpu section

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

63d2e472

Nov 21, 2013

Replace numbers in prio.hh by automatically defined numbers · 147de06c

Nadav Har'El authored 11 years ago

prio.hh defines various initialization priorities. The actual numbers
don't matter, just the order between them. But when we add too many
priorities between existing ones, we may hit a need to renumber. This
is plain ugly, and reminds me of Basic programming ;-)

So this patch switches to an enum (enum class, actually).
We now just have a list of priority names in order, with no numbers.

It would have been straightforward, if it weren't for a bug in GCC
(see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59211

) where the
"init_priority" attribute doesn't accept the enum (while the "constructor"
attribute does). Luckily, a simple workaround - explicitly casting to
int - works.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

147de06c

Nov 19, 2013

percpu: Reduce size of .percpu section · ed808267

Nadav Har'El authored 11 years ago


This patch reduces the size of the .percpu section 64-fold from about
5 MB to 70 KB, and solves issue #95.

The ".percpu" section is part of the .data section of our executable
(loader-stripped.elf). In our 15 MB executable, roughly 7 MB is text
(code), and 7 MB is data, and out of that, a whopping 5 MB is the
".percpu" section. The executable is read in real mode, and this is
especially slow on Amazon EC2, hence our wish to make the executable
as small as possible.

The percpu section starts with all the PERCPU variables defined in the
program. We have about 70 KB of those, and believe it or not, most of
this 70 KB is just a single variable, the 65K dynamic_percpu_buffer
(see percpu.cc).

But then, we need a copy of these variables for each CPU. The unpatched
code duplicated this 70KB section 64 times in the executable file (!),
and then used these memory locations for up-to-64 cpus. But there is
no reason to duplicate this data in the executable! All we need to do
is to dynamically allocate a copy of this section for each CPU, and
this is what this patch does.

This patch removes about 5 MB from our executable: After this patch,
our loader-stripped.elf is just 9.7 MB, and its data section's size is
just 2.8 MB.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ed808267

Nov 12, 2013

x64: Fix stack alignment in fault handlers · 9be9dd99

Pekka Enberg authored 11 years ago


Make sure stack pointer is 16-byte aligned in fault handler as required
by x86-64 ABI. This is needed for the page fault handler to be able to
use stack for FPU state save/restore.

Spotted by Nadav.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9be9dd99

Nov 11, 2013

build: fix .tls_template_size calculation · 3527954f

Avi Kivity authored 11 years ago


For an unknown reason, the current calculation of .tls_template_size
yields 0x10 instead of the correct value.  This results in part of the
initial tls block being freed by arch::setup(), and subsequent
corruption.

Fix by switching to the ld SIZEOF() operator.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3527954f

Nov 07, 2013

x64: Use constants for CR0 and CR4 values · f77a8c32

Pekka Enberg authored 11 years ago


Replace magic numbers with constants for CR0 and CR4 control register
values in arch/x86/boot.S.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f77a8c32

Nov 04, 2013

mmu: default value for slop parameter of linear_map() · 9bd5556d

Dmitry Fleytman authored 11 years ago


slop made page size by default because this is the most frequent case

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>

9bd5556d

xen: do not start xen interrupt threads in non-xen guests · b14b78c4

Glauber Costa authored 11 years ago

If we do, we'll have extra xen threads running around for no reason in the !xen
case. is_xen() should already work here since the initializers are run after
cpuid discovery, which is when the information for is_xen is filled up.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b14b78c4

xen: Fix xen_features initialization · 0bdd4bc6

Raphael S. Carvalho authored 11 years ago


Ironically, my commit 0c77d3c2 broke xen_features initialization.
This patch fixes it.

Reported-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphael.scarv@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0bdd4bc6

Nov 01, 2013

x64/xen: Fix assignment to xen_features. · 0c77d3c2

Raphael S. Carvalho authored 11 years ago

The code wouldn't work if XENFEAT_NR_SUBMAPS > 1. It's currently 1.
The assignment to xen_features must take i as well as j into consideration.

Signed-off-by: Raphael S. Carvalho <raphael.scarv@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0c77d3c2

Oct 30, 2013

x64: fix TLS segment alignment · 15d30a11

Avi Kivity authored 11 years ago


The TLS segment is a little wierd in that it grows backwards from percpu_base
instead of forwards.  This causes the alignment code to calculate wrong
offsets when the segment size is 8 (mod 16).  A failure was seen where
::percpu_base was set at offset 0xfffffffffffffa08 in code that was in the
same translation unit as ::percpu_base, and 0xfffffffffffffa10 elsewhere.
This caused all dynamic_percpu instances to crash.

Fix by aligning the segment size.  For good measure, align also the segment
base, both to a cacheline boundary.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

15d30a11

Oct 28, 2013

x64: fix APIC ID register read on QEMU · c19e28f4

Avi Kivity authored 11 years ago


The expression *p >> 24, where p is an unsigned*, is optimized by
the compiler to *((u8*)p + 3) - reading the most significant byte only and
dropping the shift.

When this optimization is applied to reading the APIC ID, QEMU
returns zero for all processors, since the manual requires reading
entire words.

Fix by using a volatile pointer, disabling the optimization.

Note that QEMU is technically correct here though it violates all known
real x86 implementations.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

c19e28f4

x64: fix xapic INIT IPI · e7ba2287

Avi Kivity authored 11 years ago


xapic::init_ipi() shifts apic_id by 24, unaware that xapic::ipi will do it
again.  The result is that the boot processor is reset instead of the
auxiliary processor.

Remove the extraneous shift.

Found by booting with QEMU without kvm.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e7ba2287

Oct 24, 2013

arch-cpu.hh: Fix arch_thread forward declaration · ba2abdf7

Pekka Enberg authored 11 years ago


Spotted by Clang:

../../arch/x64/arch-cpu.hh:57:1: error: 'arch_thread' defined as a struct here but previously
      declared as a class [-Werror,-Wmismatched-tags]
struct arch_thread {
^
../../arch/x64/arch-cpu.hh:37:1: note: did you mean struct here?
class arch_thread;
^~~~~
struct

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ba2abdf7

arch-cpu.hh: Fix arch_cpu forward declaration · 766b9719

Pekka Enberg authored 11 years ago


Spotted by Clang:

../../include/sched.hh:278:12: error: class 'arch_cpu' was previously declared as a struct
      [-Werror,-Wmismatched-tags]
    friend class arch_cpu;
           ^
../../arch/x64/arch-cpu.hh:39:8: note: previous use is here
struct arch_cpu {
       ^
../../include/sched.hh:278:12: note: did you mean struct here?
    friend class arch_cpu;
           ^~~~~
           struct

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

766b9719

Oct 23, 2013

Fix top of call stack - and treatment of unhandled C++ exceptions · 7fc023e8

Nadav Har'El authored 11 years ago


As noticed by Tomek in issue #64, unhandled C++ exceptions cause OSv to
silently hang, in an endless loop inside the unwinding code.

So this patch fixes the wrong CFI (DWARF Call Frame Information) which
caused the unwinder to loop. We just had a single line of assembly missing:
The topmost frame - the thread's main function - needs to undefine the
saved %rip to prevent going further back. If we don't do that, gdb will
end every "bt" output with a warning "Frame did not save its PC" (but hey,
nobody complained... ;-)), and the unwinding library, will, unfortunately,
go into an endless loop as seen in issue #64.

With this one-line patch, unhandled exceptions now work as expected -
they abort with a message like:

	terminate called after throwing an instance of 'int'
	Aborted

And attaching a debugger you can see exactly where the offending throw came
from (i.e., the stack does *not* unnecessarily unwind when there's nobody
waiting to catch the exception).

This works for uncaught exceptions anywhere - including inside main()
and from constructors when loading the object (before running main()).

"bt" in gdb also no longer ends each stack trace with an error message.
The last frame it shows is "thread_main()".

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>

7fc023e8

Oct 22, 2013

x64: Make dump_register() fault-handler safe · ab623c9f

Pekka Enberg authored 11 years ago


The debug() call can deadlock because it's using boost format. Switch to
debug_ll().

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ab623c9f

x64: Fix missing newline in dump_registers() · e8b33142

Pekka Enberg authored 11 years ago


The debug() format string is missing a newline. Fix that up.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e8b33142

Fix linker error in make mode=debug · 7f9faa65

Tomasz Grabiec authored 11 years ago


This is a workaround for linker error when compiling with -O0

  `.text._Z9safe_loadIcEbPKT_RS0_' referenced in section `.text.fixup'
  of core/mmu.o: defined in discarded section
  `.text._Z9safe_loadIcEbPKT_RS0_[_Z9safe_loadIcEbPKT_RS0_]' of
  core/mmu.o

The safe_load() template is used in both runtime.cc and core/mmu.cc
but the linker keeps it only in one section discarding the other.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>

7f9faa65

Oct 16, 2013

x64: Register dump on GP fault · ca52fa23

Pekka Enberg authored 11 years ago

Dump registers on general protection fault for debugging purposes. Even
if you have gdb available, getting to the exception frame is not always
possible after OSv has crashed.

Example output looks as follows:

registers:
RIP: 0x0000100000b7e913 RFL: 0x0000000000010202 CS: 0x0000000000000008 SS: 0x0000000000000010
RAX: 0xffffc000418ed278 RBX: 0xffffc00041b2c050 RCX: 0x0000000000000004 RDX: 0x0000000000000000
RSI: 0x0000000000000001 RDI: 0x43e0000000000000 RBP: 0x0000200008548d10 R8: 0xffffc000426e3010
R9: 0x0000000000000004 R10: 0x43e0000000000000 R11: 0xffffc00041b2c050 R12: 0xffffc000418ed1e8
R13: 0x0000000000000004 R14: 0x43e0000000000000 R15: 0xffffc00041b2c050 RSP: 0x0000200008548aa0
general protection fault

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ca52fa23

Oct 11, 2013

x64: Fix nested exception debugging · 82301253

Pekka Enberg authored 11 years ago


As of commit a449b889 ("x64: Enable sleeping in fault context") it's now
safe for another thread to enter a fault handler on the same CPU.  Fix
exception guard to reflect that.

This is needed for demand paging where a page fault from another thread
can happen on the same CPU where a thread is sleeping in the page fault
handler.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

82301253

Oct 10, 2013

build: define _KERNEL everywhere · 95ce17e3

Avi Kivity authored 11 years ago

We have _KERNEL defines scattered throughout the code, which makes
understanding it difficult.

Define it just once, and adjust the source to build.

We define it in an overridable variable, so that non-kernel imported code
can undo it.

95ce17e3

Oct 01, 2013

x64: Enable sleeping in fault context · a449b889

Pekka Enberg authored 11 years ago


In preparation for enabling demand paging, enable sleeping in fault
context by using a per-thread exception stack for normal faults and
per-CPU exception stack for nested faults.

Avi Kivity explains:

  Before [demand paging] can even hope to work, we need to enable
  sleeping in fault context.  Right now each cpu has its own exception
  stack, which leads immediately to stack corruption:

  thread 1 faults
  enters exception stack
  tries to take mutex
  scheduler switches to thread 2
  thread 2 faults
  enters same exception stack

  So we need to switch stacks.  This can be done in the same way as for
  interrupt stacks (see thread::switch_to()).

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

a449b889

Sep 30, 2013

kvmclock: Implement support for old kvmclock MSRs. · e4ceea63

Venkatesh Srinivas authored 11 years ago


Older versions of KVM and user VMMs expose kvmclock MSRs at different
MSR offsets. Detect the old flag in kvmclock::probe() and use the old
MSRs if they are the only ones available.

Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

e4ceea63

Sep 21, 2013

xen: use c++ interrupt handler · ea4cb9f6

Glauber Costa authored 11 years ago

Now that we have an efficient interrupt handler, use it.No need to delete the
old bsd code, just to avoid disrupting the file too much. Make sure through
an assertion that it is never used, though.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

ea4cb9f6

xen: rework interrupt handler · a837afe5

Glauber Costa authored 11 years ago


This version of the Xen interrupt handler tries to do as less work as possible
in the interrupt itself. The previous version and my previous fix attempt would
still clean the channels during interrupt.

Because now we have pending_sel still set in the irq thread, we can ditch
_irq_pending completely.

There is now only one xen_irq for the entire system, and therefore I am
registering one per cpu, since we will eventually have to process this in
different cpus. (for different event channels).

With this, in my (very course, host to guest) netperf test, I am achieving
9600 * 10^6 bps, while linux can reach ~10000 * 10^bps. So we're getting close:

Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 65536  16384  16384    10.00    9589.32

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

a837afe5

xen: declare shared types as atomic · de6ba640

Glauber Costa authored 11 years ago

Some of the fields in the xen shared structure need to be accessed atomically.
Move them to std::atomic so we can do that using C++11 primitives.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>

de6ba640

Sep 18, 2013

percpu: use correct percpu sect size to support 64 vcpus · 439bad31

Sasha Levin authored 11 years ago


percpu had too little space allocated to support 64 vcpus, which
lead to a crash when booting with more than 13 vcpus. Fix it by
using a correct size to support 64 vcpus.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

439bad31

Sep 15, 2013

Add Cloudius copyright to everything in arch/x64 · 6e918b0c

Nadav Har'El authored 11 years ago

Add Cloudius copyright to everything in arch/x64. This includes C++ code, assembly
code, and ld scripts.

6e918b0c

Sep 12, 2013
- Support for Xen w/o vector callbacks · 1d3e336c
  Dmitry Fleytman authored 11 years ago
  
  This patch implements GSI interrupt support for Xen bus. Needed in Xen environments w/o vector callbacks for HVM. One example of such an environment is Amazon EC2.
  1d3e336c
- Logic for GSI level triggered interrupt added · aeb82f51
  Dmitry Fleytman authored 11 years ago
  
  aeb82f51
Sep 11, 2013
- XAPIC support implemented · aa98b306
  Dmitry Fleytman authored 11 years ago
  
  XAPIC is supported as a fall-back when X2APIC is not available
  aa98b306
Sep 05, 2013

boot16.S: open up space for partition table · 4a6d51d5

Glauber Costa authored 11 years ago

Because we will be copying the bootloader code to the beginning of the disk, make
sure we won't step over the partition table space. This is technically not needed
if the code is small enough, but this guard code will 1) make sure that doesn't
happen, and 2) make sure the space is zeroed out.

The signature though, is needed, and is set to the bytes "O", "S" and "V", which
will span VSO in the end.

4a6d51d5

bootloader: move count32 variable · fcf173eb
Glauber Costa authored 11 years ago
```
It currently sits in the middle of the partition table. Move it to a safer
location.
```
fcf173eb

acpi: move table initialization to its own constructor · bf15592d

Glauber Costa authored 11 years ago

Right now we are doing it right before we parse the MADT, but this is by far
not MADT specific. Other users are planned, and the best way to resolve the
disputes is to have it in a separate constructor

bf15592d

Aug 28, 2013

work around xen x2apic bug · cc3d517a

Glauber Costa authored 11 years ago

The x2APIC specification says that reading from the X2APIC_ID MSR should return
the physical apic id of the current processor. However, the Xen implementation
(as of 4.2.2) is broken, and reads actually return old style xAPIC id. Even if
they fix it, we still have HVs deployed around that will return the wrong ID.
We can work around this by testing if the returned APIC id is in the form (id
<< 24), since in that case, the first 24 bits will all be zeroed. Then at least
we can get this working everywhere. This may pose a problem if we want to ever
support more than 1 << 24 vCPUs (or if any other HV has some random x2apic
ids), but that is highly unlikely anyway.

cc3d517a

apic: bringup cpus individually instead of all at the same time · 5cb16020

Glauber Costa authored 11 years ago

As I have described in a previous patch, the Xen hypervisor has a very nasty
bug that causes all of the x2apic msr writes to trigger a GPF. Although the
request proceeds fine despite the GPF, it does bring a problem for all-but-self
style init sequences we are using: after "failing" (succeeding but returning
failure) to deliver the interrupt for the first cpu in the group, xen will
break the loop, therefore not delivering the SIPIs to other cpus in the system
at all. We can work around that by delivering interrupts to each cpu
individually, instead of all-but-self.

5cb16020

implement wrmsr_safe · a7ea5784

Glauber Costa authored 11 years ago

Unfortunately, the Xen hypervisor has a very nasty bug (seems to be fixed by a
2013 patch - which means that although it is fixed, a lot of hypervisors will
have it), that causes all of the x2apic msr writes to init related registers
(INIT, SIPI, etc) trigger a GPF. The way to work around this, is to implement a
form of "wrmsr_safe".

a7ea5784