Commits · 49497bdb689b4d899f5d1691559c15035fc03a0f · Verlässliche Systemsoftware / projects / osv

Sep 11, 2013
- Support clock_gettime() through syscall() · 49497bdb
  Nadav Har'El authored 11 years ago
  
  Strangely, C++11's new std::chrono::system_clock::now() (which I wanted to use in a test case) calls clock_gettime() not through the function, but with the syscall() interface. So add support for that too.
  49497bdb
- Add reboot function · 542c319b
  Nadav Har'El authored 11 years ago
  
  Added a new function, osv::reboot() (declared in <osv/power.hh>) for rebooting the VM. Also added a Java interface - com.cloudius.util.Power.reboot(). NOTE: Power.java and/or jni/power.cc also need to be copied into the mgmt submodule.
  542c319b
- mutex: make the constructor constexpr · a919d5f4
  Avi Kivity authored 11 years ago
  
  Statically allocated mutexes are very common. Make the mutex constructor constexpr to ensure that a statically allocated mutex is initialized before use, even if that use is from static constructors.
  a919d5f4
- switching mgmt submodule to latest mgmt master · b7dc6b67
  narkisr authored 11 years ago
  
  b7dc6b67
- adding internal jmx server launch, disabled now since OSv fails to start it · e2bef725
  narkisr authored 11 years ago
  
  e2bef725
- moving all web related folders to usr.manifest and setting all folders paths to /usr · 6ea1a813
  narkisr authored 11 years ago
  
  6ea1a813
- latest commits · ba6ba3b1
  narkisr authored 11 years ago
  
  ba6ba3b1
Sep 10, 2013

gdb: Fix osv mmap memory layout · 9ee0d615
Pekka Enberg authored 11 years ago
```
Fix up memory layout of 'class vma' for 'osv mmap' gdb command.
```
9ee0d615

mmu: Fix file-backed vma splitting · d72b550c

Pekka Enberg authored 11 years ago

Commit 3510a5ea ("mmu: File-backed VMAs") forgot to fix vma::split() to
take file-backed mappings into account. Fix the problem by making
vma::split() a virtual function and implementing it separately for
file_vma.

Spotted by Avi Kivity.

d72b550c

Added basic readline configuration · c5c4534c

Or Cohen authored 11 years ago

Parsed by JLine (in CRaSH)
Console should now better understand keys like home/end/arrows

c5c4534c

Merge branch 'stty-for-jni' · 8b0ea169
Or Cohen authored 11 years ago

8b0ea169
Added stty JNI call for ioctl flags needed by JLine · 6b548713
Or Cohen authored 11 years ago

6b548713

DHCP: Fix crash · 68f4d147

Nadav Har'El authored 11 years ago

Rarely (about once every 20 runs) we had OSV crash during boot, in the
DHCP code. It turns out that the code first sends out the DCHP requests,
and then creates a thread to handle the replies. When a reply arrives,
the code wake()s the thread, but on rare occasions the thread hasn't yet
been set up (still a null pointer) so we have a crash.

Fix this by reversing the order - first create the reply handling thread,
and only then send the request.

68f4d147

Sep 09, 2013
- gdb: add 'osv trace2file' · 32ff60e4
  Guy Zana authored 11 years ago
  
  use to dump tracepoints to a file - trace.txt, x100 faster ;)
  32ff60e4
Sep 08, 2013

Scheduler: Fix load-balancer bug · e9f0cf29

Nadav Har'El authored 11 years ago

The load_balance() code checks if another CPU has fewer threads in its
run queue than this thread, and if so, migrates one of this CPU's threads
to the other CPU.

However, when we count this core's runnable threads, we overcount it by
1, because as soon as load_balance() goes back to sleep, one of the
runnable threads will start running. So if this core has just one more
runnable threads than some remote's core runnable threads, they are
actually even, so in that case we should *not* migrate a thread.

Overcounting the number of threads on the core running load_balance
caused bad performance in 2-core and 2-thread SpecJVM: Normally, the
size of the run queue on each core is 1 (each core is running one of
the two threads, and on the run queue we have the idle thread). But
when load_balance runs it sees 2 runnable threads (the idle thread and
the preempted benchmark thread), and the second core has just 1, so
it decides to migrate one of its threads to the second CPU. When this
is over, the second CPU has both benchmark threads, and the first CPU
has nothing, and this will only be fixed some time later when the
second CPU's load_balance thread runs, and later the balance will be
ruined again. All this time that the two threads run on the same CPU
significantly hurt performance, and on the host's "top" we see qemu
taking just 120%-150% instead of 200% as it should (and as it does
after this patch).

e9f0cf29

Scheduler: Avoid vruntime jump when clock jumps · 253e4536

Nadav Har'El authored 11 years ago

Currently, clock::get()->time() jumps (by system_time(), i.e., the host's
uptime) at some point during the initialization. This can be a huge jump
(e.g., a week if the host's uptime is a week). Fixing this jump is hard,
so we'd rather just tolerate it.

reschedule_from_interrupt() handles this clock jump badly. It calculates
current_run, the amount of time the current thread has run, to include this
jump while the thread was running. In the above example, a run time of
a whole week is wrongly attributed to some thread, and added to its vruntime,
causing it not to be scheduled again until all other threads yield the
CPU.

The fix in this patch is to limit the vruntime increase after a long
run to max_slice (10ms). Even if a thread runs for longer (or just thinks
it ran for longer), it won't be "penalized" in its dynamic priority more
than a thread that ran for 10ms. Note that this cap makes sense, as
cpu::enqueue already enforces a similar limit on the vruntime "bonus"
of a woken thread, and this patch works toward a similar goal (avoid
giving one thread a huge bonus because another thread was given a huge
penalty).

This bug is very visible in the CPU-bound SPECjvm2008 benchmarks, when
running two benchmark threads on two virtual cpus. As it happens, the
load_balancer() is the one that gets the huge vruntime increase, so
it doesn't get to run until no other thread wants to run. Because we start
with both CPU-bound threads on the same CPU, and these hardly yield the
CPU (and even more rarely are the two threads sleeping at the same time),
the load balancer thread on this CPU doesn't get to run, and the two threads
remain on the same CPU, giving us halved performance (2-cpu performance
identical to 1-cpu performance) and on the host we see qemu using 100% cpu,
instead of 200% as expected with two vcpus.

253e4536

run.py: fix image name, we now write arguments to usr.img · a8d3a5ca
Guy Zana authored 11 years ago

a8d3a5ca
run_elf: add a debug print before running an elf (easy debugging) · 6bdae19f
Guy Zana authored 11 years ago

6bdae19f
tests: add a tcp hash server that can test multiple TCP streams · 962eed70
Guy Zana authored 11 years ago

962eed70

tests: tcp send-only test · e51ef872

Guy Zana authored 11 years ago

a test where the guest connects to the host and sends a small packet of data.
used to verify that retransmits is working in Van Jacobson and the TCP
stack in general.

e51ef872

DHCP: start eth0 and configure dhcp by default before running the payload · d4d8f014
Guy Zana authored 11 years ago

d4d8f014
DHCP: add an option to wait for an IP and make it the default · 6be8f6b0
Guy Zana authored 11 years ago

6be8f6b0

build: fix sizing the image during a clean build · 9c10b784

Avi Kivity authored 11 years ago

The shell call to stat(1) is evaluted when the rule to build the image is
triggered, at which point loader-stripped.elf does not exist yet. This
causes stat to fail and the build to break.

Fix by moving the creation of loader-stripped.elf to its own target, so
that by the time the recipe is evaluated, the file is known to exist.

9c10b784

Sep 06, 2013
- Removed bash/zsh specific arithmetic expansion · 484dee50
  Or Cohen authored 11 years ago
  
  Changed "$[]" to "$(())" when calculating zfs start/size
  484dee50
Sep 05, 2013

build adaptions for single image · 16b47261
Glauber Costa authored 11 years ago

16b47261

blkfront: mark device ready earlier · 7b0354b9

Glauber Costa authored 11 years ago

We cannot read the partition table from the device if the device is not marked
as ready, since all IO will stall. I believe it should be fine to just mark the
device ready before we mark our state as connected. With that change, it all
proceed normally.

7b0354b9

call read partition table · b3e47d9a

Glauber Costa authored 11 years ago

I would like to call read_partition_table automatically from device_register,
which would guarantee that every device that comes up have its partitions
scanned. Although totally possible on KVM, it is not possible on Xen, due to
the assynchronous nature of the bringup protocol: the device is exposed and
created in a moment where IO is not yet possible, so reading the partition
table will fail. Just read them both from the drivers when we are sure the
driver is ready.

b3e47d9a

read partition table · 7fb8b99b

Glauber Costa authored 11 years ago

This code, living in device.c for maximum generality, will read the partition
table from any disk that calls it. Ideally, each new device would have its own
private data. But that would mean having to callback to the driver to set each
of the partitions up. Therefore, I found it easier to convention that all
partitions in the same drive have the same private data. This makes some sense
if we consider that the hypervisors are usually agnostic about partitions, and
all of the addressing and communications go through a single entry point, which
is the disk.

7fb8b99b

add offset calculation · cd14aecc

Glauber Costa authored 11 years ago

To support multiple partitions to a disk, I found it easier to add a
post-processing offset calculation to the bio just before calling the strategy.

The reason is, we have many (really many) entry points for bio preparation
(pre-strategy) and only two entry points for the strategy itself (the drivers).
Since multiplex_strategy is a good thing to be used even for virtio (although I
am not converting it now), since it allows for arbitrary sized requests, we
could very well reduce it to just one.

At this moment, the offset is always 0 and everything works as before.

cd14aecc

blk: derive size information from device · bfff3c6a

Glauber Costa authored 11 years ago

Currently we get it from the private data, but since I plan to use the same
private data for all partitions, we need a unique value, that already exists in
the device. So use it.

bfff3c6a

boot16.S: open up space for partition table · 4a6d51d5

Glauber Costa authored 11 years ago

Because we will be copying the bootloader code to the beginning of the disk, make
sure we won't step over the partition table space. This is technically not needed
if the code is small enough, but this guard code will 1) make sure that doesn't
happen, and 2) make sure the space is zeroed out.

The signature though, is needed, and is set to the bytes "O", "S" and "V", which
will span VSO in the end.

4a6d51d5

imgedit: extend image editing script to deal with partitions · 716ad81d

Glauber Costa authored 11 years ago

Given a partition size and start address, this will edit the image passed as parameter
to create a partition entry. This assumes the disk is always bigger than 8Gb while setting
the CHS address. From osdev wiki:

"For drives smaller than 8GB, the LBA fields and the CHS fields must "match"
when the values are converted into the other format. For drives bigger than
8GB, generally the CHS fields are set to Cylinder = 1023, Head = 254 or 255,
Sector = 63 -- which is considered an invalid setting."

716ad81d

use stripped loader for size calculation · b6e0120f

Glauber Costa authored 11 years ago

This is the size that goes in our bootloader count32. But since we will be
copying over the stripped binary anyway, we are probably reading too much data,
for no reason. That should increase boot time a bit.

b6e0120f

bootloader: move count32 variable · fcf173eb
Glauber Costa authored 11 years ago
```
It currently sits in the middle of the partition table. Move it to a safer
location.
```
fcf173eb

hpet clock driver · e2991fce

Glauber Costa authored 11 years ago

This patch implement the HPET clock driver, that should work as a fallback for
both Xen and KVM, in case the paravirtual clock is not present. This is
unfortunately the situation for all HVM guests running on EC2, so support for
this is paramount. I have tested on KVM forcing the kvmclock to disappear, and
it seems to work all right.

e2991fce

acpi: move table initialization to its own constructor · bf15592d

Glauber Costa authored 11 years ago

Right now we are doing it right before we parse the MADT, but this is by far
not MADT specific. Other users are planned, and the best way to resolve the
disputes is to have it in a separate constructor

bf15592d

trivial: remove dangling reference to xb_strategy · 5524f1ef
Glauber Costa authored 11 years ago

5524f1ef

Sep 04, 2013
- virtio-blk: Use wait_for_queue() in response_worker() · 455c4bec
  Pekka Enberg authored 11 years ago
  
  As a cleanup, use wait_for_queue() like virtio-net does.
  455c4bec
Sep 03, 2013

irq_lock: avoid 'irq_lock defined but not used' warning · 90390cca

Avi Kivity authored 11 years ago

In an attempt to be clever, we define irq_lock as an object in an anonymous
namespace, so that each translation unit gets its own copy, which is then
optimized away, since the object is never touched. But the compiler complains
that the object is defined but not used if we include the file but don't
use irq_lock.

Simplify by only declaring the object there, and defining it somewhere else.

90390cca

Add libinstrument.so and libjli.so to usr.manifest · 057e0e28

Pekka Enberg authored 11 years ago

They are needed by the JVM when "-javaagent" command line option is
used. After this patch, the jamm memory meter javaagent can be enabled
for Cassandra.

057e0e28