Commits · 530d00cce0b1def7601519b6ecffaa31c14600a0 · Verlässliche Systemsoftware / projects / osv

Jan 13, 2014

alarm: prepare for syscall interruption implementation · 530d00cc

Dmitry Fleytman authored 11 years ago


Move alarm cancellation logic to a separate function
cancel_alarm_ll(). This fucntion will be called on
thread destruction to avoid post-mortem alarms.

Also move alarm scheduling logic to a separate function
set_alarm_ll() to make code symmetric.

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

530d00cc

api: Add osv::version for querying OSv version · 53399b86

Amnon Heiman authored 11 years ago


The uname() function returns a fake Linux version number for application
compatibility.  Add a new osv::version() API that returns OSv version
that can be used by the management code.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
[ penberg: cleanups ]
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

53399b86

debug.py: fix output parsing for addr2line ver. 2.23.52.0.1-9.fc19 · 0b97eeb6

Tomasz Grabiec authored 11 years ago


Pekka noticed that his addr2line responds with '?? ??:0' on
unknown symbol, whereas mine responds with '??\n??:0'.

Reported-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0b97eeb6

trace: show all digits of timestamp · 1c04a96d

Tomasz Grabiec authored 11 years ago

It is easier to use that timestamp as argument to --since and --until
of the prof command, which accept full timestamp in nanoseconds.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

1c04a96d

trace: separate backtrace elements with a comma rather than space · c3473ce1

Tomasz Grabiec authored 11 years ago


Affects output of 'trace.py list -b' and 'osv trace' GDB
commands. Makes the trace more readable when filename is also
included.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c3473ce1

trace: add command for printing tracepoint hit profile · 2808e2df

Tomasz Grabiec authored 11 years ago


The new 'prof' command shows a hit profile of arbitrary tracepoint. Many
options accepted by 'prof-wait' are also accepted by this command.

Example:

  scripts/trace.py prof -t sched_wait

=== Thread 0xffffc0003eaeb010 ===

(100.00%, #7696) All
 |-- (99.51%, #7658) sched::thread::do_wait_until
 |    |-- (83.38%, #6417) condvar::wait(lockfree::mutex*, unsigned long)
 |    |    condvar_wait
 |    |    |-- (81.21%, #6250) cv_timedwait
 |    |    |    txg_delay
 |    |    |    dsl_pool_tempreserve_space
 |    |    |    dsl_dir_tempreserve_space
 |    |    |    dmu_tx_try_assign
 |    |    |    dmu_tx_assign
 |    |    |    |-- (81.19%, #6248) zfs_write
 |    |    |    |    vfs_file::write(uio*, int)
 |    |    |    |    sys_write
 |    |    |    |    pwritev
 |    |    |    |    writev
 |    |    |    |    __stdio_write
 |    |    |    |    __fwritex
 |    |    |    |    fwrite
 |    |    |    |    0x100000005a5f
 |    |    |    |    osv::run(std::string, int, char**, int*)

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2808e2df

trace: add command for printing wait profile · ca3be96c

Tomasz Grabiec authored 11 years ago


See

  scripts/trace.py prof-wait -h

The command is using sched_wait and sched_wait_ret tracepoints to
calculate the amount of time a thread was waiting. Samples are
collected and presented in a form of call graph tree.

By default callees are closer to the root. To inverse the order pass
-r|--caller-oriented.

If there is too much output, it can be narrowed down using
--max-levels and --min-duration options.

The presented time spectrum can be narrowed down using --since and --until
options which accept timestamps.

Example:

  scripts/trace.py prof-wait --max-levels 3 trace-file

=== Thread 0xffffc0003eaeb010 ===

12.43 s (100.00%, #7696) All
 |-- 12.43 s (99.99%, #7658) sched::thread::do_wait_until
 |    |-- 10.47 s (84.22%, #6417) condvar::wait(lockfree::mutex*, unsigned long)
 |    |    condvar_wait
 |    |    |-- 6.47 s (52.08%, #6250) cv_timedwait
 |    |    |    txg_delay
 |    |    |    dsl_pool_tempreserve_space
 |    |    |    dsl_dir_tempreserve_space
 |    |    |    dmu_tx_try_assign
 |    |    |    dmu_tx_assign
 |    |    |
 |    |    |-- 2.37 s (19.06%, #24) arc_read_nolock
 |    |    |    arc_read
 |    |    |    dsl_read
 |    |    |    traverse_visitbp
 |    |    |
 |    |    |-- 911.75 ms (7.33%, #3) txg_wait_open
 |    |    |    dmu_tx_wait
 |    |    |    zfs_write
 |    |    |    vfs_file::write(uio*, int)
 |    |    |    sys_write
 |    |    |    pwritev
 |    |    |    writev
 |    |    |    __stdio_write
 |    |    |    __fwritex
 |    |    |    fwrite
 |    |    |    0x100000005a5f
 |    |    |    osv::run(std::string, int, char**, int*)

By default every thread has a separate tree, because duration is best
interpreted in the context of particular thread. There is however an
option to merge samples from all threads into one tree:
-m|--merge-threads. It may be useful if you want to inspect all paths
going in/out to/from particular function. The direction can be changed
with -r|--caller-oriented option. Function names is passed to --function
parameter.

Example: check where zfs_write() blocks:

  scripts/trace.py prof-wait -rm --function=zfs_write trace-file

7.46 s (100.00%, #7314) All
 zfs_write
 |-- 6.48 s (86.85%, #6371) dmu_tx_assign
 |    |-- 6.47 s (86.75%, #6273) dmu_tx_try_assign
 |    |    dsl_dir_tempreserve_space
 |    |    |-- 6.47 s (86.75%, #6248) dsl_pool_tempreserve_space
 |    |    |    txg_delay
 |    |    |    cv_timedwait
 |    |    |    condvar_wait
 |    |    |    condvar::wait(lockfree::mutex*, unsigned long)
 |    |    |    sched::thread::do_wait_until
 |    |    |
 |    |    |-- 87.87 us (0.00%, #24) mutex_lock
 |    |    |    sched::thread::do_wait_until
 |    |    |
 |    |    \-- 6.40 us (0.00%, #1) dsl_dir_tempreserve_impl
 |    |         mutex_lock
 |    |         sched::thread::do_wait_until
 |    |
 |    \-- 7.32 ms (0.10%, #98) mutex_lock
 |         sched::thread::do_wait_until
 |
 |-- 911.75 ms (12.22%, #3) dmu_tx_wait
 |    txg_wait_open
 |    condvar_wait
 |    condvar::wait(lockfree::mutex*, unsigned long)
 |    sched::thread::do_wait_until

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ca3be96c

sched: add tracepoint for resuming from wait · 60f39aea

Tomasz Grabiec authored 11 years ago


Useful for calculating time during which thread
was scheduled out because of wait().

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

60f39aea

trace: add source address formatting options · 52432f51

Tomasz Grabiec authored 11 years ago


Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

52432f51

Introduce python module for tree printing · 2efddcd4

Tomasz Grabiec authored 11 years ago


Prints hierarchical data in a form of tree:

Example:

Node0
 |-- Node1
 |    |-- Node2
 |    |
 |    \-- Node3
 |
 \-- Node4

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2efddcd4

trace: add command-line tool for processing trace files · 7eab3eef

Tomasz Grabiec authored 11 years ago


For all options check:

  scripts/trace.py -h

Example: list all traces:

  scripts/trace.py list trace-file

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7eab3eef

gdb: introduce command to save traces in binary form · 5e304774

Tomasz Grabiec authored 11 years ago


Usage:

  osv trace save <filename>

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

5e304774

trace: support persisting trace stream to file · 84f139e3

Tomasz Grabiec authored 11 years ago


Having traces persisted in a file has the follwing advantages:

 - trace collection is decoupled from trace analysis. trace analysis
   tools no longer have to be part of GDB scripts. This is important
   for EC2 instances, to which we cannot connect with GDB. In this
   case trace data can be downloaded using other channels, but tools
   remain unchanged.

 - trace analysis doesn't have to be performed in the same OSv session.
   After trace is collected OSv can be shut down or left running without
   further disturbance.

The trace is saved in a binary format of the following structure:

  <format-version>
  <N_tp>
  <tracepoint 1>
  ...
  <tracepoint N_tp>
  <trace 1>
  <trace 2>
  ...
  <trace N_t>

Where:

  <tracepoint X> = <tp_key><name><signature><format>
  <trace X> = <tp_key><thread><time><cpu>(<backtrace_addr>*)<0><data>

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

84f139e3

gdb: extract trace related abstractions to an independent module · 6e10ed18

Tomasz Grabiec authored 11 years ago


Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

6e10ed18

Add python wrapper for addr2line · da3a3676

Tomasz Grabiec authored 11 years ago


Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

da3a3676

gdb: resolve tracepoint_base outside the loop · c790dae6

Tomasz Grabiec authored 11 years ago


Speeds things up slightly.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c790dae6

gdb: support backtraces shorter than backtrace_len · 8e0e6c5b

Tomasz Grabiec authored 11 years ago


If there are fewer frames than backtrace_len the remaining
addresses will be zeros. Let's not print them.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8e0e6c5b

gdb: make Trace convertible to string · b567109e

Tomasz Grabiec authored 11 years ago


Useful for ad-hoc printing.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b567109e

time.cc: generalize the std::chrono -> timespec conversion · 692a2a9c

Nadav Har'El authored 11 years ago

The code "file_ts" converting a std::chrono::duration into a struct timespec,
was unnecessarily constrained to take as input a std::chrono::nanoseconds,
whereas it can actually work with durations of any resolution (the
duration_cast will convert it to the correct units), so this patch makes
this generalization which, I think, makes the code clearer.

At first I decided to move this general function into <osv/clock.hh> and
make it a bit more friendly (returning a timespec, instead of writing into
one), but at the end, I decided not to move it. typespec is a dangerous
type because though it does specify the clock's resolution (always
nanoseconds) it doesn't specify the clock's epoch, so one can make mistakes
like calling clock_gettime(CLOCK_MONOTONIC), and pass the returned timespec
to pthread_cond_timedwait, which expects CLOCK_REALTIME not CLOCK_MONOTONIC.
So we should avoid timespec whenever we can, and I don't think we'll need
the fill_ts() function anywhere outside time.cc.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

692a2a9c

Add patch command backup files to gitignore · ad941117

Pekka Enberg authored 11 years ago

To prevent similar mistakes as in commit cc6b3a3c ("vga: Handle line
feed"), add the backup files generated by 'patch' to gitignore.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ad941117

Jan 10, 2014

Remove vga.cc.orig committed by mistake · 2f7ed1df

Pekka Enberg authored 11 years ago


I messed up in commit cc6b3a3c ("vga: Handle line feed"). Clean it up.

Reported-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2f7ed1df

loader: fix spurious error message · 4931c574

Nadav Har'El authored 11 years ago


When trying to run a nonexistant program, e.g., "run.py -e xyz", sometimes
(especially in a debug build) we would see the spurious message:

    program xyz returned -16384

The bug is simple: when osv::run returns null, meaning the program was not
found, it does not set the "ret" value. Our code checked if ret!=0 before
checking if osv::run actually found the program, which is wrong. Simply
changing the order of the code solves this bug.

Fixes #156.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4931c574

loader: Add '--vga' to switch console · 66d02657

Takuya ASADA authored 11 years ago


Add --vga on cmdline, to switch vga console.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

66d02657

VGA: Add input_ready() and readch() stubs · 555c8f9a

Takuya ASADA authored 11 years ago


To use VGAConsole class, add empty input_ready() and readch().

Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

555c8f9a

vga: Handle line feed · cc6b3a3c

Takuya ASADA authored 11 years ago

Handle line feed in the VGA. To access termios flag, add a constructor.

Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

cc6b3a3c

vga: Fix VGA base address · 8e0bcb66

Takuya ASADA authored 11 years ago


Physical memory base address is 0xffffc00000000000, so VGA base address
should be 0xffffc000000b8000.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8e0bcb66

tests: add test for thread clock · 89c50e8a

Glauber Costa authored 11 years ago


Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

89c50e8a

libc: support more time modes · bcc2fbb8

Glauber Costa authored 11 years ago


We are currently only answering requests for CLOCK_REALTIME, but we could
easily handle:

    * CLOCK_REALTIME_COARSE, which is effective the same as CLOCK_REALTIME
      but faster. In our case, all time sources are equally fast
    * CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, since we can
      easily get runtimes for our threads and publish that.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

bcc2fbb8

x64: Drop APIC base boot message · ba7250c9

Pekka Enberg authored 11 years ago


The "APIC base" message is not very useful to users. Drop it.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ba7250c9

x64: Simplify CPU bringup boot message · d291601d

Pekka Enberg authored 11 years ago


Currently, OSv prints out the following at boot:

  acpi 0 apic 0
  acpi 1 apic 1
  acpi 2 apic 2
  acpi 3 apic 3

replace that with a simpler message:

  4 CPUs detected

We do lose the ACPI ID -> CPU ID mapping but it is not terribly
important for users.

Suggested-by: Nadav Har'El <nyh@cloudius-systems.com>
Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d291601d

bsd: Simplify networking init message · c69e2d34

Pekka Enberg authored 11 years ago

Simplify networking boot initialization message as suggested by Tzach.

Suggested-by: Tzach Livyatan <tzach@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c69e2d34

loader: Update Cloudius copyright · 7740d2cb
Pekka Enberg authored 11 years ago
```
It's 2014 now.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
7740d2cb

jvm: set max_heap to all available memory. · 8ea89c9c

Glauber Costa authored 11 years ago

We respect -Xmx when instructed by the user, but when that is left blank, we
set that to be all remaining memory that we have. That is not 100 % perfect
because the JVM itself will use some memory, but that should be good enough of
an estimate. Specially given that some of the memory currently in use by OSv
could be potentially freed in the future should we need it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8ea89c9c

jvm_balloon: disable balloon upon jvm memory pressure. · 0034af3f

Glauber Costa authored 11 years ago

The biggest problem I am seeing with the balloon is that right now the only
time we call the balloon is when we're seeing memory pressure. If pressure is
coming from the JVM, we can livelock in quite interesting ways. We need to
detect that and disable the ballon in those situations, since ballooning when
the pressure comes from the JVM will only trash our workloads.

It's not yet working reliably, but this is the direction I plan to start from.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0034af3f

mm: Count total memory used by the JVM heap · 478f8746

Glauber Costa authored 11 years ago

To make informed reclaim decisions, we need to have as much relevant
information as possible about our reclaim targets. Specifically, it
is useful to know how much memory is currently used by the JVM heap.

The reasoning behind this is that if pressure is coming from the heap,
ballooning will harm us, instead of helping us.

Note: This is really just a first approximation. Ideally, total memory
shouldn't matter, but rather memory delta since a last common event.
But counting memory is the initial first step for both.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

478f8746

jvm: insert probe · b32a006b

Glauber Costa authored 11 years ago

To find out which vmas hold the Java heap, we will use a technique that is very
close to ballooning (in the implementation, it is effectively the same)

What we will do is we will insert a very small element (2 pages), and mark the
vma where the object is present as containing the JVM heap. Due to the way the
JVM allocates objects, that will end up in the young generation. As time
passes, the object will move the same way the balloon moves, and every new vma
that is seen will be marked as holding the JVM heap.

That mechanism should work for every generational GC, which should encompass
most of the JDK7 GCs (it not all). It shouldn't work with the G1GC, but that
debuts at JDK8, and for that we can do something a lot simpler, namely: having
the JVM to tell us in advance which map areas contain the heap.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b32a006b

java: memory pressure monitor · 88343714

Glauber Costa authored 11 years ago

The best possible criteria for deflating balloons is heap pressure: Whenever
there is pressure in the JVM, we should give back memory so pressure stops.

To accomplish that, we need to somehow tap into the JVM. This patch register
a MXBean that will send us notifications about collections. We will ignore
minor collections and act upon major collections by deflating any existing
balloons.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

88343714

jvm_balloon: control shrinker activation / deactivation · 52cb4738

Glauber Costa authored 11 years ago

There are restrictions on when and how a shrinker can run. For instance, if we
have no balloons inflated, there is nothing to deflate (the relaxer should,
then, be deactivated). Or also, when the JVM fails to allocate memory for an
extra balloon, it is pointless to keep trying (which would only lead to
unnecessary spins) until *at least* the next garbage collection phase.

I believe this behavior of activation / deactivation ought to be shrinker
specific. The reclaiming framework will only provide the infrastructure to do
so.

In this patch, the JVM Balloon uses that to inform the reclaimer when it makes
sense for the shrinker or relaxer to be called.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

52cb4738

JVM ballon driver · 9c59e7e8

Glauber Costa authored 11 years ago

This patch implements the JVM balloon driver, that is responsible for borrowing
memory from the JVM when OSv is short on memory, and giving it back when we are
plentiful. It works by allocating a java byte array, and then unmapping a large
page-aligned region inside it (as big as our size allows).

This array is good to go until the GC decides to move us. When that happens, we
need to carefuly emulate the memcpy fault and put things back in place.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9c59e7e8

mmu: implement a new JVM vma · b657d2b3

Glauber Costa authored 11 years ago


After carrying on some testing, I quickly realized that the old fixup-only
solution I was attempting for the ballooning was not really flying. The reason
for that, is that we would take a fault, figure out the fixup address, and
return.  If that wasn't a JVM fault, we were forced to take another fault
(since we were already out of fault context).

Once demand paging is a reality, the vast majority of the faults are for non
balloon addresses, so we were effectively doubling our number of page faults
for no reason. I have decided to go with the VMA (+fixups for instruction
decoding) route after all. This is way more efficient and it seems to be
working fine.

The JVM vma is really close to the normal anonymous VMA. Except that it can
never hold pages, and its fault handler calls into the JVM balloon facilities
for decoding.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b657d2b3