- Jan 13, 2014
-
-
Dmitry Fleytman authored
Move alarm cancellation logic to a separate function cancel_alarm_ll(). This fucntion will be called on thread destruction to avoid post-mortem alarms. Also move alarm scheduling logic to a separate function set_alarm_ll() to make code symmetric. Signed-off-by:
Dmitry Fleytman <dmitry@daynix.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Amnon Heiman authored
The uname() function returns a fake Linux version number for application compatibility. Add a new osv::version() API that returns OSv version that can be used by the management code. Signed-off-by:
Amnon Heiman <amnon@cloudius-systems.com> [ penberg: cleanups ] Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Pekka noticed that his addr2line responds with '?? ??:0' on unknown symbol, whereas mine responds with '??\n??:0'. Reported-by:
Pekka Enberg <penberg@cloudius-systems.com> Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
It is easier to use that timestamp as argument to --since and --until of the prof command, which accept full timestamp in nanoseconds. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Affects output of 'trace.py list -b' and 'osv trace' GDB commands. Makes the trace more readable when filename is also included. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
The new 'prof' command shows a hit profile of arbitrary tracepoint. Many options accepted by 'prof-wait' are also accepted by this command. Example: scripts/trace.py prof -t sched_wait === Thread 0xffffc0003eaeb010 === (100.00%, #7696) All |-- (99.51%, #7658) sched::thread::do_wait_until | |-- (83.38%, #6417) condvar::wait(lockfree::mutex*, unsigned long) | | condvar_wait | | |-- (81.21%, #6250) cv_timedwait | | | txg_delay | | | dsl_pool_tempreserve_space | | | dsl_dir_tempreserve_space | | | dmu_tx_try_assign | | | dmu_tx_assign | | | |-- (81.19%, #6248) zfs_write | | | | vfs_file::write(uio*, int) | | | | sys_write | | | | pwritev | | | | writev | | | | __stdio_write | | | | __fwritex | | | | fwrite | | | | 0x100000005a5f | | | | osv::run(std::string, int, char**, int*) Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
See scripts/trace.py prof-wait -h The command is using sched_wait and sched_wait_ret tracepoints to calculate the amount of time a thread was waiting. Samples are collected and presented in a form of call graph tree. By default callees are closer to the root. To inverse the order pass -r|--caller-oriented. If there is too much output, it can be narrowed down using --max-levels and --min-duration options. The presented time spectrum can be narrowed down using --since and --until options which accept timestamps. Example: scripts/trace.py prof-wait --max-levels 3 trace-file === Thread 0xffffc0003eaeb010 === 12.43 s (100.00%, #7696) All |-- 12.43 s (99.99%, #7658) sched::thread::do_wait_until | |-- 10.47 s (84.22%, #6417) condvar::wait(lockfree::mutex*, unsigned long) | | condvar_wait | | |-- 6.47 s (52.08%, #6250) cv_timedwait | | | txg_delay | | | dsl_pool_tempreserve_space | | | dsl_dir_tempreserve_space | | | dmu_tx_try_assign | | | dmu_tx_assign | | | | | |-- 2.37 s (19.06%, #24) arc_read_nolock | | | arc_read | | | dsl_read | | | traverse_visitbp | | | | | |-- 911.75 ms (7.33%, #3) txg_wait_open | | | dmu_tx_wait | | | zfs_write | | | vfs_file::write(uio*, int) | | | sys_write | | | pwritev | | | writev | | | __stdio_write | | | __fwritex | | | fwrite | | | 0x100000005a5f | | | osv::run(std::string, int, char**, int*) By default every thread has a separate tree, because duration is best interpreted in the context of particular thread. There is however an option to merge samples from all threads into one tree: -m|--merge-threads. It may be useful if you want to inspect all paths going in/out to/from particular function. The direction can be changed with -r|--caller-oriented option. Function names is passed to --function parameter. Example: check where zfs_write() blocks: scripts/trace.py prof-wait -rm --function=zfs_write trace-file 7.46 s (100.00%, #7314) All zfs_write |-- 6.48 s (86.85%, #6371) dmu_tx_assign | |-- 6.47 s (86.75%, #6273) dmu_tx_try_assign | | dsl_dir_tempreserve_space | | |-- 6.47 s (86.75%, #6248) dsl_pool_tempreserve_space | | | txg_delay | | | cv_timedwait | | | condvar_wait | | | condvar::wait(lockfree::mutex*, unsigned long) | | | sched::thread::do_wait_until | | | | | |-- 87.87 us (0.00%, #24) mutex_lock | | | sched::thread::do_wait_until | | | | | \-- 6.40 us (0.00%, #1) dsl_dir_tempreserve_impl | | mutex_lock | | sched::thread::do_wait_until | | | \-- 7.32 ms (0.10%, #98) mutex_lock | sched::thread::do_wait_until | |-- 911.75 ms (12.22%, #3) dmu_tx_wait | txg_wait_open | condvar_wait | condvar::wait(lockfree::mutex*, unsigned long) | sched::thread::do_wait_until Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Useful for calculating time during which thread was scheduled out because of wait(). Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Prints hierarchical data in a form of tree: Example: Node0 |-- Node1 | |-- Node2 | | | \-- Node3 | \-- Node4 Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
For all options check: scripts/trace.py -h Example: list all traces: scripts/trace.py list trace-file Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Usage: osv trace save <filename> Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Having traces persisted in a file has the follwing advantages: - trace collection is decoupled from trace analysis. trace analysis tools no longer have to be part of GDB scripts. This is important for EC2 instances, to which we cannot connect with GDB. In this case trace data can be downloaded using other channels, but tools remain unchanged. - trace analysis doesn't have to be performed in the same OSv session. After trace is collected OSv can be shut down or left running without further disturbance. The trace is saved in a binary format of the following structure: <format-version> <N_tp> <tracepoint 1> ... <tracepoint N_tp> <trace 1> <trace 2> ... <trace N_t> Where: <tracepoint X> = <tp_key><name><signature><format> <trace X> = <tp_key><thread><time><cpu>(<backtrace_addr>*)<0><data> Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Speeds things up slightly. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
If there are fewer frames than backtrace_len the remaining addresses will be zeros. Let's not print them. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Useful for ad-hoc printing. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
The code "file_ts" converting a std::chrono::duration into a struct timespec, was unnecessarily constrained to take as input a std::chrono::nanoseconds, whereas it can actually work with durations of any resolution (the duration_cast will convert it to the correct units), so this patch makes this generalization which, I think, makes the code clearer. At first I decided to move this general function into <osv/clock.hh> and make it a bit more friendly (returning a timespec, instead of writing into one), but at the end, I decided not to move it. typespec is a dangerous type because though it does specify the clock's resolution (always nanoseconds) it doesn't specify the clock's epoch, so one can make mistakes like calling clock_gettime(CLOCK_MONOTONIC), and pass the returned timespec to pthread_cond_timedwait, which expects CLOCK_REALTIME not CLOCK_MONOTONIC. So we should avoid timespec whenever we can, and I don't think we'll need the fill_ts() function anywhere outside time.cc. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
To prevent similar mistakes as in commit cc6b3a3c ("vga: Handle line feed"), add the backup files generated by 'patch' to gitignore. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 10, 2014
-
-
Pekka Enberg authored
I messed up in commit cc6b3a3c ("vga: Handle line feed"). Clean it up. Reported-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
When trying to run a nonexistant program, e.g., "run.py -e xyz", sometimes (especially in a debug build) we would see the spurious message: program xyz returned -16384 The bug is simple: when osv::run returns null, meaning the program was not found, it does not set the "ret" value. Our code checked if ret!=0 before checking if osv::run actually found the program, which is wrong. Simply changing the order of the code solves this bug. Fixes #156. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Add --vga on cmdline, to switch vga console. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
To use VGAConsole class, add empty input_ready() and readch(). Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Handle line feed in the VGA. To access termios flag, add a constructor. Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Physical memory base address is 0xffffc00000000000, so VGA base address should be 0xffffc000000b8000. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
We are currently only answering requests for CLOCK_REALTIME, but we could easily handle: * CLOCK_REALTIME_COARSE, which is effective the same as CLOCK_REALTIME but faster. In our case, all time sources are equally fast * CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, since we can easily get runtimes for our threads and publish that. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
The "APIC base" message is not very useful to users. Drop it. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Currently, OSv prints out the following at boot: acpi 0 apic 0 acpi 1 apic 1 acpi 2 apic 2 acpi 3 apic 3 replace that with a simpler message: 4 CPUs detected We do lose the ACPI ID -> CPU ID mapping but it is not terribly important for users. Suggested-by:
Nadav Har'El <nyh@cloudius-systems.com> Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Simplify networking boot initialization message as suggested by Tzach. Suggested-by:
Tzach Livyatan <tzach@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
It's 2014 now. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
We respect -Xmx when instructed by the user, but when that is left blank, we set that to be all remaining memory that we have. That is not 100 % perfect because the JVM itself will use some memory, but that should be good enough of an estimate. Specially given that some of the memory currently in use by OSv could be potentially freed in the future should we need it. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
The biggest problem I am seeing with the balloon is that right now the only time we call the balloon is when we're seeing memory pressure. If pressure is coming from the JVM, we can livelock in quite interesting ways. We need to detect that and disable the ballon in those situations, since ballooning when the pressure comes from the JVM will only trash our workloads. It's not yet working reliably, but this is the direction I plan to start from. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
To make informed reclaim decisions, we need to have as much relevant information as possible about our reclaim targets. Specifically, it is useful to know how much memory is currently used by the JVM heap. The reasoning behind this is that if pressure is coming from the heap, ballooning will harm us, instead of helping us. Note: This is really just a first approximation. Ideally, total memory shouldn't matter, but rather memory delta since a last common event. But counting memory is the initial first step for both. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
To find out which vmas hold the Java heap, we will use a technique that is very close to ballooning (in the implementation, it is effectively the same) What we will do is we will insert a very small element (2 pages), and mark the vma where the object is present as containing the JVM heap. Due to the way the JVM allocates objects, that will end up in the young generation. As time passes, the object will move the same way the balloon moves, and every new vma that is seen will be marked as holding the JVM heap. That mechanism should work for every generational GC, which should encompass most of the JDK7 GCs (it not all). It shouldn't work with the G1GC, but that debuts at JDK8, and for that we can do something a lot simpler, namely: having the JVM to tell us in advance which map areas contain the heap. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
The best possible criteria for deflating balloons is heap pressure: Whenever there is pressure in the JVM, we should give back memory so pressure stops. To accomplish that, we need to somehow tap into the JVM. This patch register a MXBean that will send us notifications about collections. We will ignore minor collections and act upon major collections by deflating any existing balloons. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
There are restrictions on when and how a shrinker can run. For instance, if we have no balloons inflated, there is nothing to deflate (the relaxer should, then, be deactivated). Or also, when the JVM fails to allocate memory for an extra balloon, it is pointless to keep trying (which would only lead to unnecessary spins) until *at least* the next garbage collection phase. I believe this behavior of activation / deactivation ought to be shrinker specific. The reclaiming framework will only provide the infrastructure to do so. In this patch, the JVM Balloon uses that to inform the reclaimer when it makes sense for the shrinker or relaxer to be called. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This patch implements the JVM balloon driver, that is responsible for borrowing memory from the JVM when OSv is short on memory, and giving it back when we are plentiful. It works by allocating a java byte array, and then unmapping a large page-aligned region inside it (as big as our size allows). This array is good to go until the GC decides to move us. When that happens, we need to carefuly emulate the memcpy fault and put things back in place. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
After carrying on some testing, I quickly realized that the old fixup-only solution I was attempting for the ballooning was not really flying. The reason for that, is that we would take a fault, figure out the fixup address, and return. If that wasn't a JVM fault, we were forced to take another fault (since we were already out of fault context). Once demand paging is a reality, the vast majority of the faults are for non balloon addresses, so we were effectively doubling our number of page faults for no reason. I have decided to go with the VMA (+fixups for instruction decoding) route after all. This is way more efficient and it seems to be working fine. The JVM vma is really close to the normal anonymous VMA. Except that it can never hold pages, and its fault handler calls into the JVM balloon facilities for decoding. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-