- Aug 29, 2013
-
-
Avi Kivity authored
This is used for temporarily dropping a lock in a lexical scope, and reacquiring it after an exit from the scope (similar to wait_until(mutex), but without the waiting): WITH_LOCK(preempt_lock) { // do some stuff while (not enough resources) { DROP_LOCK(preempt_lock) { acquire more resources } // reload anything that may have changed after DROP_LOCK() } // do more stuff with the acquired resources } Note that DROP_LOCK() doesn't work will with recursively-taken locks.
-
Avi Kivity authored
We don't want the compiler moving reads after a possible rcu_defer().
-
narkisr authored
-
narkisr authored
-
Pekka Enberg authored
-
Dor Laor authored
-
Dor Laor authored
-
Nadav Har'El authored
In the existing code, each -classpath or -jar paramter replaced the classpath. This is inconvenient (and unlike the Unix "java" program). Better just add to the classpath. For example, now we can run: java.so -cp /java/cli.jar -jar /java/web.jar app Which runs web.jar's main class, but adds both cli.jar and web.jar to the classpath.
-
- Aug 28, 2013
-
-
Glauber Costa authored
Xen has hard requirements on page transfers, and how to feed the grant tables. The address need to be page aligned, since the pfns and not addresses are used, and we need to provide at least a full page per buffer, since the hypervisor is free to fill any data within the page. To achieve that, the netfront driver will use m_cljget to attach an extended buffer to the mbuf, from the jumbop zone, since they are page-sized. However, two problems arise from this: 1) Allocating a page goes through malloc_large. Our implementation of malloc_large is currently terribly inefficient, and that creates a very heavy contention site. What I am doing with this patch is to switch our uma implementation to alloc_page / free_page instead of malloc if the caller of zcreate so specified (and then of course, specify it for the jumbop cache) 2) The refcount that is attached in the end of the buffer would either extend the buffer to 4100 bytes - defeating our purpose, or then the buffer would have to be PAGE_SIZE - 4, to accomodate for the refcount. But since the hypervisor will write to the whole page, it will eventually overwrite the refcount. To address that, I am allocating an external reference counter. BSD already have some infrastructure to do that, and I am taking advantage of this. However, I have found no way of implementing this in a way in which the reference count can be easily deduceable from the address of the extended buffer, without having the supporting mbuf to start from. Any external data structure such as hashes would probably make freeing way too slow. Thankfully, uma_find_refcnt and the UMA_ZONE_REFCNT seems to be used mostly in the setup/destruction phase (the mbuf refcount is used directly, open coded). So my proposal here is to remove the UMA_ZONE_REFCNT for that zone.
-
Glauber Costa authored
The x2APIC specification says that reading from the X2APIC_ID MSR should return the physical apic id of the current processor. However, the Xen implementation (as of 4.2.2) is broken, and reads actually return old style xAPIC id. Even if they fix it, we still have HVs deployed around that will return the wrong ID. We can work around this by testing if the returned APIC id is in the form (id << 24), since in that case, the first 24 bits will all be zeroed. Then at least we can get this working everywhere. This may pose a problem if we want to ever support more than 1 << 24 vCPUs (or if any other HV has some random x2apic ids), but that is highly unlikely anyway.
-
Glauber Costa authored
As I have described in a previous patch, the Xen hypervisor has a very nasty bug that causes all of the x2apic msr writes to trigger a GPF. Although the request proceeds fine despite the GPF, it does bring a problem for all-but-self style init sequences we are using: after "failing" (succeeding but returning failure) to deliver the interrupt for the first cpu in the group, xen will break the loop, therefore not delivering the SIPIs to other cpus in the system at all. We can work around that by delivering interrupts to each cpu individually, instead of all-but-self.
-
Glauber Costa authored
Unfortunately, the Xen hypervisor has a very nasty bug (seems to be fixed by a 2013 patch - which means that although it is fixed, a lot of hypervisors will have it), that causes all of the x2apic msr writes to init related registers (INIT, SIPI, etc) trigger a GPF. The way to work around this, is to implement a form of "wrmsr_safe".
-
Glauber Costa authored
I ended up forgetting to remove some kprintfs from device.c that were inserted during Xen's blkfront development
-
Pekka Enberg authored
Now that we can walk through the vma list, add mmap numbers to 'osv mem': (gdb) osv mem Total Memory: 4294564864 Bytes Mmap Memory: 3278278656 Bytes (76.34%) Free Memory: 474492928 Bytes (11.05%)
-
Pekka Enberg authored
-
- Aug 27, 2013
-
-
Nadav Har'El authored
Commit 65afd075 fixed mincore() to recognize unmapped addresses. However, it used mmu::ismapped() which just checks for mmap()'ed addresses, and doesn't know about malloc()ed memory. This causes trouble for libunwind (which we use for backtrace()) which tests mincore() on an on-stack variable, and for non-pthread threads, this stack might be malloc'ed, not mmap'ed. So this patch adds mmu::isreadable(), which checks that a given memory range is all readable (this memory can be mmapped, malloced, stack, whatever). mincore() now uses that. mmu::isreadable() is implemented, following Avi's idea, by trying to read, with safe_load(), one byte from every page in the range. This approach is faster than page-table-walking especially for one-byte checks (which all libunwind uses anyway), and also very simple.
-
Nadav Har'El authored
Unlike msync(), mincore() should also work on non-mmapped memory, such as stack and malloc()ed memory. Currently it doesn't - it fails on malloc()ed memory and only sometimes works on stacks (works on pthread stacks which are mmapped, but not on sched::thread stacks which are malloced by default). This patch adds a test to tst-mmap.cc to demonstrate this problem. The test currently fails, will be fixed in a follow-up patch.
-
Glauber Costa authored
Most of the performance problems I have found on Xen were due to the fact that we were hitting malloc_large consistently, for allocations that we should be able to service in some other way. Because malloc_large in our implementation is such a bottleneck, it was very useful for me to have separate tracepoints for them. I am then proposing for inclusion.
-
Nadav Har'El authored
Commit 65afd075 that fixed mincore() exposed a deadlock in the leak detector, caused by two threads taking two locks in opposite order: Thread 1: malloc() does alloc_tracker::remember(). This takes the tracker lock and calls backtrace() calling mincore() which takes the vma_list_mutex. Thread 2: mmap() does mmu::allocate() which takes the vma_list_mutex and then through mmu::populate::small_page calls memory::alloc_page() which calls alloc_tracker::remember() and takes the tracker lock. This patch fixes this deadlock: alloc_tracker::remember() will now drop its lock while running backtrace(), as the lock is only needed to protect the allocations[] array. We need to retake the lock after backtrace() completes, to copy the backtrace back to the allocations[] array. Previously, the lock's depth was also (ab)used for avoiding nested allocation tracking (e.g., tracking of memory allocation done inside backtrace() itself), but now that backtrace() is run without the lock, we need a different mechanism - a per-thread "in_tracker" flag, which is turned on inside the alloc_tracker::remember()/forget() methods.
-
Glauber Costa authored
This allows lazy people like me to just copy the instructions
-
Glauber Costa authored
We can't trust the state of the FPU and the CSR registers to be always sane. Apparently, they aren't on at least one version of Xen (which happens to be the one I am using) Initialize it manually for all CPUs on bringup.
-
Glauber Costa authored
In the xen interrupt code, I have made the mistake of exchanging the previous value of _irq_pending with true, which means that we were constantly polling for data in the interrupt threads. This was responsible for the latency spikes I was seeing. The simple "ping" test still shows bad results in absolute terms, but at least now the spikes are gone.
-
- Aug 26, 2013
-
-
Nadav Har'El authored
sched.hh included elf.hh, just so it can refer to the elf::tls_data type. But now that we have rcu.hh which includes sched.hh and therefore elf.hh, if we wish to use rcu in elf.hh (we'll do this in a later patch), we have an include loop mess. So better not include elf.hh from sched.hh, and just declare the one struct we need. After sched.hh no longer includes elf.hh and the dozen includes that it further included, we need to add missing includes to some of the code that included sched.hh and relied on its implict includes.
-
Avi Kivity authored
A signal within a signal handler is really bad news, abort when it happens to let the developers debug it.
-
Avi Kivity authored
Trying to execute the null pointer, or faults within the kernel code, are a really bad sign and it's better to abort early with them.
-
Pekka Enberg authored
If leak detector is enabled after OSv startup, the first call can be to free(), not malloc(). Fix alloctracker::forget() to deal with that. Fixes the SIGSEGV when "osv leak on" is used to enable detection from gdb after OSv has started up: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00000000003b8ee6, pid=0, tid=18446673706168635392 # # JRE version: 7.0_25 # Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 compressed oops) # Problematic frame: # C 0x00000000003b8ee6 # # Core dump written. Default location: //core or core.0 # # An error report file with more information is saved as: # /tmp/jvm-0/hs_error.log # # If you would like to submit a bug report, please include # instructions on how to reproduce the bug and visit: # http://icedtea.classpath.org/bugzilla # Aborted [penberg@localhost osv]$ addr2line -e build/debug/loader.elf 0x00000000003b8ee6 /home/penberg/osv/build/debug/../../core/alloctracker.cc:90
-
Pekka Enberg authored
Fix mincore() to deal with unmapped addresses like msync() does. This fixes a SIGSEGV in libunwind's access_mem() when leak detector is enabled: (gdb) bt #0 page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:871 #1 <signal handler called> #2 ContiguousSpace::block_start_const (this=<optimized out>, p=0x77d2f3968) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/oops/oop.inline.hpp:411 #3 0x00001000008ae16c in GenerationBlockStartClosure::do_space (this=0x2000001f9100, s=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/generation.cpp:242 #4 0x00001000007f097c in DefNewGeneration::space_iterate (this=0xffffc0003fb68c00, blk=0x2000001f9100, usedOnly=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/defNewGeneration.cpp:480 #5 0x00001000008aca0e in Generation::block_start (this=<optimized out>, p=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/memory/generation.cpp:251 #6 0x0000100000b06d2f in os::print_location (st=st@entry=0x2000001f9560, x=32165017960, verbose=verbose@entry=false) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/runtime/os.cpp:868 #7 0x0000100000b11b5b in os::print_register_info (st=0x2000001f9560, context=0x2000001f9740) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:839 #8 0x0000100000c6cde8 in VMError::report (this=0x2000001f9610, st=st@entry=0x2000001f9560) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:551 #9 0x0000100000c6da3b in VMError::report_and_die (this=this@entry=0x2000001f9610) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/share/vm/utilities/vmError.cpp:984 #10 0x0000100000b1109f in JVM_handle_linux_signal (sig=11, info=0x2000001f9bb8, ucVoid=0x2000001f9740, abort_if_unrecognized=<optimized out>) at /usr/src/debug/java-1.7.0-openjdk-1.7.0.25-2.3.12.3.fc19.x86_64/openjdk/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:528 #11 0x000000000039f242 in call_signal_handler (frame=0x2000001f9b10) at ../../arch/x64/signal.cc:69 #12 <signal handler called> #13 0x000000000057d721 in access_mem () #14 0x000000000057cb1d in dwarf_get () #15 0x000000000057ce51 in _ULx86_64_step () #16 0x00000000004315fd in backtrace (buffer=0x1ff9d80 <memory::alloc_tracker::remember(void*, int)::bt>, size=20) at ../../libc/misc/backtrace.cc:16 #17 0x00000000003b8d99 in memory::alloc_tracker::remember (this=0x1777ae0 <memory::tracker>, addr=0xffffc0004508de00, size=54) at ../../core/alloctracker.cc:59 #18 0x00000000003b0504 in memory::tracker_remember (addr=0xffffc0004508de00, size=54) at ../../core/mempool.cc:43 #19 0x00000000003b2152 in std_malloc (size=54) at ../../core/mempool.cc:723 #20 0x00000000003b259c in malloc (size=54) at ../../core/mempool.cc:856 #21 0x0000100001615e4c in JNU_GetStringPlatformChars (env=env@entry=0xffffc0003a4dc1d8, jstr=jstr@entry=0xffffc0004591b800, isCopy=isCopy@entry=0x0) at ../../../src/share/native/common/jni_util.c:801 #22 0x000010000161ada6 in Java_java_io_UnixFileSystem_getBooleanAttributes0 (env=0xffffc0003a4dc1d8, this=<optimized out>, file=<optimized out>) at ../../../src/solaris/native/java/io/UnixFileSystem_md.c:111 #23 0x000020000021ed8e in ?? () #24 0x00002000001faa58 in ?? () #25 0x00002000001faac0 in ?? () #26 0x00002000001faa50 in ?? () #27 0x0000000000000000 in ?? () Spotted by Avi Kivity.
-
Nadav Har'El authored
Do to __xstat* what commit 018c672e did to __fxstat* - they had the same problem.
-
Nadav Har'El authored
In Linux, _STAT_VER is 1 on 64-bit (and 3 on 32-bit), but glibc never verifies the argument to __fxstat64. JNR - a library used by JRuby - wrongly (I believe) passes ver==0 to __fxstat64 (see jnr-posix/..../LinuxPosix.java). On Linux this wrong argument is ignored but in our implementation, fails the check. So this patch removes this check from our code as well, to let JNR and therefore JRuby which uses it, use stat without failing.
-
Pekka Enberg authored
If a crashed OSv guest is restarted, ZFS mount causes a GPF in early startup: VFS: mounting zfs at /usr zfs: mounting osv/usr from device /dev/vblk1 Aborted GDB backtrace points finger at zfs_rmnode(): #0 processor::halt_no_interrupts () at ../../arch/x64/processor.hh:212 #1 0x00000000003e7f2a in osv::halt () at ../../core/power.cc:20 #2 0x000000000021cdd4 in abort (msg=0x636df0 "Aborted\n") at ../../runtime.cc:95 #3 0x000000000021cda2 in abort () at ../../runtime.cc:86 #4 0x000000000044c149 in osv::generate_signal (siginfo=..., ef=0xffffc0003ffe7008) at ../../libc/signal.cc:44 #5 0x000000000044c220 in osv::handle_segmentation_fault (addr=72, ef=0xffffc0003ffe7008) at ../../libc/signal.cc:55 #6 0x0000000000366df3 in page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:876 #7 <signal handler called> #8 0x0000000000345eaa in zfs_rmnode (zp=0xffffc0003d1de400) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:611 #9 0x000000000035650c in zfs_zinactive (zp=0xffffc0003d1de400) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1355 #10 0x0000000000345be1 in zfs_unlinked_drain (zfsvfs=0xffffc0003ddfe000) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:523 #11 0x000000000034f45c in zfsvfs_setup (zfsvfs=0xffffc0003ddfe000, mounting=true) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:881 #12 0x000000000034f7a4 in zfs_domount (vfsp=0xffffc0003de02000, osname=0x6b14cb "osv/usr") at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1016 #13 0x000000000034f98c in zfs_mount (mp=0xffffc0003de02000, dev=0x6b14d7 "/dev/vblk1", flags=0, data=0x6b14cb) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1415 #14 0x0000000000406852 in sys_mount (dev=0x6b14d7 "/dev/vblk1", dir=0x6b14a3 "/usr", fsname=0x6b14d3 "zfs", flags=0, data=0x6b14cb) at ../../fs/vfs/vfs_mount.c:171 #15 0x00000000003eff97 in mount_usr () at ../../fs/vfs/main.cc:1415 #16 0x0000000000203a89 in do_main_thread (_args=0xffffc0003fe9ced0) at ../../loader.cc:215 #17 0x0000000000448575 in pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, pthread_private::thread_attr const*)::{lambda()#1}::operator()() const () at ../../libc/pthread.cc:59 #18 0x00000000004499d3 in std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda0>::_M_invoke(const std::_Any_data &) (__functor=...) at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2071 #19 0x000000000037e602 in std::function<void ()>::operator()() const (this=0xffffc0003e170038) at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2468 #20 0x00000000003bae3e in sched::thread::main (this=0xffffc0003e170010) at ../../core/sched.cc:581 #21 0x00000000003b8c92 in sched::thread_main_c (t=0xffffc0003e170010) at ../../arch/x64/arch-switch.hh:133 #22 0x0000000000399c8e in thread_main () at ../../arch/x64/entry.S:101 The problem is that ZFS tries to check if the znode is an attribute directory and trips over zp->z_vnode being NULL. However, as explained in commit b7ee91ef ("zfs: port vop_lookup"), we don't even support extended attributes so drop the check completely for OSv.
-
Pekka Enberg authored
-
Pekka Enberg authored
The ASSERT() doesn't compile if ZFS debugging is enabled: CC tests/tst-zfs-disk.o In file included from ../../bsd/sys/cddl/compat/opensolaris/sys/debug.h:35:0, from ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_context.h:42, from ../../tests/tst-zfs-disk.c:28: ../../tests/tst-zfs-disk.c: In function ‘make_vdev_root’: ../../tests/tst-zfs-disk.c:119:9: error: ‘t’ undeclared (first use in this function) ASSERT(t > 0); ^ ../../bsd/sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:56:29: note: in definition of macro ‘ASSERT’ #define ASSERT(EX) ((void)((EX) || assfail(#EX, __FILE__, __LINE__))) ^ ../../tests/tst-zfs-disk.c:119:9: note: each undeclared identifier is reported only once for each function it appears in ASSERT(t > 0); ^ ../../bsd/sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:56:29: note: in definition of macro ‘ASSERT’ #define ASSERT(EX) ((void)((EX) || assfail(#EX, __FILE__, __LINE__))) ^
-
- Aug 25, 2013
-
-
Avi Kivity authored
Waiting for a quiescent state happens in two stages: first, we request all cpus to schedule at least once. Then, we wait until they do so. If, between the two stages, a cpu is brought online, then we will request N cpus to schedule but wait for N+1 to respond. This of course never happens, and the system hangs. Fix by copying the vector which holds the cpus which we signal and wait for; forcing them to be consistent. This is safe since newly-added cpus cannot be accessing any rcu-protected variables before we started signalling. Fixes random hangs with rcu, mostly seen with 'perf callstack'
-
- Aug 22, 2013
-
-
Or Cohen authored
This is a fix for the previous commit (717693) that had an implicit dependency on Ant js.jar
-
Or Cohen authored
Needed for SSHD when using sshfs
-
Avi Kivity authored
Breaks the build
-
- Aug 21, 2013
-
-
Or Cohen authored
-
Or Cohen authored
A straightfoward implementation for SSHD Supports SCP, SFTP subsystem and a shell through the current JS cli
-
Or Cohen authored
-
Avi Kivity authored
The dependency on sse4.1 crashes on older cpus, use the generic musl implementation.
-