Skip to content
Snippets Groups Projects
  1. Sep 21, 2013
  2. Sep 20, 2013
  3. Sep 15, 2013
    • Nadav Har'El's avatar
      Add copyright statement to core/* · 4c0b39f3
      Nadav Har'El authored
      Added Cloudius copyright statement to core/*.
      
      poll.cc already had a BSD copyright statement, I believe this is a mistake
      (I think Guy wrote this code from scratch), but not wanting to rush to a
      conclusion I'm leaving both copyright statements and we should address this
      issue later.
      4c0b39f3
    • Pekka Enberg's avatar
      poll: Improve tracepoints · f8c106ae
      Pekka Enberg authored
      Pass function arguments to the tracepoint and add a tracepoints for
      poll() return value and errno.
      f8c106ae
  4. Sep 12, 2013
  5. Sep 11, 2013
    • Nadav Har'El's avatar
      Add reboot function · 542c319b
      Nadav Har'El authored
      Added a new function, osv::reboot() (declared in <osv/power.hh>)
      for rebooting the VM.
      
      Also added a Java interface - com.cloudius.util.Power.reboot().
      
      NOTE: Power.java and/or jni/power.cc also need to be copied into
      the mgmt submodule.
      542c319b
  6. Sep 10, 2013
    • Pekka Enberg's avatar
      mmu: Fix file-backed vma splitting · d72b550c
      Pekka Enberg authored
      Commit 3510a5ea ("mmu: File-backed VMAs") forgot to fix vma::split() to
      take file-backed mappings into account. Fix the problem by making
      vma::split() a virtual function and implementing it separately for
      file_vma.
      
      Spotted by Avi Kivity.
      d72b550c
    • Nadav Har'El's avatar
      DHCP: Fix crash · 68f4d147
      Nadav Har'El authored
      Rarely (about once every 20 runs) we had OSV crash during boot, in the
      DHCP code. It turns out that the code first sends out the DCHP requests,
      and then creates a thread to handle the replies. When a reply arrives,
      the code wake()s the thread, but on rare occasions the thread hasn't yet
      been set up (still a null pointer) so we have a crash.
      
      Fix this by reversing the order - first create the reply handling thread,
      and only then send the request.
      68f4d147
  7. Sep 08, 2013
    • Nadav Har'El's avatar
      Scheduler: Fix load-balancer bug · e9f0cf29
      Nadav Har'El authored
      The load_balance() code checks if another CPU has fewer threads in its
      run queue than this thread, and if so, migrates one of this CPU's threads
      to the other CPU.
      
      However, when we count this core's runnable threads, we overcount it by
      1, because as soon as load_balance() goes back to sleep, one of the
      runnable threads will start running. So if this core has just one more
      runnable threads than some remote's core runnable threads, they are
      actually even, so in that case we should *not* migrate a thread.
      
      Overcounting the number of threads on the core running load_balance
      caused bad performance in 2-core and 2-thread SpecJVM: Normally, the
      size of the run queue on each core is 1 (each core is running one of
      the two threads, and on the run queue we have the idle thread). But
      when load_balance runs it sees 2 runnable threads (the idle thread and
      the preempted benchmark thread), and the second core has just 1, so
      it decides to migrate one of its threads to the second CPU. When this
      is over, the second CPU has both benchmark threads, and the first CPU
      has nothing, and this will only be fixed some time later when the
      second CPU's load_balance thread runs, and later the balance will be
      ruined again. All this time that the two threads run on the same CPU
      significantly hurt performance, and on the host's "top" we see qemu
      taking just 120%-150% instead of 200% as it should (and as it does
      after this patch).
      e9f0cf29
    • Nadav Har'El's avatar
      Scheduler: Avoid vruntime jump when clock jumps · 253e4536
      Nadav Har'El authored
      Currently, clock::get()->time() jumps (by system_time(), i.e., the host's
      uptime) at some point during the initialization. This can be a huge jump
      (e.g., a week if the host's uptime is a week). Fixing this jump is hard,
      so we'd rather just tolerate it.
      
      reschedule_from_interrupt() handles this clock jump badly. It calculates
      current_run, the amount of time the current thread has run, to include this
      jump while the thread was running. In the above example, a run time of
      a whole week is wrongly attributed to some thread, and added to its vruntime,
      causing it not to be scheduled again until all other threads yield the
      CPU.
      
      The fix in this patch is to limit the vruntime increase after a long
      run to max_slice (10ms). Even if a thread runs for longer (or just thinks
      it ran for longer), it won't be "penalized" in its dynamic priority more
      than a thread that ran for 10ms. Note that this cap makes sense, as
      cpu::enqueue already enforces a similar limit on the vruntime "bonus"
      of a woken thread, and this patch works toward a similar goal (avoid
      giving one thread a huge bonus because another thread was given a huge
      penalty).
      
      This bug is very visible in the CPU-bound SPECjvm2008 benchmarks, when
      running two benchmark threads on two virtual cpus. As it happens, the
      load_balancer() is the one that gets the huge vruntime increase, so
      it doesn't get to run until no other thread wants to run. Because we start
      with both CPU-bound threads on the same CPU, and these hardly yield the
      CPU (and even more rarely are the two threads sleeping at the same time),
      the load balancer thread on this CPU doesn't get to run, and the two threads
      remain on the same CPU, giving us halved performance (2-cpu performance
      identical to 1-cpu performance) and on the host we see qemu using 100% cpu,
      instead of 200% as expected with two vcpus.
      253e4536
    • Guy Zana's avatar
  8. Sep 03, 2013
    • Avi Kivity's avatar
      irq_lock: avoid 'irq_lock defined but not used' warning · 90390cca
      Avi Kivity authored
      In an attempt to be clever, we define irq_lock as an object in an anonymous
      namespace, so that each translation unit gets its own copy, which is then
      optimized away, since the object is never touched.  But the compiler complains
      that the object is defined but not used if we include the file but don't
      use irq_lock.
      
      Simplify by only declaring the object there, and defining it somewhere else.
      90390cca
  9. Sep 02, 2013
    • Pekka Enberg's avatar
      mmu: msync for file-backed memory maps · 1691c89d
      Pekka Enberg authored
      This adds simple msync() implementation for file-backed memory maps. It
      uses the newly added 'file_vma' data structure to write out and fsync
      the msync'd region as suggested by Avi Kivity.
      1691c89d
    • Pekka Enberg's avatar
      mmu: File-backed VMAs · 3510a5ea
      Pekka Enberg authored
      Add a new 'file_vma' class that extends 'vma'. This is needed to keep
      track of fileref and offset for file-backed VMAs for msync().
      3510a5ea
  10. Aug 29, 2013
  11. Aug 27, 2013
    • Nadav Har'El's avatar
      Fix mincore() on non-mmap()ed memory · 6924f7db
      Nadav Har'El authored
      Commit 65afd075 fixed mincore() to recognize
      unmapped addresses. However, it used mmu::ismapped() which just checks for
      mmap()'ed addresses, and doesn't know about malloc()ed memory. This causes
      trouble for libunwind (which we use for backtrace()) which tests mincore()
      on an on-stack variable, and for non-pthread threads, this stack might be
      malloc'ed, not mmap'ed.
      
      So this patch adds mmu::isreadable(), which checks that a given memory range
      is all readable (this memory can be mmapped, malloced, stack, whatever).
      mincore() now uses that.
      
      mmu::isreadable() is implemented, following Avi's idea, by trying to read,
      with safe_load(), one byte from every page in the range. This approach is
      faster than page-table-walking especially for one-byte checks (which all
      libunwind uses anyway), and also very simple.
      6924f7db
    • Glauber Costa's avatar
      mempool.c: trace large allocations · 0a798e4d
      Glauber Costa authored
      Most of the performance problems I have found on Xen were due to the fact that
      we were hitting malloc_large consistently, for allocations that we should be
      able to service in some other way. Because malloc_large in our implementation
      is such a bottleneck, it was very useful for me to have separate tracepoints
      for them.  I am then proposing for inclusion.
      0a798e4d
    • Nadav Har'El's avatar
      Fix deadlock in leak detector · 227eb39b
      Nadav Har'El authored
      Commit 65afd075 that fixed mincore()
      exposed a deadlock in the leak detector, caused by two threads taking
      two locks in opposite order:
      
      Thread 1:  malloc() does alloc_tracker::remember(). This takes the tracker
         lock and calls backtrace() calling mincore() which takes the
         vma_list_mutex.
      
      Thread 2: mmap() does mmu::allocate() which takes the vma_list_mutex and
         then through mmu::populate::small_page calls memory::alloc_page() which
         calls alloc_tracker::remember() and takes the tracker lock.
      
      This patch fixes this deadlock: alloc_tracker::remember() will now drop its
      lock while running backtrace(), as the lock is only needed to protect the
      allocations[] array. We need to retake the lock after backtrace() completes,
      to copy the backtrace back to the allocations[] array.
      
      Previously, the lock's depth was also (ab)used for avoiding nested
      allocation tracking (e.g., tracking of memory allocation done inside
      backtrace() itself), but now that backtrace() is run without the lock,
      we need a different mechanism - a per-thread "in_tracker" flag, which
      is turned on inside the alloc_tracker::remember()/forget() methods.
      227eb39b
  12. Aug 26, 2013
    • Nadav Har'El's avatar
      Avoid including elf.hh from sched.hh · 714d313a
      Nadav Har'El authored
      sched.hh included elf.hh, just so it can refer to the elf::tls_data
      type. But now that we have rcu.hh which includes sched.hh and therefore
      elf.hh, if we wish to use rcu in elf.hh (we'll do this in a later patch),
      we have an include loop mess.
      
      So better not include elf.hh from sched.hh, and just declare the one
      struct we need.
      
      After sched.hh no longer includes elf.hh and the dozen includes that
      it further included, we need to add missing includes to some of the
      code that included sched.hh and relied on its implict includes.
      714d313a
    • Avi Kivity's avatar
      mmu: don't pass really bad faults to the application · 6f464e76
      Avi Kivity authored
      Trying to execute the null pointer, or faults within the kernel code, are
      a really bad sign and it's better to abort early with them.
      6f464e76
    • Pekka Enberg's avatar
      alloctracker: Fix forget() if remember() hasn't been called · 0affe14a
      Pekka Enberg authored
      If leak detector is enabled after OSv startup, the first call can be to
      free(), not malloc(). Fix alloctracker::forget() to deal with that.
      
      Fixes the SIGSEGV when "osv leak on" is used to enable detection from
      gdb after OSv has started up:
      
        #
        # A fatal error has been detected by the Java Runtime Environment:
        #
        #  SIGSEGV (0xb) at pc=0x00000000003b8ee6, pid=0, tid=18446673706168635392
        #
        # JRE version: 7.0_25
        # Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 compressed oops)
        # Problematic frame:
        # C  0x00000000003b8ee6
        #
        # Core dump written. Default location: //core or core.0
        #
        # An error report file with more information is saved as:
        # /tmp/jvm-0/hs_error.log
        #
        # If you would like to submit a bug report, please include
        # instructions on how to reproduce the bug and visit:
        #   http://icedtea.classpath.org/bugzilla
        #
        Aborted
      
        [penberg@localhost osv]$ addr2line -e build/debug/loader.elf
        0x00000000003b8ee6
        /home/penberg/osv/build/debug/../../core/alloctracker.cc:90
      0affe14a
  13. Aug 25, 2013
    • Avi Kivity's avatar
      rcu: fix hang due to race while awaiting a quiescent state · ac7a8447
      Avi Kivity authored
      Waiting for a quiescent state happens in two stages: first, we request all
      cpus to schedule at least once.  Then, we wait until they do so.
      
      If, between the two stages, a cpu is brought online, then we will request
      N cpus to schedule but wait for N+1 to respond.  This of course never happens,
      and the system hangs.
      
      Fix by copying the vector which holds the cpus which we signal and wait for;
      forcing them to be consistent.  This is safe since newly-added cpus cannot
      be accessing any rcu-protected variables before we started signalling.
      
      Fixes random hangs with rcu, mostly seen with 'perf callstack'
      ac7a8447
  14. Aug 19, 2013
  15. Aug 18, 2013
  16. Aug 16, 2013
    • Pekka Enberg's avatar
      sched: Avoid IPIs in thread::wake() · 71fec998
      Pekka Enberg authored
      Avoid sending an IPI to a CPU that's already being woken up by another
      IPI.  This reduces IPIs by 17% for a cassandra-stress run. Execution
      time is obviously unaffected because execution is bound by lock
      contention.
      
      Before:
      
      [penberg@localhost ~]$ sudo perf kvm stat -e kvm:* -p `pidof qemu-system-x86_64`
      ^C
       Performance counter stats for process id '610':
      
               6,909,333 kvm:kvm_entry
                       0 kvm:kvm_hypercall
                       0 kvm:kvm_hv_hypercall
               1,035,125 kvm:kvm_pio
                       0 kvm:kvm_cpuid
               5,149,393 kvm:kvm_apic
               6,909,369 kvm:kvm_exit
               2,108,440 kvm:kvm_inj_virq
                       0 kvm:kvm_inj_exception
                     982 kvm:kvm_page_fault
               2,783,005 kvm:kvm_msr
                       0 kvm:kvm_cr
                   7,354 kvm:kvm_pic_set_irq
               2,366,388 kvm:kvm_apic_ipi
               2,468,569 kvm:kvm_apic_accept_irq
               2,067,044 kvm:kvm_eoi
               1,982,000 kvm:kvm_pv_eoi
                       0 kvm:kvm_nested_vmrun
                       0 kvm:kvm_nested_intercepts
                       0 kvm:kvm_nested_vmexit
                       0 kvm:kvm_nested_vmexit_inject
                       0 kvm:kvm_nested_intr_vmexit
                       0 kvm:kvm_invlpga
                       0 kvm:kvm_skinit
                   3,677 kvm:kvm_emulate_insn
                       0 kvm:vcpu_match_mmio
                       0 kvm:kvm_update_master_clock
                       0 kvm:kvm_track_tsc
                   7,354 kvm:kvm_userspace_exit
                   7,354 kvm:kvm_set_irq
                   7,354 kvm:kvm_ioapic_set_irq
                     674 kvm:kvm_msi_set_irq
                       0 kvm:kvm_ack_irq
                       0 kvm:kvm_mmio
                 609,915 kvm:kvm_fpu
                       0 kvm:kvm_age_page
                       0 kvm:kvm_try_async_get_page
                       0 kvm:kvm_async_pf_doublefault
                       0 kvm:kvm_async_pf_not_present
                       0 kvm:kvm_async_pf_ready
                       0 kvm:kvm_async_pf_completed
      
            81.180469772 seconds time elapsed
      
      After:
      
      [penberg@localhost ~]$ sudo perf kvm stat -e kvm:* -p `pidof qemu-system-x86_64`
      ^C
       Performance counter stats for process id '30824':
      
               6,411,175 kvm:kvm_entry                                                [100.00%]
                       0 kvm:kvm_hypercall                                            [100.00%]
                       0 kvm:kvm_hv_hypercall                                         [100.00%]
                 992,454 kvm:kvm_pio                                                  [100.00%]
                       0 kvm:kvm_cpuid                                                [100.00%]
               4,300,001 kvm:kvm_apic                                                 [100.00%]
               6,411,133 kvm:kvm_exit                                                 [100.00%]
               2,055,189 kvm:kvm_inj_virq                                             [100.00%]
                       0 kvm:kvm_inj_exception                                        [100.00%]
                   9,760 kvm:kvm_page_fault                                           [100.00%]
               2,356,260 kvm:kvm_msr                                                  [100.00%]
                       0 kvm:kvm_cr                                                   [100.00%]
                   3,354 kvm:kvm_pic_set_irq                                          [100.00%]
               1,943,731 kvm:kvm_apic_ipi                                             [100.00%]
               2,047,024 kvm:kvm_apic_accept_irq                                      [100.00%]
               2,019,044 kvm:kvm_eoi                                                  [100.00%]
               1,949,821 kvm:kvm_pv_eoi                                               [100.00%]
                       0 kvm:kvm_nested_vmrun                                         [100.00%]
                       0 kvm:kvm_nested_intercepts                                    [100.00%]
                       0 kvm:kvm_nested_vmexit                                        [100.00%]
                       0 kvm:kvm_nested_vmexit_inject                                 [100.00%]
                       0 kvm:kvm_nested_intr_vmexit                                   [100.00%]
                       0 kvm:kvm_invlpga                                              [100.00%]
                       0 kvm:kvm_skinit                                               [100.00%]
                   1,677 kvm:kvm_emulate_insn                                         [100.00%]
                       0 kvm:vcpu_match_mmio                                          [100.00%]
                       0 kvm:kvm_update_master_clock                                  [100.00%]
                       0 kvm:kvm_track_tsc                                            [100.00%]
                   3,354 kvm:kvm_userspace_exit                                       [100.00%]
                   3,354 kvm:kvm_set_irq                                              [100.00%]
                   3,354 kvm:kvm_ioapic_set_irq                                       [100.00%]
                     927 kvm:kvm_msi_set_irq                                          [100.00%]
                       0 kvm:kvm_ack_irq                                              [100.00%]
                       0 kvm:kvm_mmio                                                 [100.00%]
                 620,278 kvm:kvm_fpu                                                  [100.00%]
                       0 kvm:kvm_age_page                                             [100.00%]
                       0 kvm:kvm_try_async_get_page                                   [100.00%]
                       0 kvm:kvm_async_pf_doublefault                                 [100.00%]
                       0 kvm:kvm_async_pf_not_present                                 [100.00%]
                       0 kvm:kvm_async_pf_ready                                       [100.00%]
                       0 kvm:kvm_async_pf_completed
      
            79.947992238 seconds time elapsed
      71fec998
    • Pekka Enberg's avatar
      mempool: Fix GPF in debug realloc() · ba81e15a
      Pekka Enberg authored
      Starting up Cassandra with debug memory allocator GPFs as follows:
      
        Breakpoint 1, abort () at ../../runtime.cc:85
        85	{
        (gdb) bt
        #0  abort () at ../../runtime.cc:85
        #1  0x0000000000375812 in osv::generate_signal (siginfo=..., ef=ef@entry=0xffffc0003ffe3008) at ../../libc/signal.cc:40
        #2  0x000000000037587c in osv::handle_segmentation_fault (addr=addr@entry=18446708889768681440, ef=ef@entry=0xffffc0003ffe3008)
            at ../../libc/signal.cc:55
        #3  0x00000000002fba02 in page_fault (ef=0xffffc0003ffe3008) at ../../core/mmu.cc:876
        #4  <signal handler called>
        #5  dbg::realloc (v=v@entry=0xffffe00019b3e000, size=size@entry=16) at ../../core/mempool.cc:846
        #6  0x000000000032654c in realloc (obj=0xffffe00019b3e000, size=16) at ../../core/mempool.cc:870
        #7  0x0000100000627743 in ?? ()
        #8  0x00002000001fe770 in ?? ()
        #9  0x00002000001fe780 in ?? ()
        #10 0x00002000001fe710 in ?? ()
        #11 0x00002000001fe700 in ?? ()
        #12 0xffffe000170e8000 in ?? ()
        #13 0x0000000200000001 in ?? ()
        #14 0x0000000000000020 in ?? ()
        #15 0x00002000001ffe70 in ?? ()
        #16 0xffffe000170e0004 in ?? ()
        #17 0x000000000036f361 in strcpy (dest=0x100001087420 "", src=<optimized out>) at ../../libc/string/strcpy.c:8
        #18 0x0000100000629b53 in ?? ()
        #19 0xffffe00019b22000 in ?? ()
        #20 0x0000000000000001 in ?? ()
        #21 0x0000000000000000 in ?? ()
      
      The problem was introduced in commit 1ea5672f ("memory: let the debug
      allocator mimic the standard allocator more closely") which forgot
      to convert realloc() to use 'pad_before'.
      ba81e15a
  17. Aug 15, 2013
    • Avi Kivity's avatar
      mempool: workaround for unaligned allocations · c19c8aec
      Avi Kivity authored
      An allocation that is larger than half a page, but smaller than a page,
      will end up badly aligned.
      
      Work around it by using the large allocators for objects larger than half
      a page.  This is wasteful and slow but at least it works.
      
      Later we can improve this by moving the slab header to the end of the page,
      so it doesn't interfere with alignment.
      c19c8aec
    • Pekka Enberg's avatar
      mempool: Fix refill_page_buffer() on out-of-memory · 2149839e
      Pekka Enberg authored
      Building OSv with debug memory allocator enabled:
      
        $ make -j mode=debug conf-preempt=0 conf-debug_memory=1
      
      Causes the guest to enter a busy loop right after JVM starts up:
      
        $ ./scripts/run.py -d
      
        [...]
      
        OpenJDK 64-Bit Server VM warning: Can't detect initial thread stack location - find_vma failed
      
      GDB explains:
      
        #0  0x00000000003b5c54 in
      boost::intrusive::rbtree_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
      boost::intrusive::set_member_hook<boost::intrusive::none,
      boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
      &memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true>
      >::private_erase (this=0x1d2f8c8 <memory::free_page_ranges+8>, b=..., e=...,
      n=@0x3b40e9: 6179885759521391432) at
      ../../external/misc.bin/usr/include/boost/intrusive/rbtree.hpp:1417
        #1  0x00000000003b552e in
      boost::intrusive::rbtree_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
      boost::intrusive::set_member_hook<boost::intrusive::none,
      boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
      &memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true>
      >::erase<memory::page_range, memory::addr_cmp>(memory::page_range const&,
      memory::addr_cmp,
      boost::intrusive::detail::enable_if_c<!boost::intrusive::detail::is_convertible<memory::addr_cmp,
      boost::intrusive::tree_iterator<boost::intrusive::rbtree_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
      boost::intrusive::set_member_hook<boost::intrusive::none,
      boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
      &memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true> >,
      true> >::value, void>::type*) (this=0x1d2f8c0 <memory::free_page_ranges>,
      key=..., comp=...) at
      ../../external/misc.bin/usr/include/boost/intrusive/rbtree.hpp:878
        #2  0x00000000003b4c4e in
      boost::intrusive::rbtree_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
      boost::intrusive::set_member_hook<boost::intrusive::none,
      boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
      &memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true>
      >::erase (this=0x1d2f8c0 <memory::free_page_ranges>, value=...) at
      ../../external/misc.bin/usr/include/boost/intrusive/rbtree.hpp:856
        #3  0x00000000003b4145 in
      boost::intrusive::set_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
      boost::intrusive::set_member_hook<boost::intrusive::none,
      boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
      &memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true>
      >::erase (this=0x1d2f8c0 <memory::free_page_ranges>, value=...) at
      ../../external/misc.bin/usr/include/boost/intrusive/set.hpp:601
        #4  0x00000000003b0130 in memory::refill_page_buffer () at ../../core/mempool.cc:487
        #5  0x00000000003b05f8 in memory::untracked_alloc_page () at ../../core/mempool.cc:569
        #6  0x00000000003b0631 in memory::alloc_page () at ../../core/mempool.cc:577
        #7  0x0000000000367a7c in mmu::populate::small_page (this=0x2000001fd460, ptep=..., offset=0) at ../../core/mmu.cc:456
        #8  0x0000000000365b00 in mmu::page_range_operation::operate_page
      (this=0x2000001fd460, huge=false, addr=0xffffe0004ec9b000, offset=0) at
      ../../core/mmu.cc:438
        #9  0x0000000000365790 in mmu::page_range_operation::operate
      (this=0x2000001fd460, start=0xffffe0004ec9b000, size=4096) at
      ../../core/mmu.cc:387
        #10 0x0000000000366148 in mmu::vpopulate (addr=0xffffe0004ec9b000, size=4096) at ../../core/mmu.cc:657
        #11 0x00000000003b0d8d in dbg::malloc (size=16) at ../../core/mempool.cc:818
        #12 0x00000000003b0f32 in malloc (size=16) at ../../core/mempool.cc:854
      
      Fix the problem by checking if free_page_ranges is empty in
      refill_page_buffer(). This fixes the busy loop issue and shows what's
      really happening:
      
        OpenJDK 64-Bit Server VM warning: Can't detect initial thread stack location - find_vma failed
        alloc_page(): out of memory
        Aborted
      2149839e
  18. Aug 14, 2013
    • Pekka Enberg's avatar
      trace: RCU-protect tracepoint_base::probes · 31fe10b3
      Pekka Enberg authored
      As suggested by Avi, RCU-protect tracepoint_base::probes to make sure
      probes are really stopped before the caller accesses the collected
      traces.
      31fe10b3
    • Avi Kivity's avatar
      rcu: Add rcu_synchronize() API · 59c61f31
      Avi Kivity authored
      59c61f31
    • Pekka Enberg's avatar
      callstack: Fix list iteration in callstack_collector::merge() · eff44bc7
      Pekka Enberg authored
      Call to erase() invalidates iterators so switch from range-based for
      loop to using iterators manually.
      
      This fixes a bug that resulted in JVM crashing on SMP when "perf
      callstack" was run:
      
        #
        # A fatal error has been detected by the Java Runtime Environment:
        #
        #  SIGSEGV (0xb) at pc=0x0000000000328a44, pid=0, tid=18446673706080178176
        #
        # JRE version: 7.0_19
        # Java VM: OpenJDK 64-Bit Server VM (23.7-b01 mixed mode linux-amd64 compressed oops)
        # Problematic frame:
        # C  0x0000000000328a44
        #
        # Core dump written. Default location: //core or core.0
        #
        # An error report file with more information is saved as:
        # /tmp/jvm-0/hs_error.log
        #
        # If you would like to submit a bug report, please include
        # instructions on how to reproduce the bug and visit:
        #   http://icedtea.classpath.org/bugzilla
        #
        Aborted
      eff44bc7
  19. Aug 13, 2013
    • Avi Kivity's avatar
      rcu: fix debug build re rcu_read_lock · 14865fa2
      Avi Kivity authored
      The release build optimizes away references to this object, but the debug
      build does not.  Define it.
      14865fa2
    • Avi Kivity's avatar
      dhcp: fix random dhcp failures · 26a04985
      Avi Kivity authored
      We don't initialize the dhcp packets, so some of them get the relay agent IP
      set, and the DHCP DISCOVER packets get sent to a random address on the
      Internet.  Usually it doesn't have a DHCP server installed, so the guest
      does not get configured.
      
      Fix by zero-initializing the packet.
      26a04985
  20. Aug 12, 2013
    • Avi Kivity's avatar
      build: link libstdc++, libgcc_s only once · c9e61d4a
      Avi Kivity authored
      Currently we statically link to libstdc++ and libgcc_s, and also dynamically
      link to the same libraries (since the payload requires them).  This causes
      some symbols to be available from both the static and dynamic version.
      
      With the resolution order change introduced by 82513d41, we can
      resolve the same symbol to different addresses at different times.  This
      violates the One Definition Rule, and in fact breaks std::string's
      destructor.
      
      Fix by only linking in the libraries statically.  We use ld's --whole-archive
      flag to bring in all symbols, including those that may be used by the payload
      but not by the kernel.
      
      Some symbols now become duplicates; we drop our version.
      c9e61d4a
  21. Aug 11, 2013
    • Avi Kivity's avatar
      rcu: add basic read-copy-update implementation · 94b69794
      Avi Kivity authored
      This adds fairly basic support for rcu.
      
      Declaring:
      
         mutex mtx;
         rcu_ptr<my_object> my_ptr;
      
      Read-side:
      
         WITH_LOCK(rcu_read_lock) {
            const my_object* p = my_ptr.read();
            // do things with *p
            // but don't block!
         }
      
      Write-side:
      
        WITH_LOCK(mtx) {
          my_object* old = my_ptr.read_by_owner();
          my_object* p = new my_object;
          // ...
          my_ptr.assign(p);
          rcu_dispose(old);  // or rcu_defer(some_func, old);
        }
      94b69794
  22. Aug 08, 2013
    • Nadav Har'El's avatar
      Dynamic linker - fix crash on SMP · d703ec00
      Nadav Har'El authored
      This patch fixes the following bug, of CLI & memcached on two vcpus
      crashing on startup. The cause of the crash is this: Java is running
      two threads. One loads a new shared library (in this example, libnio.so),
      and the second thread just running normally and runs some function it hasn't
      run before (pthread_cond_destroy()). When our on-demand resolver code tries
      to resolve this function name, it iterates over the module list, and sees
      libnio.so, but this object hasn't been completely set up yet (we put it in
      the list first - see program::add_object()), so looking up a symbol in it
      crashes.
      
      Why hasn't this problem been noticed before the recent link-order change?
      Because before that change, the half-loaded library was always last in the
      list (OSV itself was the first), so existing symbols were always found before
      reaching the partially-set-up object. Now OSV, with many symbols, is last, and
      the half-set-up object is in the middle, so the problem is common. But it
      also could happen previously, if we had unresolved symbols (e.g., weak symbols),
      but these were probably rare enough for the bug not to happen in practice.
      
      The fix in this patch is "hacky", because I wanted to avoid restructuring
      the whole code. The problem is that the functions called in add_object()
      (including relocate_rela(), nested add_object(), etc.) all assume that
      they can look up symbols in the being-set-up object, while we don't want
      these objects to be visible for other threads. So we do exactly this -
      each object gets a "visiblity" field. If "visibility" is 0, all threads
      can use this object, but if visibility is a thread pointer, only this
      thread searches in this object. So add_object() starts the object
      with visibility set to its thread, and only when add_object() is done,
      it sets the visibility to 0 so all threads can see it.
      
      While this solves the common bug, not that this patch still leaves
      a small room for SMP bugs, because it doesn't add locking to _modules,
      so a lookup during an add_object() can see a broken vector for a short
      duration. We should fix this remaining problem later, using RCU.
      d703ec00
    • Nadav Har'El's avatar
      dl_iterate_phdr: Don't pass pointers on the stack to callback · a26fe58c
      Nadav Har'El authored
      This patch fixes the exception handling bug seen in tst-except.so.
      
      The callback given dl_iterate_phdr (such as _Unwind_IteratePhdrCallback
      in libgcc_eh.a used to implement exceptions) may decide to cache previous
      information we gave it, as long as the "adds" and "subs" fields are
      unchanged.
      
      The bug was that we passed to the callback a on-stack *copy* of the
      obj->_phdrs vector, and if the callback saved pointers to that in its
      cache, they became invalid on the next call. We need the pointers to
      remain valid as long as adds/subs do not change. So we need to pass
      the actual obj->_phdrs (which doesn't change after the object's load),
      NOT a copy.
      
      Note there's a locking issue remaining here - if someone dlclose()s
      an object while the callback is running (and already checked adds/subs)
      it can use a stale pointer. This should be fixed separately, probably
      by using reference counting on objects.
      a26fe58c
    • Nadav Har'El's avatar
      dl_iterate_phdr: fill missing adds and subs field · 4d8353ac
      Nadav Har'El authored
      The callback function passed to dl_iterate_phdr, such as
      _UnWind_IteratePhdrCallback (used in libgcc_eh.a to implement exceptions),
      may want to cache previous lookups, and wants to know when the list of
      iterated modules hasn't changed since the last call to dl_iterate_phdr.
      For this, dl_iterate_phdr() is supposed to fill two fields, dlpi_adds
      and dlpi_subs, counting the number of times objects were loaded or
      unloaded from the program. If both dlpi_subs and dlpi_adds are unchanged,
      the callback is guaranteed the list of objects is unchanged.
      
      In the existing code, we forgot to set these two fields, so they got
      random values which caused the exception unwinding code to sometimes
      cache, and sometime not cache, depending on the phase of the moon.
      
      This patch adds the counting of the correct "subs" and "adds" counters,
      and after it exception unwinding will always use its cache (as long
      as the list of objects doesn't change).
      
      Note that this does NOT fix the crash in tst-except.so. That is a bug
      which appears when caching is enabled (which before this patch happend
      randomly), and will be fixed by the next patch.
      4d8353ac
Loading