Skip to content
Snippets Groups Projects
  1. Aug 26, 2013
    • Pekka Enberg's avatar
      1397a3ec
    • Pekka Enberg's avatar
      tst-zfs-disk: Drop broken ASSERT() · f43cdb68
      Pekka Enberg authored
      The ASSERT() doesn't compile if ZFS debugging is enabled:
      
        CC tests/tst-zfs-disk.o
      In file included from ../../bsd/sys/cddl/compat/opensolaris/sys/debug.h:35:0,
                       from ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_context.h:42,
                       from ../../tests/tst-zfs-disk.c:28:
      ../../tests/tst-zfs-disk.c: In function ‘make_vdev_root’:
      ../../tests/tst-zfs-disk.c:119:9: error: ‘t’ undeclared (first use in this function)
        ASSERT(t > 0);
               ^
      ../../bsd/sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:56:29: note: in definition of macro ‘ASSERT’
       #define ASSERT(EX) ((void)((EX) || assfail(#EX, __FILE__, __LINE__)))
                                   ^
      ../../tests/tst-zfs-disk.c:119:9: note: each undeclared identifier is reported only once for each function it appears in
        ASSERT(t > 0);
               ^
      ../../bsd/sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:56:29: note: in definition of macro ‘ASSERT’
       #define ASSERT(EX) ((void)((EX) || assfail(#EX, __FILE__, __LINE__)))
                                   ^
      f43cdb68
  2. Aug 25, 2013
    • Avi Kivity's avatar
      rcu: fix hang due to race while awaiting a quiescent state · ac7a8447
      Avi Kivity authored
      Waiting for a quiescent state happens in two stages: first, we request all
      cpus to schedule at least once.  Then, we wait until they do so.
      
      If, between the two stages, a cpu is brought online, then we will request
      N cpus to schedule but wait for N+1 to respond.  This of course never happens,
      and the system hangs.
      
      Fix by copying the vector which holds the cpus which we signal and wait for;
      forcing them to be consistent.  This is safe since newly-added cpus cannot
      be accessing any rcu-protected variables before we started signalling.
      
      Fixes random hangs with rcu, mostly seen with 'perf callstack'
      ac7a8447
  3. Aug 22, 2013
  4. Aug 21, 2013
  5. Aug 20, 2013
    • Avi Kivity's avatar
      vfs: implement umask() · e93b0a25
      Avi Kivity authored
      Currently the umask is ignore (it's pointless since we have no users).
      
      Needed by JRuby.
      e93b0a25
  6. Aug 19, 2013
    • Avi Kivity's avatar
      dhcp: convert to WITH_LOCK · eed5bafd
      Avi Kivity authored
      eed5bafd
    • Avi Kivity's avatar
      4d64c3c3
    • Avi Kivity's avatar
      vfs: improve response time of dup3() closing the original file · 367a5eb4
      Avi Kivity authored
      If dup3() is called with oldfd pointing to an existing file, it will close
      the file for us.  37988879 converted fd operations to RCU, which caused
      this close to be deferred until after the RCU grace period (43df74e7 fixed
      this, but only for close(), not for dup3).
      
      The asynchronous operation of dup3() should be fine, except that it triggers
      a bug in sys_rename(): if the reference count of the vnode for either the
      source or destination is elevated, rename fails with EBUSY.  This is due to
      the coupling between vnodes and pathnames and can be fixed with the move
      to separate dentries.
      
      The whole sequence looks like
      
      0xffffc000de90c010  3   1376912988.418660 vfs_lseek            95 0x0 0
      0xffffc000de90c010  3   1376912988.418661 vfs_lseek_ret        0x0
      0xffffc000de90c010  3   1376912988.418689 vfs_dup3             93 95 0x0
      0xffffc000de90c010  3   1376912988.418696 vfs_dup3_ret         95
      0xffffc000de90c010  3   1376912988.418711 vfs_close            95
      0xffffc000de90c010  3   1376912988.418711 vfs_close_ret
      ...
      0xffffc000de90c010  3   1376912988.420573 vfs_close            95
      0xffffc000de90c010  3   1376912988.420580 vfs_close_ret
      0xffffc000de90c010  3   1376912988.420738 vfs_rename           "/usr/var/lib/cassandra/data/system/local/system-local-tmp-ic-1-Index.db" "/usr/var/lib/cassandra/data/system/l
      ocal/system-local-ic-1-Index.db"
      0xffffc000de90c010  3   1376912988.422302 vfs_pwritev_ret      0x56
      0xffffc000de90c010  3   1376912988.422302 vfs_rename_err       16
      
      fd 95 (as it was before dup3) is still open at the time of the rename.
      
      Fix by not deferring the fdrop() in fdset(); 43df74e7 already made fdrop()
      safe to use directly.
      
      Fixes failures with Cassandra.
      367a5eb4
    • Avi Kivity's avatar
      vfs: add tracepoints to vfs entry points · 1cd7aae5
      Avi Kivity authored
      1cd7aae5
  7. Aug 18, 2013
  8. Aug 16, 2013
    • Avi Kivity's avatar
      build: allow building on gcc 4.7 · 8b08f27c
      Avi Kivity authored
      gcc 4.7 doesn't define a few builtins, which we don't use, but the 4.8
      headers require.
      
      Declare them so compilation passes.
      8b08f27c
    • Pekka Enberg's avatar
      sched: Avoid IPIs in thread::wake() · 71fec998
      Pekka Enberg authored
      Avoid sending an IPI to a CPU that's already being woken up by another
      IPI.  This reduces IPIs by 17% for a cassandra-stress run. Execution
      time is obviously unaffected because execution is bound by lock
      contention.
      
      Before:
      
      [penberg@localhost ~]$ sudo perf kvm stat -e kvm:* -p `pidof qemu-system-x86_64`
      ^C
       Performance counter stats for process id '610':
      
               6,909,333 kvm:kvm_entry
                       0 kvm:kvm_hypercall
                       0 kvm:kvm_hv_hypercall
               1,035,125 kvm:kvm_pio
                       0 kvm:kvm_cpuid
               5,149,393 kvm:kvm_apic
               6,909,369 kvm:kvm_exit
               2,108,440 kvm:kvm_inj_virq
                       0 kvm:kvm_inj_exception
                     982 kvm:kvm_page_fault
               2,783,005 kvm:kvm_msr
                       0 kvm:kvm_cr
                   7,354 kvm:kvm_pic_set_irq
               2,366,388 kvm:kvm_apic_ipi
               2,468,569 kvm:kvm_apic_accept_irq
               2,067,044 kvm:kvm_eoi
               1,982,000 kvm:kvm_pv_eoi
                       0 kvm:kvm_nested_vmrun
                       0 kvm:kvm_nested_intercepts
                       0 kvm:kvm_nested_vmexit
                       0 kvm:kvm_nested_vmexit_inject
                       0 kvm:kvm_nested_intr_vmexit
                       0 kvm:kvm_invlpga
                       0 kvm:kvm_skinit
                   3,677 kvm:kvm_emulate_insn
                       0 kvm:vcpu_match_mmio
                       0 kvm:kvm_update_master_clock
                       0 kvm:kvm_track_tsc
                   7,354 kvm:kvm_userspace_exit
                   7,354 kvm:kvm_set_irq
                   7,354 kvm:kvm_ioapic_set_irq
                     674 kvm:kvm_msi_set_irq
                       0 kvm:kvm_ack_irq
                       0 kvm:kvm_mmio
                 609,915 kvm:kvm_fpu
                       0 kvm:kvm_age_page
                       0 kvm:kvm_try_async_get_page
                       0 kvm:kvm_async_pf_doublefault
                       0 kvm:kvm_async_pf_not_present
                       0 kvm:kvm_async_pf_ready
                       0 kvm:kvm_async_pf_completed
      
            81.180469772 seconds time elapsed
      
      After:
      
      [penberg@localhost ~]$ sudo perf kvm stat -e kvm:* -p `pidof qemu-system-x86_64`
      ^C
       Performance counter stats for process id '30824':
      
               6,411,175 kvm:kvm_entry                                                [100.00%]
                       0 kvm:kvm_hypercall                                            [100.00%]
                       0 kvm:kvm_hv_hypercall                                         [100.00%]
                 992,454 kvm:kvm_pio                                                  [100.00%]
                       0 kvm:kvm_cpuid                                                [100.00%]
               4,300,001 kvm:kvm_apic                                                 [100.00%]
               6,411,133 kvm:kvm_exit                                                 [100.00%]
               2,055,189 kvm:kvm_inj_virq                                             [100.00%]
                       0 kvm:kvm_inj_exception                                        [100.00%]
                   9,760 kvm:kvm_page_fault                                           [100.00%]
               2,356,260 kvm:kvm_msr                                                  [100.00%]
                       0 kvm:kvm_cr                                                   [100.00%]
                   3,354 kvm:kvm_pic_set_irq                                          [100.00%]
               1,943,731 kvm:kvm_apic_ipi                                             [100.00%]
               2,047,024 kvm:kvm_apic_accept_irq                                      [100.00%]
               2,019,044 kvm:kvm_eoi                                                  [100.00%]
               1,949,821 kvm:kvm_pv_eoi                                               [100.00%]
                       0 kvm:kvm_nested_vmrun                                         [100.00%]
                       0 kvm:kvm_nested_intercepts                                    [100.00%]
                       0 kvm:kvm_nested_vmexit                                        [100.00%]
                       0 kvm:kvm_nested_vmexit_inject                                 [100.00%]
                       0 kvm:kvm_nested_intr_vmexit                                   [100.00%]
                       0 kvm:kvm_invlpga                                              [100.00%]
                       0 kvm:kvm_skinit                                               [100.00%]
                   1,677 kvm:kvm_emulate_insn                                         [100.00%]
                       0 kvm:vcpu_match_mmio                                          [100.00%]
                       0 kvm:kvm_update_master_clock                                  [100.00%]
                       0 kvm:kvm_track_tsc                                            [100.00%]
                   3,354 kvm:kvm_userspace_exit                                       [100.00%]
                   3,354 kvm:kvm_set_irq                                              [100.00%]
                   3,354 kvm:kvm_ioapic_set_irq                                       [100.00%]
                     927 kvm:kvm_msi_set_irq                                          [100.00%]
                       0 kvm:kvm_ack_irq                                              [100.00%]
                       0 kvm:kvm_mmio                                                 [100.00%]
                 620,278 kvm:kvm_fpu                                                  [100.00%]
                       0 kvm:kvm_age_page                                             [100.00%]
                       0 kvm:kvm_try_async_get_page                                   [100.00%]
                       0 kvm:kvm_async_pf_doublefault                                 [100.00%]
                       0 kvm:kvm_async_pf_not_present                                 [100.00%]
                       0 kvm:kvm_async_pf_ready                                       [100.00%]
                       0 kvm:kvm_async_pf_completed
      
            79.947992238 seconds time elapsed
      71fec998
    • Christoph Hellwig's avatar
      vfs: replace v_path refernece with d_path · 388ad45f
      Christoph Hellwig authored
      v_path will go away once the vnode cache uses numeric indices for hardlink
      support.
      388ad45f
    • Christoph Hellwig's avatar
      vfs: store a struct dentry in struct file · 42912366
      Christoph Hellwig authored
      We'll need this for any pathname related actions.
      42912366
    • Christoph Hellwig's avatar
      vfs: split dentries from vnodes · 85dda0a8
      Christoph Hellwig authored
      Create a new dentry structure for pathname components, following the Linux
      VFS model.  The vnodes are left-as is for now but are always fronted by
      dentries for pathname lookups.  In a second step they will be moved to
      use non-pathname indices.
      
      [penberg: fix open(O_CREAT|O_EXCL) breakage ]
      85dda0a8
    • Christoph Hellwig's avatar
      68a1624d
    • Christoph Hellwig's avatar
      vfs: fully initialize struct file before calling VOP_OPEN · d0d853d0
      Christoph Hellwig authored
      We'll pass the file to the open method soon, so make sure it's fully
      constructed.
      d0d853d0
    • Christoph Hellwig's avatar
      remove romfs and fatfs · fc3dc458
      Christoph Hellwig authored
      These aren't used in the build and will bitrot badly once
      changing the VFS significantly.
      fc3dc458
    • Pekka Enberg's avatar
      mempool: Fix GPF in debug realloc() · ba81e15a
      Pekka Enberg authored
      Starting up Cassandra with debug memory allocator GPFs as follows:
      
        Breakpoint 1, abort () at ../../runtime.cc:85
        85	{
        (gdb) bt
        #0  abort () at ../../runtime.cc:85
        #1  0x0000000000375812 in osv::generate_signal (siginfo=..., ef=ef@entry=0xffffc0003ffe3008) at ../../libc/signal.cc:40
        #2  0x000000000037587c in osv::handle_segmentation_fault (addr=addr@entry=18446708889768681440, ef=ef@entry=0xffffc0003ffe3008)
            at ../../libc/signal.cc:55
        #3  0x00000000002fba02 in page_fault (ef=0xffffc0003ffe3008) at ../../core/mmu.cc:876
        #4  <signal handler called>
        #5  dbg::realloc (v=v@entry=0xffffe00019b3e000, size=size@entry=16) at ../../core/mempool.cc:846
        #6  0x000000000032654c in realloc (obj=0xffffe00019b3e000, size=16) at ../../core/mempool.cc:870
        #7  0x0000100000627743 in ?? ()
        #8  0x00002000001fe770 in ?? ()
        #9  0x00002000001fe780 in ?? ()
        #10 0x00002000001fe710 in ?? ()
        #11 0x00002000001fe700 in ?? ()
        #12 0xffffe000170e8000 in ?? ()
        #13 0x0000000200000001 in ?? ()
        #14 0x0000000000000020 in ?? ()
        #15 0x00002000001ffe70 in ?? ()
        #16 0xffffe000170e0004 in ?? ()
        #17 0x000000000036f361 in strcpy (dest=0x100001087420 "", src=<optimized out>) at ../../libc/string/strcpy.c:8
        #18 0x0000100000629b53 in ?? ()
        #19 0xffffe00019b22000 in ?? ()
        #20 0x0000000000000001 in ?? ()
        #21 0x0000000000000000 in ?? ()
      
      The problem was introduced in commit 1ea5672f ("memory: let the debug
      allocator mimic the standard allocator more closely") which forgot
      to convert realloc() to use 'pad_before'.
      ba81e15a
  9. Aug 15, 2013
    • Avi Kivity's avatar
      mempool: workaround for unaligned allocations · c19c8aec
      Avi Kivity authored
      An allocation that is larger than half a page, but smaller than a page,
      will end up badly aligned.
      
      Work around it by using the large allocators for objects larger than half
      a page.  This is wasteful and slow but at least it works.
      
      Later we can improve this by moving the slab header to the end of the page,
      so it doesn't interfere with alignment.
      c19c8aec
    • Glauber Costa's avatar
      fix idiotic xen stack corruption for the blkfront driver · 8162f64f
      Glauber Costa authored
      While planning to run tests on Xen today, I found my guests in current tip
      failing to mount ZFS. I spent some time debugging the memory allocator, since
      it was the culprit last time: only to find out we were not even reaching the
      memory allocator.
      
      I noticed then that ZFS was failing with error 75 -> EOVERFLOW. Looking
      further, one of our bootup messages was showing the disks as "0MB".
      
      That information is read from the xenstore, and it was being read correctly.
      However, by the time we calculate the disk size, this is no longer correct,
      indicating a stack corruption.
      
      I found out the culprit to be a subsequent call to xs_gather, which calls a
      variant of scanf internally. The call was being passed a %lu argument to read
      an int variable, which would explain the corruption if the # of sectors was
      right before it in the stack.
      
      Indeed, with this fix, ZFS fails in a different way now =)
      8162f64f
    • Pekka Enberg's avatar
      mempool: Fix refill_page_buffer() on out-of-memory · 2149839e
      Pekka Enberg authored
      Building OSv with debug memory allocator enabled:
      
        $ make -j mode=debug conf-preempt=0 conf-debug_memory=1
      
      Causes the guest to enter a busy loop right after JVM starts up:
      
        $ ./scripts/run.py -d
      
        [...]
      
        OpenJDK 64-Bit Server VM warning: Can't detect initial thread stack location - find_vma failed
      
      GDB explains:
      
        #0  0x00000000003b5c54 in
      boost::intrusive::rbtree_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
      boost::intrusive::set_member_hook<boost::intrusive::none,
      boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
      &memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true>
      >::private_erase (this=0x1d2f8c8 <memory::free_page_ranges+8>, b=..., e=...,
      n=@0x3b40e9: 6179885759521391432) at
      ../../external/misc.bin/usr/include/boost/intrusive/rbtree.hpp:1417
        #1  0x00000000003b552e in
      boost::intrusive::rbtree_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
      boost::intrusive::set_member_hook<boost::intrusive::none,
      boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
      &memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true>
      >::erase<memory::page_range, memory::addr_cmp>(memory::page_range const&,
      memory::addr_cmp,
      boost::intrusive::detail::enable_if_c<!boost::intrusive::detail::is_convertible<memory::addr_cmp,
      boost::intrusive::tree_iterator<boost::intrusive::rbtree_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
      boost::intrusive::set_member_hook<boost::intrusive::none,
      boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
      &memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true> >,
      true> >::value, void>::type*) (this=0x1d2f8c0 <memory::free_page_ranges>,
      key=..., comp=...) at
      ../../external/misc.bin/usr/include/boost/intrusive/rbtree.hpp:878
        #2  0x00000000003b4c4e in
      boost::intrusive::rbtree_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
      boost::intrusive::set_member_hook<boost::intrusive::none,
      boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
      &memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true>
      >::erase (this=0x1d2f8c0 <memory::free_page_ranges>, value=...) at
      ../../external/misc.bin/usr/include/boost/intrusive/rbtree.hpp:856
        #3  0x00000000003b4145 in
      boost::intrusive::set_impl<boost::intrusive::setopt<boost::intrusive::detail::member_hook_traits<memory::page_range,
      boost::intrusive::set_member_hook<boost::intrusive::none,
      boost::intrusive::none, boost::intrusive::none, boost::intrusive::none>,
      &memory::page_range::member_hook>, memory::addr_cmp, unsigned long, true>
      >::erase (this=0x1d2f8c0 <memory::free_page_ranges>, value=...) at
      ../../external/misc.bin/usr/include/boost/intrusive/set.hpp:601
        #4  0x00000000003b0130 in memory::refill_page_buffer () at ../../core/mempool.cc:487
        #5  0x00000000003b05f8 in memory::untracked_alloc_page () at ../../core/mempool.cc:569
        #6  0x00000000003b0631 in memory::alloc_page () at ../../core/mempool.cc:577
        #7  0x0000000000367a7c in mmu::populate::small_page (this=0x2000001fd460, ptep=..., offset=0) at ../../core/mmu.cc:456
        #8  0x0000000000365b00 in mmu::page_range_operation::operate_page
      (this=0x2000001fd460, huge=false, addr=0xffffe0004ec9b000, offset=0) at
      ../../core/mmu.cc:438
        #9  0x0000000000365790 in mmu::page_range_operation::operate
      (this=0x2000001fd460, start=0xffffe0004ec9b000, size=4096) at
      ../../core/mmu.cc:387
        #10 0x0000000000366148 in mmu::vpopulate (addr=0xffffe0004ec9b000, size=4096) at ../../core/mmu.cc:657
        #11 0x00000000003b0d8d in dbg::malloc (size=16) at ../../core/mempool.cc:818
        #12 0x00000000003b0f32 in malloc (size=16) at ../../core/mempool.cc:854
      
      Fix the problem by checking if free_page_ranges is empty in
      refill_page_buffer(). This fixes the busy loop issue and shows what's
      really happening:
      
        OpenJDK 64-Bit Server VM warning: Can't detect initial thread stack location - find_vma failed
        alloc_page(): out of memory
        Aborted
      2149839e
  10. Aug 14, 2013
Loading