Skip to content
Snippets Groups Projects
  1. Oct 13, 2013
  2. Oct 10, 2013
  3. Oct 03, 2013
  4. Sep 26, 2013
  5. Sep 20, 2013
  6. Sep 17, 2013
  7. Sep 16, 2013
    • Dmitry Fleytman's avatar
    • Glauber Costa's avatar
      routing: provide a valid MTU in route messages · 295d80ff
      Glauber Costa authored
      Right now we send route messages with MTUs zeroed out. This can lead
      to the following assert in ip_output.c (~line 308) triggering:
      
          KASSERT(mtu > 0, ("%s: mtu %d <= 0, rte=%p (rt_flags=0x%08x) ifp=%p",
              __func__, mtu, rte, (rte != NULL) ? rte->rt_flags : 0, ifp));
      
      This happens because the code will assume that if there is a valid route, that
      route will have a valid MTU. And in this case, will always use the route MTU
      instead of the interface one.
      
      When we allocate the route it has a valid MTU. But when we send the route
      message, we will overwrite it with the value we see in the route message.  This
      is done in rtsock.c:rt_setmetrics.
      
      With this patch, those assertion stops happening.
      
      A note: this wasn't been seen in local instalations, only on EC2. Looking at
      it, there is nothing Xen specific. The reason it was not happening on local, is
      that local traffic does not go through the default route, but rather through
      the local 192.168.100.0/24 route. That one seems to take a different
      configuration path, and thus sets the MTU correctly.
      295d80ff
  8. Sep 15, 2013
    • Nadav Har'El's avatar
      Add copyright statements in bsd/ · fe9e6a82
      Nadav Har'El authored
      Added our copyright statements to some of the files in the top bsd/
      directory, and in bsd/porting.
      
      I only added our copyright to files which were completely by us - I did
      not attempt to hunt which bsd or solaris files we modified to add our
      copyright to them, I don't think this is important (or, we can do this
      later).
      
      I also found one header file (uma_stub.h) that had large chunks copied from
      freebsd, so I added both the freebsd copyright and ours.
      fe9e6a82
    • Glauber Costa's avatar
      trivial: small fixes to blkfront · 21285383
      Glauber Costa authored
      * %lu => %u when reading flush. The barrier flag had the same bug, but I
      ended up recreating it for flush.
      * Move check of xb_flags to after we check sc->flags != NULL, as spotted by
      Dima
      21285383
  9. Sep 14, 2013
    • Glauber Costa's avatar
      xenfront: guest-side implementation of flush · 6842d8ce
      Glauber Costa authored
      Not all Xen versions implement the barrier feature in their pv disks. Such is
      the case in Amazon EC2. For those, we should interpret the flush request and
      wait for the requests until they are all handled. Also, we can still try to use
      one of either barriers or flush-disk operations before we resort to guest-side
      software implementation. Our driver currently only tests for one of them, so
      this patch also implements flush requests.
      
      Luckily, the implementation of xb_dump (which we don't currently use) needs to
      do that as well, and this is also quite well isolated in xb_quiesce(). All we need
      to do is call xb_quiesce() if flushing is not available in our backend
      6842d8ce
    • Glauber Costa's avatar
      Do not overwrite the buffer on writes. · 0e62d585
      Glauber Costa authored
      Even Lords make brown paper bag mistakes. This is a left over code from my
      initial testing, where the buffer where set with pre existing values to make
      sure they were going through.  I forgot to remove them. As a result reads were
      fine, but writes would just wipe the previous data from the buffer.
      Incidentally, the "write-then-read-the-data-back" test I was doing would also
      obviously pass, so I haven't noticed this so far.
      
      Fix is to just leave the buffer alone.
      0e62d585
    • Nadav Har'El's avatar
      Change "hz" to fix poll() premature timeout · 26a30376
      Nadav Har'El authored
      msleep() measure times in units of 1/hz seconds. We had hz = 1,000,000,
      which gives excellent resolution (microsecond) but a terible range
      (limits msleep()'s timeout to 35 minutes).
      
      We had a program (Cassandra) doing poll() with a timeout of 2 hours,
      which caused msleep to think we gave a negative timeout.
      
      This patch reduces hz to 1,000, i.e., have msleep() operate in the same units
      as poll(). Looking at the code, I don't believe this change will have any
      ill-effects - we don't need higher resolution (freebsd code is used to
      hz=1,000, which is the default there), and the code converts time units to
      hz's correctly, always using the hz macro. The allowed range for timeouts will
      grow to over 24 days - and match poll()'s allowed range.
      26a30376
  10. Sep 12, 2013
    • Dmitry Fleytman's avatar
      Support for Xen w/o vector callbacks · 1d3e336c
      Dmitry Fleytman authored
      This patch implements GSI interrupt support for Xen bus.
      Needed in Xen environments w/o vector callbacks for HVM.
      One example of such an environment is Amazon EC2.
      1d3e336c
  11. Sep 05, 2013
    • Glauber Costa's avatar
      blkfront: mark device ready earlier · 7b0354b9
      Glauber Costa authored
      We cannot read the partition table from the device if the device is not marked
      as ready, since all IO will stall. I believe it should be fine to just mark the
      device ready before we mark our state as connected. With that change, it all
      proceed normally.
      7b0354b9
  12. Aug 28, 2013
    • Glauber Costa's avatar
      mbufs: use an entire page for jumbop zone allocations · 0d466fab
      Glauber Costa authored
      Xen has hard requirements on page transfers, and how to feed the grant tables.
      The address need to be page aligned, since the pfns and not addresses are used,
      and we need to provide at least a full page per buffer, since the hypervisor is
      free to fill any data within the page.
      
      To achieve that, the netfront driver will use m_cljget to attach an extended
      buffer to the mbuf, from the jumbop zone, since they are page-sized. However,
      two problems arise from this:
      
      1) Allocating a page goes through malloc_large. Our implementation of malloc_large
      is currently terribly inefficient, and that creates a very heavy contention site.
      
      What I am doing with this patch is to switch our uma implementation to
      alloc_page / free_page instead of malloc if the caller of zcreate so specified
      (and then of course, specify it for the jumbop cache)
      
      2) The refcount that is attached in the end of the buffer would either extend the
      buffer to 4100 bytes - defeating our purpose, or then the buffer would have to be
      PAGE_SIZE - 4, to accomodate for the refcount. But since the hypervisor will write
      to the whole page, it will eventually overwrite the refcount.
      
      To address that, I am allocating an external reference counter. BSD already
      have some infrastructure to do that, and I am taking advantage of this.
      However, I have found no way of implementing this in a way in which the
      reference count can be easily deduceable from the address of the extended
      buffer, without having the supporting mbuf to start from. Any external data
      structure such as hashes would probably make freeing way too slow. Thankfully,
      uma_find_refcnt and the UMA_ZONE_REFCNT seems to be used mostly in the
      setup/destruction phase (the mbuf refcount is used directly, open coded). So my
      proposal here is to remove the UMA_ZONE_REFCNT for that zone.
      0d466fab
  13. Aug 26, 2013
    • Pekka Enberg's avatar
      zfs: Fix GPF in zfs_rmnode() · 3d3c65b3
      Pekka Enberg authored
      If a crashed OSv guest is restarted, ZFS mount causes a GPF in early
      startup:
      
        VFS: mounting zfs at /usr
        zfs: mounting osv/usr from device /dev/vblk1
        Aborted
      
      GDB backtrace points finger at zfs_rmnode():
      
        #0  processor::halt_no_interrupts () at ../../arch/x64/processor.hh:212
        #1  0x00000000003e7f2a in osv::halt () at ../../core/power.cc:20
        #2  0x000000000021cdd4 in abort (msg=0x636df0 "Aborted\n") at ../../runtime.cc:95
        #3  0x000000000021cda2 in abort () at ../../runtime.cc:86
        #4  0x000000000044c149 in osv::generate_signal (siginfo=..., ef=0xffffc0003ffe7008) at ../../libc/signal.cc:44
        #5  0x000000000044c220 in osv::handle_segmentation_fault (addr=72, ef=0xffffc0003ffe7008) at ../../libc/signal.cc:55
        #6  0x0000000000366df3 in page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:876
        #7  <signal handler called>
        #8  0x0000000000345eaa in zfs_rmnode (zp=0xffffc0003d1de400)
            at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:611
        #9  0x000000000035650c in zfs_zinactive (zp=0xffffc0003d1de400)
            at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1355
        #10 0x0000000000345be1 in zfs_unlinked_drain (zfsvfs=0xffffc0003ddfe000)
            at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:523
        #11 0x000000000034f45c in zfsvfs_setup (zfsvfs=0xffffc0003ddfe000, mounting=true)
            at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:881
        #12 0x000000000034f7a4 in zfs_domount (vfsp=0xffffc0003de02000, osname=0x6b14cb "osv/usr")
            at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1016
        #13 0x000000000034f98c in zfs_mount (mp=0xffffc0003de02000, dev=0x6b14d7 "/dev/vblk1", flags=0, data=0x6b14cb)
            at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1415
        #14 0x0000000000406852 in sys_mount (dev=0x6b14d7 "/dev/vblk1", dir=0x6b14a3 "/usr", fsname=0x6b14d3 "zfs", flags=0, data=0x6b14cb)
            at ../../fs/vfs/vfs_mount.c:171
        #15 0x00000000003eff97 in mount_usr () at ../../fs/vfs/main.cc:1415
        #16 0x0000000000203a89 in do_main_thread (_args=0xffffc0003fe9ced0) at ../../loader.cc:215
        #17 0x0000000000448575 in pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, pthread_private::thread_attr const*)::{lambda()#1}::operator()() const () at ../../libc/pthread.cc:59
        #18 0x00000000004499d3 in std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda0>::_M_invoke(const std::_Any_data &) (__functor=...)
            at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2071
        #19 0x000000000037e602 in std::function<void ()>::operator()() const (this=0xffffc0003e170038)
            at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2468
        #20 0x00000000003bae3e in sched::thread::main (this=0xffffc0003e170010) at ../../core/sched.cc:581
        #21 0x00000000003b8c92 in sched::thread_main_c (t=0xffffc0003e170010) at ../../arch/x64/arch-switch.hh:133
        #22 0x0000000000399c8e in thread_main () at ../../arch/x64/entry.S:101
      
      The problem is that ZFS tries to check if the znode is an attribute
      directory and trips over zp->z_vnode being NULL.  However, as explained
      in commit b7ee91ef ("zfs: port vop_lookup"), we don't even support
      extended attributes so drop the check completely for OSv.
      3d3c65b3
Loading