Skip to content
Snippets Groups Projects
  1. Sep 14, 2013
    • Nadav Har'El's avatar
      Change "hz" to fix poll() premature timeout · 26a30376
      Nadav Har'El authored
      msleep() measure times in units of 1/hz seconds. We had hz = 1,000,000,
      which gives excellent resolution (microsecond) but a terible range
      (limits msleep()'s timeout to 35 minutes).
      
      We had a program (Cassandra) doing poll() with a timeout of 2 hours,
      which caused msleep to think we gave a negative timeout.
      
      This patch reduces hz to 1,000, i.e., have msleep() operate in the same units
      as poll(). Looking at the code, I don't believe this change will have any
      ill-effects - we don't need higher resolution (freebsd code is used to
      hz=1,000, which is the default there), and the code converts time units to
      hz's correctly, always using the hz macro. The allowed range for timeouts will
      grow to over 24 days - and match poll()'s allowed range.
      26a30376
  2. Sep 12, 2013
    • Dmitry Fleytman's avatar
      Support for Xen w/o vector callbacks · 1d3e336c
      Dmitry Fleytman authored
      This patch implements GSI interrupt support for Xen bus.
      Needed in Xen environments w/o vector callbacks for HVM.
      One example of such an environment is Amazon EC2.
      1d3e336c
  3. Sep 05, 2013
    • Glauber Costa's avatar
      blkfront: mark device ready earlier · 7b0354b9
      Glauber Costa authored
      We cannot read the partition table from the device if the device is not marked
      as ready, since all IO will stall. I believe it should be fine to just mark the
      device ready before we mark our state as connected. With that change, it all
      proceed normally.
      7b0354b9
  4. Aug 28, 2013
    • Glauber Costa's avatar
      mbufs: use an entire page for jumbop zone allocations · 0d466fab
      Glauber Costa authored
      Xen has hard requirements on page transfers, and how to feed the grant tables.
      The address need to be page aligned, since the pfns and not addresses are used,
      and we need to provide at least a full page per buffer, since the hypervisor is
      free to fill any data within the page.
      
      To achieve that, the netfront driver will use m_cljget to attach an extended
      buffer to the mbuf, from the jumbop zone, since they are page-sized. However,
      two problems arise from this:
      
      1) Allocating a page goes through malloc_large. Our implementation of malloc_large
      is currently terribly inefficient, and that creates a very heavy contention site.
      
      What I am doing with this patch is to switch our uma implementation to
      alloc_page / free_page instead of malloc if the caller of zcreate so specified
      (and then of course, specify it for the jumbop cache)
      
      2) The refcount that is attached in the end of the buffer would either extend the
      buffer to 4100 bytes - defeating our purpose, or then the buffer would have to be
      PAGE_SIZE - 4, to accomodate for the refcount. But since the hypervisor will write
      to the whole page, it will eventually overwrite the refcount.
      
      To address that, I am allocating an external reference counter. BSD already
      have some infrastructure to do that, and I am taking advantage of this.
      However, I have found no way of implementing this in a way in which the
      reference count can be easily deduceable from the address of the extended
      buffer, without having the supporting mbuf to start from. Any external data
      structure such as hashes would probably make freeing way too slow. Thankfully,
      uma_find_refcnt and the UMA_ZONE_REFCNT seems to be used mostly in the
      setup/destruction phase (the mbuf refcount is used directly, open coded). So my
      proposal here is to remove the UMA_ZONE_REFCNT for that zone.
      0d466fab
  5. Aug 26, 2013
    • Pekka Enberg's avatar
      zfs: Fix GPF in zfs_rmnode() · 3d3c65b3
      Pekka Enberg authored
      If a crashed OSv guest is restarted, ZFS mount causes a GPF in early
      startup:
      
        VFS: mounting zfs at /usr
        zfs: mounting osv/usr from device /dev/vblk1
        Aborted
      
      GDB backtrace points finger at zfs_rmnode():
      
        #0  processor::halt_no_interrupts () at ../../arch/x64/processor.hh:212
        #1  0x00000000003e7f2a in osv::halt () at ../../core/power.cc:20
        #2  0x000000000021cdd4 in abort (msg=0x636df0 "Aborted\n") at ../../runtime.cc:95
        #3  0x000000000021cda2 in abort () at ../../runtime.cc:86
        #4  0x000000000044c149 in osv::generate_signal (siginfo=..., ef=0xffffc0003ffe7008) at ../../libc/signal.cc:44
        #5  0x000000000044c220 in osv::handle_segmentation_fault (addr=72, ef=0xffffc0003ffe7008) at ../../libc/signal.cc:55
        #6  0x0000000000366df3 in page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:876
        #7  <signal handler called>
        #8  0x0000000000345eaa in zfs_rmnode (zp=0xffffc0003d1de400)
            at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:611
        #9  0x000000000035650c in zfs_zinactive (zp=0xffffc0003d1de400)
            at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1355
        #10 0x0000000000345be1 in zfs_unlinked_drain (zfsvfs=0xffffc0003ddfe000)
            at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:523
        #11 0x000000000034f45c in zfsvfs_setup (zfsvfs=0xffffc0003ddfe000, mounting=true)
            at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:881
        #12 0x000000000034f7a4 in zfs_domount (vfsp=0xffffc0003de02000, osname=0x6b14cb "osv/usr")
            at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1016
        #13 0x000000000034f98c in zfs_mount (mp=0xffffc0003de02000, dev=0x6b14d7 "/dev/vblk1", flags=0, data=0x6b14cb)
            at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1415
        #14 0x0000000000406852 in sys_mount (dev=0x6b14d7 "/dev/vblk1", dir=0x6b14a3 "/usr", fsname=0x6b14d3 "zfs", flags=0, data=0x6b14cb)
            at ../../fs/vfs/vfs_mount.c:171
        #15 0x00000000003eff97 in mount_usr () at ../../fs/vfs/main.cc:1415
        #16 0x0000000000203a89 in do_main_thread (_args=0xffffc0003fe9ced0) at ../../loader.cc:215
        #17 0x0000000000448575 in pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, pthread_private::thread_attr const*)::{lambda()#1}::operator()() const () at ../../libc/pthread.cc:59
        #18 0x00000000004499d3 in std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda0>::_M_invoke(const std::_Any_data &) (__functor=...)
            at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2071
        #19 0x000000000037e602 in std::function<void ()>::operator()() const (this=0xffffc0003e170038)
            at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2468
        #20 0x00000000003bae3e in sched::thread::main (this=0xffffc0003e170010) at ../../core/sched.cc:581
        #21 0x00000000003b8c92 in sched::thread_main_c (t=0xffffc0003e170010) at ../../arch/x64/arch-switch.hh:133
        #22 0x0000000000399c8e in thread_main () at ../../arch/x64/entry.S:101
      
      The problem is that ZFS tries to check if the znode is an attribute
      directory and trips over zp->z_vnode being NULL.  However, as explained
      in commit b7ee91ef ("zfs: port vop_lookup"), we don't even support
      extended attributes so drop the check completely for OSv.
      3d3c65b3
  6. Aug 18, 2013
    • Avi Kivity's avatar
      osv_start_if: set address instead of adding a new one · 91cd1e4c
      Avi Kivity authored
      SIOCAIFADDR appends an address to the interface's address list instead of
      replacing it.  This causes 'ifconfig' to display 0.0.0.0 (the first address
      configured) instead of the correct address obtained by dhcp.
      
      Fix by also deleting the existing address, if it exists.
      91cd1e4c
  7. Aug 16, 2013
  8. Aug 15, 2013
    • Glauber Costa's avatar
      fix idiotic xen stack corruption for the blkfront driver · 8162f64f
      Glauber Costa authored
      While planning to run tests on Xen today, I found my guests in current tip
      failing to mount ZFS. I spent some time debugging the memory allocator, since
      it was the culprit last time: only to find out we were not even reaching the
      memory allocator.
      
      I noticed then that ZFS was failing with error 75 -> EOVERFLOW. Looking
      further, one of our bootup messages was showing the disks as "0MB".
      
      That information is read from the xenstore, and it was being read correctly.
      However, by the time we calculate the disk size, this is no longer correct,
      indicating a stack corruption.
      
      I found out the culprit to be a subsequent call to xs_gather, which calls a
      variant of scanf internally. The call was being passed a %lu argument to read
      an int variable, which would explain the corruption if the # of sectors was
      right before it in the stack.
      
      Indeed, with this fix, ZFS fails in a different way now =)
      8162f64f
  9. Aug 14, 2013
  10. Aug 13, 2013
    • Glauber Costa's avatar
      enable and compile the blkfront driver · d6800985
      Glauber Costa authored
      The xen block driver needs some extra state not needed for the network drivers.
      Namely, the same way virtio-blk does, we need to tell the block layer which is
      our strategy, read and write functions. For that, we need some extra code that
      I am implementing in xenfront-blk.cc.
      d6800985
    • Glauber Costa's avatar
      BSDEDIT/blkfront: fix sector number calculation · 980179d4
      Glauber Costa authored
      BSD expects its sector number to be already provided by the bio. We could add this field
      to the bio, but it is easier just to calculate it from the given offset.
      
      There are many places in which the bios are filled up, including many in the zfs code. So
      it is easier to change them just here.
      980179d4
    • Glauber Costa's avatar
      bus_dma implementation · be8e3c59
      Glauber Costa authored
      Simple implementation of BSD's bus_dma interface. Since we are constrained by virtual
      environments, we are able to cut out most of the things.
      be8e3c59
    • Glauber Costa's avatar
      Make bus_dma headers available from c++ · 369f7b86
      Glauber Costa authored
      We will use it in our bus_dma implementation
      369f7b86
    • Glauber Costa's avatar
      BSDEDIT/blkfront: bio adjustments · a372023a
      Glauber Costa authored
      Our version of biodone takes two arguments instead of one. Adjust it, passing
      the status in the second argument as expected. We could adjust our biodone()
      function to be the same as BSD's, but I decided to do the other way around, at
      least for now: we need locking and synchronization via cond vars at bio
      completion, and although the xenfront driver has its own lock for this, the
      other users rely on the internal lock for correctness. Adjusting them would
      mean adjusting their locking semantics, which although doable, is just more
      work than adjusting xenfront.
      a372023a
    • Glauber Costa's avatar
      compile subr_disk · f849892e
      Glauber Costa authored
      We need to add our headers first, but the rest should be ready to go.
      f849892e
    • Glauber Costa's avatar
      Import subr_disk from bsd · 8cdbb22d
      Glauber Costa authored
      It provides very simple queue / dequeue functions for the bios.
      8cdbb22d
    • Glauber Costa's avatar
      BSDEDIT/xenfront: initialize data expected to be 0. · 6f7398a0
      Glauber Costa authored
      BSD code does not initialize its structures.. It works well when memory is
      previously zeroed but not otherwise. Xen hypervisor compiled in debug mode
      fills memory with a poison pattern, and then the code breaks for those
      variables. Force them to 0.
      6f7398a0
    • Glauber Costa's avatar
      BSD: fill header files for blkfront and netfront · 1e38861f
      Glauber Costa authored
      Those headers are needed from blkfront and netfront. Some of them are empty stubs
      are usual but some are import from BSD. I am bringing them separately so it is
      obvious what they are here for.
      1e38861f
    • Glauber Costa's avatar
      BSDEDIT/netfront: statically determine ring size · 0bcc9395
      Glauber Costa authored
      The macro to calculate ring size are really gigantic and nested. Somewhere,
      somehow, gcc believes that one of the size calculations yields a variadic size.
      It doesn't seem to be the case to me, but maybe we are using (or lacking) some
      compiler flag that can explain this.
      
      Although this is clearly suboptimal, let us set with this for now. It should not
      be a huge problem unless we update xen headers.
      0bcc9395
    • Glauber Costa's avatar
      netfront: method to derive a blkfront from its softc · db09902b
      Glauber Costa authored
      Because softc is private - only a void pointer outside blkfront, we need
      a helper here to return the correct device from its softc representation.
      This will be used by the osv side to determine where to trigger IO to
      db09902b
    • Glauber Costa's avatar
      BSDEDIT/xenfront: standardize device names · 297ebbdf
      Glauber Costa authored
      BSD uses non-standard device names (standard here meaning us) for the network
      and block interfaces. There is no reason for us to play the complexity to deal
      with different names, so change it.
      297ebbdf
    • Glauber Costa's avatar
      BSDEDIT/netfront: trivial netfront adjustments · 92975a18
      Glauber Costa authored
      This patch contains the trivial osv adjustments for osv, like type fixing,
      header conciliation, sysctl removal, etc.
      92975a18
    • Glauber Costa's avatar
      netport: empty ifmedia functions · 69ba5c7b
      Glauber Costa authored
      We won't implement interface media change routines - at least for now, so stub
      them.
      69ba5c7b
    • Glauber Costa's avatar
      BSDEDIT/blkfront: trivial adjustments for osv · 38e51488
      Glauber Costa authored
      Those are: type conciliation, osv porting header inclusion and deleting unused
      statements.
      38e51488
    • Glauber Costa's avatar
      if_var: don't check interface queue drive size in empty checks · 7357d75d
      Glauber Costa authored
      Current test does:
      
          (((ifq)->ifq_drv_len == 0) && ((ifq)->ifq_len == 0))
      
      but ifq_drv_len does not exist. Funny enough, it does not exist in BSD
      basic interface queue as well.
      7357d75d
    • Glauber Costa's avatar
      BSDEDIT/xenbus: compile xenbus files · 1bb0512a
      Glauber Costa authored
      Changes needed for xenbus operation. They are, as usual:
      * delete function tables and make previously static functions on it public
      * comment out sysctl code
      * change order of includes between sbuf.h and malloc.h. sbuf calls into our
        functions, and those have a single malloc instead of a 3-argument one. This
        is by far the easiest way to handle this
      * Modify calling convention for device_add_child. It is just way easier if we
        now the path at creation time. BSD does not need it because it creates all
        devices equal (they are the same device_t structure), but for us is way more
        convenient if we can create the appropriate classes.
      1bb0512a
    • Glauber Costa's avatar
      BSDEDIT/xenstore: connect to osv stubs and compile · 2b7e37e5
      Glauber Costa authored
      In particular, I am not implementing the struct filling in the end of
      the file. Just comment it out, and make the relevant static functions
      public. We will call them from our code.
      2b7e37e5
    • Glauber Costa's avatar
      BSDEDIT/xenstore: mechanically convert log to bsd_log · 01221ec0
      Glauber Costa authored
      As suggested by Guy
      01221ec0
    • Glauber Costa's avatar
    • Glauber Costa's avatar
      BSDEDIT/evtchn.c: changes for pv event channel · 2cd983e4
      Glauber Costa authored
      Mostly trivial changes needed to compile the pv event channel.  We need some
      type adjustments, but the most complex ones are assembly fixes.  Because BSD
      seems to only do this for 32-bit guests, we need to adjust the inline asm
      instructions to take quad words for longs, and force int types for double
      words.
      
      After this, the evtchn can be compiled.
      2cd983e4
    • Glauber Costa's avatar
      Xen paravirtual event channel · a8560f2d
      Glauber Costa authored
      This file implements the pv and pv-on-hvm event channel mechanism.
      Verbatim copy from BSD.
      a8560f2d
    • Glauber Costa's avatar
      porting: bus definitions · d8dd6623
      Glauber Costa authored
      This contains interrupts, devices and bus definitions. Most of them are is bus
      files in BSD anyway.
      d8dd6623
    • Glauber Costa's avatar
      BSDEDIT/gnttab: compile in the grant tables · b16bdd49
      Glauber Costa authored
      With this patch, the grant table code is compiled into osv.
      The edits in the file reflect the fact that we don't need to go through PCI
      memory for the Xen special device even for HVM. We have mappings that are
      way simpler, so we can just use them. All the rest is kept as unchanged as I
      could.
      b16bdd49
    • Glauber Costa's avatar
      bsd: mmu stub functions · 3e5657ed
      Glauber Costa authored
      3e5657ed
    • Glauber Costa's avatar
      add uoff_t to netport.h · d4f0a78f
      Glauber Costa authored
      This is for the lack of a better place.
      d4f0a78f
    • Glauber Costa's avatar
      bus_dma verbatim copy · 2a4d0aa4
      Glauber Costa authored
      2a4d0aa4
    • Glauber Costa's avatar
      xen: verbatim copy of BSD's blkif header · d22e88c8
      Glauber Costa authored
      d22e88c8
  11. Aug 12, 2013
Loading