- Oct 03, 2013
-
-
Benoît Canet authored
Signed-off-by:
Benoit Canet <benoit@irqsave.net> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Benoît Canet authored
Signed-off-by:
Benoit Canet <benoit@irqsave.net> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Benoît Canet authored
Signed-off-by:
Benoit Canet <benoit@irqsave.net> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
bsd's ifconf conflicts with osv's; rename it. We use the bsd version in <osv/ioctl.h>, since we currently don't support the Linux-ABI variants of these ioctls. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Tested-By:
Benoit Canet <benoit@irqsave.net>
-
Avi Kivity authored
The bsd ifaddr struct conflicts with the osv ifaddr struct, which is a public interface. Rename the bsd struct to avoid conflict. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Tested-By:
Benoit Canet <benoit@irqsave.net>
-
Avi Kivity authored
Workaround a bytorder function conflict, and reconcile a declaration. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Tested-By:
Benoit Canet <benoit@irqsave.net>
-
Avi Kivity authored
Some structures are duplicated; move the duplicates to a common header <netinet/__in.h>. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Tested-By:
Benoit Canet <benoit@irqsave.net>
-
Avi Kivity authored
Some structures are duplicated; deduplicate them. A few are source-compatible but not binary-compatible; use the ones from <bits/socket.h>. Others are both source- and binary- compatible; put them in a new header <sys/__socket.h> which is included from both. Work around a problem with the byteorder functions/macros. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Tested-By:
Benoit Canet <benoit@irqsave.net>
-
- Sep 26, 2013
-
-
Raphael S. Carvalho authored
Update ->va_nlink() in zfs_getattr() in preparation for sys_link(). Signed-off-by:
Raphael S. Carvalho <raphael.scarv@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Wire up the VOP_LINK vnode operation for ZFS in preparation for sys_link(). Signed-off-by:
Raphael S. Carvalho <raphael.scarv@gmail.com> [ penberg: drop FIGNORE, cleanup, split ] Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Sep 20, 2013
-
-
Glauber Costa authored
All the other fields of the pcpu structure BSD expects are initialized by the event channel. Except for the cpu id, which the code expects to be already initialized. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
- Sep 17, 2013
-
-
Or Cohen authored
-
- Sep 16, 2013
-
-
Dmitry Fleytman authored
-
Glauber Costa authored
Right now we send route messages with MTUs zeroed out. This can lead to the following assert in ip_output.c (~line 308) triggering: KASSERT(mtu > 0, ("%s: mtu %d <= 0, rte=%p (rt_flags=0x%08x) ifp=%p", __func__, mtu, rte, (rte != NULL) ? rte->rt_flags : 0, ifp)); This happens because the code will assume that if there is a valid route, that route will have a valid MTU. And in this case, will always use the route MTU instead of the interface one. When we allocate the route it has a valid MTU. But when we send the route message, we will overwrite it with the value we see in the route message. This is done in rtsock.c:rt_setmetrics. With this patch, those assertion stops happening. A note: this wasn't been seen in local instalations, only on EC2. Looking at it, there is nothing Xen specific. The reason it was not happening on local, is that local traffic does not go through the default route, but rather through the local 192.168.100.0/24 route. That one seems to take a different configuration path, and thus sets the MTU correctly.
-
- Sep 15, 2013
-
-
Nadav Har'El authored
Added our copyright statements to some of the files in the top bsd/ directory, and in bsd/porting. I only added our copyright to files which were completely by us - I did not attempt to hunt which bsd or solaris files we modified to add our copyright to them, I don't think this is important (or, we can do this later). I also found one header file (uma_stub.h) that had large chunks copied from freebsd, so I added both the freebsd copyright and ours.
-
Glauber Costa authored
* %lu => %u when reading flush. The barrier flag had the same bug, but I ended up recreating it for flush. * Move check of xb_flags to after we check sc->flags != NULL, as spotted by Dima
-
- Sep 14, 2013
-
-
Glauber Costa authored
Not all Xen versions implement the barrier feature in their pv disks. Such is the case in Amazon EC2. For those, we should interpret the flush request and wait for the requests until they are all handled. Also, we can still try to use one of either barriers or flush-disk operations before we resort to guest-side software implementation. Our driver currently only tests for one of them, so this patch also implements flush requests. Luckily, the implementation of xb_dump (which we don't currently use) needs to do that as well, and this is also quite well isolated in xb_quiesce(). All we need to do is call xb_quiesce() if flushing is not available in our backend
-
Glauber Costa authored
Even Lords make brown paper bag mistakes. This is a left over code from my initial testing, where the buffer where set with pre existing values to make sure they were going through. I forgot to remove them. As a result reads were fine, but writes would just wipe the previous data from the buffer. Incidentally, the "write-then-read-the-data-back" test I was doing would also obviously pass, so I haven't noticed this so far. Fix is to just leave the buffer alone.
-
Nadav Har'El authored
msleep() measure times in units of 1/hz seconds. We had hz = 1,000,000, which gives excellent resolution (microsecond) but a terible range (limits msleep()'s timeout to 35 minutes). We had a program (Cassandra) doing poll() with a timeout of 2 hours, which caused msleep to think we gave a negative timeout. This patch reduces hz to 1,000, i.e., have msleep() operate in the same units as poll(). Looking at the code, I don't believe this change will have any ill-effects - we don't need higher resolution (freebsd code is used to hz=1,000, which is the default there), and the code converts time units to hz's correctly, always using the hz macro. The allowed range for timeouts will grow to over 24 days - and match poll()'s allowed range.
-
- Sep 12, 2013
-
-
Dmitry Fleytman authored
This patch implements GSI interrupt support for Xen bus. Needed in Xen environments w/o vector callbacks for HVM. One example of such an environment is Amazon EC2.
-
- Sep 05, 2013
-
-
Glauber Costa authored
We cannot read the partition table from the device if the device is not marked as ready, since all IO will stall. I believe it should be fine to just mark the device ready before we mark our state as connected. With that change, it all proceed normally.
-
- Aug 28, 2013
-
-
Glauber Costa authored
Xen has hard requirements on page transfers, and how to feed the grant tables. The address need to be page aligned, since the pfns and not addresses are used, and we need to provide at least a full page per buffer, since the hypervisor is free to fill any data within the page. To achieve that, the netfront driver will use m_cljget to attach an extended buffer to the mbuf, from the jumbop zone, since they are page-sized. However, two problems arise from this: 1) Allocating a page goes through malloc_large. Our implementation of malloc_large is currently terribly inefficient, and that creates a very heavy contention site. What I am doing with this patch is to switch our uma implementation to alloc_page / free_page instead of malloc if the caller of zcreate so specified (and then of course, specify it for the jumbop cache) 2) The refcount that is attached in the end of the buffer would either extend the buffer to 4100 bytes - defeating our purpose, or then the buffer would have to be PAGE_SIZE - 4, to accomodate for the refcount. But since the hypervisor will write to the whole page, it will eventually overwrite the refcount. To address that, I am allocating an external reference counter. BSD already have some infrastructure to do that, and I am taking advantage of this. However, I have found no way of implementing this in a way in which the reference count can be easily deduceable from the address of the extended buffer, without having the supporting mbuf to start from. Any external data structure such as hashes would probably make freeing way too slow. Thankfully, uma_find_refcnt and the UMA_ZONE_REFCNT seems to be used mostly in the setup/destruction phase (the mbuf refcount is used directly, open coded). So my proposal here is to remove the UMA_ZONE_REFCNT for that zone.
-
- Aug 26, 2013
-
-
Pekka Enberg authored
If a crashed OSv guest is restarted, ZFS mount causes a GPF in early startup: VFS: mounting zfs at /usr zfs: mounting osv/usr from device /dev/vblk1 Aborted GDB backtrace points finger at zfs_rmnode(): #0 processor::halt_no_interrupts () at ../../arch/x64/processor.hh:212 #1 0x00000000003e7f2a in osv::halt () at ../../core/power.cc:20 #2 0x000000000021cdd4 in abort (msg=0x636df0 "Aborted\n") at ../../runtime.cc:95 #3 0x000000000021cda2 in abort () at ../../runtime.cc:86 #4 0x000000000044c149 in osv::generate_signal (siginfo=..., ef=0xffffc0003ffe7008) at ../../libc/signal.cc:44 #5 0x000000000044c220 in osv::handle_segmentation_fault (addr=72, ef=0xffffc0003ffe7008) at ../../libc/signal.cc:55 #6 0x0000000000366df3 in page_fault (ef=0xffffc0003ffe7008) at ../../core/mmu.cc:876 #7 <signal handler called> #8 0x0000000000345eaa in zfs_rmnode (zp=0xffffc0003d1de400) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:611 #9 0x000000000035650c in zfs_zinactive (zp=0xffffc0003d1de400) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1355 #10 0x0000000000345be1 in zfs_unlinked_drain (zfsvfs=0xffffc0003ddfe000) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:523 #11 0x000000000034f45c in zfsvfs_setup (zfsvfs=0xffffc0003ddfe000, mounting=true) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:881 #12 0x000000000034f7a4 in zfs_domount (vfsp=0xffffc0003de02000, osname=0x6b14cb "osv/usr") at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1016 #13 0x000000000034f98c in zfs_mount (mp=0xffffc0003de02000, dev=0x6b14d7 "/dev/vblk1", flags=0, data=0x6b14cb) at ../../bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1415 #14 0x0000000000406852 in sys_mount (dev=0x6b14d7 "/dev/vblk1", dir=0x6b14a3 "/usr", fsname=0x6b14d3 "zfs", flags=0, data=0x6b14cb) at ../../fs/vfs/vfs_mount.c:171 #15 0x00000000003eff97 in mount_usr () at ../../fs/vfs/main.cc:1415 #16 0x0000000000203a89 in do_main_thread (_args=0xffffc0003fe9ced0) at ../../loader.cc:215 #17 0x0000000000448575 in pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, pthread_private::thread_attr const*)::{lambda()#1}::operator()() const () at ../../libc/pthread.cc:59 #18 0x00000000004499d3 in std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::__lambda0>::_M_invoke(const std::_Any_data &) (__functor=...) at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2071 #19 0x000000000037e602 in std::function<void ()>::operator()() const (this=0xffffc0003e170038) at ../../external/gcc.bin/usr/include/c++/4.8.1/functional:2468 #20 0x00000000003bae3e in sched::thread::main (this=0xffffc0003e170010) at ../../core/sched.cc:581 #21 0x00000000003b8c92 in sched::thread_main_c (t=0xffffc0003e170010) at ../../arch/x64/arch-switch.hh:133 #22 0x0000000000399c8e in thread_main () at ../../arch/x64/entry.S:101 The problem is that ZFS tries to check if the znode is an attribute directory and trips over zp->z_vnode being NULL. However, as explained in commit b7ee91ef ("zfs: port vop_lookup"), we don't even support extended attributes so drop the check completely for OSv.
-
- Aug 18, 2013
-
-
Avi Kivity authored
SIOCAIFADDR appends an address to the interface's address list instead of replacing it. This causes 'ifconfig' to display 0.0.0.0 (the first address configured) instead of the correct address obtained by dhcp. Fix by also deleting the existing address, if it exists.
-
- Aug 16, 2013
-
-
Christoph Hellwig authored
We'll need this for any pathname related actions.
-
Christoph Hellwig authored
Create a new dentry structure for pathname components, following the Linux VFS model. The vnodes are left-as is for now but are always fronted by dentries for pathname lookups. In a second step they will be moved to use non-pathname indices. [penberg: fix open(O_CREAT|O_EXCL) breakage ]
-
Christoph Hellwig authored
-
- Aug 15, 2013
-
-
Glauber Costa authored
While planning to run tests on Xen today, I found my guests in current tip failing to mount ZFS. I spent some time debugging the memory allocator, since it was the culprit last time: only to find out we were not even reaching the memory allocator. I noticed then that ZFS was failing with error 75 -> EOVERFLOW. Looking further, one of our bootup messages was showing the disks as "0MB". That information is read from the xenstore, and it was being read correctly. However, by the time we calculate the disk size, this is no longer correct, indicating a stack corruption. I found out the culprit to be a subsequent call to xs_gather, which calls a variant of scanf internally. The call was being passed a %lu argument to read an int variable, which would explain the corruption if the # of sectors was right before it in the stack. Indeed, with this fix, ZFS fails in a different way now =)
-
- Aug 14, 2013
-
-
Avi Kivity authored
Wastes memory, esp. with power-of-two allocations.
-
- Aug 13, 2013
-
-
Glauber Costa authored
The xen block driver needs some extra state not needed for the network drivers. Namely, the same way virtio-blk does, we need to tell the block layer which is our strategy, read and write functions. For that, we need some extra code that I am implementing in xenfront-blk.cc.
-
Glauber Costa authored
BSD expects its sector number to be already provided by the bio. We could add this field to the bio, but it is easier just to calculate it from the given offset. There are many places in which the bios are filled up, including many in the zfs code. So it is easier to change them just here.
-
Glauber Costa authored
Simple implementation of BSD's bus_dma interface. Since we are constrained by virtual environments, we are able to cut out most of the things.
-
Glauber Costa authored
We will use it in our bus_dma implementation
-
Glauber Costa authored
Our version of biodone takes two arguments instead of one. Adjust it, passing the status in the second argument as expected. We could adjust our biodone() function to be the same as BSD's, but I decided to do the other way around, at least for now: we need locking and synchronization via cond vars at bio completion, and although the xenfront driver has its own lock for this, the other users rely on the internal lock for correctness. Adjusting them would mean adjusting their locking semantics, which although doable, is just more work than adjusting xenfront.
-
Glauber Costa authored
We need to add our headers first, but the rest should be ready to go.
-
Glauber Costa authored
It provides very simple queue / dequeue functions for the bios.
-
Glauber Costa authored
BSD code does not initialize its structures.. It works well when memory is previously zeroed but not otherwise. Xen hypervisor compiled in debug mode fills memory with a poison pattern, and then the code breaks for those variables. Force them to 0.
-
Glauber Costa authored
Those headers are needed from blkfront and netfront. Some of them are empty stubs are usual but some are import from BSD. I am bringing them separately so it is obvious what they are here for.
-
Glauber Costa authored
The macro to calculate ring size are really gigantic and nested. Somewhere, somehow, gcc believes that one of the size calculations yields a variadic size. It doesn't seem to be the case to me, but maybe we are using (or lacking) some compiler flag that can explain this. Although this is clearly suboptimal, let us set with this for now. It should not be a huge problem unless we update xen headers.
-
Glauber Costa authored
Because softc is private - only a void pointer outside blkfront, we need a helper here to return the correct device from its softc representation. This will be used by the osv side to determine where to trigger IO to
-