- Jan 09, 2014
-
-
Raphael S. Carvalho authored
This problem was found when running 'tests/tst-zfs-mount.so' multiple times. At the first time, all tests succeed, however, a subsequent run would fail at the test: 'mkdir /foo/bar', the error message reported that the target file already exists. The test basically creates a directory /foo/bar, rename it to /foo/bar2, then remove /foo/bar2. How could /foo/bar still be there? Quite simple. Our shutdown function calls unmount_rootfs() which will attempt to unmount zfs with the flag MNT_FOURCE, however, it's not being passed to zfs_unmount(), neither unmount_rootfs() tests itself the return status (which was always getting failures previously). So OSv is really being shutdown while there is remaining data waiting to be synced with the backing store. As a result, inconsitency. This problem was fixed by passing the flag to VFS_UNMOUNT which will now unmount the fs properly on sudden shutdowns. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 03, 2014
-
-
Raphael S. Carvalho authored
Currently, vflush is used in the unmount process to release remaining dentries. vflush in turn calls vevict that is releasing dentries that it doesn't own. This behavior is not correct neither good to the future of VFS. So Avi suggested switching to a different approach. We could only release those dentries owned by the mountpoint when unmounting it as there wouldn't be anything else in the dcache (given its functionality). The problem was fixed by doing the following steps: - Drop vflush calls in sys_umount2, make vevict an empty function, and remove vevict. - Created the function release_mp_dentries to release dentries of a mount point which will be called by VFS_UNMOUNT. It cannot be called before VFS_UNMOUNT as failures must be considered, neither after as the mount point would be considered busy. Don't respect this "rule", and that previously seen ZFS replay transaction error would happen. NOTE: vflush is currently duplicated in zfs unmount cases to address the problem above. This patch fixes this duplication as well. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
gcc -O3 complains about uninitialized use of 'nims'. In fact, inm_get_source() can return an error without setting nims which will be later read from RB_FOREACH_REVERSE_FROM macro. It looks like 'nims' is intended to hold the last value for which inm_get_source() returned success. The uninitialized access would happen if this function never succeded. I am not sure if this is possible in practice, but let's initialize nims to NULL, which will cause no iteration in RB_FOREACH_REVERSE_FROM macro. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Gcc comaplins about attempt to read 'size' via dereferencing a pointer in xdr_u_int() in case xdrs->x_op == XDR_ENCODE. However, in this case the size will be set from the switch case inside xdr_string() before xdr_u_int() is invoked. I think it's spurious because the code clearly assumes that xdrs->x_op cannot change between these two execution points. Let's initialize 'size' to 0 to make gcc happy. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 01, 2014
-
-
Nadav Har'El authored
Instead of the old C-style file-operation function types and fo_*() functions, since recently we have methods of the "file" class. All our filesystem code is now C++, and can use these methods directly. So this patch drops the old types and functions, and uses the class methods instead. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Drop the socket uma zone and replace it with calls to new and delete. This allows the constructor and destructor to be called, so we can add C++ objects to the socket structure. Take care to use the default-initializing form of the constructor since the socket code requires zero initialization. We lose the ability to limit the socket count, so we'll have to re-add it in the future if we want it. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
Each implementation of "struct file" needs to implement 8 different file operations. Most special file implementations, such as pipe, socketpair, epoll and timerfd, don't support many of these operations. We had in unsupported.h functions that can be reused for the unsupported operation, but this resulted in a lot of ugly boiler-plate code. Instead, this patch switches to a cleaner, more C++-like, method: It defines a new "file" subclass, called "special_file", which implements all file operations except close(), with a default implementation identical to the old unsupported.h implementations. The files of pipe(), socketpair(), timerfd() and epoll_create() now inherit from special_file, and only override the file operations they really want to implement. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 30, 2013
-
-
Tomasz Grabiec authored
This was the cause of poor ZFS performance in misc-fs-stress test. Before: Wrote 168.129 MB in 10.12 s = 16.610 Mb/s Wrote 194.688 MB in 10.00 s = 19.469 Mb/s Wrote 183.004 MB in 10.06 s = 18.186 Mb/s Wrote 167.754 MB in 10.28 s = 16.315 Mb/s After: Wrote 636.227 MB in 10.00 s = 63.623 Mb/s Wrote 666.979 MB in 10.00 s = 66.696 Mb/s Wrote 613.512 MB in 10.00 s = 61.350 Mb/s Wrote 573.502 MB in 10.00 s = 57.346 Mb/s Wrote 668.607 MB in 10.00 s = 66.857 Mb/s Wrote 630.920 MB in 10.00 s = 63.087 Mb/s It turned out that the limiting factor was the ARC cache. A check inside arc_tempreserve_space() was forcing txg to be synced too often (once every 400ms). The arc_c variable was only 16M (arc_c_min) which allowed to write only 8M per transaction. It turns out that arc_c depends on kmem_size() which is based on physmem which was never initialized. I would hold with commiting this yet because of several reasons, which I want to put under your consideration. While this improves write throughput it makes the boot time after make much longer, on my disk the boot time is increased from 1.5s to 10s. This is because zfs verifies the last 3 txgs upon mount. This patch increases txg size, which results in more data to check in the next boot. I'm working on solving this right now. Something worth noting is that while larger transactions sync less often incresing throughput they also sync longer increasing worst case latency. In my test the pauses get as high as 3 seconds with 1G of guest memory. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
While (unfortunately) C++ doesn't support designated initializers, and the compiler rejects them, one instance has survived in xenbus. Strangely, gcc 4.8.2 generates correct code, while gcc 4.8.0 fails with an internal compiler error, instead of both of them rejecting the code. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 26, 2013
-
-
Avi Kivity authored
Backported from FreeBSD r242252. Improves netperf by about 10%. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
This is more useful if there is no ordering between the two numbers (either one can be ahead). Change BYTES_THIS_ACK to return unsigned, to prevent an unsigned division from turning into a signed division. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Add comparison operators that use modulo arithmetic to order sequence numbers, and use them to replace SEQ_LT() and friends, increasing code readability. As a consequence std::min() and std::max() can be used instead of SEQ_MIN() and SEQ_MAX(). Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
tcp sequence numbers are similar to integers, but have different comparison operations. Separate them into a class so we don't mix the two. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
inline functions can be overloaded and are less nasty than macros in other ways (like evaluating their arguments only once). Note we can't touch ntohl() itself, since it is defined to be an out-of-line function by libc. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 24, 2013
-
-
Avi Kivity authored
Helps making bsd header changes that xen includes. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
We use sched::thread::attr to pass parameters to sched::thread creation, i.e., create a thread with non-default stack parameters, pinned to a particular CPU, or a detached thread. Previously we had constructors taking many combinations of stack size (integer), pinned cpu (cpu*) and detached (boolean), and doing "the right thing". However, this makes the code hard to read (what does attr(4096) specify?) and the constructors hard to expand with new parameters. Replace the attr() constructors with the so-called "named parameter" idiom: attr now only has a null constructor attr(), and one modifies it with calls to pin(cpu*), detach(), or stack(size). For example, attr() // default attributes attr().pin(sched::cpus[0]) // pin to cpu 0 attr().stack(4096).pin(sched::cpus[0]) // pin and non-default stack and so on. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Dmitry Fleytman authored
This patch applies bugfix published on FreeBSD list at Feb 2013: http://lists.freebsd.org/pipermail/svn-src-stable-9/2013-February/003928.html LRO mechanism is broken on systems without IP checksum verification offload. Due to improper checksum verification RX packets omit LRO path and go directly to TCP stack which is not good for performance. EC2 Xen is one example of such a system. This bug is one of the reasons we see bad performance on Amazon. Some test results w/ and w/o the fix: Buffer size Before After Improvement % TCP TX 32 557.52 1386.28 149 64 552.38 1385.99 151 128 546.43 1401.46 156 256 565.25 1382.28 145 512 557.32 1375.23 147 1024 549.71 1356.69 147 2048 551.11 1371.92 149 4096 556.13 1383.67 149 8192 559.49 1364.05 144 16384 567.25 1366.48 141 32768 546.18 1366.63 150 65536 553.4 1353.87 145 TCP RX 32 107.37 105.48 -2 64 187.56 179.9 -4 128 297.16 301.71 2 256 300.47 503.92 68 512 294.76 826.13 180 1024 299.95 1916.69 539 2048 287.04 1924.44 570 4096 300.78 1929.37 541 8192 304.52 1934.02 535 16384 305.04 1957.54 542 32768 309 1921.84 522 65536 296.48 1935.41 553 Still we are pretty far from Linux, there are other problems to be fixed. Signed-off-by:
Dmitry Fleytman <dmitry@daynix.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 20, 2013
-
-
Asias He authored
if_getinfo is shorter and more consistent all the other if_ functions name, e.g., if_init, if_ioctl. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Error message: bsd/sys/kern/uipc_socket.cc: In function 'socket* soalloc(vnet*)': bsd/sys/kern/uipc_socket.cc:245:58: error: expected ')' before 'PRIx64' uipc_d("soalloc() so=%" PRIx64, (uint64_t)so); ^ This was introduced in one of these commits: c506516b14d32e416f4a9247869a1a89f7e9c413 406a93ed As suggested by Pekka including <cinttypes> is the proper fix for C++11. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 19, 2013
-
-
Avi Kivity authored
It's just to painful to do this piecemeal. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Vlad Zolotarov authored
Added ifnet->if_get_if_info() method to collect the internally gathered statistics into the standard if_data struct. Signed-off-by:
Vlad Zolotarov <vladz@cloudius-systems.com> Reviewed-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Vlad Zolotarov authored
Added IFCAP_HWSTATS iface capability indicating that device driver and HW are capable to handle the statistics internally. Signed-off-by:
Vlad Zolotarov <vladz@cloudius-systems.com> Reviewed-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Over the process of creating a zpool, history_str_get() will be eventually called by zfs_ioc_pool_create(). history_str_get() will not work as expected as it fully relies on the function copyinstr() which is currently unimplemented. After that, spa_create() will be called with history_str storing trash, thus making the spa_history_log filled with completely wrong entries. Let's fix this problem simply by implementing copyinstr. By the way, copystr has basically the same semantics as copyinstr, so let's implement it as well. It's not being used anywhere, but who knows? It might be needed in the future by some bsd compliant application. As far as I know, this problem didn't manifest *visibly* anywhere as the spa history log wasn't intentionally used by us yet. I found the problem while auditing code. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 16, 2013
-
-
Avi Kivity authored
bsd defines some m_ macros, for example m_flags, to save some typing. However if you have a variable of the same name in another header, for example m_flags, have fun trying to compile your code. Expand the code in place and eliminate the macros. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
The bsd specific ones are nenumbered to avoid clashes. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Since most SOL_* macros are equivalent to the IPPROTO_* defines, only one define needs to be changed. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Move them to a common include file. Since they're defined externally, there is no real conflict. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Unused. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 12, 2013
-
-
Avi Kivity authored
Fix a unicode character in the author's name. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Vlad authored
Fix some compilation errors that would arrise when NDEBUG is defined: - tests/misc-tcp.cc: missing #include<iostream> that would come from include/boost/assert.hpp: line 81 when NDEBUG is not defined. - bsd/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_dataset.c: when NDEBUG is not defined assert() is defined to __assert_fail, which has "noreturn" attribute, which satisfies the compiler. When NDEBUG is defined and assert() completly compiles out, the compiler start complaining about the missing "return" statment. Signed-off-by:
Vlad Zolotarov <vladz@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 10, 2013
-
-
Raphael S. Carvalho authored
Currently, namei() does vget() unconditionally if no dentry is found. This is wrong because the path can be a hard link that points to a vnode that's already in memory. To fix the problem: - Use inode number as part of the hash in vget() - Use vn_lookup() in vget() to make sure we have one vnode in memory per inode number. - Push the vget() calls down to individual filesystems and make VOP_LOOKUP return an vnode Changes since v2: - v1 dropped lock in vn_lookup, thus assert that vnode_lock is held. Changes since v3: - Fix lock ordering issue in dentry_lookup. The lock respective to the parent node must be acquired before dentry_lookup and released after the process is done. Otherwise, a second thread looking up for the same dentry may take the 'NULL' path incorrectly. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
This patch fixes the error codes in four error cases: 1. unlink() of a directory used to return EPERM (as in Posix), and now returns EISDIR (as in Linux). 2. rmdir() of a non-empty directory used to return EEXIST (as in Posix) and now returns ENOTEMPTY (as in Linux). 3. rmdir() of a regular file (non-directory) used to return EBADF and now returns ENOTDIR (as in Linux). 4. readdir() of a regular file (non-directory) used to return EBADF and now returns ENOTDIR (as in Linux). This patch also adds a test, tst-remove.cc, for the various unlink() and rmdir() success and failure modes. Fixes #123. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 09, 2013
-
-
Pekka Enberg authored
This reverts commit e4aad1ba. It causes tst-vfs.so to just hang. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 08, 2013
-
-
Raphael S. Carvalho authored
Currently, namei() does vget() unconditionally if no dentry is found. This is wrong because the path can be a hard link that points to a vnode that's already in memory. To fix the problem: - Use inode number as part of the hash in vget() - Use vn_lookup() in vget() to make sure we have one vnode in memory per inode number. - Push the vget() calls down to individual filesystems and make VOP_LOOKUP return an vnode - Drop lock in vn_lookup() and assert that vnode_lock is held. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 05, 2013
-
-
Pekka Enberg authored
OSv does not support xattrs in ZFS. Fix up a build breakage reported by Glauber: /home/ubuntu/osv/bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c: In function ‘zfs_setattr’: /home/ubuntu/osv/bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:2152:12: error: ‘xoap’ may be used uninitialized in this function [-Werror=uninitialized] Reported-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Probably the worst indented file in the whole tree. Set Eclipse Ctrl-I on it. Also fix a unicode character in the author's name. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-