- Feb 12, 2014
-
-
Zhi Yong Wu authored
When control flow reaches at the bottom inner loop in namei(), the pointer p will point to either a '\0' or a '/' character because of the upper inner loop break condition: for (i = 0; i < PATH_MAX; i++) { if (*p == '\0' || *p == '/') { break; } name[i] = *p++; } So the "while" loop will never be executed and we can eliminate it as dead code. Reviewed-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Zhi Yong Wu <zwu.kernel@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 10, 2014
-
-
Asias He authored
bio->bio_private is uesed by low-level device driver, e.g. virtio-scsi. In multiplex_strategy, when one bio is splitted into multiple smaller bios, we should copy bio_private into the newly created bios. Signed-off-by:
Asias He <asias@cloudius-systems.com>
-
- Feb 07, 2014
-
-
Raphael S. Carvalho authored
The mismatch between vnode and znode refcount was found while working on the leak series, but I wasn't able to come up with a good solution at that time. This patch addresses a problem which could potentially leak znode objects. The function vrele from the VFS layer along with the changes made into zfs_inactive prevent zfs_inactive itself from working properly on the same znode more than once. Simply put, zfs_inactive isn't able to release more than 1 refcnt of the same znode. So the actual problem comes into effect, when you have a znode holding two refcounts of its own. When it happens, the underlying znode object would stay around 'forever' ( at least till OSv is switched off ;-) ) - Scenario example where this problem would take place: * Consider that you have opened a file for the link A, so the znode structure will be created with refcnt set to 1. * Afterwards, you open a file for the link B which has the same inode as the link A. Another znode wouldn't be created, but instead the refcnt of the same znode used for the link A would be bumped, thus 2. - How to fix the problem: * First, allow zfs_inactive to work on the same znode till the refcnt reaches zero. To do that, vp->vdata must only be set to NULL when we're sure that znode will be actually destroyed. So let's do it conditionally regarding the znode refcnt from now on. NOTE: also properly initialize the field z_vnode on every znode creation. * Then finally, fix our vrele to call VOP_INACTIVE even when the vnode object isn't supposed to be destroyed. So it would release the refcnt of the znode properly. After all, zfs_zinactive called by zfs_inactive would only destroy the znode object if its refcnt reaches 0. It would also synchronize the vnode refcnts with the znode ones. 'scripts/test.py -s;' succeeded. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Feb 06, 2014
-
-
Raphael S. Carvalho authored
/proc must be unmounted to release refcnts which pertains to the root mountpoint, i.e. zfs. It was preventing zfs_umount from releasing the mp dentries properly, thus VOP_INACTIVE from being called on the respective vnodes. Found the problem while dumping the mountpoint refcnts. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 28, 2014
-
-
Takuya ASADA authored
Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 22, 2014
-
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 21, 2014
-
-
Nadav Har'El authored
This patch fixes chdir() on a normal file, which used to succeed (!?), but now will fail as it should, with ENOTDIR. The patch also adds an exhaustive test for chdir's success and error cases. Before the latest chdir() patches, most of these tests would fail, and now all of them succeed. This test is standard C++ & Posix code, so it can be run also on Linux. This is important for verifing that whatever we expect from OSv, Linux really does the same. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
This patch adds the O_DIRECTORY flag to sys_open(), which causes the open to fail with ENOTDIR if the given file is any type of file but a directory. We need this flag as part of a correct implementation of chdir() (which should fail on a non-directory file), and it is also required for Linux compatibility (the O_DIRECTORY flag exists since Linux 2.1.126). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
I don't know how chdir() ever worked - apparently it didn't! It took an argument "pathname", and then declared a local "path" and used that, not pathname, as the path :-) Obviously, a call to task_conv, which converts a relative "pathname" to an absolute "path", was missing... chdir() is still a mess and incompatible in the error cases with Linux's chdir(). I'll fix that, and add a test, in a follow-up patch. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 17, 2014
-
-
Pekka Enberg authored
Add a '/proc' directory to OSv image and mount procfs on it. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
This patch adds a simple Linux compatible procfs filesystem. It currently implements a "/proc/self/maps" file which is looked up by OpenJDK during startup. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Add 'struct file' to VOP_READ API. This is needed for procfs which generates file contents at open() time and read() must operate on it, not the vnode. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 16, 2014
-
-
Pekka Enberg authored
OpenJDK looks up "/dev/urandom" at startup but works just fine without it. There's no need to display an error message in OSv if that happens. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 09, 2014
-
-
Raphael S. Carvalho authored
This problem was found when running 'tests/tst-zfs-mount.so' multiple times. At the first time, all tests succeed, however, a subsequent run would fail at the test: 'mkdir /foo/bar', the error message reported that the target file already exists. The test basically creates a directory /foo/bar, rename it to /foo/bar2, then remove /foo/bar2. How could /foo/bar still be there? Quite simple. Our shutdown function calls unmount_rootfs() which will attempt to unmount zfs with the flag MNT_FOURCE, however, it's not being passed to zfs_unmount(), neither unmount_rootfs() tests itself the return status (which was always getting failures previously). So OSv is really being shutdown while there is remaining data waiting to be synced with the backing store. As a result, inconsitency. This problem was fixed by passing the flag to VFS_UNMOUNT which will now unmount the fs properly on sudden shutdowns. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 03, 2014
-
-
Raphael S. Carvalho authored
Start using spaces instead of tabs and surround all single-line control statements with curly braces. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
It will be useful to take better and safer VFS decisions in the future. For example, avoiding code that uses the absolute path to determine something. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
newmp->m_covered must be released if not NULL. Found this problem while dumping dcache content. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Currently, vflush is used in the unmount process to release remaining dentries. vflush in turn calls vevict that is releasing dentries that it doesn't own. This behavior is not correct neither good to the future of VFS. So Avi suggested switching to a different approach. We could only release those dentries owned by the mountpoint when unmounting it as there wouldn't be anything else in the dcache (given its functionality). The problem was fixed by doing the following steps: - Drop vflush calls in sys_umount2, make vevict an empty function, and remove vevict. - Created the function release_mp_dentries to release dentries of a mount point which will be called by VFS_UNMOUNT. It cannot be called before VFS_UNMOUNT as failures must be considered, neither after as the mount point would be considered busy. Don't respect this "rule", and that previously seen ZFS replay transaction error would happen. NOTE: vflush is currently duplicated in zfs unmount cases to address the problem above. This patch fixes this duplication as well. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 01, 2014
-
-
Nadav Har'El authored
Instead of the old C-style file-operation function types and fo_*() functions, since recently we have methods of the "file" class. All our filesystem code is now C++, and can use these methods directly. So this patch drops the old types and functions, and uses the class methods instead. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
Each implementation of "struct file" needs to implement 8 different file operations. Most special file implementations, such as pipe, socketpair, epoll and timerfd, don't support many of these operations. We had in unsupported.h functions that can be reused for the unsupported operation, but this resulted in a lot of ugly boiler-plate code. Instead, this patch switches to a cleaner, more C++-like, method: It defines a new "file" subclass, called "special_file", which implements all file operations except close(), with a default implementation identical to the old unsupported.h implementations. The files of pipe(), socketpair(), timerfd() and epoll_create() now inherit from special_file, and only override the file operations they really want to implement. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 19, 2013
-
-
Asias He authored
Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 16, 2013
-
-
Asias He authored
If iov->iov_len == 0, it will loop forever. uiomove() will take care of the zero iov len case. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 13, 2013
-
-
Raphael S. Carvalho authored
Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 10, 2013
-
-
Raphael S. Carvalho authored
Currently, namei() does vget() unconditionally if no dentry is found. This is wrong because the path can be a hard link that points to a vnode that's already in memory. To fix the problem: - Use inode number as part of the hash in vget() - Use vn_lookup() in vget() to make sure we have one vnode in memory per inode number. - Push the vget() calls down to individual filesystems and make VOP_LOOKUP return an vnode Changes since v2: - v1 dropped lock in vn_lookup, thus assert that vnode_lock is held. Changes since v3: - Fix lock ordering issue in dentry_lookup. The lock respective to the parent node must be acquired before dentry_lookup and released after the process is done. Otherwise, a second thread looking up for the same dentry may take the 'NULL' path incorrectly. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
This patch fixes the error codes in four error cases: 1. unlink() of a directory used to return EPERM (as in Posix), and now returns EISDIR (as in Linux). 2. rmdir() of a non-empty directory used to return EEXIST (as in Posix) and now returns ENOTEMPTY (as in Linux). 3. rmdir() of a regular file (non-directory) used to return EBADF and now returns ENOTDIR (as in Linux). 4. readdir() of a regular file (non-directory) used to return EBADF and now returns ENOTDIR (as in Linux). This patch also adds a test, tst-remove.cc, for the various unlink() and rmdir() success and failure modes. Fixes #123. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 09, 2013
-
-
Nadav Har'El authored
I tried using a test which called mknod() (to create an empty regular file). Despite us having an mknod() implementation, it didn't work, and failed on lookup of the symbol __xmknod. Turns out that in glibc, mknod() is source-only, and converted to the ABI function which is __xmknod, whose first parameter is a version number _MKNOD_VER_LINUX (0 on x86-64 Linux). So this patch implements __xmknod, and now mknod() works. Note we already had the same kind of trick for __xstat(), needed so that stat() would work. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
main.cc was still using tab characters instead of spaces as our coding conventions dictate. Reindent it, using Eclipse's ctrl-I. This patch doesn't change anything else. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Pekka Enberg authored
This reverts commit e4aad1ba. It causes tst-vfs.so to just hang. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 08, 2013
-
-
Raphael S. Carvalho authored
Currently, namei() does vget() unconditionally if no dentry is found. This is wrong because the path can be a hard link that points to a vnode that's already in memory. To fix the problem: - Use inode number as part of the hash in vget() - Use vn_lookup() in vget() to make sure we have one vnode in memory per inode number. - Push the vget() calls down to individual filesystems and make VOP_LOOKUP return an vnode - Drop lock in vn_lookup() and assert that vnode_lock is held. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 04, 2013
-
-
Avi Kivity authored
Everyone is now overriding file's virtual functions; we can make them pure virtual and remove fileops completely. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Unused. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Everyone switched to the nifty variadic type. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Derived file objects will be initialized by the class constructor, no need for fo_init(). Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Once file::~file() is called, virtual functions no longer dispatch to the derived type (which has since been destroyed) but to the base type, which is uninteresting. Move the call to close() from the destructor to fdrop(). Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 03, 2013
-
-
Avi Kivity authored
The default is to dispatch directly to the corresponding member of f_ops, but that can be overridden. The fo_*() functions are redirected to dispatch via the virtual functions. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-