Commits · fbd7d0846f3daa980b7182ca08a7f83248d24853 · Verlässliche Systemsoftware / projects / osv

Feb 12, 2014

vfs: remove the dead code in namei() · d87b5c2b

Zhi Yong Wu authored 11 years ago


When control flow reaches at the bottom inner loop in namei(), the pointer p
will point to either a '\0' or a '/' character because of the upper inner loop
break condition:


        for (i = 0; i < PATH_MAX; i++) {
            if (*p == '\0' || *p == '/') {
                break;
            }
            name[i] = *p++;
        }

So the "while" loop will never be executed and we can eliminate it as dead
code.

Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Zhi Yong Wu <zwu.kernel@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d87b5c2b

Feb 10, 2014

bio: Copy bio_private in multiplex_strategy · f8eace86

Asias He authored 11 years ago

bio->bio_private is uesed by low-level device driver, e.g. virtio-scsi.
In multiplex_strategy, when one bio is splitted into multiple smaller
bios, we should copy bio_private into the newly created bios.

Signed-off-by: Asias He <asias@cloudius-systems.com>

f8eace86

Feb 07, 2014

vfs/zfs: Sync vnode and znode refcounts · ca805bdb

Raphael S. Carvalho authored 11 years ago


The mismatch between vnode and znode refcount was found while working on
the leak series, but I wasn't able to come up with a good solution at
that time.

This patch addresses a problem which could potentially leak znode objects.

The function vrele from the VFS layer along with the changes made into
zfs_inactive prevent zfs_inactive itself from working properly on the
same znode more than once. Simply put, zfs_inactive isn't able to release
more than 1 refcnt of the same znode.

So the actual problem comes into effect, when you have a znode holding
two refcounts of its own. When it happens, the underlying znode object
would stay around 'forever' ( at least till OSv is switched off ;-) )

- Scenario example where this problem would take place:

* Consider that you have opened a file for the link A, so the znode structure
will be created with refcnt set to 1.

* Afterwards, you open a file for the link B which has the same inode as the
link A. Another znode wouldn't be created, but instead the refcnt of the same
znode used for the link A would be bumped, thus 2.

- How to fix the problem:

* First, allow zfs_inactive to work on the same znode till the refcnt reaches
zero. To do that, vp->vdata must only be set to NULL when we're sure that
znode will be actually destroyed. So let's do it conditionally regarding
the znode refcnt from now on.
NOTE: also properly initialize the field z_vnode on every znode creation.

* Then finally, fix our vrele to call VOP_INACTIVE even when the vnode object
isn't supposed to be destroyed. So it would release the refcnt of the znode
properly. After all, zfs_zinactive called by zfs_inactive would only destroy
the znode object if its refcnt reaches 0.
It would also synchronize the vnode refcnts with the znode ones.

'scripts/test.py -s;' succeeded.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ca805bdb

Feb 06, 2014

vfs: Unmount /proc to release refcnts pertained to root · 0a823ad2

Raphael S. Carvalho authored 11 years ago


/proc must be unmounted to release refcnts which pertains to the root
mountpoint, i.e. zfs.
It was preventing zfs_umount from releasing the mp dentries properly,
thus VOP_INACTIVE from being called on the respective vnodes.

Found the problem while dumping the mountpoint refcnts.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0a823ad2

Jan 28, 2014

Add F_SETLK, F_GETLK stub on fcntl · f3fe2238

Takuya ASADA authored 11 years ago


Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f3fe2238

Add fchown() stub · 592c572d

Takuya ASADA authored 11 years ago


Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

592c572d

Add fchmod() stub · 6e063227

Takuya ASADA authored 11 years ago


Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

6e063227

Jan 22, 2014
- include: Move mmu.hh to include/osv · 9cb900b7
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  9cb900b7
- include: Move sched.hh to include/osv · fae5693e
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  fae5693e
Jan 21, 2014

chdir(): Fix error path, and add test · 4ae8779e

Nadav Har'El authored 11 years ago


This patch fixes chdir() on a normal file, which used to succeed (!?),
but now will fail as it should, with ENOTDIR.

The patch also adds an exhaustive test for chdir's success and error cases.
Before the latest chdir() patches, most of these tests would fail, and now
all of them succeed.

This test is standard C++ & Posix code, so it can be run also on Linux.
This is important for verifing that whatever we expect from OSv, Linux
really does the same.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4ae8779e

open(): add O_DIRECTORY flag · 05fc5774

Nadav Har'El authored 11 years ago

This patch adds the O_DIRECTORY flag to sys_open(), which causes the open
to fail with ENOTDIR if the given file is any type of file but a directory.

We need this flag as part of a correct implementation of chdir() (which
should fail on a non-directory file), and it is also required for Linux
compatibility (the O_DIRECTORY flag exists since Linux 2.1.126).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

05fc5774

Fix non-functional chdir() · 0f9bf9b6

Nadav Har'El authored 11 years ago

I don't know how chdir() ever worked - apparently it didn't!

It took an argument "pathname", and then declared a local "path" and used
that, not pathname, as the path :-) Obviously, a call to task_conv, which
converts a relative "pathname" to an absolute "path", was missing...

chdir() is still a mess and incompatible in the error cases with Linux's
chdir(). I'll fix that, and add a test, in a follow-up patch.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0f9bf9b6

Jan 17, 2014

vfs: mount procfs at boot · b21423ec

Pekka Enberg authored 11 years ago


Add a '/proc' directory to OSv image and mount procfs on it.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b21423ec

proc filesystem · e293854f

Pekka Enberg authored 11 years ago


This patch adds a simple Linux compatible procfs filesystem.  It
currently implements a "/proc/self/maps" file which is looked up by
OpenJDK during startup.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e293854f

vfs: 'struct file' to VOP_READ · 9f68c2d9

Pekka Enberg authored 11 years ago


Add 'struct file' to VOP_READ API. This is needed for procfs which
generates file contents at open() time and read() must operate on it,
not the vnode.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9f68c2d9

Jan 16, 2014

devfs: Remove "device not found" printout · f40fdbae

Pekka Enberg authored 11 years ago

OpenJDK looks up "/dev/urandom" at startup but works just fine without
it. There's no need to display an error message in OSv if that happens.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f40fdbae

Jan 09, 2014

zfs: Fix on-disk data inconsistency on shutdown · 2d93af3b

Raphael S. Carvalho authored 11 years ago

This problem was found when running 'tests/tst-zfs-mount.so' multiple times.
At the first time, all tests succeed, however, a subsequent run would
fail at the test: 'mkdir /foo/bar', the error message reported
that the target file already exists.

The test basically creates a directory /foo/bar, rename it to /foo/bar2,
then remove /foo/bar2. How could /foo/bar still be there?

Quite simple. Our shutdown function calls unmount_rootfs() which will
attempt to unmount zfs with the flag MNT_FOURCE, however, it's not being
passed to zfs_unmount(), neither unmount_rootfs() tests itself the
return status (which was always getting failures previously).
So OSv is really being shutdown while there is remaining data waiting to
be synced with the backing store. As a result, inconsitency.

This problem was fixed by passing the flag to VFS_UNMOUNT which will now
unmount the fs properly on sudden shutdowns.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2d93af3b

Jan 03, 2014

Fix fs/vfs/vfs_lookup.c coding style · 67d82557

Raphael S. Carvalho authored 11 years ago


Start using spaces instead of tabs and surround all single-line
control statements with curly braces.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

67d82557

vfs: Add hierarchy support to directory entries · 3bd235e9

Raphael S. Carvalho authored 11 years ago

It will be useful to take better and safer VFS decisions in the future.
For example, avoiding code that uses the absolute path to determine something.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3bd235e9

vfs: Fix dentry leak in sys_pivot_root · e30bed5c

Raphael S. Carvalho authored 11 years ago


newmp->m_covered must be released if not NULL.
Found this problem while dumping dcache content.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e30bed5c

vfs: change the approach of releasing dentries during unmount · af466dbc

Raphael S. Carvalho authored 11 years ago


Currently, vflush is used in the unmount process to release remaining
dentries. vflush in turn calls vevict that is releasing dentries that
it doesn't own.
This behavior is not correct neither good to the future of VFS.

So Avi suggested switching to a different approach. We could only
release those dentries owned by the mountpoint when unmounting it as
there wouldn't be anything else in the dcache (given its functionality).

The problem was fixed by doing the following steps:
 - Drop vflush calls in sys_umount2, make vevict an empty function,
and remove vevict.

 - Created the function release_mp_dentries to release dentries of a mount
point which will be called by VFS_UNMOUNT. It cannot be called before
VFS_UNMOUNT as failures must be considered, neither after as the mount point
would be considered busy.
Don't respect this "rule", and that previously seen ZFS replay transaction
error would happen.

NOTE: vflush is currently duplicated in zfs unmount cases to address the problem
above. This patch fixes this duplication as well.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

af466dbc

Jan 01, 2014

fs: clean up old "fo_*" C functions · a844d248

Nadav Har'El authored 11 years ago


Instead of the old C-style file-operation function types and fo_*()
functions, since recently we have methods of the "file" class. All our
filesystem code is now C++, and can use these methods directly.

So this patch drops the old types and functions, and uses the class methods
instead.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

a844d248

file: reduce boiler-plate code in special files · 9478a14d

Nadav Har'El authored 11 years ago

Each implementation of "struct file" needs to implement 8 different file
operations. Most special file implementations, such as pipe, socketpair,
epoll and timerfd, don't support many of these operations. We had in
unsupported.h functions that can be reused for the unsupported operation,
but this resulted in a lot of ugly boiler-plate code.

Instead, this patch switches to a cleaner, more C++-like, method:
It defines a new "file" subclass, called "special_file", which implements
all file operations except close(), with a default implementation identical
to the old unsupported.h implementations.

The files of pipe(), socketpair(), timerfd() and epoll_create() now inherit
from special_file, and only override the file operations they really want
to implement.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

9478a14d

Dec 19, 2013

vfs: Do uio size check in bdev_read adn bdev_write · af12394b

Asias He authored 11 years ago


Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

af12394b

Dec 16, 2013

vfs: Fix endless loop in bdev_read and bdev_write · 09f4ec17

Asias He authored 11 years ago


If iov->iov_len == 0, it will loop forever.
uiomove() will take care of the zero iov len case.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

09f4ec17

Dec 13, 2013

umount2: Add parameter checks · 2afd6f60

Raphael S. Carvalho authored 11 years ago


Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2afd6f60

Dec 10, 2013

vfs: Fix duplicate in-memory vnodes · 9ecda822

Raphael S. Carvalho authored 11 years ago


Currently, namei() does vget() unconditionally if no dentry is found.
This is wrong because the path can be a hard link that points to a vnode
that's already in memory.

To fix the problem:

  - Use inode number as part of the hash in vget()

  - Use vn_lookup() in vget() to make sure we have one vnode in memory
    per inode number.

  - Push the vget() calls down to individual filesystems and make
    VOP_LOOKUP return an vnode

Changes since v2:
  - v1 dropped lock in vn_lookup, thus assert that vnode_lock is held.

Changes since v3:
  - Fix lock ordering issue in dentry_lookup. The lock respective to the parent
node must be acquired before dentry_lookup and released after the process is
done. Otherwise, a second thread looking up for the same dentry may take the
'NULL' path incorrectly.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

9ecda822

Fix wrong error codes in unlink(), rmdir() and readdir() · 86b5374f

Nadav Har'El authored 11 years ago


This patch fixes the error codes in four error cases:

1. unlink() of a directory used to return EPERM (as in Posix), and now
   returns EISDIR (as in Linux).

2. rmdir() of a non-empty directory used to return EEXIST (as in Posix)
   and now returns ENOTEMPTY (as in Linux).

3. rmdir() of a regular file (non-directory) used to return EBADF
   and now returns ENOTDIR (as in Linux).

4. readdir() of a regular file (non-directory) used to return EBADF
   and now returns ENOTDIR (as in Linux).

This patch also adds a test, tst-remove.cc, for the various unlink() and
rmdir() success and failure modes.

Fixes #123.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

86b5374f

Dec 09, 2013

Implement mknod() · dd701e2d

Nadav Har'El authored 11 years ago


I tried using a test which called mknod() (to create an empty regular file).
Despite us having an mknod() implementation, it didn't work, and failed on
lookup of the symbol __xmknod.

Turns out that in glibc, mknod() is source-only, and converted to the ABI
function which is __xmknod, whose first parameter is a version number
_MKNOD_VER_LINUX (0 on x86-64 Linux).

So this patch implements __xmknod, and now mknod() works.

Note we already had the same kind of trick for __xstat(), needed so that
stat() would work.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

dd701e2d

Reindent fs/vfs/main.cc · eb8451a0

Nadav Har'El authored 11 years ago


main.cc was still using tab characters instead of spaces as our coding
conventions dictate. Reindent it, using Eclipse's ctrl-I.
This patch doesn't change anything else.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

eb8451a0

Revert "vfs: Fix duplicate in-memory vnodes" · 0984e12e

Pekka Enberg authored 11 years ago


This reverts commit e4aad1ba.

It causes tst-vfs.so to just hang.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0984e12e

Dec 08, 2013

vfs: Fix duplicate in-memory vnodes · e4aad1ba

Raphael S. Carvalho authored 11 years ago


Currently, namei() does vget() unconditionally if no dentry is found.
This is wrong because the path can be a hard link that points to a vnode
that's already in memory.

To fix the problem:

  - Use inode number as part of the hash in vget()

  - Use vn_lookup() in vget() to make sure we have one vnode in memory
    per inode number.

  - Push the vget() calls down to individual filesystems and make
    VOP_LOOKUP return an vnode

  - Drop lock in vn_lookup() and assert that vnode_lock is held.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e4aad1ba

Dec 04, 2013

file: remove fileops · c67f9ebf