- Mar 27, 2014
-
-
Raphael S. Carvalho authored
Previously, zfs device was being only provided to allow the use of commands needed to create the zpool, and so the file system. At that time, doing so was quite enough, however, making zfs device, i.e. /dev/zfs part of every OSv instance would allow us to use commands that will help analysing, debugging, tuning the zpool and file systems there contained. The basic explanation is that those commands use libzfs which in turn relies on /dev/zfs to communicate with the zfs code. Commands example: zpool, zfs, zdb. The latter one not being ported to OSv yet. This patch will also be helpful for the ongoing ztest porting. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
/etc/mnttab is required by libzfs to get running properly, so let's create it as an empty file. ryao from zfsonlinux and openzfs told me that an empty /etc/mnttab is used on Linux. Also reading the libzfs code shows that /etc/mnttab mostly used for management of the file itself, nothing that would prevent some libzfs functionality from working. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
The tst-dns-resolver.so fails spuriously. Blacklist it until the problem is fixed to keep Jenkin builds running. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
-
Raphael S. Carvalho authored
The root dataset and ZFS are mounted at the mkfs phase, but they aren't unmounted aftwards. Running mkfs with VERBOSE flag enabled shows the following: Running mkfs... VFS: mounting zfs at /zfs zfs: mounting osv from device osv VFS: mounting zfs at /zfs/zfs zfs: mounting osv/zfs from device osv/zfs The first mount happens when issuing: {"zpool", "create", "-f", "-R", "/zfs", "osv", "/dev/vblk0.1"}, &ret); It creates a pool called osv and mounts the root dataset at /zfs The latter mount happens when issuing: {"zfs", "create", "osv/zfs"} It creates a file system called zfs at the pool OSv and automatically mounts it at the root dataset mountpoint. No data inconsistency problem was seen up to now because both mkfs.so and cpiod.so do an explicit sync() at the end, thus ensuring everything was correctly flushed out to the stable storage. There is an expression in Dutch that says: prevention is better than cure. Thus, this patch changes cpiod.so to unmount both mount points when the /zfs/zfs prefix was passed. It cannot be done at mkfs.so itself because cpiod.so is called afterwards at the same OSv instance. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Zifei Tong authored
Python3 no longer allow implicitly conversion form bytes to string, add explicit decode() to convert input bytes. Tested with both Python2 and Python3. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Zifei Tong <zifeitong@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Gleb has noticed that the ARC buffers can go unshared too early. This will happen because we call the UNMAP operation on every put(). That is certainly not what we want, since the buffer only has to be unshared when the last reference is gone. Design decisions: 1) We obviously can't use the arc natural reference count for that, since bumping it would make the buffer unevictable. 2) We could modify the arc_buf structure itself to add another refcnt (minimum 4 bytes). However, I am trying to keep core-ZFS modifications to a minimum, and only to places where it is totally unavoidable. Therefore, the solution is to add another hash, which will hash the whole buffer instead of the physaddr like the one we have currently. In terms of memory usage, it will add only 8 bytes per buffer (+/- 128k each buffer), which makes for a memory usage of 64k per mapped Gb compared to the arc refcount solution. This is a good trade off. I am also avoiding adding a new vop_map/unmap style operation just to query the buffer address from its file attributes (needed for the put side). Instead, I am conventioning that an empty iovec means query, and a filled iov means unshare. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
Zifei Tong authored
debugf() used to write log message with respect to the length of format string. This will cause the messages wrongly truncated. Also change confusing variable names: exchange 'fmt' and 'msg'. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Zifei Tong <zifeitong@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Spotted by code review. Gleg had spotted one improper use of "i", but there was another. In this case we iterate over nothing, and i is always 0. It is uninitialized to begin with, and the code works just because it is being set to 0 by luck. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
Glauber Costa authored
We have seen bugs with mmap shared/file handling for small files. This patch tests some of the corner scenarios to find those problems. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
Glauber Costa authored
There is a problem with the way ZFS currently handles its buffers, which is actually a limitation of our allocator: buffers smaller than a page won't be page aligned even if we ask for it. Therefore, if the buffer we are mapping falls into this category, we will map the wrong location. The way I solved this problem was so stupid, that in retrospect I can't even believe I did it: when the file would run out of size, we would truncate the file. This is obviously wrong because reading a file is not expected to change its size in any circumstance, and if anybody relied in the actual size, we will be crashing something. This is the bug that plagued Cassandra. Not truncating, however, brings back the original problem. One solution I have considered is to always allocate at least a page for data allocations (leaving metadata alone), but that would deviate from ZFS and harm many-small-files workloads. However, During testing, I have noticed though that ZFS will allocate small buffers only when the file itself is small. This means that we can just avoid using the special shared mapping for small files - which makes sense anyway. For instance, if we have a file that is 128k + 1byte (remember 128k is ZFS's maximum buffer size), both buffers will be large enough to be aligned. And if I that ever fails to hold, we will now see an assertion hit instead of a random bug. In time, we should fix our allocator to provide alignment guarantees. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
Gleb Natapov authored
Currently all mapping are keyed on ARC buffer start when mapping is added, but on remove pointer into ARC buffer is used, so remove may leave no longer valid mappings in the database. This patch fixes it by using a pointer into ARC as a key, the same pointer that is used during removal. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
- Mar 26, 2014
-
-
Tomasz Grabiec authored
Option -F should always be used with -X. Without this flag if output is smaller than the screen then no output will be shown. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Running "make image=mgmt,iperf" failed because the last line in apps/iperf/usr.manifest did not have a newline, and was copied without a newline, which caused it to be stuck together to the first line of mgmt's manifest. Fix this by explicitly adding a newline character to each line we add to the generated manifest file - whether or not the original manifest had one. Reviewed-by:
Tomasz Grabiec <tgrabiec@gmail.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
ESXi 5.5 can not edit virtualHW.version = 10 vmx file. Reducing to 8 works in ESXi. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Mar 25, 2014
-
-
Tomasz Grabiec authored
timer_base is a thread-agnostic interface for per-CPU timers. The current code is prone to a race condition involving set() and cancel(). The latter may attempt to remove the timer before it was inserted into timer tree. Found during code inspection. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Implement __sysv_signal(), which is used by code using signal() when compiled with _XOPEN_SOURCE, -std=..., or something similar (see signal(2) manual page for a full discussion of the two variants of signal()). Fixes #238. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Instead of duplicating sigaction()'s code, let's just use it. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Add signal number verification to sigaction(). Also add a FIXME comment that we don't support mode sa_flags. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
The id of the last thread might mislead people's thinking about the number of threads in OSv, so let's simply print the nr after all threads were already printed. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
tst-kill runs various signal handlers, which we run in separate threads. When the test completes, we may be unlucky enough for the last signal handler to still be running, at which point when the module's memory is unmapped (e.g., in test.py -s each test is unmapped when it ends) we can get a page fault and a crash. This patch sleeps for a second at the end of tst-kill, to make sure that the signal handler has completed; This sleep is a bit ugly, but I can't think of a cleaner way - Posix provides no way to check if there's a running handler, and I wouldn't like to add a new API just for this test. Fixes #249. Reviewed-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
All threads created through the bsd/porting/kthread interface are threads that used to be kernel threads in BSD, which means they are expected to use less stack. Although I have no idea what is the default stack size for BSD, in Linux things need as little as 4k. More importantly, they are threads whose memory usage are under our control, and we could fix heavy offenders without a problem. If we don't say anything, they will start with 64k which is way, way, too much. I am proposing here we go lower and get to 16k - which is even still quite conservative, but so am I. Measuring memory before and after the mount - because ZFS is currently our heaviest user, I can save around 7Mb with this patch. Passes make check (except for tst-kill, which is broken AFAICT) and misc-fs-stress. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Add 'all' as dependent target of osv.vmdk/.vdi, to prevent build error. Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
On VBOX and VMW, the version info is not printed correctly. Fix it by only print after our console is initialized. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
-
- Mar 24, 2014
-
-
wangbicheng authored
The request_memory() function has an early exit condition where we forgot to call _detach(). Fix that up by attaching later in the function which ensures we eventually will detach. Signed-off-by:
wangbicheng <wangbicheng@huawei.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
sogeneric_send(), in one error case, unlocks the socket before doing 'goto release' which unlocks the socket again, resulting in an assertion failure (unlocking an unlocked mutex) and crash. This patch removes the extra unlock. Fixes #248. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
Generate osv.vdi image from 'make osv.vdi' Makefile target which will convert to the image to vdi as well as add the --vga option. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Add osv.vmdk and osv.vdi targets on Makefile. Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> [ penberg: fix formatting ] Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Gleb Natapov authored
Rescheduling in nested exception is not supported since nested exception stack is per cpu, but without the assert such reschedule will cause stack corruption which will be hard to debug. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
AHCI has 32 cmd slots to issue cmd. Only first slot is used currently which makes the queue-depth 1. This patch uses all the cmd slots and makes the cmd completion async. Now, the queue-depth is 32. Test with "/tests/misc-bdev-write.so" on VBOX shows improvements: Before: ~10MB/s After: ~20MB/s 1000 round of "/tests/misc-bdev-rw.so" tests passed on VBOX and QEMU. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
Assert to make sure the data buffer address satisfies the AHCI spec's data alignment requirement. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
AHCI spec '4.2.2 Command List Structure' says 'Command Table Descriptor Base Address' must be aligned to 128-byte cache line, indicated by bits 06:00 being reserved. All the 32 cmd_table are allocated in one linear space. The size of cmd_table is 144 bytes for now which is larger than 128 bytes. So we pad cmd_table to 256 byes. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
One PRDT entry can contain 4MB buffer at most, we currently only use one PRDT per AHCI cmd. Limit the size of a bio request to respect it. Larger bio will be split into smaller bios by multiplex_strategy. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Add missing scandir() function from musl 1.0.0. Fixes #237. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-