- Apr 25, 2014
-
-
Tomasz Grabiec authored
Tracepoint argument which extends 'blob_tag' will be interpreted as a range of byte-sized values. Storage required to serialize such object is proportional to its size. I need it to implement storage-fiendly packet capturing using tracing layer. It could be also used to capture variable length strings. Current limit (50 chars) is too short for some paths passed to vfs calls. With variable-length encoding, we could set a more generous limit. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
-
Avi Kivity authored
If the rcu threads need memory, let them have it, since they will use it to free even more memory. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
malloc() must wait for memory, and since page table operations can allocate memory, it must be able to dip into the reserve pool. free() should indicate it is a reclaimer. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
We already have a facility that to indicate that a thread is a reclaimer and should be allowed to allocate reserve memory (since that memory will be used to free memory). Extend it to allow indicating that a particular code section is used to free memory, not the entire thread. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
After the previous patches, when we try to run an executable we cannot read (e.g., a directory - see issue #94), a "struct error" exception will be thrown out of osv::run, and nobody will catch it so the user will see a somewhat-puzzling "uncaught exception" error. With this patch, we catch the read error exception inside osv::run(), and when it happens, just return a normal load failure (nullptr). E.g, now trying to run a directory will result in a normal failure: $ scripts/run.py -e / OSv v0.07-39-g03feb99 run_main(): cannot execute /. Powering off. Fixes #94. The osv::run() API currently (before this patch, and also after it) doesn't have any way to say *why* the loading failed - it could have been that the executable was a directory, that it was not an ELF shared object, that it was a shared object and didn't have a main - in all cases the return value is nullptr. In the future this should probably change. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
read(fileref, ...) and write(fileref, ...), when they notice an error, currently assert() causing a crash which looks to non-expert users like an OSv bug. This causes, for example, bug #94, where doing osv::run() of a directory, instead of an executable, causes such an assertion failure. With this patch, these read() and write() will throw an exception (a struct error) instead of asserting. This fixes #94, as now the user will see an uncaught exception instead of buggy-looking assertion failure. In the next patch we'll catch this exception, so the user won't even see that. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Gleb Natapov authored
No longer used. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
The bug that caused it to be disabled should be fixed now. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Now vma_list_mutex is used to protect against races between ARC buffer mapping by MMU and eviction by ZFS. The problem is that MMU code calls into ZFS with vma_list_mutex held, so on that path all ZFS related locks are taken after vma_list_mutex. An attempt to acquire vma_list_mutex during ARC buffer eviction, while many of the same ZFS locks are already held, causes deadlock. It was solved by using trylock() and skipping an eviction if vma_list_mutex cannot be acquired, but it appears that some mmapped buffers are destroyed not during eviction, but after writeback and this destruction cannot be delayed. It calls for locking scheme redesign. This patch introduce arc_lock that have to be held during access to read_cache. It prevents simultaneous eviction and mapping. arc_lock should be the most inner lock held on any code path. Code is change to adhere to this rule. For that the patch replaces ARC_SHARED_BUF flag by new b_mmaped field. The reason is that access to b_flags field is guarded by hash_lock and it is impossible to guaranty same order between hash_lock and arc_lock on all code paths. Dropping the need for hash_lock is a nice solution. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Currently page_allocator return a page to a page mapper and the later populates a pte with it. Sometimes page allocation and pte population needs to be appear atomic though. For instance in case of a pagecache we want to prevent page eviction before pte is populated since page eviction clears pte, but if allocation and mapping is not atomic pte can be populated with stale data after eviction. With current approach very wide scoped lock is needed to guaranty atomicity. Moving pte population into page_allocator allows for much simpler locking. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Current code assumes that for the same file and same offset ZFS will always return same ARC buffer, but this appears to be not the case. ZFS may create new ARC buffer while an old one is undergoing writeback. It means that we need to track mapping between file/offset and mmapped ARC buffer by ourselves. It's exactly what this patch is about. It adds new kind of cached page that holds pointers to an ARC buffer and stores them in new read_cache map. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
All pagecache functions run under vma_list_lock, so no additional locking is needed. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Unmap page as soon as possible instead of waiting for max_pages to accumulate. Will allow to free pages outside of vma_list_mutex in the feature. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Useful for debugging cache related problems. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
sys_read() error should be propagated to an application, but for now assert here instead of silent memory corruption. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
rmap will usually contain only one element so using unordered_set for it is a little bit heavy. boost::variant allow us to use direct pointer in common case and fall back to unordered_set only when more than one elements are added. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
- Apr 24, 2014
-
-
Gleb Natapov authored
Write permission should not be granted to ptes that has no write permission because they are COW, but currently there is no way to distinguish between write protection due to vma permission and write protection due to COW. Use bit reserved for software use in pte as a marker for COW ptes and check it during permission changes. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Gleb Natapov authored
Mapping page as dirty saves CPU from doing additional memory access to write out dirty bit when access will succeed. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com>
-
Nadav Har'El authored
Posix allows read() on directories in some filesystems. However, Linux always returns EISDIR in this case, so because we're emulating Linux, so should we, for every filesystem. All our filesystems except ZFS (e.g., ramfs) already return EISDIR when reading a directory, but ZFS doesn't, so this patch adds the missing check in ZFS. This patch is related to issue #94: the first step to fixing #94 is to return the right error when reading a directory. This patch also adds a test case, which can be compiled both on OSv and Linux, to verify they both have the same behavior. Before the patch, the test succeeded on Linux but failed on OSv when the directory is on ZFS. Instead of fixing zfs_read like I do in this patch, I could have also fixed sys_read() in vfs_syscalls.cc which is the top layer of all read() operations, and I could have done there (fp->f_dentry && fp->f_dentry->d_vnode->v_type == VDIR) { return EISDIR; } to cover all the filesystems. I decided not to do that, because all filesystems except ZFS already have this check, and because the lower layers like zfs_read() already have more natural access to d_vnode. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
txq_encap decrements txq.avail for each mbuf, so we need to increment same number here. Reviewed-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This patch implements a version of flock that only performs error checking. It assumes any valid operation succeeds, since we are using a single-process model. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
The jemalloc memory allocator will make intense use of MADV_DONTNEED to flush pages it is no longer using. Respect that advice. Let's keep returning -1 for the remaining cases so we don't fool anybody Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
MongoDB wants it. In general, I am providing the information that is easy to get, and ignoring the ones which are not - with the exception of process count, that seemed easy enough to implement. This is the kind of thing Mongo does with it: 2014-04-15T09:54:12.322+0000 [clientcursormon] mem (MB) res:670160 virt:25212 2014-04-15T09:54:12.323+0000 [clientcursormon] mapped (incl journal view):160 2014-04-15T09:54:12.324+0000 [clientcursormon] connections:0 Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This is one of the statistics that shows up in /proc/self/stat under Linux, but this is generally interesting for applications. Since we don't have kernel mode and userspace mode, it is very hard to differentiate between "time spent in userspace" and "kernel time spent on behalf of the process". Therefore, we will present system time as always 0. If we wanted, we could at least differentiate clearly osv-specific threads as system time, but there is no need to go through the trouble now. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
It will be used for procfs compatibility. Applications may want to know how much memory is potentially available through mmap mappings (not necessarily allocated). Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
The current code does not allow ioctls on unix sockets, but real world applications - nginx in my example will issue them. I am starting by implementing FIONBIO to set blocking/nonblocking mode. I notice here that we already have a flag in the read and write functions, and they seem to handle that flag: the internal pipe_buffer implementation does check and act on it. So let's just flip it. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Apr 23, 2014
-
-
Glauber Costa authored
First of all, we need the symbol. But so far it is fine returning EINVAL to everybody except for PR_SET_DUMPABLE. There are situations (see man prctl) in which this flag is automatically cleared. Some programs (nginx here) will then proceed to set it to make sure it is on. The flag itself is meanginless to us, since we are not producing per-thread coredumps, and when we do dump, we will dump regardless of this. But we need the call to suceed. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
empty implementation will suit us. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Nadav says: The clock() function is stubbed. We can easily implement it by calling clock_gettime(CLOCK_PROCESS_CPUTIME_ID). Closes #275. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Compile tst-mmap.so with "-z now", so that symbols are resolved at load time instead of when needed. The problem was that tst-mmap uses wait_until, which runs the functions sched::thread::wait() and sched::thread::stop_wait() while preemption is disabled. In about 1 in 10 runs of tst-mmap.so, this caused an assertion- failure crash (symbol resolution caused sleeping while preemption is disabled). Fixes #256. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Same as we currently have for AS, show that value as unlimited. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-