- Jan 21, 2014
-
-
Dmitry Fleytman authored
It is a bad practice to have DHCP discovery without timeout and retries. In case discovery packet gets lost boot stucks. Beside this there is an interesting phenomena on some systems. A few first DHCP discovery packets sent on boot get lost in some cases. This started to happen from time to time on my KVM system and almost every time on my Xen system after installing recent Fedora Core updates. Packet leaves VM's interface but never arrives to bridge interface. The packet itself built properly and arrives to DHCP server just fine after a few retransmissions. Most probably this phenomena is a bug (or limitation) in the current Linux bridge version so this patch is actually a work-around, but since in general case it is a good idea to have DHCP timeouts/retries it worth to have it anyway. Signed-off-by:
Dmitry Fleytman <dmitry@daynix.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 20, 2014
-
-
Avi Kivity authored
A waitqueue is an object on which multiple threads can wait; other threads can wake up either one or all waiting threads. A waitqueue is associated with an external mutex which the user must supply for both wait and wake operations. Waitqueues differ from condition variables in three respects: - waitqueues do not contain an internal mutex. This makes them smaller, and reduces lock acquisitions. On the other hand the waker must hold the associated mutex, whereas this is not required with condition variables. - waitqueues support sched::thread::wait_for() waitqueues support wait morphing and do not cause excess lock contention, even with wake_all(). Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
This adds a facility to wake a thread, but with the intention that it will acquire a certain lock after waking, and while the waker holds the lock. This is implemented using the regular wait morphing code (send_lock() and receive_lock()), but with additional mutual exclusion to allow regular wake()s in parallel. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Vlad Zolotarov authored
Remove the extra check on size just like the remark above implies. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Vlad Zolotarov <vladz@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 19, 2014
-
-
Takuya ASADA authored
Fix object::lookup_addr to lookup correct symbol. It should returns the nearest symbol which is s_addr < addr, but it compares opposite way. Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Jan 17, 2014
-
-
Dmitry Fleytman authored
This patch introduces support for MTU option as described in RFC2132, chapter 5.1. Interface MTU Option Amazon EC2 networking uses this option in some cases and it gives throughput improvement of about 250% on big instances with 10G networking. Netperf results for hi1.4xlarge instances, TCP_MAERTS test, OSv runs netserver: Send buffer size Throughput w/ patch (Mbps) Throughput w/o patch (Mbps) Improvement (%) 32 4912.29 1386.28 254 64 4832.01 1385.99 249 128 4835.09 1401.46 245 256 4746.41 1382.28 243 512 4849.04 1375.23 253 1024 4631.8 1356.69 241 2048 4859.59 1371.92 254 4096 4864.99 1383.67 252 8192 4627.07 1364.05 239 16384 4868.73 1366.48 256 32768 4822.69 1366.63 253 65536 4837.67 1353.87 257 Signed-off-by:
Dmitry Fleytman <dmitry@daynix.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Add procfs_maps() function to core/mmu.cc that returns all the VMAs formatted for Linux compatible "/proc/<pid>/maps" file. This will be called by the procfs filesystem. Limitations: * Shared mappings are not identified as such. * File-backed mmap offset, device, inode, and pathname are not reported. * Special region names such as [heap] and [stack] are not reported. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 16, 2014
-
-
Glauber Costa authored
This code should not be here. I am 100 % positive that I removed it in my testings, but I must have forgotten to git add the remove before I sent out the patch and it ended up in tree. This is simply a test leftover, it will have the effect of having threads to loop forever and never waiting because the initial value won't be 0. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 15, 2014
-
-
Eduardo Piva authored
Create a circular buffer that stored all debug messages accordingly. If the debug buffer is full, reuse it. A method called flush_debug_buffer is added to enable printing all messages to console if verbose mode is configured. The global variable debug_buffer_full is used to track if, when flushing debug buffer to console, we need to flush both buffer sides. If verbose boolean variable is set, all messages are printed to the console after beeing stored in the buffer. The size of the buffer is 50Kb, defined in debug.hh. A function debugf that received a variable list of arguments is defined so we can change some printf from boot sequence to debugf call. A different name is used to prevent C overload. Signed-off-by:
Eduardo Piva <efpiva@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 13, 2014
-
-
Glauber Costa authored
ZFS will perform some checks to determine if the current calling "process" is the reclaimer. Export the address of the reclaimer thread so that test can work. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Dmitry Fleytman authored
Signed-off-by:
Dmitry Fleytman <dmitry@daynix.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Amnon Heiman authored
The uname() function returns a fake Linux version number for application compatibility. Add a new osv::version() API that returns OSv version that can be used by the management code. Signed-off-by:
Amnon Heiman <amnon@cloudius-systems.com> [ penberg: cleanups ] Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Useful for calculating time during which thread was scheduled out because of wait(). Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 10, 2014
-
-
Glauber Costa authored
To make informed reclaim decisions, we need to have as much relevant information as possible about our reclaim targets. Specifically, it is useful to know how much memory is currently used by the JVM heap. The reasoning behind this is that if pressure is coming from the heap, ballooning will harm us, instead of helping us. Note: This is really just a first approximation. Ideally, total memory shouldn't matter, but rather memory delta since a last common event. But counting memory is the initial first step for both. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
To find out which vmas hold the Java heap, we will use a technique that is very close to ballooning (in the implementation, it is effectively the same) What we will do is we will insert a very small element (2 pages), and mark the vma where the object is present as containing the JVM heap. Due to the way the JVM allocates objects, that will end up in the young generation. As time passes, the object will move the same way the balloon moves, and every new vma that is seen will be marked as holding the JVM heap. That mechanism should work for every generational GC, which should encompass most of the JDK7 GCs (it not all). It shouldn't work with the G1GC, but that debuts at JDK8, and for that we can do something a lot simpler, namely: having the JVM to tell us in advance which map areas contain the heap. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
There are restrictions on when and how a shrinker can run. For instance, if we have no balloons inflated, there is nothing to deflate (the relaxer should, then, be deactivated). Or also, when the JVM fails to allocate memory for an extra balloon, it is pointless to keep trying (which would only lead to unnecessary spins) until *at least* the next garbage collection phase. I believe this behavior of activation / deactivation ought to be shrinker specific. The reclaiming framework will only provide the infrastructure to do so. In this patch, the JVM Balloon uses that to inform the reclaimer when it makes sense for the shrinker or relaxer to be called. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This patch implements the JVM balloon driver, that is responsible for borrowing memory from the JVM when OSv is short on memory, and giving it back when we are plentiful. It works by allocating a java byte array, and then unmapping a large page-aligned region inside it (as big as our size allows). This array is good to go until the GC decides to move us. When that happens, we need to carefuly emulate the memcpy fault and put things back in place. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
After carrying on some testing, I quickly realized that the old fixup-only solution I was attempting for the ballooning was not really flying. The reason for that, is that we would take a fault, figure out the fixup address, and return. If that wasn't a JVM fault, we were forced to take another fault (since we were already out of fault context). Once demand paging is a reality, the vast majority of the faults are for non balloon addresses, so we were effectively doubling our number of page faults for no reason. I have decided to go with the VMA (+fixups for instruction decoding) route after all. This is way more efficient and it seems to be working fine. The JVM vma is really close to the normal anonymous VMA. Except that it can never hold pages, and its fault handler calls into the JVM balloon facilities for decoding. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This patch introduces the memory reclaimer thread, which I hope to use to dispose of unused memory when pressure kicks in. "Pressure" right now is defined to be when we have only 20 % of total memory available. But that can be revisited. The way it will work is that each memory user that is able to dispose of its memory will register a shrinker, and the reclaimer will loop through them. However, the current "loop through all" only "works" because we have only one shrinker being registered. When other appears, we need better policies to drive how much to take, and from whom. Memory allocation will now wait if memory is not available, instead of aborting. The decision of aborting should belong to the reclaimer and no one else. We should never expect to have an unbounded and more importantly, all opaque, number of shrinkers like Linux does. We have control of who they are and how they behave, so I expect that we will be able to make a lot better decisions in the long run. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Following an early suggestion from Nadav, I am trying to use semaphores for the balloon instead of keeping our own queue. For that to work, I need to have a bit more functionality that may not belong in the main balloon class. Namely: 1) I need to query for the presence of waiters (and maybe in the future for the number of waiters) 2) I need a special post that would allow me to make sure that we are almost posting at most as much we're waiting for, and nothing more. This patch transforms the post method in an unlocked version (and exposes a trivial version that just locks around it) and make other changes necessary to allow subclassing Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This will be useful when we shrink, so we know how much memory we newly released for system consumption. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
operate so far operates in a page range and at the very most sets a success flag somewhere. I am here extending the API to allow it to return how much data it manipulated. So as an example, if we fault in 2Mb in an empty range, it will return 2 << 20. But if fault in the same 2Mb in a range that already contained some sparse 4k pages, we will return 2 << 20 - previous_pages. That will be useful to count memory usage in certain VMAs. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 08, 2014
-
-
Glauber Costa authored
There was a small bug in the free memory tracking code that I've only hit recently. I was wrong in assuming that in the first branch for huge page allocation, where we erase the entire range, we should account for N bytes. This assumption came from my - wrong - understanding that we would do that when the range is exactly N bytes. Looking at the code with fresh eyes, that is definitely not what happens. In my previous stress test we were hitting the second branch all the time, so this bug lived on. Turns out that we will delete the entire page range, which may be bigger than N, the allocation size. Therefore, the whole range should be discounted from our calculation. The remainder (bigger than N part) will be accounted for later when we reinsert it in the page range, in the same way it is for the second branch of this code. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 07, 2014
-
-
Nadav Har'El authored
In very early OSv history, the spinlock was used in the mutex's implementation so it made sense to put it in mutex.cc and mutex.h. But now that the spinlock is all that's left in mutex.cc (the real mutex is in lfmutex.cc), rename this file spinlock.cc. Also, move the spinlock definitions from <osv/mutex.h> to a new <osv/spinlock.h>, so if someone wants to make the grave mistake of using a spinlock - they will at least need to explicitly include this header file. Currently, the only remaining user of the spinlock is the console. Using a spinlock (and not a mutex) in the console allows printing debug messages while preemption is disabled. Arguably, this use-case is no longer important (we have tracepoints), so in the future we can consider dropping the spinlock completely. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This patch reserves some thread ids, that are kept unused. This is so we can construct values that reuse the thread public id and add it together with other information and still fit in 32-bits. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This will be used later to determine for how long have a thread been running. It can easily be updated right before we call ran_for(), reusing its interval parameter. Fixes #135 Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 03, 2014
-
-
Tomasz Grabiec authored
Useful if you want to know who created that large pile of threads. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 02, 2014
-
-
Gleb Natapov authored
Currently map_file() do three passes over vma memory in a worst case. First it maps memory with write permission while zeroing it, then it reads a file into memory and, if vma is read only, it does one more pass to fix memory permissions. Fix it by providing new specialization of fill_page class which builds iovec of all allocated memory and reads from a file using the iovec at the end of populate stage. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
In issue #145 I reported a crash during boot in start_early_threads(). I wasn't actually able to replicate this bug on master, but it happens quite frequently (e.g., on virtually every "make check" run) with some patches of mine that seem unrelated to this bug. The problem is that start_early_threads() (added in 63216e85) iterates on the threads in the thread list, and uses t->remote_thread_local_var() for each thread. This can only work if the thread has its TLS initialized, but unfortunately in thread's constructor we first added the new thread to the list, and only later called setup_tcb() (which allocates and initializes the TLS). If we're unlucky, start_early_threads() can find a thread on the list which still doesn't have its TLS allocated, so remote_thread_local_var() will crash. The simple fix is to switch the order of the construction: First set up the new thread's TLS, and only then add it to the list of threads. Fixes #145. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
In order to reuse the logic it needs to be extracted. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 01, 2014
-
-
Nadav Har'El authored
Instead of the old C-style file-operation function types and fo_*() functions, since recently we have methods of the "file" class. All our filesystem code is now C++, and can use these methods directly. So this patch drops the old types and functions, and uses the class methods instead. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
The code has bitrotted, and it doesn't support wait morphing. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Instead of free standing functions, use member functions, which are easier to work with. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
the _ prefix helps to distinguish between members and non-members; helps with the next patch. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Make the code more maintainable by removing the #ifdefs; it doesn't make sense to disable wait morphing. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
Each implementation of "struct file" needs to implement 8 different file operations. Most special file implementations, such as pipe, socketpair, epoll and timerfd, don't support many of these operations. We had in unsupported.h functions that can be reused for the unsupported operation, but this resulted in a lot of ugly boiler-plate code. Instead, this patch switches to a cleaner, more C++-like, method: It defines a new "file" subclass, called "special_file", which implements all file operations except close(), with a default implementation identical to the old unsupported.h implementations. The files of pipe(), socketpair(), timerfd() and epoll_create() now inherit from special_file, and only override the file operations they really want to implement. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 31, 2013
-
-
Gleb Natapov authored
Right now most of mmap related functions have the same bug related to vma locking: they validate mapping under vma lock then release the lock and do actual vma operation, but since between validation and operation the mapping can go away it is incorrect. This patch fixes this by doing validation and operation under the same lock. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Gleb Natapov authored
Currently offset calculation is incorrect. Fix it by tracking base address of a region and calculating an offset by subtracting base address from current mapping address. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Gleb Natapov authored
map_range() is a entry point into a page mapper, so make it impossible to instantiate map_level class directly by making all of its function and constructor private and declaring map_range() as a friend. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Gleb Natapov authored
One for initial call, another for recursion. This gets rid of default parameters, the need for std::integral_constant and makes code much more readable. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-