- Jul 28, 2013
-
-
Avi Kivity authored
Needed for safe backtrace.
-
Avi Kivity authored
A backtrace() implementation which is safe for use in wierd contexts like interrupts. Needs -fno-omit-frame-pointer, but doesn't crash if some object is compiled without it.
-
- Jul 24, 2013
-
-
Avi Kivity authored
safe_load() and safe_store() can be used even if the pointer may fault.
-
- Jul 22, 2013
-
-
Avi Kivity authored
We want to use backtrace() (and therefore dl_allocate_phdr(), and therefore program::with_modules()) in interrupt disabled context; change with_modules() to avoid any allocations.
-
- Jul 21, 2013
-
-
Avi Kivity authored
Scheduler and allocator improvements.
-
Avi Kivity authored
Since the memory pools are backed by the page allocator, we need a fast page allocator, particularly for pools of large objects (with 1-2 objects per page, a page is exhausted very quickly). This patch adds a per-cpu cache of allocated pages. Pages are allocated from (and freed to) the cache without locking; the buffer is filled or drained when it is empty or full, taking the page range lock.
-
Avi Kivity authored
If we allocate and free just one object in an empty pool, we will continuously allocate a page, format it for the pool, then free it. This is wastefull, so allow the pool to keep one empty page. The page is kept at the back of the free list, so it won't get fragemented needlessly.
-
Avi Kivity authored
Instead of an array of 64 free lists, let dynamic_percpu<> manage the allocations for us. This reduces waste since we no longer require cache line alignment.
-
Avi Kivity authored
Instead of managing the counters manually, use the generic infrastructure.
-
Avi Kivity authored
dynamic_percpu<T> allocates and initializes an object of type T on all cpus (if a cpu is later hotplugged, it will also get an instance). Unlike ordinary percpu variables, dynamic_percpu objects can be used in a dynamic scope, that is, in objects that are not in static scope (one the stack or heap).
-
Avi Kivity authored
With dynamic percpu allocations, the allocator won't be available until the first cpu is created. This creates a circular dependency, since the first cpu itself needs to be allocated. Use a simple and wasteful allocator in that time until we're ready. Objects allocated by the simple allocator are marked by having a page offset of 8.
-
- Jul 20, 2013
-
-
Pekka Enberg authored
-
- Jul 19, 2013
-
-
Pekka Enberg authored
Add 'ant' and 'gcc-c++' packages to build prerequisites. They are needed to build OSv on newly installed Fedora 19.
-
- Jul 18, 2013
-
-
Avi Kivity authored
tls is needed for per-cpu storage, so initialize it before the rest of the scheduler.
-
Avi Kivity authored
Make the early allocator available earlier to support the dynamic per-cpu allocator.
-
Avi Kivity authored
Depending on the current thread causes a circular dependency with later patches. Use a per-thread variable instead, which is maintained on migrations similarly to percpu_base. A small speedup is a nice side effect.
-
Avi Kivity authored
-
Avi Kivity authored
Avoid a #include loop with later patches.
-
Avi Kivity authored
A preemption is expensive, both in the cycles spent in the scheduler, and in cache lines being evicted by the new thread. Penalize threads that cause preemption by adding a small preemption tax to their vruntime; this will decrease their relative priority. Threads that sleep a long time will be relatively unaffected and retain low latency; threads that wake up very often, such us those in a wait/wake loop with another thread, will be penalized a lot and avoid excessive wakes.
-
Avi Kivity authored
With the current implementation, a sleeping thread can accrue a large vruntime backlog by sleeping. This will result in this thread preempting anything that moves for a while. The borrow mechanism attempts to correct for this, but isn't working well. Reduce the backlog by limiting the vruntime difference to a single round trip of all currently queued threads. The borrow mechanism is removed. This is similar to Guy's patch, except vruntime only moves forward, so it is capped only in the negative (minimum) direction, not forward. It is also similar to Linux cfs.
-
Avi Kivity authored
Currently we initialize a new thread's vruntime to the clock time. However, as only acquire vruntime as they run, while the clock always runs, this is unreasonably high. Initialize it instead to the parent thread's vruntime. Since the parent thread is running now, its vruntime represents fairly high priority; we may want to tune that later. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
-
Dor Laor authored
-
Nadav Har'El authored
This patch adds to tst-condvar two benchmark for measuring condvar::wake_all() on a condvar that nobody is waiting on. The first benchmark does these wakes from a single thread, measuring 26ns before commit 3509b19b, and only 3ns after it. The second benchmark does wake_all() loops from two threads on two different CPUs. Before the aforementioned commit, this frequently involved a contented mutex and context switches, with as much as 30,000 ns delay. After that commit, this benchmark measures 3ns, the same as the single-threaded benchmark.
-
Nadav Har'El authored
Previously, condvar_wake_one()/all() took the condvar's internal lock before testing if anyone is waiting; A condvar_wake when nobody was waiting was mutex_lock()+mutex_unlock() time (on my machine, 26 ns) when there is no contention, but much much higher (involving a context switch) when several CPUs are trying condvar_wake concurrently. In this patch, we first test if the queue head is null before acquiring the lock, and only acquire the lock if it isn't. Now the condvar_wake-on-an-empty-queue micro-benchmark (see next patch) takes less than 3ns - regardless of how many CPUs are doing it concurrently. Note that the queue head we test is NOT atomic, and we do not use any memory fences. If we read the queue head and see there 0, it is safe to decide nobody is waiting and do nothing. But if we read the queue head and see != 0, we can't do anything with the value we read - it might be only half-set (if the pointer is not atomic on this architecture) or be set but the value it points to is not (we didn't use a memory fence to enforce any ordering). So if we see the head is != 0, we need to acquire the lock (which also imposes the required memory visibility and ordering) and try again.
-
- Jul 17, 2013
-
-
Dor Laor authored
No code change.
-
Dor Laor authored
Instead of cancelling block requests due to no space on the ring that lead to corruption of the upper layer, block until there is space.
-
Christoph Hellwig authored
-
Christoph Hellwig authored
-
Christoph Hellwig authored
-
Christoph Hellwig authored
-
Christoph Hellwig authored
-
Christoph Hellwig authored
-
Christoph Hellwig authored
-
Christoph Hellwig authored
-
Christoph Hellwig authored
This operation is very different from FreeBSD and Solaris because our VFS uses create just for the actual entry creation and not for opening the file, similar to how Linux splits the operation. A lot of code that is already handled in vnop_open or the VFS thus can go away.
-
Christoph Hellwig authored
This is required for write operations. For now we don't actually replay it yet, as that requires a lot more hairy OS-specific code.
-
Christoph Hellwig authored
-
Christoph Hellwig authored
This one will only show up in non-debug builds for some reason.
-
Christoph Hellwig authored
-