- Jun 14, 2013
-
-
Glauber Costa authored
The algorithm we follow for memory discovery is quite simple: iterate over the E820h map, and for every type 1 (== RAM) memory, we increment total size, and map it linearly to our address space mappings. That breaks on xen, however. I have no idea what is seabios doing for KVM, but xen's hvmloader will put most of the ACPI tables at a reserved region around physical address 0xfc000000. When we try to parse the ACPI tables, we will reach an unmapped portion of the address space and fault (BTW, those faults are really hard to debug, we're triple faulting directly, at least in my setup) Luckily, the acpi driver code is prepared for such scenarios, and before using any of that memory it will call map and unmap functions - we just don't implement it. This patch implements the necessary map function - and while we are at it, its unmap counterpart. This is all far away from being performance critical, so I am being as dump as possible and just servicing the request without tracking any previous state. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
- Jun 13, 2013
-
-
Avi Kivity authored
Make the function tracer work again.
-
Avi Kivity authored
Set it to have the largest possible vruntime, so it is only ever picked up from the queue if there are no other threads on it. This avoids a pointless context switch to the idle thread, where it wakes, sees another thread, and switches out again.
-
Nadav Har'El authored
usleep() was scrubbed out of POSIX in 2008, and not used in Java, but it does exist in glibc and is damn easy to use compared to its newer relative, nanosleep, so I want to use it in a test.
-
Nadav Har'El authored
As Avi pointed out, shutdown_af_local() did read-modify-write to f->f_flags without locking. Add the missing locks.
-
Nadav Har'El authored
Remove things already done.
-
- Jun 12, 2013
-
-
Avi Kivity authored
The functions that are used in function tracing must not themselves be traced, lest we recurse endlessly. Rather than marking them all with no_instrument_function, keep a nesting counter and check if we're nested. This way only the functions used for the test must not be traced.
-
Avi Kivity authored
Seeing a trace from an interrupt incurred while tracing can be confusing, so disable them.
-
Avi Kivity authored
In the tracer, we don't want interrupt manipulation to cause recursion, so provide uninstrumented versions of select functions.
-
Nadav Har'El authored
This patch optionally enables, at compile-time, OSV to use the lock-free mutex instead of the spin-lock-based mutex. To use the lock-free mutex, change the line "#undef LOCKFREE_MUTEX" in include/osv/mutex.h to "#define LOCKFREE_MUTEX". LOCKFREE_MUTEX is currently disabled by default, awaiting a few more tests, but at this point I'm happy to say that beyond one known unrelated bug (see details below), it seems the lock-free mutex is fairly stable, and survives all tests and benchmarks I threw at it. The remaining known bug involves a thread destruction race between complete() and join(): complete wake()s the joiner thread, which in rare cases can really quickly delete the thread's stack, before wake() returns, causing a crash on return from wake(). This bug is really unrelated to the lock-free mutex, but for some unknown reason I can only reproduce it with the lock-free mutex on the SPECjvm2008 "sunflow" benchmark. To make lockfree::mutex our default mutex, this patch does the following when LOCKFREE_MUTEX is defined: 1. In core/mutex.cc, #ifndef away out the old mutex code, leaving the spinlock code in case someone wants to use it directly. 2. In include/osv/mutex.h, do different things in C++ and C (remember that lockfree::mutex is a C++ class, and cannot be used directly from C): * In C++, simply make mutex and mutex_t aliases for lockfree::mutex. * In C, make struct mutex and mutex_t an opaque 40-byte structure (in C++ compilation, we verify that this 40 is indeed the C++ class's length), and define the operations on it. 3. In libc/pthread.cc, if LOCKFREE_MUTEX, unfortunately the new mutex will not fit into pthread_mutex_t, and neither will condvar fit now into pthread_cond_t. So use a lazily allocated mutex or condvar, using the lazy_indirect<> template.
-
Glauber Costa authored
I have been commenting in and out lines in this script to choose the right underlying hypervisor to run. So here is the automated version of it. I haven't choosed the letters h or y because they usually denote help and yes, respectively. Also not a kvm/no-kvm boolean because very soon we will like to include xen. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
Glauber Costa authored
Now that we can actually see the debug message, print our name on it. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
Glauber Costa authored
We can use a very simple outb instruction to write data to the serial port in case we don't have a console implementation yet. We don't need to be fancy, and even limited functionality will already allow us to print messages early, (specially debug). Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
Glauber Costa authored
We could benefit from the console being ready a bit earlier. The only dependency that I see to it are the interrupts that need to be working. So as soon as we initialize the ioapic, we should be able to initialize the console. This is not the end of story: we still need an even earlier console to debug the driver initialization functions, and I was inclined to just leave console_init where it is, for now. But additionally, I felt that loader is really a more appropriate place for that than vfs_init... So I propose we switch. In the mean time, it might help debug things that happen between ioapic init and the old vfs_init (mem initialization, smp bring up, etc) Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
- Jun 11, 2013
-
-
Nadav Har'El authored
Sorry, forgot one hunk in "git add -p" :(
-
Avi Kivity authored
cv_timedwait() has a relative timeout expressed in ticks (microseconds), while condvar_wait() has an absolute timeout expressed in nanoseconds. Replace the 1:1 macro with a function that does the correct translation.
-
Nadav Har'El authored
Updated test with the new API. Sorry about forgetting to commit it earlier.
-
Glauber Costa authored
-
Nadav Har'El authored
Changed lockfree::queue_mpsc (lock-free multiple-producer single-consumer queue) pop() API. Instead of returning separately the popped value (type T) and a boolean success/failed, now return a pointer to the linked_item<T> originally pushed(), or nullptr on failure. The new pop() API is slightly more awkward (instead of using the returned value directly, you need to take it's field "value") but has an important new feature: It gives you not just the value, but also the address where this value is stored. So it is now possible to change value in its original structure. This allows us to implement our (by now) traditional waitqueue technique: The values on the queue are thread pointers, and the popper, before waking up a thread, sets the thread pointer to zero - this way the woken up thread knows it isn't a spurious wakeup. A followup patch will use this capability to cleanup lockfree::mutex not to abuse the "owner" field as a notifier of non-spurious wakeups. After that patch, "owner" will be used only for implementing recursive mutex, and will not be part of the wakeup protocol.
-
Nadav Har'El authored
The way "owner" and "depth" were used in lockfree::mutex was messy. Ideally, neither should be needed if we implemented a non-recursive mutex, but following the design of ::mutex, we (re)used "owner" also as a marker that a thread was waken to have the lock (and it's not a spurious wake). After this patch, owner and depth are used in lockfree::mutex *only* for implementing a recursive mutex, and building a non-recursive mutex should be as simple as dropping these two variables. In more detail: 1. "owner" is no longer used to tell a woken up thread that the wake wasn't spurious. Instead, zero the thread in the wait-record. This is a familiar idiom, which we already used a few times before. 2. "depth" isn't an atomic variable, so it should only be read by the same thread which set it, and this wasn't the case previously. Now, depth is only ever written (set to 1, incremented or decremented) or read by the lock-holding thread - and not the lock releasing thread. 3. "owner" needs to be an atomic variable - a non-lock-holding thread needs to read it and recognize it isn't holding the lock - but it doesn't need any special memory ordering with other variables, so should always be accessed with "relaxed" memory ordering.
-
Nadav Har'El authored
Fixed a very rare hang in sched::thread::join(): thread::complete() included the following code: _status.store(status::terminated); if (_joiner) { _joiner->wake(); } If we are preempted right after setting status to "terminated", but before calling wake(), this thread will never be scheduled again (it will remain in the terminated status forever), and will never call wake() - so the join()ing thread may just wait forever. I saw this happening in a test case that started and joined millions of threads, and eventually the join() hangs. The solution is to enclose the above lines with preempt_disable()/ preempt_enable().
-
Nadav Har'El authored
wake() normally calls schedule(), but doesn't do so if preemption is disabled. So we should mark need_reschedule = true, to suggest that schedule() can be called when preemption is later enabled.
-
Avi Kivity authored
Due to the need to handle the x64 red zone, we use a separate stack for exceptions via the IST mechanism. This means that a nested exception will reuse the parent exception's stack, corrupting it. It is usually very hard to figure out the root cause when this happens. Prevent this by setting up a separate stack for nested exceptions, and aborting immediately if a nested exception happens.
-
- Jun 10, 2013
-
-
Avi Kivity authored
Now that processor::features() is initialized early enough, we can use it in ifunc dispatchers.
-
Avi Kivity authored
cpuid is useful for ifunc-dispatched functions (like memcpy), so we can select the correct function based on available processor features. Make processor::features available early to support this. We use a static function-local variable to ensure it is initialized early enough.
-
Avi Kivity authored
Optimized memcpy() using rep mobsb
-
Avi Kivity authored
If the cpu supports "Enhanced REP MOVS / STOS" (ERMS), use an rep movsb instruction to implement memcpy. This speeds up copies significantly, especially large misaligned ones.
-
Avi Kivity authored
Used for implementing support for indirect functions referenced from shared libraries.
-
Christoph Hellwig authored
-
Christoph Hellwig authored
-
Christoph Hellwig authored
-
Christoph Hellwig authored
-
Christoph Hellwig authored
-
Glauber Costa authored
The compiler rightfully complains that we use the symbol at sizeof instead of its dereferrence. Fix it. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
-
- Jun 09, 2013
-
-
Nadav Har'El authored
-
Nadav Har'El authored
Recently Guy fixed abort() so it will *really* not infinitely recurse trying to print a message, using a lock, causing a new abort, ad infinitum. Unfortunately, that didn't fix one remaining case: DUMMY_HANDLER (see exceptions.cc) used the idiom debug(....); abort(); which can again cause infinite recursion - a #GP calls debug() which causes a new #GP, which again calls debug, etc. Instead of the above broken idiom, created a new function abort(msg), which is just like the familiar abort(), just changes the "Aborted" message to some other message (a constant string). Like abort(), the new variant abort(msg) will only print the message once even if called recursively - and uses a lockless version of debug(). Note that the new abort(msg) is a C++-only API. C will only see the abort(void) which is extern "C". At first I wanted to call the new function panic(msg) and export it to C, but gave when I saw the name panic() was already in use in a bunch of BSD code.
-
Nadav Har'El authored
Before this patch tracepoints required manual tracepoint numbers: tracepoint<17, unsigned int> trace_event1("event1", "%d"); tracepoint<18> trace_event2("event2", ""); While the numbers only had to be unique in the file, so it wasn't hard to achieve, this was still tedious and verbose. This patch adds an additional, shorter, tracepoint syntax, not requiring those numbers and in general less repetitive and clearer: TRACEPOINT(trace_event1, "%d", unsigned int); TRACEPOINT(trace_event2, ""); The first parameter is the name of the generated tracepoint function - it's convenient to see it so that grep can find it, for example. The name of the tracepoint itself (shown in "osv trace") is this string without the prefix trace_ (if the name of the tracepoint function, for some reason, doesn't start with trace_, the full function name is used as the tracepoint name).
-
Guy Zana authored
given the scheduler state, wake() sometimes rescheduled the dispatcher thread immidiately, and then it blocked on the mutex that is still held by the caller of _callout_stop_safe_locked(). this patch does wake() outside of the lock to eliminated these spurious context switches.
-
Guy Zana authored
-
Guy Zana authored
-