- Dec 18, 2013
-
-
Glauber Costa authored
This patch adds the basic of memory tracking, and exposes an interface to for that data to be collected. We basically start with all stats at zero, and as we add memory to the System, we bump it up and recalculate the watermarks (to avoid recomputing them all the time). When a page range comes up, it will be added as free memory. We operate based on what is currently sitting in the page ranges. This means that we are effectively ignoring memory that sit in pools for memory usage. I think it is a good assumption because it allow us to focus in the big picture, and leave the pools to be used as liquid currency. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 17, 2013
-
-
Avi Kivity authored
With net channels, poll() needs to wait not only on poll wakeups and the timeout, but also requests from network interfaces to flush net channels for polled sockets. In preparation for that, switch from bsd msleep() to native wait_until(). Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Raphael S. Carvalho authored
Reviewed-by:
Dor Laor <dor@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 16, 2013
-
-
Avi Kivity authored
bsd defines some m_ macros, for example m_flags, to save some typing. However if you have a variable of the same name in another header, for example m_flags, have fun trying to compile your code. Expand the code in place and eliminate the macros. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
This code seems obviously broken to me: tls_data().size there is no tls_data function, it is a struct. So this is creating temporary uninitialized struct and reads size field from it. What it meant instead is probably the TLS size, which is calculated by tls() function and returned in a tls_data structure. I am not able to actually test this change because I don't have any DSO which has R_X86_64_TPOFF64 relocations. Any idea how to test it? tls() is also broken, because it initializes file_size field instead of the size field. The file_size field was added at some point but this place wasn't updated. As it appears that tls() is not actually used anywhere, this patch gets rid of it. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Tomasz Grabiec authored
Dynamically loaded modules use __tls_get_addr() to locate thread local symbols. Symbol is identified by module index and offset in module's TLS area. Module index and offset are filled in by dynamic linker when DSO is loaded. TLS area for given DSO is allocated dynamically, on-demand. OSv keeps TLS areas in a vector indexed by module index, inside per-thread vector. TLS area of core module should be handled differently than that of dynamically loaded modules. The TLS offsets for thread local symbols defined in core module are known at link time and code inside core module can use these offsets directly. The offsets are relative to TCB pointer (fs register on x86). The problem was that __tls_get_addr() was treating core module as a dynamically loaded module and returned pointer inside dynamically allocated TLS area instead of a pointer inside core module's TLS. As a result code inside core module was reading value from different location than code inside DSO has written value to. The offending thread local varaible was __once_call. It was set by call_once() defined in DSO (inlined from a definition inside header) and read by __once_proxy() defined in core module. Fixes #125. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Tomasz Grabiec authored
In order to have unform naming, ulong is used in several places. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Tomasz Grabiec authored
Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Pekka Enberg authored
Move the x86-64 PTE definitions to a new arch specific arch-mmu.hh header file to make core/mmu.cc smaller and more portable. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 15, 2013
-
-
Nadav Har'El authored
thread::destroy() had a "FIXME" comment: // FIXME: we have a problem in case of a race between join() and the // thread's completion. Here we can see _joiner==0 and not notify // anyone, but at the same time join() decided to go to sleep (because // status is not yet status::terminated) and we'll never wake it. This is indeed a bug, which Glauber discovered was hanging the tst-threadcomplete.so test once in a while - the test sometimes hangs with one thread in the "terminated" state (waiting for someone to join it), and a second thread waiting in join() but missed the other thread's termination event. The solution works like this: join() uses a CAS to set itself as the _joiner. If it succeeded, it waits like before for the status to become "terminated". But if the CAS failed, it means a concurrent destroy() call beat us at the race, and we can just return from join(). destroy() checks (with a CAS) if _joiner was already set - if so we need to wake this thread just like in the original code. But if _joiner was not yet set, either there is no-one doing join(), or there's a concurrent join() call that will soon return (this is what the joiner does when it loses the CAS race). In this case, all we need to do is to set the status to "terminated" - and we must do it through a _detached_state we saved earlier, because if join() already returned the thread may already be deleted). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
Add a new lock, "rcu_read_lock_in_preempt_disabled", which is exactly like rcu_read_lock but assuming that preemption is already disabled. Because all our rcu_read_lock does is to disable preemption, the new lock type currently does absolutely nothing - but in some future implementation of RCU it might need to do something. We'll use the new lock type in the following patch, as an optimization over the regular rcu_read_lock. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 13, 2013
-
-
Raphael S. Carvalho authored
Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 11, 2013
-
-
Pekka Enberg authored
Simplify core/mmu.cc and make it more portable by moving the page fault handler to arch/x64/mmu.cc. There's more arch specific code in core/mmu.cc that should be also moved. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Make vma constructors more strongly typed by using the addr_range type. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Separate the common vma code to an abstract base class that's inherited by anon_vma and file_vma. Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
We have recently seen a problems where eventual page fault outside application would occur. I managed to track that down to my huge page failure patch, but wasn't really sure what was going on. Kudos for Raphael, then, that figured out that the problem happened when allocate_intemediate_level was called from split_huge_page. The problem here, is that in that case we do *not* enter allocate_intermediate_level with the pte emptied, and were previously expecting the write of the new pte to happen unconditionally. The compare_exchange broke it, because the exchange doesn't really happen. There are many ways to fix this issue, but the least confusing of them, given that there are other callers to this function that could potentially display this problem, is to do some deffensive programming and clearly separate the semantics of both types of callers. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Tested-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Once page_fault() checks that this is not a fast fixup (see safe_load()), we reach the page-fault slow path, which needs to allocate memory or even read from disk, and might sleep. If we ever get such a slow page-fault inside kernel code which has preemption or interrupts disabled, this is a serious bug, because the code in question thinks it cannot sleep. So this patch adds two assertions to verify this. The preemptable() assertion is easily triggered if stacks are demand-paged as explained in commit 41efdc1c (I have a patch to solve this, but it won't fit in the margin). However, I've also seen this assertion without demand-paged stacks, when running all tests together through testrunner.so. So I'm hoping these assertions will be helpful in hunting down some elusive bugs we still have. This patch adds a third use of the "0x200" constant (the nineth bit of the rflags register is the interrupt flag), so it replaces them by a new symbolic name, processor::rflags_if. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
We suddenly stop propagating the exception frame down the vma_fault path. There is no reason not to propagate it further, aside from the fact that currently there are no users. However, aside from the fact that it presents a more consistent frame passing, I intend to use it for the JVM balloon. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 10, 2013
-
-
Nadav Har'El authored
This patch fixes two bugs in shared-object finalization, i.e., running its static destructors before it is unloaded. The bugs were seen when osv::run()ing a test program using libboost_unit_test_framework-mt.so, which crashed after the test program finished. The two related bugs were: 1. We need to call the module's destructors (run_fini_funcs()) *before* removing it from the module list, otherwise the destructors will not be able to call functions from this module! (we got a symbol not found error in the destructor). 2. We need to unload the modules needed by this module *before* unloading this module, not after like was (implictly) done until now. This makes sense because of symmetry (during a module load, the needed modules are loaded after this one), but also practically: a needed module's destructor (in our case, boost unit test framework) might refer to objects in the needing module (in our case, the test program), so we cannot call the needed module's destructor after we've already unloaded the needing module. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
As ref() is now never called, we can remove the reference counter and make unref() unconditional. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
One problem with wake() is, if the thread that it is waking can cuncurrently exit, that it may touch freed memory belonging to the thread structure. Fix by separating the state that wake() touches into a detached_state structure, and free that using rcu. Add a thread_handle class that references only this detached state, and accesses it via rcu. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
rcu_read_lock disables preemption, but this is an implementation detail and users should not make use of it. Add preempt_lock_in_rcu that takes advantage of the implementation detail and does nothing, but allows users to explicitly disable preemption. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
When seeing this flag, pages fault in should not be filled with zeroes or any other patterns, and should rather be just left alone in whatever state we find them at. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 09, 2013
-
-
Glauber Costa authored
Addressing that FIXME, as part of my memory reclamation series. But this is ready to go already. The goal is to retry to serve the allocation if a huge page allocation fails, and fill the range with the 4k pages. The simplest and most robust way I've found to do that was to propagate the error up until we reach operate(). Being there, all we need to do is to re-walk the range with 4k pages instead of 2Mb. We could theoretically just bail out on huge pages and move hp_end, but, specially when we have reclaim, it is likely that one operation will fail while the upcoming ones may succeed. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> [ penberg: s/NULL/nullptr/ ] Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 08, 2013
-
-
Glauber Costa authored
I needed to call detach in a test code of mine, and this is isn't implemented. The code I wrote to use it may or may not stay in the end, but nevertheless, let's implement it. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Glauber Costa authored
set_cleanup is quite a complicated piece of code. It is very easy to get it to race with other thread destruction sites, which was made abundantly clear when we tried to implement pthread detach. This patch tries to make it easier, by restricting how and when set_cleanup can be called. The trick here is that currently, a thread may or may not have a cleanup function, and through a call to set_cleanup, our decision to cleanup may change. From this point on, set_cleanup will only tell us *how* to cleanup. If and when, is a decision that we will make ourselves. For instance, if a thread is block-local, the destructor will be called by the end of the block. In that case, the _cleanup function will be there anyhow: we'll just not call it. We're setting here a default cleanup function for all created threads, that just deletes the current thread object. Anything coming from pthread will try to override it by also deleting the pthread object. And again, it is important to node that they will set up those cleanup function unconditionally. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Glauber Costa authored
Linux uses a 32-bit integer for pid_t, so let's do it as well. This is because there are function in which we have to return our id back to the application. One application is gettid, that we already have in the tree. Theoretically, we could come up with a mapping between our 64-bit id and the Linux one, but since we have to maintain the mapping anyway, we might as well just use the Linux pids as our default IDs. The max size for that is 32-bit. It is not enough if we're just allocating pids by bumping the counter, but again, since we will have to maintain the bitmaps, 32-bit will allow us as much as 4 billion PIDs. avi: remove unneeded #include Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Glauber Costa authored
Right now we are taking a clock measure very early for cpu initialization. That forces an unnecessary dependency between sched and clock initializations. Since that lock is used to determine for how long the cpu has been running, we can initialize the runtime later, when we init the idle thread. Nothing should be running before it. After doing this, we can move the sched initialization a bit earlier. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 06, 2013
-
-
Asias He authored
Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Benoît Canet authored
The exact location of the stack end is not needed by java so move back this variable to restore the state to what was done before the mkfs.so/cpiod.so split. Signed-off-by:
Benoit Canet <benoit@irqsave.net> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 05, 2013
-
-
Avi Kivity authored
Some objects have DPTMOD64 relocations with the null symbol, presumably to set the value to 0 (it is too much trouble to write zero into the file during the link phase, apparently). Detect this condition and write the zero. Needed by JDK8. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
A list can be slow to search for an element if we have many threads. Even under normal load, the number of threads we span may not be classified as huge, but it is not tiny either. Change it to a map so we can implement functions that operate on a given thread without that much overhead - O(1) for the common case. Note that ideally we would use an unordered_set, that doesn't require an extra key. However, that would also mean that the key is implicit and set to be of type key_type&. Threads are not very lightweight to create for search purposes, so we go for a id-as-key approach. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Benoît Canet authored
This restore the original behavior of osv::run in place before the mkfs.so and cpiod.so split committed a day ago. Signed-off-by:
Benoit Canet <benoit@irqsave.net> Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 04, 2013
-
-
Nadav Har'El authored
When source code is compiled with -D_FORTIFY_SOURCE on Linux, various functions are sometimes replaced by __*_chk variants (e.g., __strcpy_chk) which can help avoid buffer overflows when the compiler knows the buffer's size during compilation. If we want to run source compiled on Linux with -D_FORTIFY_SOURCE (either deliberately or unintentionally - see issue #111), we need to implement these functions otherwise the program will crash because of a missing symbol. We already implement a bunch of _chk functions, but we are definitely missing some more. This patch implements 6 more _chk functions which are needed to run the "rogue" program (mentioned in issue #111) when compiled with -D_FORTIFY_SOURCE=1. Following the philosophy of our existing *_chk functions, we do not aim for either ultimate performance or iron-clad security for our implementation of these functions. If this becomes important, we should revisit all our *_chk functions. When compiled with -D_FORTIFY_SOURCE=2, rogue still doesn't work, but not because of a missing symbol, but because it fails reading the terminfo file for a yet unknown reason (a patch for that issue will be sent separately). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 03, 2013
-
-
Raphael S. Carvalho authored
Besides simplifying mmu::map_file interface, let's make it more similar to mmu::map_anon. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Benoît Canet authored
A ';' at the end of a parameter mark the end of a program's arguments list. The goal of this patch is to be able to split mkfs.so in to parts mkfs.so and cpiod.so. The patch uses a full spirit parser to escape "" and split commands around ';'. Signed-off-by:
Benoit Canet <benoit@irqsave.net> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 01, 2013
-
-
Nadav Har'El authored
Before this patch, OSv crashes or continuously reboots when given unknown command line paramters, e.g., scripts/run.py -c1 -e "--help --z a" With this patch, it says, as expected that the "--z" option is not recognized, and displays the list of known options: unrecognised option '--z' OSv options: --help show help text --trace arg tracepoints to enable --trace-backtrace log backtraces in the tracepoint log --leak start leak detector after boot --nomount don't mount the file system --noshutdown continue running after main() returns --env arg set Unix-like environment variable (putenv()) --cwd arg set current working directory Aborted The problem was that to parse the command line options, we used Boost, which throws an exception when an unrecognized option is seen. We need to catch this exception, and show a message accordingly. But before this patch, C++ exceptions did not work correctly during this stage of the boot process, because exceptions use elf::program(), and we only set it up later. So this patch moves the setup of the elf::program() object earlier in the boot, to the beginning of main_cont(). Now we'll be able to use C++ exceptions throughout main_cont(), not just in command line parsing. This patch also removes the unused "filesystem" paramter of elf::program(), rather than move the initializion of this empty object as well. Fixes #103. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-