An error occurred while fetching folder content.
Nadav Har'El
authored
This patch fixes the following bug, of CLI & memcached on two vcpus crashing on startup. The cause of the crash is this: Java is running two threads. One loads a new shared library (in this example, libnio.so), and the second thread just running normally and runs some function it hasn't run before (pthread_cond_destroy()). When our on-demand resolver code tries to resolve this function name, it iterates over the module list, and sees libnio.so, but this object hasn't been completely set up yet (we put it in the list first - see program::add_object()), so looking up a symbol in it crashes. Why hasn't this problem been noticed before the recent link-order change? Because before that change, the half-loaded library was always last in the list (OSV itself was the first), so existing symbols were always found before reaching the partially-set-up object. Now OSV, with many symbols, is last, and the half-set-up object is in the middle, so the problem is common. But it also could happen previously, if we had unresolved symbols (e.g., weak symbols), but these were probably rare enough for the bug not to happen in practice. The fix in this patch is "hacky", because I wanted to avoid restructuring the whole code. The problem is that the functions called in add_object() (including relocate_rela(), nested add_object(), etc.) all assume that they can look up symbols in the being-set-up object, while we don't want these objects to be visible for other threads. So we do exactly this - each object gets a "visiblity" field. If "visibility" is 0, all threads can use this object, but if visibility is a thread pointer, only this thread searches in this object. So add_object() starts the object with visibility set to its thread, and only when add_object() is done, it sets the visibility to 0 so all threads can see it. While this solves the common bug, not that this patch still leaves a small room for SMP bugs, because it doesn't add locking to _modules, so a lookup during an add_object() can see a broken vector for a short duration. We should fix this remaining problem later, using RCU.
Name | Last commit | Last update |
---|---|---|
.. |