Commits · 15a32ac8b1053d24794c8b181bda6fac203bb331 · Verlässliche Systemsoftware / projects / osv

Nov 25, 2013

Fix possible deadlock in condvar · 15a32ac8

Nadav Har'El authored 11 years ago

When a condvar's timeout and wakeup race, we wait for the concurrent
wakeup to complete, so it won't crash. We did this wr.wait() with
the condvar's internal mutex (m) locked, which was fine when this code
was written; But now that we have wait morphing, wr.wait() waits not
just for the wakeup to complete, but also for the user_mutex to become
available. With m locked and us waiting for user_mutex, we're now in
deadlock territory - because a common idiom of using a condvar is to
do the locks in opposite order: lock user_mutex first and then use the
condvar, which locks m.

I can't think of an easy way to actually demonstrate this deadlock,
short of having a locked condvar_wait timeout racing with condvar_wake_one
racing and then an additional locked condvar operation coming in
concurrently, but I don't have a test case demonstrating this.
I am hoping it will fix the lockups that Pekka is seeing in his
Cassandra tests (which are the reason I looked for possible condvar
deadlocks in the first place).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Tested-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

15a32ac8

sched: delay initialization of early threads · d91d7799

Glauber Costa authored 11 years ago


The problem with sleep, is that we can initialize early threads before the
cpu itself is initialized. If we note what goes on in init_on_cpu, it should
become clear:

void cpu::init_on_cpu()
{
    arch.init_on_cpu();
    clock_event->setup_on_cpu();
}

When we finally initialize the clock_event, it can get lost if we already have
pending timers of any kind - which we may, if we have early threads being
start()ed before that. I have played with many potential solutions, but in the
end, I think the most sensible thing to do is to delay initialization of early
threads to the point when we are first idle. That is the best way to guarantee
that everything will be properly initialized and running.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

d91d7799

Nov 22, 2013

README.md: Fix typos · f336eaa8

ufokaradagli@gmail.com authored 11 years ago


Fixed a couple of spelling mistakes in README.md

Signed-off-by: Omer Karadagli <ufokaradagli@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f336eaa8

epoll: remove epoll registrations after close() · c8d212eb

Avi Kivity authored 11 years ago


To prevent leaks when a file is close()d without an EPOLL_CTL_DEL,
record epoll registrations in the file structure and remove them
when the file is destroyed.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c8d212eb

tst-tcp: close socket before retaking the lock · 0f085966

Avi Kivity authored 11 years ago


Avoid possible blocking.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0f085966

tst-tcp: clean up better after threads · 4ed4e4da

Avi Kivity authored 11 years ago


Make sure to wait until the running thread count drops to zero
before destroying things.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4ed4e4da

vfs: prevent double initialization of file::f_lock · b38cd33b

Avi Kivity authored 11 years ago

Since it's initialized with the constructor, the mutex is already initialized.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b38cd33b

vfs: give 'file' a proper destructor · cfb1d736

Avi Kivity authored 11 years ago


use file::operator delete to ensure it is reclaimed via rcu,
and let the rest of the cleanup happen via the destructor.

This allows us to add other members to file, and let the standard
construction/destruction sequence take place.

Note the constructor is already used (falloc_noinstall()).

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

cfb1d736

epoll: switch epoll_object to using file pointers · 7185874b

Avi Kivity authored 11 years ago


Holding filerefs causes close() to be delayed indefinitly in case
the user "forgets" to EPOLL_CTL_DEL the file before close().

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7185874b

test.py: add tst-except.so · 2f2b5a4f
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
2f2b5a4f

tst-except: add test for _Unwind_Resume · 9ce64a11

Avi Kivity authored 11 years ago


Previously, _Unwind_Resume wasn't available, so functions that handled an
exception implicitly (by running a few destructors) crashed.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9ce64a11

build: bring back libgcc_s.so · be565320

Avi Kivity authored 11 years ago


Commit c9e61d4a ("build: link libstdc++, libgcc_s only once") threw
away libgcc_s.so since we already link with libgcc.a and libgcc_eh.a, which
provide the same symbols, and since having the same symbols in multiple
objects violates certain C++ rules.

However, libgcc_eh.a provides certain symbols only as local symbols, which
means they aren't available to the payload.  This manifests itself in errors
such as failing to find _Unwind_Resume if an exception is thrown.

(This is likely due to the requirement that mulitple objects linked with
libgcc_eh.a work together, which also brings some confidence that the ODR
violations of having two versions of the library won't bite us).

Fix the problem by adding libgcc_s.so to the filesystem and allowing
the payload to link to it.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

be565320

Nov 21, 2013

Replace numbers in prio.hh by automatically defined numbers · 147de06c

Nadav Har'El authored 11 years ago

prio.hh defines various initialization priorities. The actual numbers
don't matter, just the order between them. But when we add too many
priorities between existing ones, we may hit a need to renumber. This
is plain ugly, and reminds me of Basic programming ;-)

So this patch switches to an enum (enum class, actually).
We now just have a list of priority names in order, with no numbers.

It would have been straightforward, if it weren't for a bug in GCC
(see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59211

) where the
"init_priority" attribute doesn't accept the enum (while the "constructor"
attribute does). Luckily, a simple workaround - explicitly casting to
int - works.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

147de06c

tests: add tcp connection test · be940896

Avi Kivity authored 11 years ago

The test creates and destroys threads, each of which creates a random number
of connections, each transferring a random number of bytes to an echo server.

This is used to stress the tcp/ip stack.

The test is portable, and builds on the host with the command

g++ -O2 -g3 -pthread -std=gnu++11 -lboost_program_options -lboost_system tests/tst-tcp.cc

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

be940896

epoll: switch to file based implementation using do_poll() · e771caab

Avi Kivity authored 11 years ago


Instead of using file descriptors and poll(), use do_poll().  This allows
us to get rid of user supplied fds early, which is important as fd lifetime
is decoupled from epoll lifetime.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e771caab

poll: refactor poll() in terms of file pointers, not file descriptors · 0b68144e

Avi Kivity authored 11 years ago

With epoll(), the lifetime of an ongoing poll may be longer than the
lifetime of a file descriptor; if an fd is close()d then we expect it
to be silently removed from the epoll.

With the current implementation of epoll(), which just calls poll(), this is
impossible to do correctly since poll() is implemented in terms of file
descriptor.

Add an intermedite do_poll() that works on file pointers. This allows a
refactored epoll() to convert file descriptors to file pointers just once,
and then a close()d and re-open()ed descriptor can be added without a problem.

As a side effect, a lot of atomic operations (fget() and fdrop()) are saved.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0b68144e

vfs: Make error messages when mounting rootfs more verbose. · f0d96816

Raphael S. Carvalho authored 11 years ago


Provide a better error message instead of simply printing the error
codes.

Before:

  failed to create /dev, error = 17

After:

  failed to create /dev, error = File exists

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f0d96816

scripts: Change mkzfs.py to call run.py with unsafe cache. · 56cdfc49

Raphael S. Carvalho authored 11 years ago

This patch adds the unsafe-cache option to run.py and changes mkzfs.py
to always call run.py with this option enabled.
Thus, we're doing this change just for the build run (Suggested by Nadav Har'El).
The main goal is to boost the time it takes to complete the entire process.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

56cdfc49

mmu: allow early tlb flushes · 9dc7cc25

Glauber Costa authored 11 years ago

TLB flushes cannot happen early, because we will try to send IPIs around before
they are ready to go. Now, the funny thing is *why* that happen:

We test for the size of the cpu vector to be 1. But before the cpus are
initialized, that vector is empty. Because there is a limit on how soon we can
initialize a cpu(), let's change the test to also acount for an empty vector.
It should be obvious and clear that when we have an empty vector, only one cpu
is present.

I have triggered this in the context of my last patchset for threads. My test
script was set to -c1 (sorry about that), and as soon as I tested it with SMP
it exploded here.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9dc7cc25

Nov 20, 2013

sched: start and initialize early threads · 63216e85

Glauber Costa authored 11 years ago

We may have threads that were initialized and started very early, before
sched::init() took place. We can easily identify such threads: they are all
threads that are in the thread list so far with the exception of the main
thread.

For those, we finish their initialization so they are now in a safe state.
Also, some of them may have been started already. Since we cannot really start
anything before the main thread, they were put in as special state called
"prestarted". Every thread found in this state is started at this moment.

Note how this code needs to run in the main thread itself, since we depend on
initialization that will only happen inside switch_to_first to properly
function those procedures.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

63216e85

sched: add a new thread state, prestarted · e005a738

Glauber Costa authored 11 years ago

It may be that a thread that is initialized early is also started early. We need to
somehow mark that thread as already started, so we can start it for real later when
the scheduler is ready to go. We will do this by adding an extra state, prestarted.

Later on, we will take action to start those threads properly.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e005a738

sched: avoid dereferencing current() fields on thread creation · d6ea366f

Glauber Costa authored 11 years ago

Threads that are created very early will see this field with a NULL address. We
should test against it before dereferencing. If we found current not to be
available, we skip some steps or use default values.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

d6ea366f

sched: start thread list early · 9381f4b6

Glauber Costa authored 11 years ago


Since the thread list does not depend on nothing but the memory allocator,
allocate it as early as we can.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

9381f4b6

Power-off, instead of halt, when loader can't run command · 3615b885

Nadav Har'El authored 11 years ago


Currently, when we try to run an invalid shared object (e.g, run.py -e aaa)
loader.cc calls abort(). This patch changes it to use osv::poweroff().

This is useful, for example, to measure how much time our boot/poweroff
cycle takes, without running any payload, by doing

	time scripts/run.py -e aaa

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3615b885

mmu: move huge_page_size to mmu.hh · e5cf0f1d

Glauber Costa authored 11 years ago

No reason at all for page_size to be in mmu.hh but huge_page_size in mmu.cc.
Move it, so we can also use huge_page_size outside the mmu.cc scope.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e5cf0f1d

Nov 19, 2013

Explicitly request alignment when allocating per-cpu area · e9549266

Nadav Har'El authored 11 years ago

Commit ed808267 used malloc() to allocate
the per-cpu variables area. As Avi pointed out, we need this area to be
aligned like the strictest alignment of any per-cpu variable. The strictest
alignment we need is probably CACHELINE_ALIGNED (64 bytes), but it's easiest
just to require 4096-byte alignment, and this is what the code prior to the
above patch did.

The above commit worked because luckily enough, our malloc() does return
page-aligned memory for large allocations. But it's possible that this will
not be the case in the future. So this patch switches to use aligned_alloc()
instead, explicitly requesting a 4096-byte-aligned block of memory.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e9549266

Partial implementation of aligned_alloc() and posix_memaligned(). · 7e06bd33

Nadav Har'El authored 11 years ago

This patch provides a trivial implementation of two similar functions for
allocating aligned memory blocks: aligned_alloc() (from the C11 standard)
and posix_memaligned() (from POSIX). Memory returned by either function
can be freed with the ordinary free().

This trivial implementation just calls malloc(), and assert()s that it got
the desired alignment, aborting if not. In many cases this is good enough
because malloc() already returns 4096-byte-aligned blocks for large
allocations. In particular we'll use these functions in the next patch for
allocating the large page-aligned per-cpu areas.

If we ever fail on this assertion, we can replace these functions by a
full implementation (see issue #87).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

7e06bd33

Add negotiation flag check for FLUSH · a0ce1b50

Takuya ASADA authored 11 years ago

Some older version of qemu-nbd cuases error exit with nbd_client.py.
(Look at: https://groups.google.com/d/msg/osv-dev/EW5BtNFNfzs/I33BeFXg2f0J

)
This is because nbd_client.py is sending FLUSH command unconditionally, but it's extended feature, nbd client should check nbd server has the capability to accept FLUSH.
nbd server sends capability flags on negotiation stage, it sends HAS_FLAGS(0x1) and SEND_FLUSH(0x4) when server supports FLUSH.

This patch adds these capability check, and skips to send FLUSH if server doesn't support it.

Signed-off-by: Takuya ASADA <syuu@dokukino.com>
Reviewed-by: Benoît Canet <benoit.canet@irqsave.net>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

a0ce1b50

percpu: Reduce size of .percpu section · ed808267

Nadav Har'El authored 11 years ago


This patch reduces the size of the .percpu section 64-fold from about
5 MB to 70 KB, and solves issue #95.

The ".percpu" section is part of the .data section of our executable
(loader-stripped.elf). In our 15 MB executable, roughly 7 MB is text
(code), and 7 MB is data, and out of that, a whopping 5 MB is the
".percpu" section. The executable is read in real mode, and this is
especially slow on Amazon EC2, hence our wish to make the executable
as small as possible.

The percpu section starts with all the PERCPU variables defined in the
program. We have about 70 KB of those, and believe it or not, most of
this 70 KB is just a single variable, the 65K dynamic_percpu_buffer
(see percpu.cc).

But then, we need a copy of these variables for each CPU. The unpatched
code duplicated this 70KB section 64 times in the executable file (!),
and then used these memory locations for up-to-64 cpus. But there is
no reason to duplicate this data in the executable! All we need to do
is to dynamically allocate a copy of this section for each CPU, and
this is what this patch does.

This patch removes about 5 MB from our executable: After this patch,
our loader-stripped.elf is just 9.7 MB, and its data section's size is
just 2.8 MB.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ed808267

vfs: Introduce vop_eperm · f1ee72ed

Raphael S. Carvalho authored 11 years ago

vop_eperm allows more code reuse (suggested by Glauber Costa)

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f1ee72ed

Nov 18, 2013

Reformat Java code · 3587c3f1

Pekka Enberg authored 11 years ago


Use four spaces for indentation and use UNIX linefeeds.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3587c3f1

Nov 15, 2013

scripts/test.py: Show running test case name · aa49cd96

Pekka Enberg authored 11 years ago


Show running test case name.  Makes debugging test failures less
painful...

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

aa49cd96

ZFS root filesystem · 27030264

Pekka Enberg authored 11 years ago


Use the new pivot_root() functionality to switch to ZFS root filesystem
once OSv is up and running.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

27030264

ramfs: unmount · 240edb49

Pekka Enberg authored 11 years ago


Needed by pivot_root() to unmount the initial rootfs.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

240edb49

vfs: Add pivot_root() system call · 377f4383

Pekka Enberg authored 11 years ago

This adds a simple pivot_root() system call that works on mountpoints
and simply removes 'put_old' from mount list so that VFS doesn't know
about it and adds renames 'new_root' ->m_path to '/'.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

377f4383

Nov 14, 2013

mmu: Make fill from fill_page support variable page sizes · 59250dee

Raphael S. Carvalho authored 11 years ago


Previously, fill only supported small-page-size chunks.  However, it's
possible to avoid calling fill multiple times simply by allowing
variable page sizes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

59250dee

libc/pthread.cc: Switch to WARN_STUBBED() · e5b8acd4
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
e5b8acd4

build.mk: Fix bootfs and usr manifest dependencies · c4a45f44

Pekka Enberg authored 11 years ago


Commit 0dcf1f8f ("OSv module support") didn't add a dependency to
bootfs.manifest.skel and usr.manifest.skel which causes image not to be
rebuilt if the files are changed.

Fix that up.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c4a45f44

libc: pthread_kill() stub · 39b9c04b

Pekka Enberg authored 11 years ago


Add pthread_kill() stub. Needed by Cassandra when its stopped with
Ctrl-C.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

39b9c04b

Nov 13, 2013

Add missing tst-libc-locking.cc file · 10bd596d

Tomasz Grabiec authored 11 years ago


Add the actual test case that was forgotten from commit a9f8092a
("Introduce test for libc locking").

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

10bd596d