Commits · be940896f29da86d9ae99ec6cd2d66fc1204eb91 · Verlässliche Systemsoftware / projects / osv

Nov 21, 2013

tests: add tcp connection test · be940896

Avi Kivity authored 11 years ago

The test creates and destroys threads, each of which creates a random number
of connections, each transferring a random number of bytes to an echo server.

This is used to stress the tcp/ip stack.

The test is portable, and builds on the host with the command

g++ -O2 -g3 -pthread -std=gnu++11 -lboost_program_options -lboost_system tests/tst-tcp.cc

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

be940896

epoll: switch to file based implementation using do_poll() · e771caab

Avi Kivity authored 11 years ago


Instead of using file descriptors and poll(), use do_poll().  This allows
us to get rid of user supplied fds early, which is important as fd lifetime
is decoupled from epoll lifetime.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e771caab

poll: refactor poll() in terms of file pointers, not file descriptors · 0b68144e

Avi Kivity authored 11 years ago

With epoll(), the lifetime of an ongoing poll may be longer than the
lifetime of a file descriptor; if an fd is close()d then we expect it
to be silently removed from the epoll.

With the current implementation of epoll(), which just calls poll(), this is
impossible to do correctly since poll() is implemented in terms of file
descriptor.

Add an intermedite do_poll() that works on file pointers. This allows a
refactored epoll() to convert file descriptors to file pointers just once,
and then a close()d and re-open()ed descriptor can be added without a problem.

As a side effect, a lot of atomic operations (fget() and fdrop()) are saved.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0b68144e

vfs: Make error messages when mounting rootfs more verbose. · f0d96816

Raphael S. Carvalho authored 11 years ago


Provide a better error message instead of simply printing the error
codes.

Before:

  failed to create /dev, error = 17

After:

  failed to create /dev, error = File exists

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f0d96816

scripts: Change mkzfs.py to call run.py with unsafe cache. · 56cdfc49

Raphael S. Carvalho authored 11 years ago

This patch adds the unsafe-cache option to run.py and changes mkzfs.py
to always call run.py with this option enabled.
Thus, we're doing this change just for the build run (Suggested by Nadav Har'El).
The main goal is to boost the time it takes to complete the entire process.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

56cdfc49

mmu: allow early tlb flushes · 9dc7cc25

Glauber Costa authored 11 years ago

TLB flushes cannot happen early, because we will try to send IPIs around before
they are ready to go. Now, the funny thing is *why* that happen:

We test for the size of the cpu vector to be 1. But before the cpus are
initialized, that vector is empty. Because there is a limit on how soon we can
initialize a cpu(), let's change the test to also acount for an empty vector.
It should be obvious and clear that when we have an empty vector, only one cpu
is present.

I have triggered this in the context of my last patchset for threads. My test
script was set to -c1 (sorry about that), and as soon as I tested it with SMP
it exploded here.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9dc7cc25

Nov 20, 2013

sched: start and initialize early threads · 63216e85

Glauber Costa authored 11 years ago

We may have threads that were initialized and started very early, before
sched::init() took place. We can easily identify such threads: they are all
threads that are in the thread list so far with the exception of the main
thread.

For those, we finish their initialization so they are now in a safe state.
Also, some of them may have been started already. Since we cannot really start
anything before the main thread, they were put in as special state called
"prestarted". Every thread found in this state is started at this moment.

Note how this code needs to run in the main thread itself, since we depend on
initialization that will only happen inside switch_to_first to properly
function those procedures.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

63216e85

sched: add a new thread state, prestarted · e005a738

Glauber Costa authored 11 years ago

It may be that a thread that is initialized early is also started early. We need to
somehow mark that thread as already started, so we can start it for real later when
the scheduler is ready to go. We will do this by adding an extra state, prestarted.

Later on, we will take action to start those threads properly.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e005a738

sched: avoid dereferencing current() fields on thread creation · d6ea366f

Glauber Costa authored 11 years ago

Threads that are created very early will see this field with a NULL address. We
should test against it before dereferencing. If we found current not to be
available, we skip some steps or use default values.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

d6ea366f

sched: start thread list early · 9381f4b6

Glauber Costa authored 11 years ago


Since the thread list does not depend on nothing but the memory allocator,
allocate it as early as we can.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

9381f4b6

Power-off, instead of halt, when loader can't run command · 3615b885

Nadav Har'El authored 11 years ago


Currently, when we try to run an invalid shared object (e.g, run.py -e aaa)
loader.cc calls abort(). This patch changes it to use osv::poweroff().

This is useful, for example, to measure how much time our boot/poweroff
cycle takes, without running any payload, by doing

	time scripts/run.py -e aaa

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3615b885

mmu: move huge_page_size to mmu.hh · e5cf0f1d

Glauber Costa authored 11 years ago

No reason at all for page_size to be in mmu.hh but huge_page_size in mmu.cc.
Move it, so we can also use huge_page_size outside the mmu.cc scope.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e5cf0f1d

Nov 19, 2013

Explicitly request alignment when allocating per-cpu area · e9549266

Nadav Har'El authored 11 years ago

Commit ed808267 used malloc() to allocate
the per-cpu variables area. As Avi pointed out, we need this area to be
aligned like the strictest alignment of any per-cpu variable. The strictest
alignment we need is probably CACHELINE_ALIGNED (64 bytes), but it's easiest
just to require 4096-byte alignment, and this is what the code prior to the
above patch did.

The above commit worked because luckily enough, our malloc() does return
page-aligned memory for large allocations. But it's possible that this will
not be the case in the future. So this patch switches to use aligned_alloc()
instead, explicitly requesting a 4096-byte-aligned block of memory.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e9549266

Partial implementation of aligned_alloc() and posix_memaligned(). · 7e06bd33

Nadav Har'El authored 11 years ago

This patch provides a trivial implementation of two similar functions for
allocating aligned memory blocks: aligned_alloc() (from the C11 standard)
and posix_memaligned() (from POSIX). Memory returned by either function
can be freed with the ordinary free().

This trivial implementation just calls malloc(), and assert()s that it got
the desired alignment, aborting if not. In many cases this is good enough
because malloc() already returns 4096-byte-aligned blocks for large
allocations. In particular we'll use these functions in the next patch for
allocating the large page-aligned per-cpu areas.

If we ever fail on this assertion, we can replace these functions by a
full implementation (see issue #87).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

7e06bd33

Add negotiation flag check for FLUSH · a0ce1b50

Takuya ASADA authored 11 years ago

Some older version of qemu-nbd cuases error exit with nbd_client.py.
(Look at: https://groups.google.com/d/msg/osv-dev/EW5BtNFNfzs/I33BeFXg2f0J

)
This is because nbd_client.py is sending FLUSH command unconditionally, but it's extended feature, nbd client should check nbd server has the capability to accept FLUSH.
nbd server sends capability flags on negotiation stage, it sends HAS_FLAGS(0x1) and SEND_FLUSH(0x4) when server supports FLUSH.

This patch adds these capability check, and skips to send FLUSH if server doesn't support it.

Signed-off-by: Takuya ASADA <syuu@dokukino.com>
Reviewed-by: Benoît Canet <benoit.canet@irqsave.net>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

a0ce1b50

percpu: Reduce size of .percpu section · ed808267

Nadav Har'El authored 11 years ago


This patch reduces the size of the .percpu section 64-fold from about
5 MB to 70 KB, and solves issue #95.

The ".percpu" section is part of the .data section of our executable
(loader-stripped.elf). In our 15 MB executable, roughly 7 MB is text
(code), and 7 MB is data, and out of that, a whopping 5 MB is the
".percpu" section. The executable is read in real mode, and this is
especially slow on Amazon EC2, hence our wish to make the executable
as small as possible.

The percpu section starts with all the PERCPU variables defined in the
program. We have about 70 KB of those, and believe it or not, most of
this 70 KB is just a single variable, the 65K dynamic_percpu_buffer
(see percpu.cc).

But then, we need a copy of these variables for each CPU. The unpatched
code duplicated this 70KB section 64 times in the executable file (!),
and then used these memory locations for up-to-64 cpus. But there is
no reason to duplicate this data in the executable! All we need to do
is to dynamically allocate a copy of this section for each CPU, and
this is what this patch does.

This patch removes about 5 MB from our executable: After this patch,
our loader-stripped.elf is just 9.7 MB, and its data section's size is
just 2.8 MB.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ed808267

vfs: Introduce vop_eperm · f1ee72ed

Raphael S. Carvalho authored 11 years ago

vop_eperm allows more code reuse (suggested by Glauber Costa)

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f1ee72ed

Nov 18, 2013

Reformat Java code · 3587c3f1

Pekka Enberg authored 11 years ago


Use four spaces for indentation and use UNIX linefeeds.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3587c3f1

Nov 15, 2013

scripts/test.py: Show running test case name · aa49cd96

Pekka Enberg authored 11 years ago


Show running test case name.  Makes debugging test failures less
painful...

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

aa49cd96

ZFS root filesystem · 27030264

Pekka Enberg authored 11 years ago


Use the new pivot_root() functionality to switch to ZFS root filesystem
once OSv is up and running.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

27030264

ramfs: unmount · 240edb49

Pekka Enberg authored 11 years ago


Needed by pivot_root() to unmount the initial rootfs.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

240edb49

vfs: Add pivot_root() system call · 377f4383

Pekka Enberg authored 11 years ago

This adds a simple pivot_root() system call that works on mountpoints
and simply removes 'put_old' from mount list so that VFS doesn't know
about it and adds renames 'new_root' ->m_path to '/'.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

377f4383

Nov 14, 2013

mmu: Make fill from fill_page support variable page sizes · 59250dee

Raphael S. Carvalho authored 11 years ago


Previously, fill only supported small-page-size chunks.  However, it's
possible to avoid calling fill multiple times simply by allowing
variable page sizes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

59250dee

libc/pthread.cc: Switch to WARN_STUBBED() · e5b8acd4
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
e5b8acd4

build.mk: Fix bootfs and usr manifest dependencies · c4a45f44

Pekka Enberg authored 11 years ago


Commit 0dcf1f8f ("OSv module support") didn't add a dependency to
bootfs.manifest.skel and usr.manifest.skel which causes image not to be
rebuilt if the files are changed.

Fix that up.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c4a45f44

libc: pthread_kill() stub · 39b9c04b

Pekka Enberg authored 11 years ago


Add pthread_kill() stub. Needed by Cassandra when its stopped with
Ctrl-C.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

39b9c04b

Nov 13, 2013

Add missing tst-libc-locking.cc file · 10bd596d

Tomasz Grabiec authored 11 years ago


Add the actual test case that was forgotten from commit a9f8092a
("Introduce test for libc locking").

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

10bd596d

Add .pyc files to gitignore · a3006cdb

Pekka Enberg authored 11 years ago


They are temporary files and should be ignored.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a3006cdb

Remove scripts/nbd_client.pyc from the tree · 2d4d65f2

Pekka Enberg authored 11 years ago


I committed the file accidentally.  It's a temporary file that shouldn't
be there.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2d4d65f2

Introduce test for libc locking · a9f8092a

Tomasz Grabiec authored 11 years ago


Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a9f8092a

Fix lack of locking in libc in file and memstream related operations · 3f053b8f

Tomasz Grabiec authored 11 years ago

Spotted by Nadav: libc.threaded field is not set but is used in
several 'if' statements when setting the lock_owner field.

When 'libc.threaded' is false then 'lock_owner' of a FILE is set to a
special value which indicates no locking. This field is initially set
to 0 and the original musl code had a logic which upon creation of the
first thread set it to true and adjusted 'lock_owner' field of all
open files to the value of libc.main_thread. In OSv we had no such
logic which resulted in no locking of the FILE structure.

This patch fixes the issue by using threaded mode from the very
beginning. We also do not rely anymore on posix thread existence so
that stdlib can be used very early in the boot process without
unexpected behavior. It is used (rightfully or not) for example in
ramdisk_init(). We do not have to hold the pthread id in the
'lock_owner' field because the mutex already tracks the owner and we
can do the check using 'mutex_owned()' function.

This patch also gets rid of a magic value STDIO_SINGLETHREADED, which
is of type pthread_t and was used to disable locking when it was known
to be not necessary. A new field is introduced named 'no_locking'
which serves this purpose.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3f053b8f

OSv module support · 0dcf1f8f

Takuya ASADA authored 11 years ago

The idea of the patch is basically described in prevoius post:

https://groups.google.com/d/msg/osv-dev/RL2S3AL9TNE/l4XZJo3-lI0J

Whis this patch, you will be able to install OSv apps into disk image on
"make all" stage.

These apps does not require to exist in OSv repository, you can install
apps which is on any git repository or svn repository, or on local
directory.

You'll need to write a config file to add apps, format of the file is
JSON.

Here's a sample of the file:
{
   "modules":[
      {
	 "name":"osv-mruby",
         "type":"git",
         "path":"https://github.com/syuu1228/osv-mruby.git",
         "branch":"master"
      }
   ]
}

If you add "module" on config file, make all calls script/module.py.

This scripts perform "git clone" to fetch repository to $(out)/module,
and invoke "make module" on each module.

"make module" should outputs bootfs.manifest/usr.manifest on module
directory, the script merge bootfs.manifest.skel/usr.manifest.skel and
module local manifests to single file
$(out)/bootfs.manifest/$(out)/usr.manifest.

Here's app Makefile example:

  https://github.com/syuu1228/osv-mruby/blob/master/Makefile



It have "module" target, and the target builds all binaries and
generates *.manifest.

Signed-off-by: Takuya ASADA <syuu@dokukino.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0dcf1f8f

scripts/run.py: Add wait feature · 2c994347

Raphael S. Carvalho authored 11 years ago


It simply tells QEMU not to start OSv at startup.

To continue the execution, it's possible to use either the QEMU monitor
or a GDB remote connection.

Signed-off-by: Raphael S. Carvalho <raphael.scarv@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2c994347

Nov 12, 2013

x64: Fix stack alignment in fault handlers · 9be9dd99

Pekka Enberg authored 11 years ago


Make sure stack pointer is 16-byte aligned in fault handler as required
by x86-64 ABI. This is needed for the page fault handler to be able to
use stack for FPU state save/restore.

Spotted by Nadav.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9be9dd99

release-ec2: fix AMI replication code · 188d11db

Dmitry Fleytman authored 11 years ago


Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

188d11db

Nov 11, 2013

build: fix .tls_template_size calculation · 3527954f

Avi Kivity authored 11 years ago


For an unknown reason, the current calculation of .tls_template_size
yields 0x10 instead of the correct value.  This results in part of the
initial tls block being freed by arch::setup(), and subsequent
corruption.

Fix by switching to the ld SIZEOF() operator.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3527954f

bsd: Fix formatting in porting/network.cc · bfc7bd10

Pekka Enberg authored 11 years ago


Use four spaces, not tabs for indentation.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

bfc7bd10

Fix make flag in readme for debug mode · 4a10bbe2

dleifker@gmail.com authored 11 years ago


The -j must have been included by mistake, otherwise make uses unlimited
jobs.

Signed-off-by: David Leifker <dlei...@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4a10bbe2

ovs: fix build error. (bsd/porting/networking.cc) · 870a9a55

大谷津昂季 authored 11 years ago


do not need initialize variable. (or use memset(3) for zero clear.)
Because set of values by ioctl(SIOCGIFFLAGS)

```
(snip)
  CC bsd/sys/kern/sys_socket.o
  CC bsd/sys/kern/subr_disk.o
  CC bsd/porting/route.o
  CXX bsd/porting/networking.o
../../bsd/porting/networking.cc: In function ‘int osv::ifup(std::string)’:
 ../../bsd/porting/networking.cc:99:30: error: missing braces around
initializer for ‘char [16]’ [-Werror=missing-braces]
cc1plus: all warnings being treated as errors
make[1]: *** [bsd/porting/networking.o] Error 1
make[1]: Leaving directory
`/home/kouki-o/work/kaishuu0123-osv/build/release'
make: *** [all] Error 2
(snip)
```

Signed-off-by: Kouki Ooyatsu <kaishuu0123@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

870a9a55

ip: fix IP defragmentation mechanism for big packets · a450a612

Dmitry Fleytman authored 11 years ago


Current limit of fragments number for IP packet is 16.
This is not enough for packets bigger than 24K on standard MTU.
This patch increases this number up to theoretical maximum.

The problem found during UDP RX performance testing - throughput
dropped to 0 for 32K UDP packets.

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a450a612