Commits · cb139dba52315defce48f153563ef4c772d3d394 · Verlässliche Systemsoftware / projects / osv

Apr 10, 2014

backtrace: show call-sites instead of return-sites · cb139dba

Tomasz Grabiec authored 10 years ago


I think this is more useful. When I analyze a backtrace I want to see
a call-chain rather than return-chain. The return addres may lay
inside an inlined function and have little to do with the
call-site. Example follows showing addresses translated by addr2line.

Before:

  tcp_net_channel_packet
  std::__atomic_base<unsigned int>::load(std::memory_order)
  operator()

After:

  tcp_net_channel_packet
  std::function<void (mbuf*)>::operator()(mbuf*)
  tcp_flush_net_channel

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

cb139dba

build: Fix Revert "build: Skip rebuild of usr.img on osv.vmdk and osv.vdi target" · a2f4d6f6

Asias He authored 10 years ago


1) Adding usr.img dependency to Makefile is broken because no usr.img
target is in Makefile. (It happened to work for me because I have a
file named usr.img under src/osv)

2) Adding usr.img dependency to build.mk will cause the rebuild of usr.img for some reason.

	osv.vmdk osv.vdi: usr.img
		...
	.PHONY: osv.vmdk osv.vdi

So, in order to build correct images. Let's drop the usr.img dependency
completely for now before we can fix issue 2).

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a2f4d6f6

libc: Add inotify stubs · db09228b

Pekka Enberg authored 10 years ago

This patch adds inotify API stubs. The lack of inotify prevents booting
an application under OSv completely.

  https://github.com/cloudius-systems/capstan/issues/65



Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

db09228b

arch/aarch64: fix storage size for fpsr and fpcr · d4bc3a7c

Vladimir Murzin authored 10 years ago


FPSR and FPCR registers are 32-bit wide.

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Vladimir Murzin <murzin.v@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d4bc3a7c

bsd/aarch64: clean-up atomics · 4989dfc4

Vladimir Murzin authored 10 years ago


CBNZ instruction doesn't affect condition flags, so there is no sense to
clobber them.

Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Vladimir Murzin <murzin.v@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4989dfc4

Apr 09, 2014

linux: make write syscall available · d437a24e

Glauber Costa authored 10 years ago


The jemalloc memory allocator likes to bypass the libc when calling write, so
it calls the syscall directly. Let's make write available through this interface
as well.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d437a24e

pthread: implement mutexattr functions · c86c915b

Glauber Costa authored 10 years ago

redis (jemalloc to be precise) actually expects that functions to succeed.
Returning -1 here means it will not initialize correctly its memory allocator.

This simple implementation just stores the desired value, but does not change
anything in the underlying mutex.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c86c915b

tests: enhance mmap test · ed2a70f1

Glauber Costa authored 10 years ago

We have recently discovered a bug through which we fail to unmap a valid region.
This is fixed now, and this patch adds the failing condition to the test suite.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ed2a70f1

mmu: correctly calculate size for unmap operations · a4999feb

Glauber Costa authored 10 years ago


Right now, we graciously accept - as does Linux - a mmap call for a size
smaller than a page: we align it up, and serve it. But the same alignment is
missing from unmap: so if the user rightfully tries to unmap using the same
size, it will fail.

The following test program succeeds on Linux but fails on OSv:

    int main () {
        void *ret = mmap(NULL, 64, PROT_READ, MAP_ANON | MAP_PRIVATE, -1, 0);
        if (ret == NULL) {
            return 1;
        }
        return munmap(ret, 64);
    }

After this patch, it works for us as well.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

a4999feb

pthread: stub setcanceltype · 8f5944cc

Glauber Costa authored 10 years ago

The implementation won't do anything anyway until we have setcancelstate().

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8f5944cc

pthread_at_fork: bogus implementation · 6375fbd5

Glauber Costa authored 10 years ago

Because we do not support fork, there is no need to do anything upon fork (and if
it changes in the future, we should revisit this)

So we can get away with this simple implementation that just returns 0.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

6375fbd5

libc: Add program_invocation_name variables · 4aa2675e

Glauber Costa authored 10 years ago

 
Redis relies on the variables being present and correctly set.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4aa2675e

run.py: allow multiple arguments to --trace · cb2c236e

Tomasz Grabiec authored 10 years ago


Allow to pass --trace multiple times to run.py so that the command
line interface is the same as for OSv's loader, eg:

  scripts/run.py --trace sched_wait* --trace memory_*

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

cb2c236e

Bump up apps submodule HEAD · 257194a2

Tomasz Grabiec authored 10 years ago


Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

257194a2

Merge branch 'ec2-release-scripts' · 68fe6e1a
Pekka Enberg authored 10 years ago

68fe6e1a

upload-ec2: Introduce script for images upload · 4a45edda

Dmitry Fleytman authored 10 years ago


Introduce script upload-ec2.sh that gets prebuilt
OSv images from S3 and converts them to EC2 AMIs.

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4a45edda

release-ec2: Base released images on Linux AMI · c0f53020

Dmitry Fleytman authored 10 years ago


Linux AMI suits OSv better because of default settings
instances inherit on creation - mainly preconfigured SSH
access and open SSH ports.

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c0f53020

release-ec2: Introduce AWS regions list parameter · 42493eef

Dmitry Fleytman authored 10 years ago


Parameter --override-regions added

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

42493eef

release-ec2: trivial: help screen enhancements · ed690e4b

Dmitry Fleytman authored 10 years ago


Help screen enhanced to make parameters syntax clear

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ed690e4b

vfs: fix partial non-blocking write · ef169330

Nadav Har'El authored 10 years ago


Our read() and write(), and their variants (pread, pwrite, readv, writev,
preadv, pwritev) all shared the same bug when it comes to a partial read or
write: they returned EWOULDBLOCK (EAGAIN) instead of returning successfully
with the number of bytes actually written or read, as they should have.

In the internals of the BSD read and write operations (e.g., sosend_generic)
each operation returns *both* an error number and a number of bytes left.
But at the end, the system call is expected to return just one of them -
either an error *or* a number of bytes. The existing read()/write() code,
when it saw the internals returning an error code, always returned it and
ignored the number of bytes. This was wrong: When the error is EWOULDBLOCK
and the number of bytes is non-zero, we should return this number of bytes
(i.e., a successful partial write), *not* the EWOULDBLOCK error.

This bug went unnoticed almost since the dawn of OSv, because partial reads
and writes are not common. For example, a write() to a blocking socket will
always return after the entire write is successful, and will not partially
succeed. Only when we write to an O_NONBLOCK socket, will it be possible to
see a partial write - But even then, we would need a pretty large write()
to see it only partially succeeding.

But this bug is very noticable when running the Jetty Web server (see issue
At some point it's like the response was restarted (complete with a second
copy of the headers). In Jetty's demo this was seen as half-shown images,
as well as corrupt output when fetching large text files like /test/da.txt.

Turns out that Jetty sends static responses in a surprisingly efficient
(for Java code...) way, using a single system call for the entire response:
It mmap()s the file it wishes to send, and then uses one writev() call to
send two arrays: The HTTP headers (built in malloc()ed memory), and the
file itself (from mmapped memory). So Jetty tries to write even a 1MB file
in one huge writev() call. But there's an added twist: It does so with the
socket configured to O_NONBLOCK. So for large writes, the write will only
partially succeed (empirically, only about 50KB will succeed), and Jetty
will notice the partial write and continue writing the rest - until the
whole file is sent. With the bug we had, part of the request will have been
written, but Jetty still thought the write didn't write anything so it would
start writing again from the beginning - causing the weird sort of response
corruption we've been seeing.

This patch also includes a test case which confirms this bug, and its fix.
In this test (tst-tcp-nbwrite), two threads communicate over a TCP socket
(on the loopback interface), one thread write()s a very large buffer and
the other receives what it can. We try this two times - once on a blocking
socket and once on a non-blocking socket. In each case we expect the number
of bytes written by one thread (return from write()) and the number read
by the second thread (return from read()) to be the same. With the bug we
had, in the non-blocking case we saw write() returning -1 (with
errno=EWOULDBLOCK) but read returned over 50,000 bytes, causing the test
to fail.

Fixes #257.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

ef169330

mgmt: update to latest · 6e110e24
Pekka Enberg authored 10 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
6e110e24

Apr 08, 2014

sched: fix waitqueue race causing failure to wake up · 4ef65eb6

Avi Kivity authored 10 years ago


When waitqueue::wake_all() wakes up waiting threads, it calls
sched::thread::wake_lock() to enqueue those waiting threads on the mutex
protecting the waitqueue, thus avoiding needless contention on the mutex.
However, if a thread is already waking, we let it wake naturally and acquire
the mutex itself.

The problem is that the waitqueue code (wait_object<waitqueue>::poll())
examines the wait_record it sleeps on and see if it has woken, and if not,
goes back to sleep.  Since nothing in that thread-already-awake path clears
the wait_record, that is what happens, and the thread stalls, until a timeout
occurs.

Fix by clearing the wait record.  As it is protected by the mutex, no
extra synchronization is needed.

Observed with iperf -P 64 against the guest.  Likely triggered by net channels
waking up the thread, and then before it has a chance to wake up, a FIN
packet arrives that is processed in the driver thread; so when the packets
are consumed the thread is in the waking state.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4ef65eb6

jvmballoon: fix _soft_max_balloons check · 04ac707f

Gleb Natapov authored 10 years ago


Number of to be released balloons is calculated as a difference between
current number of balloons and sof max. If they are equal no balloons
are released and the loop repeats.

Signed-off-by: Gleb Natapov <gleb@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

04ac707f

zfs: spa_zio_taskq[ZIO_TYPE_FREE][ZIO_TASKQ_ISSUE]->tq_lock contention · c74afb15

Raphael S. Carvalho authored 10 years ago

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Approved by: Richard Lowe <richlowe@richlowe.net>

Reference: https://illumos.org/issues/3581



Patch taken from Illumos and slight changes were needed to port it to OSv.

This patch targets improvement on taskq lock contention by dispatching work
over independent task queues. ZFS on Linux devops mention that it's not clear
whether or not this issue affects their port, but profile results showed that
time spent on taskq_thread() was reduced by about 11%.
Apart from getting performance benefits, the number of threads in OSv was
nicely reduced (from ~344 threads to ~224; so possibly saving a good amount
of memory footprint).
Also good for stepping towards our synchronicity with ZFS upstream.

Addressing the issue #247.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c74afb15

dhcp: fix parsing of DHCP options · 40d4ae32

Tomasz Grabiec authored 10 years ago

There are several issues with current code. Firstly, the LENGTH_OK
macro was used in the condition in while(). This macro was checking if
options + op_len does not exceed packet limit. This macro works fine
when used from PARSE_OP() inside the switch, but because 'options' is
bumped up by op_len at the end of the loop body, use of this macro in
the while condition may result in premature exit of the loop. This was
causing that some times OSv was not parsing network mask and gateway
leaving them at 0.0.0.0 when started on Goodle Compute Engine. As a
result OSv was not responding over network. See issue #254.

Another issue was that the stop condition which checks for op ==
DHCP_OPTION_END was using 'op' from the outer context, which was never
overwritten. The actual variable which was changed based on the packet
content was redeclared inside the loop.

A third problem, spotted by Vlad, is that the code was not handling
DHCP_OPTION_PAD properly. This option has only opcode byte and no
following length byte. Currrent code would attempt to read the length
byte and skip by that amount, which would yiled incorrect parsing
result.

Reviewed-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

40d4ae32

dhcp: remove lookup_opcode() · abbfc557

Tomasz Grabiec authored 10 years ago


The lookup_opcode() function is incorrect. It was mishandling
DHCP_OPTION_PAD, which does not have a following length byte.

Also, the while condition is reading 'op' value which never
changes. This may result in reads beyond packet size.

Since this function is unused the best fix is to remove it.

Reveiwed-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

abbfc557

Apr 07, 2014

build-standalone-img: Fix image versioning · 06cf7b0f

Pekka Enberg authored 10 years ago

The typo in commit 425c5ce7 ("build-standalone-img: Add version number to
image filename") caused the shell script to evaluate "version" command
instead of expanding the "version" variable. Fix that.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

06cf7b0f

build-standalone-img: Add version number to image filename · 425c5ce7

Pekka Enberg authored 10 years ago


Add OSv version number to image filename so that we can actually have
more than one release available for download...

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

425c5ce7

Revert "build: Skip rebuild of usr.img on osv.vmdk and osv.vdi target" · 9d4983fe
Pekka Enberg authored 10 years ago
```
This reverts commit df30ec8d. It breaks
building "osv.vdi" altogether.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
View commits for tag v0.06 v0.06

9d4983fe

build-osv-release: standalone image support · 87001a10

Pekka Enberg authored 10 years ago


Make scripts/build-osv-release support both Capstan and standalone
images.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

87001a10

scripts: Add build-standalone-img to build images used without capstan · ee5be0ee

Asias He authored 10 years ago


This script is similar to build-capstan-img. Later we can hook this
script to scripts/build-osv-release to build images used without
capstan. It takes the same args with build-capstan-img.

Example output:

   $ find build/standalone
   build/standalone/cloudius/
   build/standalone/cloudius/osv-iperf
   build/standalone/cloudius/osv-iperf/osv-iperf.esx.ova
   build/standalone/cloudius/osv-iperf/osv-iperf.qemu.qcow2
   build/standalone/cloudius/osv-iperf/osv-iperf.vbox.ova
   build/standalone/cloudius/osv-iperf/osv-iperf.vmw.zip
   build/standalone/cloudius/osv-iperf/osv-iperf.gce.tar.gz

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ee5be0ee

aarch64: stub new mmu functions in bsd · 94996a8c

Claudio Fontana authored 10 years ago


commit 82f881a2 "zfs: mmu: take vma_list_mutex"
introduces code that needs to be stubbed for now for AArch64.

Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

94996a8c

aarch64: fix up merge issues · 52f64ac3

Claudio Fontana authored 10 years ago


lots of mmu related changes require fixups for AArch64.

Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

52f64ac3

zfs: mmu: take vma_list_mutex before tearing down PT during ARC buffer eviction · 82f881a2

Gleb Natapov authored 10 years ago


Without the lock page mapping and buffer eviction can run in parallel
which may cause following race:

page mapper thread               thread calling arc_evict()
map_addr() {
  page = arc_get_page();
  add_mapping(page, ptep);
                                  evict(page) {
                                    ptep = get_mapping(page);
                                    ptep.write(0);
                                    free(page);
                                  }

  ptep.write(page);
}

ARC code has no well defined order for taking its mutexes. It uses trylock()
and skips a buffer if required locks cannot be acquired. This patch uses
same approach for vma_list_mutex: if evicted buffer is shared try to
lock vma_list_mutex and skip the buffer if this fails.

Signed-off-by: Gleb Natapov <gleb@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

82f881a2

Apr 06, 2014

gdb: fix 'osv thread' and 'osv thread apply all' · 23275cd0

Tomasz Grabiec authored 10 years ago


thread_list was populated with references to sched::thread* rather
than sched::thread. The commands were assuming the latter. As a result
the commands were printing and comparing not against sched::thread
address but against address of the pointer to sched::thread in
sched::thread_map.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

23275cd0

gdb: fix 'osv info virtio' · d82f6fca

Tomasz Grabiec authored 10 years ago


Looks like gdb.Value() cannot be implicitly converted to int
by range() on GDB with python3:

    for qidx in range(0, vb['_num_queues']):
  TypeError: 'gdb.Value' object cannot be interpreted as an integer
  Error occurred in Python command: 'gdb.Value' object cannot be interpreted as an integer

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

d82f6fca

apps: update · 3640debb
Avi Kivity authored 10 years ago
```
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
```
3640debb

Apr 04, 2014

README.md: the run.py default for OSv Guest requested RAM size is 2GB RAM, not 1GB · f6064ddb

Vlad Zolotarov authored 10 years ago


Reviewed-by: Tomasz Grabiec <tgrabiec@gmail.com>
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f6064ddb

README.md: use the proper name for Debian package containing maven-shade-plugin.jar · 40d8032c

Vlad Zolotarov authored 10 years ago


Reviewed-by: Tomasz Grabiec <tgrabiec@gmail.com>
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

40d8032c

Apr 03, 2014

sched: fix rare crashes caused by reschedule running on the wrong CPU · ee92f736

Nadav Har'El authored 10 years ago

For a long time we've had the bug summarized in issue #178, where very
rarely but consistently, in various runs such as Cassandra, Netperf and
tst-queue-mpsc.so, we saw OSv crashing because of some corruption in the
timer list, such as arming an already armed timer, or canceling and already
canceled timer.

It turns out the problem was the schedule() function, which basically did
cpu::current()->schedule(). The problem is that if we're unlucky enough,
the thread can be migrated right after calling cpu::current(), but before
the irq disable in schedule(), which causes us to do a rescheduling for
one CPU on a different CPU, which is a big faux pas. This can cause us,
for example, to mess with one CPU's preemption_timer from a different CPU,
causing the timer-related races and crashes we've seen in issue #178.

Clearly, we shouldn't at all have a *method* cpu->schedule() which can
operate on any cpu. Rather, we should have only a *function* (class-static)
cpu::schedule() which operates on the current cpu - and makes sure we find
that current CPU within the IRQ lock to ensure (among other things) the
thread cannot get migrated.

Another benefit of this patch is that it actually simplifies the code,
with one less function called "schedule".

Fixes #178.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

ee92f736