Skip to content
Snippets Groups Projects
user avatar
Nadav Har'El authored
Our read() and write(), and their variants (pread, pwrite, readv, writev,
preadv, pwritev) all shared the same bug when it comes to a partial read or
write: they returned EWOULDBLOCK (EAGAIN) instead of returning successfully
with the number of bytes actually written or read, as they should have.

In the internals of the BSD read and write operations (e.g., sosend_generic)
each operation returns *both* an error number and a number of bytes left.
But at the end, the system call is expected to return just one of them -
either an error *or* a number of bytes. The existing read()/write() code,
when it saw the internals returning an error code, always returned it and
ignored the number of bytes. This was wrong: When the error is EWOULDBLOCK
and the number of bytes is non-zero, we should return this number of bytes
(i.e., a successful partial write), *not* the EWOULDBLOCK error.

This bug went unnoticed almost since the dawn of OSv, because partial reads
and writes are not common. For example, a write() to a blocking socket will
always return after the entire write is successful, and will not partially
succeed. Only when we write to an O_NONBLOCK socket, will it be possible to
see a partial write - But even then, we would need a pretty large write()
to see it only partially succeeding.

But this bug is very noticable when running the Jetty Web server (see issue
At some point it's like the response was restarted (complete with a second
copy of the headers). In Jetty's demo this was seen as half-shown images,
as well as corrupt output when fetching large text files like /test/da.txt.

Turns out that Jetty sends static responses in a surprisingly efficient
(for Java code...) way, using a single system call for the entire response:
It mmap()s the file it wishes to send, and then uses one writev() call to
send two arrays: The HTTP headers (built in malloc()ed memory), and the
file itself (from mmapped memory). So Jetty tries to write even a 1MB file
in one huge writev() call. But there's an added twist: It does so with the
socket configured to O_NONBLOCK. So for large writes, the write will only
partially succeed (empirically, only about 50KB will succeed), and Jetty
will notice the partial write and continue writing the rest - until the
whole file is sent. With the bug we had, part of the request will have been
written, but Jetty still thought the write didn't write anything so it would
start writing again from the beginning - causing the weird sort of response
corruption we've been seeing.

This patch also includes a test case which confirms this bug, and its fix.
In this test (tst-tcp-nbwrite), two threads communicate over a TCP socket
(on the loopback interface), one thread write()s a very large buffer and
the other receives what it can. We try this two times - once on a blocking
socket and once on a non-blocking socket. In each case we expect the number
of bytes written by one thread (return from write()) and the number read
by the second thread (return from read()) to be the same. With the bug we
had, in the non-blocking case we saw write() returning -1 (with
errno=EWOULDBLOCK) but read returned over 50,000 bytes, causing the test
to fail.

Fixes #257.

Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
ef169330
History

OSv

OSv is a new open-source operating system for virtual-machines. OSv was designed from the ground up to execute a single application on top of a hypervisor, resulting in superior performance and effortless management when compared to traditional operating systems which were designed for a vast range of physical machines.

OSv has new APIs for new applications, but also runs unmodified Linux applications (most of Linux's ABI is supported) and in particular can run an unmodified JVM, and applications built on top of one.

For more information about OSv, see http://osv.io/ and https://github.com/cloudius-systems/osv/wiki

Documentation

Building

OSv can only be built on a 64-bit x86 Linux distribution. Please note that this means the "x86_64" or "amd64" version, not the 32-bit "i386" version.

First, install prerequisite packages:

Fedora

yum install ant autoconf automake boost-static gcc-c++ genromfs libvirt libtool flex bison qemu-system-x86 qemu-img maven maven-shade-plugin

Debian

apt-get install build-essential libboost-all-dev genromfs autoconf libtool openjdk-7-jdk ant qemu-utils maven libmaven-shade-plugin-java

Ubuntu users: you may use Oracle JDK if you don't want to pull too many dependencies for openjdk-7-jdk

To ensure functional C++11 support, Gcc 4.8 or above is required, as this was the first version to fully comply with the C++11 standard.

Make sure all git submodules are up-to-date:

git submodule update --init --recursive

Finally, build everything at once:

make

By default make creates image in qcow2 format. To change this pass format value via img_format variable, i.e.

make img_format=raw

Running OSv

./scripts/run.py

By default, this runs OSv under KVM, with 4 VCPUs and 2GB of memory, and runs the default management application (containing a shell, Web server, and SSH server).

If running under KVM you can terminate by hitting Ctrl+A X.

External Networking

To start osv with external networking:

sudo ./scripts/run.py -n -v

The -v is for kvm's vhost that provides better performance and its setup requires a tap and thus we use sudo.

By default OSv spawns a dhcpd that auto config the virtual nics. Static config can be done within OSv, configure networking like so:

ifconfig virtio-net0 192.168.122.100 netmask 255.255.255.0 up
route add default gw 192.168.122.1

Test networking:

test invoke TCPExternalCommunication

Running Java or C applications that already reside within the image:

# The default Java-based shell and web server
sudo scripts/run.py -nv -m4G -e "java.so -jar /usr/mgmt/web-1.0.0.jar app prod"

# One of the unit tests (compiled C++ code)
$ sudo scripts/run.py -nv -m4G -e "/tests/tst-pipe.so"