Commits · 1194fb25bbd7f20e0c355b41404c9fbabf9baa18 · Verlässliche Systemsoftware / projects / osv

Dec 19, 2013

ramdisk: Use dev->size instead of sc->size · 1194fb25

Asias He authored 11 years ago


Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

1194fb25

Dec 18, 2013

virtio-scsi: Initial support · c7e560f1

Asias He authored 11 years ago

This adds initial virtio-scsi support. We have no scsi layer in osv, in
this implementation virtio-scsi works directly with the bio layer. It
translates BIO_READ, BIO_WRITE and BIO_FLUSH to SCSI CMD.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c7e560f1

virtio-net: Do not use std::stringstream for _driver_name · 2d9c5f7d

Asias He authored 11 years ago


Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2d9c5f7d

virtio-blk: Do not use std::stringstream for _driver_name · 30e1b35a

Asias He authored 11 years ago


Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

30e1b35a

virtio-blk: Switch to sched::thread_handle for thread wakeup · 86f741fe

Asias He authored 11 years ago


The lock used to protect _waiting_request_thread can go away.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

86f741fe

Dec 17, 2013

virtio: Add init_sg helper · 7570ef53

Asias He authored 11 years ago


Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7570ef53

virtio: Add add_out_sg and add_in_sg helper · e6ff43a7

Asias He authored 11 years ago


Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e6ff43a7

virtio: Use emplace_back instead of push_back · 20c870a5

Asias He authored 11 years ago


We can skip to construct a vring::sg_node.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

20c870a5

Dec 16, 2013

bsd: fix horrible m_* macro clash · f81d5121

Avi Kivity authored 11 years ago


bsd defines some m_ macros, for example m_flags, to save some typing.  However
if you have a variable of the same name in another header, for example
m_flags, have fun trying to compile your code.

Expand the code in place and eliminate the macros.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f81d5121

virtio-blk: Use 'auto' where possible · d841f0e2

Pekka Enberg authored 11 years ago


Clean up virtio-blk.cc by using 'auto' type specifier where possible.

Reviewed-by: Dor Laor <dor@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d841f0e2

virtio-net: use if_transmit() instead of legacy if_start() · 0b3bf210

Vlad authored 11 years ago

Switched the virtio-net driver to use if_transmit() instead of legacy
if_start(). This saves us at least 2 additional lock/unlock sequences
per-each mbuf since IF_ENQUEUE() and IF_DEQUEUE() take lock when
pushing/removing the mbuf from the queue if ifnet in a legacy mode.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0b3bf210

Dec 11, 2013

random: Separate device node from virtio-rng · 2752a285

Amnon Heiman authored 11 years ago


Separate /dev/random the virtio-rng driver and register virtio-rng as a
HW RNG entropy source.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2752a285

virtio-blk: Disable interrupts while irq handling is in progress · 8cc46dca

Asias He authored 11 years ago


This reduces unnecessary interrupts that host could send to guest
while guest is in the progress of irq handling.

In virtio_driver::wait_for_queue, we will re-enable interrupts when
there is nothing to process.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8cc46dca

Dec 09, 2013

virtio-rng: Do not call queue->get_buf_gc() with wait_until · f0706cec

Asias He authored 11 years ago


Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f0706cec

virtio-blk: Do not call queue->get_buf_gc() with wait_until · 1cc3dee7

Asias He authored 11 years ago


When I hacked use_indirect() to always use indirect buffer, I saw this
assertion when running:

   $scripts/run.py  -e "/tests/tst-bdev-write.so vblk1"

   VFS: mounting devfs at /dev
   51.671 Mb/s
   Assertion failed: _status.load() == status::running
   (/home/asias/src/cloudius-systems/osv/core/sched.cc: prepare_wait: 655) Aborted

It turned out that we are making a waiting thread waiting again. get_buf_gc()
calls free which might make the thread in waiting state again.

Suggested-by: Dor Laor <dor@cloudius-systems.com>
Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

1cc3dee7

virtio: Add vring::used_ring_can_gc() helper · b7f8fa6d

Asias He authored 11 years ago


It is useful to test if we can do gc on the used ring.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b7f8fa6d

virtio: Fix vring::use_indirect · 6204bf4e

Asias He authored 11 years ago


When the _avail_count is less than 1/3 of the ring, we start using
indirect descriptor.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Dor Laor <dor@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

6204bf4e

virtio: Do not scale desc_needed by _num / 2 · 24bbcd78

Asias He authored 11 years ago


There is no reason we should do the scale.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Dor Laor <dor@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

24bbcd78

Dec 08, 2013

xen: disable pvclock for more than 32 CPUs · 0ddd6ef1

Glauber Costa authored 11 years ago

Xen's shared info contains hardcoded space for only 32 CPUs. Because we use
those structure to derive timing information, we would be basically accessing
random memory after that. This is very hard to test and trigger, so what I'd
did to demonstrate I was right (although that wasn't really needed, math could
be used for that...) was to print the first timing information a cpu would
produce. I could verify that the timing on CPUs > 32 was behind in time than
the time produced in CPUs < 32.

It is possible to move the vcpu area to a different location, but this is a
relatively new feature of the Xen Hypervisor: Amazon won't support it. So
we need a disable path anyway. I will open up an issue for somebody to implement
that support eventually.

Another user of the vcpu structure is interrupts. But for interrupts the story
is easier, since we can select which CPUs we can take interrupts at, and only
take them in the first 32 CPUs. In any case, we're taking them all in CPU0 now,
so already under control

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0ddd6ef1

Dec 06, 2013

virtio: Fix vring::used_ring_is_half_empty() calculation · c62c9dd8

Asias He authored 11 years ago


This patch fixes:

- The order of _used_ring_host_head and _used->_idx, the latter is more
  advanced than the former.
- The unwanted promotion to "int"

Pekka wrote:

   However, on the right-hand side, the expression type in master
   evaluates to "int" because of that innocent-looking constant "2" and
   lack of parenthesis after the cast.  That will also force the
   left-hand side to promote to "int".

   And no, I really don't claim to follow integer promotion rules so I
   used typeid().name() verify what the compiler is doing:

   [penberg@localhost tmp]$ cat types.cpp
   #include <typeinfo>
   #include <stdint.h>
   #include <cstdio>

   using namespace std;

   int main()
   {
       unsigned int _num = 1;

       printf("int                = %s\n", typeid(int).name());
       printf("uint16_t           = %s\n", typeid(uint16_t).name());
       printf("(uint16_t)_num/2)  = %s\n", typeid((uint16_t)_num/2).name());
       printf("(uint16_t)(_num/2) = %s\n", typeid((uint16_t)(_num/2)).name());
   }
   [penberg@localhost tmp]$ g++ -std=c++11 -Wall types.cpp
   [penberg@localhost tmp]$ ./a.out
   int                = i
   uint16_t           = t
   (uint16_t)_num/2)  = i
   (uint16_t)(_num/2) = t

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c62c9dd8

virtio-net: Remove _tx_gc_lock lock · b92c33eb

Asias He authored 11 years ago


Now, the tx gc thread is gonna. The gc code can only be called in one
place. We do not need the lock anymore.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b92c33eb

virtio-net: Do _tx_queue->get_buf_gc() in virtio_net::tx_gc() · d41ccda8

Asias He authored 11 years ago


This unifies the code a bit: we do all the tx queue gc in one common
code path.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d41ccda8

virtio-net: Drop tx_gc_thread thread · 5b71c0f4

Asias He authored 11 years ago


We do tx queue gc on the tx path if there is not enough space. The tx
queue gc thread is not a must.

Dropping it saves us a running thread and saves a thread wakeup on every
interrupt.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

5b71c0f4

virtio-blk: enable write caching · b2429878

Tomasz Grabiec authored 11 years ago


This is used by QEMU to determine if the guest will be using explicit
flush requests. If this is not enabled, it will flush using
fdatasync() after every write request.

This is causing dramatic performance degradation when using spinning
disk. This

The test "tests/tst-bdev-write.so" exposes the issue:

=== Before ===

0.469 Mb/s
0.312 Mb/s
0.323 Mb/s
0.354 Mb/s
0.163 Mb/s
0.100 Mb/s
0.388 Mb/s
0.293 Mb/s
0.401 Mb/s
Written 3.117 MB in 10.02 s

=== After ===

49.151 Mb/s
53.126 Mb/s
32.079 Mb/s
49.082 Mb/s
29.575 Mb/s
42.553 Mb/s
35.909 Mb/s
37.592 Mb/s
67.425 Mb/s
Written 440.562 MB in 10.00 s

Using "tests/tst-fs-stress.so":

=== Before ===

2.414 Mb/s
3.633 Mb/s
0.630 Mb/s
0.279 Mb/s
2.497 Mb/s
Written 15.379 MB in 10.51 s

Latency of write() [s]:
0     0.000000090
0.5   0.000004532
0.9   0.000005969
0.99  0.000022659
0.999 0.001138458
1.0   4.020670891

=== After ===

11.893 Mb/s
20.292 Mb/s
13.801 Mb/s
16.102 Mb/s
24.811 Mb/s
18.113 Mb/s
21.336 Mb/s
18.976 Mb/s
Written 182.254 MB in 10.00 s

Latency of write() [s]:
0     0.000000089
0.5   0.000004497
0.9   0.000005878
0.99  0.000018114
0.999 0.000111873
1.0   0.681828260

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b2429878

Dec 05, 2013

virtio-blk: Switch to and add tracepoint · db28809b

Asias He authored 11 years ago


Switch the existing printout based debug info to tracepoint and add new
tracepoint.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

db28809b

Dec 04, 2013

virtio-rng: drop excessive producer/consumer serialization · cd987b50

Avi Kivity authored 11 years ago


There is no need to hold the lock while waiting for the host to refill the
entropy buffer; drop it.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

cd987b50

virtio-rng: fix incorrect use of valarray · b641f572

Avi Kivity authored 11 years ago


std::valarray does not guarantee its elements will be allocated contiguously,
so the form &v[0] is only guaranteed to point to the first element, not the
rest.

Switch to std::vector, where contiguity is guaranteed.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b641f572

virtio-rng: fix excessive serialization with multiple consumers · 73bac492

Avi Kivity authored 11 years ago

Suppose N threads try to acquire a byte of entropy from an empty pool. They
will all serialize on the mutex, waiting for the pool to refill. However,
when the pool is eventually refilled, only one consumer will be awakened;
the rest will continue sleeping even though there is entropy available in the
pool. They will eventually be awakened when the worker refills the pool,
but that's unneeded latency.

Fix by using wake_all() to wake all consumers.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

73bac492

virtio-rng: Remove blocking /dev/urandom · df4bb086

Pekka Enberg authored 11 years ago


It's a bad idea to claim to support /dev/urandom but rely on HW RNG
because starting up Cassandra, for example, takes ages.

Drop it until we have a cryptographically secure PRNG in OSv that can be
used to implement /dev/urandom properly.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

df4bb086

console: make it a derived class of file · d102880f
Avi Kivity authored 11 years ago
```
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
```
d102880f

Dec 03, 2013

console: convert to make_file() · 035b2bf5

Avi Kivity authored 11 years ago


Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

035b2bf5

Dec 01, 2013

virtio: Add virtio-rng driver · fd1be662

Pekka Enberg authored 11 years ago


This adds the virtio-rng driver to OSv.  The implementation is simple:

  - Start a thread that keeps 64 byte of entropy cached in internal
    buffer.  Entropy is gathered from the host with virtio-rng.

  - Create device nodes for "/dev/random" and "/dev/urandom" that both
    use the same virtio_rng_read() hook.

  - Use the entropy buffer for virtio_rng_read().  If we exhaust the
    buffer, wake up the thread and wait for more entropy to appear.

We eventually should move device node creation to separate
drivers/random.c that multiplexes between different hardware RNG
implementations.  However, as we only support virtio-rng, I'm leaving
that to whomever implements support for the next RNG.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

fd1be662

virtio: Fix device init with no MSI support · 5dfaac26

Pekka Enberg authored 11 years ago


Not all virtio devices support MSI.  Fix device initialization by not
writing to VIRTIO_MSI_QUEUE_VECTOR register if a PCI device does not
advertise MSI-X support.  This is needed to initialize virtio-rng
devices when running on KVM/QEMU.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

5dfaac26

Fix HPET driver clock rollback · 778394c3

Nadav Har'El authored 11 years ago


When KVM paravirtual clock isn't available (e.g., on Xen or on plain Qemu),
we used the HPET clock. Our HPET clock driver rolled back the clock
(clock::get()->time()) once every 42 seconds, causing strange things like
a scheduler assertion when the clock jumps back.

The problem is that we read just 32 bits out of the 64 bits of the HPET
counter. This means that we roll back the clock once every 2^32 ticks,
and with the 10ns tick (which seems to be case in Qemu), this means about
42 seconds.

Douglas Adams would have liked this bug ;-)

Fixed the code, and removed overly-optimistic comment which stated the
rollback should take years.

Added an assertion that the HPET really has a 64-bit counter; Intel's HPET
specification from 2004 already specify that a 64-bit counter is recommended,
and both Qemu and Xen do implement a 64-bit counter. If we had to deal
with a 32-bit counter, we would need to write a handler for the interrupt
that the HPET sends every time the counter wraps around.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

778394c3

Nov 28, 2013

virtio-blk: Use virtio::probe() helper · ceb2cea3
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
ceb2cea3
virtio-net: Use virtio::probe() helper · ec1ccccb
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
ec1ccccb

virtio: Add virtio::probe() helper · d8ac7e49

Pekka Enberg authored 11 years ago


Add a new virtio::probe() helper function to simplify virtio driver
probing.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d8ac7e49

Nov 26, 2013

Warn about incorrect use of percpu<> / PERCPU(..). · 8add1b91

Nadav Har'El authored 11 years ago


This patch causes incorrect usage of percpu<>/PERCPU() to cause
compilation errors instead of silent runtime corruptions.

Thanks to Dmitry for first noticing this issue in xen_intr.cc (see his
separate patch), and to Avi for suggesting a compile-time fix.

With this patch:

1. Using percpu<...> to *define* a per-cpu variable fails compilation.
   Instead, PERCPU(...) must be used for the definition, which is important
   because it places the variable in the ".percpu" section.

2. If a *declaration* is needed additionally (e.g., for a static class
   member), percpu<...> must be used, not PERCPU().
   Trying to use PERCPU() for declaration will cause a compilation error.

3. PERCPU() only works on statically-constructed objects - global variables,
   static function-variables and static class-members. Trying to use it
   on a dynamically-constructed object - stack variable, class field,
   or operator new - will cause a compilation error.

With this patch, the bug in xen_intr.cc would have been caught at
compile time.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8add1b91

Nov 21, 2013

Replace numbers in prio.hh by automatically defined numbers · 147de06c

Nadav Har'El authored 11 years ago

prio.hh defines various initialization priorities. The actual numbers
don't matter, just the order between them. But when we add too many
priorities between existing ones, we may hit a need to renumber. This
is plain ugly, and reminds me of Basic programming ;-)

So this patch switches to an enum (enum class, actually).
We now just have a list of priority names in order, with no numbers.

It would have been straightforward, if it weren't for a bug in GCC
(see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59211

) where the
"init_priority" attribute doesn't accept the enum (while the "constructor"
attribute does). Luckily, a simple workaround - explicitly casting to
int - works.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

147de06c

Nov 06, 2013

drivers/xenfront-xenbus: Fix condvar_wait() timeout · 229b070d

Pekka Enberg authored 11 years ago


condvar_wait() expects an absolute time, not a duration.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

229b070d