Commits · b2c6447217bd079c2469e57e9ce12f38f31b97dc · Verlässliche Systemsoftware / projects / osv

May 15, 2014

virtio-vring: rework the interface of get_buf_finalize() · 6c096634

Vlad Zolotarov authored 10 years ago


New interface allows to send a single doorbell to the host per
buffers bulk.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

6c096634

virtio-vring: fix kick() to return a correct value. · 76d3b061

Vlad Zolotarov authored 10 years ago


Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

76d3b061

May 05, 2014

virtio-vring: prevent delaying a kick() for more than half of a u16 range · 2cb094fe

Vlad Zolotarov authored 10 years ago


(u16)_avail_added_since_kick wrap-around should be avoided since we may loose
an _avail_event update and thus miss the point where the HV should have
been woken (kicked).

We choose half a u16 range as a threshold since kick() is usually called for a
bulk of buffers and not for every separate buffer. Number of buffers in such a bulk is
limited by a size of a virtio ring. The virtio ring size is unlikely to have 32K
entries in the nearest future (it has 256 entries now) thus kicking() at least
every 32K buffers will ensure that we prevent the mentioned above wrap-around
as a result of such a bulking.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

2cb094fe

Feb 13, 2014

virtio: use physically configuous memory for indirect descriptors · 4f6a3261

Avi Kivity authored 11 years ago

Using malloc() fails with the debug allocator, and can fail for the normal
allocator as well since it does not guarantee physical memory.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4f6a3261

Feb 12, 2014

virtio-vring: add effective_avail_ring_count() · af1547c4

Zhi Yong Wu authored 11 years ago


It can reduce the duplicated code

Signed-off-by: Zhi Yong Wu <zwu.kernel@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

af1547c4

virtio-vring: unify the coding style · ff2f30c7

Zhi Yong Wu authored 11 years ago


Signed-off-by: Zhi Yong Wu <zwu.kernel@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

ff2f30c7

Feb 06, 2014

Coding style fix for virtio-vring.cc/hh · 74ee22cc

Takuya ASADA authored 11 years ago


Signed-off-by: Takuya ASADA <syuu@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

74ee22cc

Jan 22, 2014
- include: Move debug.hh to include/osv · 7809519b
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  7809519b
- include: Move mempool.hh to include/osv · 9c95f49d
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  9c95f49d
- include: Move ilog2.hh to include/osv · d8df3fd1
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  d8df3fd1
- include: Move mmu.hh to include/osv · 9cb900b7
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  9cb900b7
- include: Move interrupt.hh to include/osv · d7cc6216
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  d7cc6216
- include: Move sched.hh to include/osv · fae5693e
  Pekka Enberg authored 11 years ago
  
  Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
  fae5693e
Dec 31, 2013

vring: add tracepoint for tracking _avail_count · 73fc368f

Tomasz Grabiec authored 11 years ago


Useful for tracing virtio queue utilization issues.

To get plottable data:
 1. Save traces to gdb.txt
 2. grep virtio_add_buf gdb.txt | grep "queue=0" \
     | awk '{print $3 " " $6}' | sed -r 's/avail=(.*)/\1/'

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

73fc368f

vring: move add_buf_wait() definition to .cc · 026ab605

Tomasz Grabiec authored 11 years ago


Clean up virtio-vring.hh and move add_buf_wait() out of line. Benefits:

  - Plays well with tracepoints

  - Less recompilation when code is changed

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

026ab605

Dec 27, 2013

virtio: Enable indirect descriptor for vblk and vscsi · 4ec750b7

Asias He authored 11 years ago


With indirect descriptor, we can queue more buffers in the queue.
Indirect descriptor helps block device by making the large request does
not consume the entire ring and making the queue depth deeper. Indirect
descriptor does not help net device because it makes the queue longer so
it adds latency. The tests show that indirect descriptor makes blk
faster and there is no real measurable degradation on net. Also the
indirect will turn on only when we are short of descriptors.

This patch only enables indirect descriptor for vblk and vscsi. vnet is
not enabled.

1) vblk
Before: 340MB/s
After:  350MB/s

2) vscsi
Before: 320MB/s
After:  410MB/s

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4ec750b7

Dec 23, 2013

vring: avoid expensive divides · d11a6daa

Avi Kivity authored 11 years ago


Replace divides by variables and by the hard constants with masks and
divides by easy constants.

Improves netperf by about 1.6%.

Noted by Vlad.

Reviewed-by: Dor Laor <dor@cloudius-systems.com>
Reviewed-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

d11a6daa

Dec 09, 2013

virtio: Add vring::used_ring_can_gc() helper · b7f8fa6d

Asias He authored 11 years ago


It is useful to test if we can do gc on the used ring.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b7f8fa6d

virtio: Fix vring::use_indirect · 6204bf4e

Asias He authored 11 years ago


When the _avail_count is less than 1/3 of the ring, we start using
indirect descriptor.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Dor Laor <dor@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

6204bf4e

virtio: Do not scale desc_needed by _num / 2 · 24bbcd78

Asias He authored 11 years ago


There is no reason we should do the scale.

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Dor Laor <dor@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

24bbcd78

Dec 06, 2013

virtio: Fix vring::used_ring_is_half_empty() calculation · c62c9dd8

Asias He authored 11 years ago


This patch fixes:

- The order of _used_ring_host_head and _used->_idx, the latter is more
  advanced than the former.
- The unwanted promotion to "int"

Pekka wrote:

   However, on the right-hand side, the expression type in master
   evaluates to "int" because of that innocent-looking constant "2" and
   lack of parenthesis after the cast.  That will also force the
   left-hand side to promote to "int".

   And no, I really don't claim to follow integer promotion rules so I
   used typeid().name() verify what the compiler is doing:

   [penberg@localhost tmp]$ cat types.cpp
   #include <typeinfo>
   #include <stdint.h>
   #include <cstdio>

   using namespace std;

   int main()
   {
       unsigned int _num = 1;

       printf("int                = %s\n", typeid(int).name());
       printf("uint16_t           = %s\n", typeid(uint16_t).name());
       printf("(uint16_t)_num/2)  = %s\n", typeid((uint16_t)_num/2).name());
       printf("(uint16_t)(_num/2) = %s\n", typeid((uint16_t)(_num/2)).name());
   }
   [penberg@localhost tmp]$ g++ -std=c++11 -Wall types.cpp
   [penberg@localhost tmp]$ ./a.out
   int                = i
   uint16_t           = t
   (uint16_t)_num/2)  = i
   (uint16_t)(_num/2) = t

Signed-off-by: Asias He <asias@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c62c9dd8

Sep 15, 2013

Add copyright statement to drivers/* · c0e0ebf2

Nadav Har'El authored 11 years ago

Add Cloudius copyright and license statement to drivers/*.

A couple of header files were based on Linux's BSD-licensed header files
(e.g., include/uapi/linux/virtio_net.h) so they included the BSD license,
but not any copyright statement, so we can just replace that by our own
statement of the BSD license.

c0e0ebf2

Jul 11, 2013

virtio: explicitly request contiguous memory for the virtio ring · 79aa5d28
Avi Kivity authored 11 years ago
```
Required by the virtio spec.
```
79aa5d28

Fix hang in virtio_driver::wait_for_queue · 8ebb1693

Nadav Har'El authored 11 years ago

virtio_driver::wait_for_queue() would often hang in a memcached and
mc_benchmark workload, waiting forever for received packets although
these *do* arrive.

As part of the virtio protocol, we need to set the host notification
flag (we call this, somewhat confusingly, queue->enable_interrupts())
and then check if there's anything in the queue, and if not, wait
for the interrupt.

This order is important: If we check the queue and only then set the
notification flag, and data came in between those, the check will be
empty and an interrupt never sent - and we can wait indefinitely for
data that has already arrived.

We did this in the right order, but the host code, running on a
different CPU, might see memory accesses in a different order!
We need a memory fence to ensure that the same order is also seen
on other processors.

This patch adds a memory fence to the end of the enable_interrupts()
function itself, so we can continue to use it as before in
wait_for_queue(). Note that we do *not* add a memory fence to
disable_interrupts() - because no current use (and no expected use)
cares about the ordering of disable_interrupts() vs other memory
accesses.

8ebb1693

Jul 10, 2013

Allow parallel execution of {add|get}_buff, prevent fast path allocs · 350fa518

Dor Laor authored 11 years ago

virtio-vring and it's users (net/blk) were changed so no request
header will be allocated on run time except for init. In order to
do that, I have to change get_buf and break it into multiple parts:

        // Get the top item from the used ring
        void* get_buf_elem(u32 *len);
        // Let the host know we consumed the used entry
        // We separate that from get_buf_elem so no one
        // will re-cycle the request header location until
        // we're finished with it in the upper layer
        void get_buf_finalize();
        // GC the used items that were already read to be emptied
        // within the ring. Should be called by add_buf
        // It was separated from the get_buf flow to allow parallelism of the two
        void get_buf_gc();

As a result, it was simple to get rid of the shared lock that protected
_avail_head variable before. Today only the thread that calls add_buf
updates this variable (add_buf calls get_buf_gc internally).

There are two new locks instead:
  - virtio-net tx_gc lock - very rarely it can be accessed
    by the tx_gc thread or normally by the tx xmit thread
  - virtio-blk make_requests - there are parallel requests

350fa518

Jul 04, 2013

Trivial: get rid of sglist entirely · 18beb1b6
Dor Laor authored 11 years ago

18beb1b6

Sglist virtio usage refactore · ddef97f6

Dor Laor authored 11 years ago

Use a single instance per queue vector of sglist data.
Before this patch sglist was implemented as a std::list
which caused it to allocate heap memory and travel through pointers.
Now we use a single vector per queue to temoprary keep the
buffer data between the upper virtio layer and the lower one.

ddef97f6

Jul 03, 2013

Add a virtio_kick trace point · 37008a2b
Dor Laor authored 11 years ago

37008a2b

Use the right memory barriers when accessing the 'used' data. · 8bbb7d26

Dor Laor authored 11 years ago

When the guest reads the used pointer it must make sure that
all the other relevant writes to the descriptors by the
host are up to date.
The same goes for the other direction when the guest
updates the ring in get_buf - the write to used_event should
make all the descriptor changes visible to the host

8bbb7d26

Use nullptr instead of reinterpret_cast · 723e8efc
Dor Laor authored 11 years ago

723e8efc

Reduce singifincantly the amount of tx interrupts · 6a323929

Dor Laor authored 11 years ago

Instead of enabling interrupts for tx by the host when we have
a single used pkt in the ring, wait until we have 1/2 ring.
This improves the amount of tx irqs from one per pkt to practically
zero (note that we actively call tx_gc if there is not place on the
ring when doing tx). There was a 40% performance boost on the
netperf rx test

6a323929

Convert avail._idx to a std::atomic variable As a result, get rid of the... · be462b48

Dor Laor authored 11 years ago

Convert avail._idx to a std::atomic variable As a result, get rid of the compiler barrier calls since we use std:atomic load/store instead

be462b48

Use std::atomic load on the share guest:host index · 4f7085d4
Dor Laor authored 11 years ago

4f7085d4

Jun 30, 2013

Implement event index, a virtio optimization · 26e046a3

Dor Laor authored 11 years ago

The optimization improves performance by letting each side
of the ring know what was the last index read by the remote
party. In case were the other side has older indexes, waiting
for processing we won't trigger another update (kick or irq,
depending on the direction).
The optimization reduces irq injection by a 7%.

Along the way, collapse vring::need_event to the code and use
std::atomic(s) to access any variable which is shared w/ the host

26e046a3

Jun 21, 2013

virtio: respect host virtqueue notification suppression · e0355a8e

Guy Zana authored 11 years ago

same as we can tell the host to disable interrupts via the _avail ring,
the host can tell us to supress notification via the _used ring.
every notificaion, or kick consumes about 10ns as it is implemented as
writing to an io port, which travels to usespace qemu in the host.

this simple patch, increase netperf's throughput by 600%, from a
300mbps to 1800mbps.

e0355a8e

Jun 20, 2013

Limit the usage of indirect buffers · 5b751612

Dor Laor authored 11 years ago

Indirect is good for very large SG list but isn't required
in case there is enough place on the ring or the SG list is tiny.
For the time being there is barely use of it so I set it off
by default

5b751612

Add mergeable buffers support for virtio-net · d487ffd1

Dor Laor authored 11 years ago

The feature allows the hypervisor to batch several packets together
as one large SG list. Once such header is received, the guest rx
routine interates over the list and assembles a mega mbuf.

The patch also simplifies the rx path by using a single buffer for
the virtio data and its header. This shrinks the sg list from size of
two into a single one.

The issue is that at the moment I haven't seen packets w/ mbuf > 1
being received. Linux guest does receives such packets here and there.
It may be due to the use of offload features that enalrge the packet size

d487ffd1

Jun 17, 2013
- virtio: add various tracepoints to virtio / virtio-net · e083c4cb
  Guy Zana authored 11 years ago
  
  e083c4cb
Jun 09, 2013
- logger: changed debugging calls to use tprintf_X variants · 41360613
  Guy Zana authored 11 years ago
  
  41360613
Jun 06, 2013
- virtio: expose disable/enable interrupts · 8efa9c02
  Guy Zana authored 11 years ago
  
  8efa9c02