- Dec 27, 2013
-
-
Avi Kivity authored
Indicate to the virtual hardware that our stack supports UDP fragmentation offload. This improves performance by a factor of about 3.3 (from ~20Gbps to ~66Gbps) running the netperf UDP STREAM test. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
We reported the size of the last buffer in the packet, rather than the size of the complete packet. Fix to report the total size. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
With indirect descriptor, we can queue more buffers in the queue. Indirect descriptor helps block device by making the large request does not consume the entire ring and making the queue depth deeper. Indirect descriptor does not help net device because it makes the queue longer so it adds latency. The tests show that indirect descriptor makes blk faster and there is no real measurable degradation on net. Also the indirect will turn on only when we are short of descriptors. This patch only enables indirect descriptor for vblk and vscsi. vnet is not enabled. 1) vblk Before: 340MB/s After: 350MB/s 2) vscsi Before: 320MB/s After: 410MB/s Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Dec 26, 2013
-
-
Asias He authored
VIRTIO_RING_F_INDIRECT_DESC belongs to the base features bits. No need to specify it in net driver. Signed-off-by:
Asias He <asias@cloudius-systems.com> Reviewed-by:
Dor Laor <dor@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Gleb Natapov authored
operate_range() has it already. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Gleb Natapov authored
If permissions are elevated another cpu will fault and will see new permission after page walk. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Gleb Natapov authored
Page cannot be freed before remote tlbs are flushed since if remote cpu has the page in its tlb and the page is reallocated for some other purposes remote cpu can still access the page through tlb and corrupt its content. Think about two threads running on two different cpus: first one writes to a virtual address constantly and second unmaps the virtual address. Physical page, virtual address is mapped to, cannot be freed before both cpus tlb are flushed. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Gleb Natapov authored
Add constexpr to make sure they are evaluated in compile time if possible. Compiler will probably do it anyway though. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Gleb Natapov authored
Implement page_mapper variant for virt_to_phys mapping. map_level class now hold a reference to page_mapper since page_mapper state needs to be preserved over function calls. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Gleb Natapov authored
This is pretty straightforward: provide page_mapper variant for each of those operations, remove unused pt walker. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Gleb Natapov authored
This patch implements generic page table walker that traverse page table levels in compile time. It accepts page_mapper class, that controls various aspects of page traversing, as a parameter. page_table_operation controls whether non present intermediate page should be allocated, how to handle leaf small/huge pages, whether to split huge pages, how to handle sub area of a huge page in case splitting is disabled and whether walker should loop over multiple page entries. linear_map_level() is modified to use new page walker. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Gleb Natapov authored
Move code that will be needed by unified page walker. No changes to generated code. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Backported from FreeBSD r242252. Improves netperf by about 10%. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
This is more useful if there is no ordering between the two numbers (either one can be ahead). Change BYTES_THIS_ACK to return unsigned, to prevent an unsigned division from turning into a signed division. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
Add comparison operators that use modulo arithmetic to order sequence numbers, and use them to replace SEQ_LT() and friends, increasing code readability. As a consequence std::min() and std::max() can be used instead of SEQ_MIN() and SEQ_MAX(). Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
tcp sequence numbers are similar to integers, but have different comparison operations. Separate them into a class so we don't mix the two. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Avi Kivity authored
inline functions can be overloaded and are less nasty than macros in other ways (like evaluating their arguments only once). Note we can't touch ntohl() itself, since it is defined to be an out-of-line function by libc. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 25, 2013
-
-
Nadav Har'El authored
This patch fixes two places where tst-mmap.cc doesn't munmap() everything it mmap()ed. This only makes a difference if doing leak detection on tst-mmap.cc to see if our page-table handling code leaked memory. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
select() previously returned EINVAL only when nfds > FD_SETSIZE+1. The right test is nfds > FD_SETSIZE, i.e., for nfds = FD_SETSIZE+1 this is also an error. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 24, 2013
-
-
Avi Kivity authored
Helps making bsd header changes that xen includes. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Nadav Har'El authored
We use sched::thread::attr to pass parameters to sched::thread creation, i.e., create a thread with non-default stack parameters, pinned to a particular CPU, or a detached thread. Previously we had constructors taking many combinations of stack size (integer), pinned cpu (cpu*) and detached (boolean), and doing "the right thing". However, this makes the code hard to read (what does attr(4096) specify?) and the constructors hard to expand with new parameters. Replace the attr() constructors with the so-called "named parameter" idiom: attr now only has a null constructor attr(), and one modifies it with calls to pin(cpu*), detach(), or stack(size). For example, attr() // default attributes attr().pin(sched::cpus[0]) // pin to cpu 0 attr().stack(4096).pin(sched::cpus[0]) // pin and non-default stack and so on. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Dmitry Fleytman authored
This patch applies bugfix published on FreeBSD list at Feb 2013: http://lists.freebsd.org/pipermail/svn-src-stable-9/2013-February/003928.html LRO mechanism is broken on systems without IP checksum verification offload. Due to improper checksum verification RX packets omit LRO path and go directly to TCP stack which is not good for performance. EC2 Xen is one example of such a system. This bug is one of the reasons we see bad performance on Amazon. Some test results w/ and w/o the fix: Buffer size Before After Improvement % TCP TX 32 557.52 1386.28 149 64 552.38 1385.99 151 128 546.43 1401.46 156 256 565.25 1382.28 145 512 557.32 1375.23 147 1024 549.71 1356.69 147 2048 551.11 1371.92 149 4096 556.13 1383.67 149 8192 559.49 1364.05 144 16384 567.25 1366.48 141 32768 546.18 1366.63 150 65536 553.4 1353.87 145 TCP RX 32 107.37 105.48 -2 64 187.56 179.9 -4 128 297.16 301.71 2 256 300.47 503.92 68 512 294.76 826.13 180 1024 299.95 1916.69 539 2048 287.04 1924.44 570 4096 300.78 1929.37 541 8192 304.52 1934.02 535 16384 305.04 1957.54 542 32768 309 1921.84 522 65536 296.48 1935.41 553 Still we are pretty far from Linux, there are other problems to be fixed. Signed-off-by:
Dmitry Fleytman <dmitry@daynix.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 23, 2013
-
-
Asias He authored
I saw this Abort: 35.159 Mb/s 50.230 Mb/s 46.648 Mb/s 68.850 Mb/s Wrote 613.418 MB in 10.00 s Aborted The backtrace says: (gdb) bt #0 0x000000000035bb82 in halt_no_interrupts () at /home/asias/src/cloudius-systems/osv/arch/x64/processor.hh:241 #1 osv::halt () at /home/asias/src/cloudius-systems/osv/core/power.cc:28 #2 0x0000000000218142 in abort (msg=msg@entry=0x55197f "Aborted\n") at /home/asias/src/cloudius-systems/osv/runtime.cc:89 #3 0x000000000021816e in abort () at /home/asias/src/cloudius-systems/osv/runtime.cc:79 #4 0x000000000039eaa2 in osv::generate_signal (siginfo=..., ef=0xffffc0003eb56008) at /home/asias/src/cloudius-systems/osv/libc/signal.cc:58 #5 0x000000000039eb0c in osv::handle_segmentation_fault (addr=<optimized out>, ef=<optimized out>) at /home/asias/src/cloudius-systems/osv/libc/signal.cc:73 #6 0x000000000030b45c in mmu::vm_sigsegv (addr=addr@entry=17592186060800, ef=ef@entry=0xffffc0003eb56008) at /home/asias/src/cloudius-systems/osv/core/mmu.cc:763 #7 0x000000000030b54b in mmu::vm_fault (addr=<optimized out>, addr@entry=17592186061840, ef=ef@entry=0xffffc0003eb56008) at /home/asias/src/cloudius-systems/osv/core/mmu.cc:773 #8 0x000000000032bff5 in page_fault (ef=0xffffc0003eb56008) at /home/asias/src/cloudius-systems/osv/arch/x64/mmu.cc:35 #9 <signal handler called> #10 0x0000100000004410 in ?? () #11 0x000000000031e5fd in virtio::blk::req_done (this=0xffffc0003eddb800) at /home/asias/src/cloudius-systems/osv/drivers/virtio-blk. Wait until all the bio are done to fix this use after free. This patch also make the test to measure completed writes instead of submitted writes. Reviewed-by:
Tomasz Grabiec <tgrabiec@gmail.com> Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
We were missing the prerequisite qemu-img (needed for qemu-nbd, used during our build) and of course qemu-system-x86 (we run the guest as part of the build, to have it write files into the ZFS image). Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
aio=native makes QEMU to use Linux native implementation of aio instead of emulated thread pool which improves tst-bdev-write.so performance: blk, aio=threads Wrote 691.910 MB in 10.03 s Wrote 590.020 MB in 10.00 s Wrote 605.578 MB in 10.01 s Wrote 662.828 MB in 10.01 s Wrote 624.762 MB in 10.01 s blk, aio=native Wrote 789.566 MB in 10.00 s Wrote 744.691 MB in 10.00 s Wrote 537.125 MB in 10.00 s Wrote 732.230 MB in 10.00 s Wrote 683.383 MB in 10.00 s scsi, aio=threads Wrote 200.863 MB in 10.02 s Wrote 193.758 MB in 10.02 s Wrote 193.680 MB in 10.03 s Wrote 195.211 MB in 10.03 s Wrote 190.762 MB in 10.02 s scsi, aio=native Wrote 414.344 MB in 10.05 s Wrote 483.148 MB in 10.00 s Wrote 537.477 MB in 10.01 s Wrote 477.727 MB in 10.01 s Wrote 462.805 MB in 10.01 s Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Avi Kivity authored
Replace divides by variables and by the hard constants with masks and divides by easy constants. Improves netperf by about 1.6%. Noted by Vlad. Reviewed-by:
Dor Laor <dor@cloudius-systems.com> Reviewed-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- Dec 20, 2013
-
-
Gleb Natapov authored
All page table operations have to hold vma lock currently. If populate races with unpopulated in best case some s may remain populated in worst case unpopulate may free intermediate page while populate uses it. If populate races with protect some ptes may end up with incorrect permissions. vma list lock may be to big of a hammer to prevent those races, but at least per vma lock is needed. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Currently, library objects are being leaked by run_main() on success of osv::run() which, in addition to leaking memory, makes the dcache leak directory entries that causes further problems. Releasing library objects is fine as even dependent objects will be released automatically. I have tested it, and dcache hasn't any leaked dentries anymore. This problem was found in our attempt to implement dentry hierarchy with help for Avi Kivity. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Reviewed-by:
Asias He <asias@cloudius-systems.com> [ penberg: improve changelog ] Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
We are under the virtio namespace, it makes no sense to repeat the virtio prefix again in the virito_rng driver. Change the naming from virtio::virtio_rng to virtio::rng. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
if_getinfo is shorter and more consistent all the other if_ functions name, e.g., if_init, if_ioctl. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
Rename it to txq_stats and rxq_stats which are more consistent with struct txq and struct rxq. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
We are under virtio namespace, no need to repeat the virtio prefix. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
We are under the virtio namespace, it makes no sense to repeat the virtio prefix again in the virito_net driver. Change the naming from virtio::virtio_net to virtio::net. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
It's better not to indent after the namespace specifier. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
vma layout changed in commit bbec1a18. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
Use --scsi or -S to boot from virtio-scsi instead of virtio-blk device. This options is default to false. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Asias He authored
Now, virtio-scsi disks use the same name as virtio-blk, e.g., vblk0, vlbk1. If both scsi and blk are added to guest, they share the global disk index number. Signed-off-by:
Asias He <asias@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
This test includes a large number of tests for the timerfd_*() functions and many of their bizarre use-cases and corner cases. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
This patch implements the Linux's timerfd_*() system calls, declared in <sys/timerfd.h>. These define a file descriptor, usable for read() or poll() and friends, which becomes readable when a timer expires. This aspires to be a full implementation of timerfd, with all the intricate details explained in timerfd_create(2). timerfd was added to Linux five years ago (Linux 2.6.25). Boost's asio, in particular, uses this feature if it thinks it is available. Fixes #129. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-