Skip to content
Snippets Groups Projects
  1. Feb 13, 2014
  2. Feb 12, 2014
  3. Feb 11, 2014
    • Nadav Har'El's avatar
      epoll: Support epoll()'s EPOLLET · d41d748f
      Nadav Har'El authored
      
      This patch adds support for epoll()'s edge-triggered mode, EPOLLET.
      Fixes #188.
      
      As explained in issue #188, Boost's asio uses EPOLLET heavily, and we use
      that library in our management http server, and also in our image creation
      tool (cpiod.so). By ignoring EPOLLET, like we did until now, the code worked,
      but unnecessarily wasted CPU when epoll_wait() always returned immediately
      instead of waiting until a new event.
      
      This patch works within the confines of our existing poll mechanisms -
      where epoll() call poll(). We do not change this in this patch, and it
      should be changed in the future (see issue #17).
      
      In this patch we add to each struct file a field "poll_wake_count", which
      as its name suggests counts the number of poll_wake()s done on this
      file. Additionally, epoll remembers the last value it saw of this counter,
      so that in poll_scan(), if we see that an fp (polled with EPOLLET) has
      an unchanged counter from last time, we do not return readiness on this fp
      regardless on whether or not it has readable data.
      
      We have a complication with EPOLLET on sockets. These have an "SB_SEL"
      optimization, which avoids calling poll_wake() when it thinks the new
      data is not interesting because the old data was not yet consumed, and
      also avoids calling poll_wake() if fp->poll() was not previously done.
      This optimization is counter-productive for EPOLLET (and causes missed
      wakeups) so we need to work around it in the EPOLLET case.
      
      This patch also adds a test for the EPOLLET case in tst-epoll.cc. The test
      runs on both OSv and Linux, and can confirm that in the tested scenarios,
      Linux and OSv behave the same, including even one same false-positive:
      When epoll_wait() tells us there is data in a pipe, and we don't read it,
      but then more data comes on a pipe, epoll_wait() will again return a new
      event, despite this is not really being an edge event (the pipe didn't
      change from empty to not-empty, as it was previously not-empty as well).
      
      Concluding remarks:
      
      The primary goal of this implementation is to stop EPOLLET epoll_wait()
      from returning immediately despite nothing have happened on the file.
      That was what caused the 100% CPU use before this patch. That being said,
      the goal of this patch is NOT to avoid all false-positives or unnecessary
      wakeups; When events do occur on the file, we may be doing a bit more
      wakeups than strictly necessary. I think this is acceptable (our epoll()
      has worse problems) but for posterity, I want to explain:
      
      I already mentioned above one false-positive that also happens on Linux.
      Another false-positive wakeup that remains is in one of EPOLLET's classic
      use cases: Consider several threads sleeping on epoll() on the same socket
      (e.g., TCP listening socket, or UDP socket). When one packet arrives, normal
      level-triggered epoll() will wake all the threads, but only one will read
      the packet and the rest will find they have nothing to read. With edge-
      triggered epoll, only one thread should be woken and the rest would not.
      But in our implementation, poll_wake() wakes up *all* the pollers on this
      file, so we cannot currently support this optimization.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      d41d748f
    • Vlad Zolotarov's avatar
      msix: thread affinity · b4e8d47d
      Vlad Zolotarov authored
      
      Instead of binding all msix interrupts to cpu 0, have them chase the
      interrupt service routine thread and pin themselves to the same cpu.
      
      This patch is based on the patch from Avi Kivity <avi@cloudius-systems.com>
      and used some ideas of Nadav Har'El <nyh@cloudius-systems.com>.
      
      It improves the performance of the single thread Rx netperf test by 16%:
      before - 25694 Mbps
      after  - 29875 Mbps
      
      New in V2:
       - Dropped the functor class - use lambda instead.
       - Fixed the race in a waking flow.
       - Added some comments.
       - Added the performance numbers to the patch description.
      
      Signed-off-by: default avatarVlad Zolotarov <vladz@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      b4e8d47d
    • Nadav Har'El's avatar
      gdb: fix "osv mmap" and friends · ce177a50
      Nadav Har'El authored
      
      It appears that in GDB, (mmu::vma*)0 does not work, and one needs to enclose
      the type's name in single quotes: ('mmu::vma'*)0. This broke the vma_list
      function in scripts/loader.py, and caused an exception in "osv mmap" and
      other commands using the vma_list function.
      
      This patch adds the missing single-quotes.
      
      I don't understand how this code ever worked for anybody...
      I'm using gdb-7.6.1 from Fedora 19, if it matters.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      ce177a50
Loading