-
- Downloads
epoll: Support epoll()'s EPOLLET
This patch adds support for epoll()'s edge-triggered mode, EPOLLET. Fixes #188. As explained in issue #188, Boost's asio uses EPOLLET heavily, and we use that library in our management http server, and also in our image creation tool (cpiod.so). By ignoring EPOLLET, like we did until now, the code worked, but unnecessarily wasted CPU when epoll_wait() always returned immediately instead of waiting until a new event. This patch works within the confines of our existing poll mechanisms - where epoll() call poll(). We do not change this in this patch, and it should be changed in the future (see issue #17). In this patch we add to each struct file a field "poll_wake_count", which as its name suggests counts the number of poll_wake()s done on this file. Additionally, epoll remembers the last value it saw of this counter, so that in poll_scan(), if we see that an fp (polled with EPOLLET) has an unchanged counter from last time, we do not return readiness on this fp regardless on whether or not it has readable data. We have a complication with EPOLLET on sockets. These have an "SB_SEL" optimization, which avoids calling poll_wake() when it thinks the new data is not interesting because the old data was not yet consumed, and also avoids calling poll_wake() if fp->poll() was not previously done. This optimization is counter-productive for EPOLLET (and causes missed wakeups) so we need to work around it in the EPOLLET case. This patch also adds a test for the EPOLLET case in tst-epoll.cc. The test runs on both OSv and Linux, and can confirm that in the tested scenarios, Linux and OSv behave the same, including even one same false-positive: When epoll_wait() tells us there is data in a pipe, and we don't read it, but then more data comes on a pipe, epoll_wait() will again return a new event, despite this is not really being an edge event (the pipe didn't change from empty to not-empty, as it was previously not-empty as well). Concluding remarks: The primary goal of this implementation is to stop EPOLLET epoll_wait() from returning immediately despite nothing have happened on the file. That was what caused the 100% CPU use before this patch. That being said, the goal of this patch is NOT to avoid all false-positives or unnecessary wakeups; When events do occur on the file, we may be doing a bit more wakeups than strictly necessary. I think this is acceptable (our epoll() has worse problems) but for posterity, I want to explain: I already mentioned above one false-positive that also happens on Linux. Another false-positive wakeup that remains is in one of EPOLLET's classic use cases: Consider several threads sleeping on epoll() on the same socket (e.g., TCP listening socket, or UDP socket). When one packet arrives, normal level-triggered epoll() will wake all the threads, but only one will read the packet and the rest will find they have nothing to read. With edge- triggered epoll, only one thread should be woken and the rest would not. But in our implementation, poll_wake() wakes up *all* the pollers on this file, so we cannot currently support this optimization. Signed-off-by:Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
Showing
- bsd/sys/kern/uipc_socket.cc 2 additions, 1 deletionbsd/sys/kern/uipc_socket.cc
- core/epoll.cc 25 additions, 15 deletionscore/epoll.cc
- core/poll.cc 25 additions, 1 deletioncore/poll.cc
- include/osv/file.h 3 additions, 0 deletionsinclude/osv/file.h
- include/osv/poll.h 5 additions, 3 deletionsinclude/osv/poll.h
- tests/tst-epoll.cc 47 additions, 1 deletiontests/tst-epoll.cc
Loading
Please register or sign in to comment