Skip to content
Snippets Groups Projects
  • Nadav Har'El's avatar
    1e65eb54
    Fix socket poll() deadlock · 1e65eb54
    Nadav Har'El authored
    In commit 7ecbf29f I added to the
    poll_install() stage of poll() a check of the current state of the file -
    to avoid the sleep if the file became ready before we managed to "install"
    its poll request.
    
    However, I wrongly believed it was necessary to put this check inside
    the FD_LOCK together with the request installation. In fact, it doesn't
    need to be in the same lock - all we need is for the check to happen
    *after* the installation. The call to fo_poll() doesn't need to be in
    the same FD_LOCK or even in an FD_LOCK at all.
    
    Moreover, as it turns out, it must NOT be in an FD_LOCK() because this
    results in a deadlock when polling sockets, caused by two different
    code paths taking locks in opposite order:
    
    1. Before this fix, poll() took FD_LOCK and called fo_poll() which
       called sopoll_generic() which took a SOCKBUF_LOCK
    
    2. In the wake path, SOCKBUF_LOCK was taken, then so_wake_poll()
       is called which calls poll_wake() which takes FD_LOCK.
    1e65eb54
    History
    Fix socket poll() deadlock
    Nadav Har'El authored
    In commit 7ecbf29f I added to the
    poll_install() stage of poll() a check of the current state of the file -
    to avoid the sleep if the file became ready before we managed to "install"
    its poll request.
    
    However, I wrongly believed it was necessary to put this check inside
    the FD_LOCK together with the request installation. In fact, it doesn't
    need to be in the same lock - all we need is for the check to happen
    *after* the installation. The call to fo_poll() doesn't need to be in
    the same FD_LOCK or even in an FD_LOCK at all.
    
    Moreover, as it turns out, it must NOT be in an FD_LOCK() because this
    results in a deadlock when polling sockets, caused by two different
    code paths taking locks in opposite order:
    
    1. Before this fix, poll() took FD_LOCK and called fo_poll() which
       called sopoll_generic() which took a SOCKBUF_LOCK
    
    2. In the wake path, SOCKBUF_LOCK was taken, then so_wake_poll()
       is called which calls poll_wake() which takes FD_LOCK.