Skip to content
Snippets Groups Projects
  1. Jun 30, 2013
    • Nadav Har'El's avatar
      Stub getpwnam(), setuid() and setgid() · c6399e55
      Nadav Har'El authored
      Implement getpwname(), setuid() and setgid() in the simplest way possible
      considering that we don't support any userid except 0:
      
      getpwname() returns user 0 for any username given to it.
      setuid() and setgid() does nothing for uid or gid 0, otherwise fails.
      Where would the caller get this !=0 id anyway?
      
      Memcached needs these calls, because it wants to be clever and
      warn the user against running it as root....
      c6399e55
    • Nadav Har'El's avatar
      libc: implement signal() · 1ca1bd43
      Nadav Har'El authored
      Implement the signal() function. This is hardly a useful function in OSV,
      first because our signal support is pretty broken, and second because
      sigaction() is a much more portable API that should always be preferred.
      
      Nevertheless, memcached uses signal() (to catch SIGINT, which it will
      never get in OSV...), so let's implement it for the sake of completeness.
      1ca1bd43
  2. Jun 18, 2013
  3. Jun 13, 2013
    • Nadav Har'El's avatar
      Implement usleep() · 0be1f9e0
      Nadav Har'El authored
      usleep() was scrubbed out of POSIX in 2008, and not used in Java, but
      it does exist in glibc and is damn easy to use compared to its newer
      relative, nanosleep, so I want to use it in a test.
      0be1f9e0
    • Nadav Har'El's avatar
      shutdown_af_local: add missing locks · 67923f37
      Nadav Har'El authored
      As Avi pointed out, shutdown_af_local() did read-modify-write to
      f->f_flags without locking. Add the missing locks.
      67923f37
  4. Jun 12, 2013
    • Nadav Har'El's avatar
      Optionally enable (disabled by default) lock-free mutex · a2cb99d5
      Nadav Har'El authored
      This patch optionally enables, at compile-time, OSV to use the lock-free
      mutex instead of the spin-lock-based mutex. To use the lock-free mutex,
      change the line "#undef LOCKFREE_MUTEX" in include/osv/mutex.h to
      "#define LOCKFREE_MUTEX".
      
      LOCKFREE_MUTEX is currently disabled by default, awaiting a few more
      tests, but at this point I'm happy to say that beyond one known
      unrelated bug (see details below), it seems the lock-free mutex is
      fairly stable, and survives all tests and benchmarks I threw at it.
      
      The remaining known bug involves a thread destruction race between
      complete() and join(): complete wake()s the joiner thread, which in
      rare cases can really quickly delete the thread's stack, before wake()
      returns, causing a crash on return from wake(). This bug is really
      unrelated to the lock-free mutex, but for some unknown reason I can
      only reproduce it with the lock-free mutex on the SPECjvm2008 "sunflow"
      benchmark.
      
      To make lockfree::mutex our default mutex, this patch does the following
      when LOCKFREE_MUTEX is defined:
      
      1. In core/mutex.cc, #ifndef away out the old mutex code, leaving the
         spinlock code in case someone wants to use it directly.
      
      2. In include/osv/mutex.h, do different things in C++ and C (remember that
         lockfree::mutex is a C++ class, and cannot be used directly from C):
      
         * In C++, simply make mutex and mutex_t aliases for lockfree::mutex.
      
         * In C, make struct mutex and mutex_t an opaque 40-byte structure (in
           C++ compilation, we verify that this 40 is indeed the C++ class's
           length), and define the operations on it.
      
      3. In libc/pthread.cc, if LOCKFREE_MUTEX, unfortunately the new mutex
         will not fit into pthread_mutex_t, and neither will condvar fit now
         into pthread_cond_t. So use a lazily allocated mutex or condvar, using
         the lazy_indirect<> template.
      a2cb99d5
  5. Jun 10, 2013
    • Avi Kivity's avatar
      libc: optimized memcpy() · 06dd5386
      Avi Kivity authored
      If the cpu supports "Enhanced REP MOVS / STOS" (ERMS), use an rep movsb
      instruction to implement memcpy.  This speeds up copies significantly,
      especially large misaligned ones.
      06dd5386
  6. Jun 09, 2013
    • Nadav Har'El's avatar
      Implement shutdown() on unix domain sockets · 30f6e9dd
      Nadav Har'El authored
      The existing shutdown() code only worked with AF_INET sockets, and returned
      ENOTSOCK for AT_LOCAL sockets, because we implemented the latter sockets in
      completely different code (in af_local.cc).
      
      So in uipc_syscalls_wrap.c, the same place we call a the special af-local
      socketpair(), we also need to call the special af-local shutdown().
      
      The way we do it is a bit ugly, but effective: shutdown() first calls
      shutdown_af_local(), and if that returns ENOTSOCK (so it's not an af_local
      socket), we continue trying the regular socket shutdown code.
      
      A better way would have been to add shutdown() to the fileops table -
      either the generic one (why not?), or invent a new mechanism whereby
      certain file types (in this case, "sockets" of all types) can have additional
      ops tables including in this case a shutdown() operation. Linux has
      something of this sort for implementing shutdown().
      30f6e9dd
  7. Jun 04, 2013
    • Nadav Har'El's avatar
      Nonblocking pipes · 1fa558ef
      Nadav Har'El authored
      This patch adds support for O_NONBLOCK on pipes and unix domain sockets.
      
      Java's EPollSelectorImpl uses a pipe to interrupt a sleeping poll, and it,
      quite understandably, sets them to non-blocking (if you only write a
      single byte to a pipe, you don't expect any blocking anyway).
      So we can't croak if this option is used, and better just implement it
      correctly.
      1fa558ef
  8. Jun 02, 2013
    • Nadav Har'El's avatar
      Split af_local.cc into four files · 729bdbd5
      Nadav Har'El authored
      The source file af_local.cc implemented both pipes and bi-directional
      pipes (unix domain stream socketpair), using a common buffer implemetation.
      
      As suggested by Guy, split this file into four files:
      
      pipe_buffer.cc and pipe_buffer.hh contain the common buffer implementation,
      class pipe_buffer. Since this buffer basically implements a single-direction
      pipe, I renamed it from "af_local_buffer" to pipe_buffer.
      
      af_local.cc now contains just the unix domain stream socketpair
      implementation, implemented using two pipe_buffer objects.
      
      af_pipe.cc contains the Posix pipe() implementation, implemented using
      one pipe_buffer object..
      729bdbd5
    • Nadav Har'El's avatar
      Fix readv() and writev() support in pipe and unix-domain socket. · 09e61023
      Nadav Har'El authored
      The iovec iteration was broken, so both readv() and writev() on pipes
      and unix-domain stream sockets didn't work. Fix it.
      09e61023
    • Nadav Har'El's avatar
      Atomic writes, and long writes, to pipes. · 2a6a0391
      Nadav Har'El authored
      This patch fixes two behaviors of pipes and unix-domain stream socketpair,
      which went against Posix and Linux standards
      
      1. A blocking write() on a pipe needs to return only when the full write -
         is finished. It should not just write until the end of the pipe buffer
         and return - as we did in the previous code.
      
         This means that a long write() to a pipe can write the data in parts,
         waiting between them for a reader to read from the pipe.
      
      2. As explained above, writes will be split into parts (and if there are
         multiple writers, get mixed with writes from other writers). But Posix
         also guarantees that short writes - up to 4096 bytes (PIPE_BUF==4096
         on Linux) - are *atomic*, and not be split up.
         In the previous code, if even 1 byte was available on the buffer,
         we wrote it. Now, if the write is short, we need to wait until the
         entire needed length is available.
      2a6a0391
    • Nadav Har'El's avatar
      Abort if unsupported O_NONBLOCK used on unix-domain socket or pipe. · b0c593aa
      Nadav Har'El authored
      O_NONBLOCK is not yet supported in our implementation of unix-domain
      sockets or pipes, so until it is, abort() if it is used, instead of
      silently ignoring this mode and doing something very different from
      what the application expected.
      b0c593aa
  9. May 31, 2013
  10. May 30, 2013
    • Nadav Har'El's avatar
      Add pipe() · 8ef91f0d
      Nadav Har'El authored
      This patch adds pipe(). The pipes are built using the same FIFO implementation,
      "af_local_buffer", as used by the existing unix-domain socketpair
      implementation - while the socket-pair used two of these buffers, a pipe
      uses one.
      
      This implementation deviates from traditional POSIX pipe behavior in two
      ways that we should fix in followup-patches:
      
      1. SIGPIPE is not supported: A write to a pipe whose read end is closed
         will always return EPIPE, and not generate a SIGPIPE signal.
         Programs that rely on SIGPIPE will break, but SIGPIPE is completely out
         of fashion, and normally ignored.
      
      2. Unix-style "atomic writes" are not obeyed. A write(), even if smaller
         than PIPE_BUF (=4096 on Linux, whose ABI we're emulating), may partially
         succeed if the pipe's buffer is nearly full. Only a write() of a single
         byte is guaranteed to be atomic.
      
         We hope that Java doesn't rely on multi-byte write() atomicity
         (single-byte writes are enough for waking poll, for example), and users
         of Java's "Pipe" class definitely can't (as Java is not Posix-only),
         so we hope this will not cause problems. Fixing this issue (which is easy)
         is left as a TODO in the code.
      
      Additionally, this patch marks with a FIXME (but doesn't fix) a serious
      bug in the code's iovec handling, so writev() and readv() are expected
      not to work in this version of pipe() - and also on the existing socketpair.
      8ef91f0d
    • Nadav Har'El's avatar
      Move unsupported fileops to fs/unsupported.c · 5062ff4f
      Nadav Har'El authored
      Previously, we re-implemented "unsupported" file operations (e.g., chmod
      for a pipe on which fchmod makes no sense) several times - there was
      an implementation only for chmod in kern_descrip.c, used in sys_socket.c,
      and af_local.cc had its own. As we add more file descriptor type (e.g.,
      create_epoll()) we'll have even more copies of these silly functions, so
      let's do it once in fs/unsupported.c - with the fs/unsupported.h header
      file.
      
      This also gives us a central place to document (and argue) whether an
      unimplemented ioctl() should return ENOTTY or EBADF (I think the former).
      5062ff4f
    • Nadav Har'El's avatar
      Fix waiting poll on unix-domain socketpair · c58c7aac
      Nadav Har'El authored
      If poll() was waiting on a file descriptor from socketpair_af_local, we
      would never wake it up, and an example of this is the failure in a
      recently committed fix to tst-af-local.cc.
      
      The problem is that when one writes to one end of the socket, we need to
      call wake_poll() on the other end of the socket, so we need to remember
      which "struct file *" is attached to each end of the af_local_buffer objects.
      
      What I did is what I thought the most elegant solution is:
      
      Rather than having "sender" and "receiver" of af_local_buffer booleans,
      they are now "struct file *". I added new functions, attach_sender(f) and
      attach_receiver(f), which set the file* we'll need to notify for each
      end; These functions are analogous to functions detach_sender, detach_receiver
      that we already had.
      
      After each interesting event - read, write, close, etc - we notify the
      appropriate file*, using poll_wake.
      
      attach_sender(f) and attach_receiver(f) is called by af_local_init(f) - which
      used to be empty and now does something. Note how af_local_init(f) only
      does send->attach_sender(f) and receive->attach_receiver(f), but doesn't
      touch the two others (send->attach_receiver, receive->attach_sender) -
      these other two are set when the second file descriptor, with the send
      and receive fifos in reversed roles, is initialized with its af_local_init.
      
      After this fix, the new af_local_test works correctly.
      c58c7aac
  11. May 29, 2013
    • Nadav Har'El's avatar
      Implement missing readdir64() as alias to readdir() · 61e295f2
      Nadav Har'El authored
      This patch implements readdir64, as an alias to readdir. We can do this,
      because on 64-bit Linux, even the ordinary struct dirent uses 64-bit
      sizes, so the structures are identical.
      
      The reason we didn't miss this function earlier is that reasonable
      applications prefer to use readdir64_r, not readdir64. Because Boost
      filesystem library thought we don't have the former (see next patch
      for fixing this), it used the latter.
      61e295f2
  12. May 27, 2013
    • Christoph Hellwig's avatar
      provide a utsname structure · 43c3f6dd
      Christoph Hellwig authored
      ZFS wants direct access to a global utsname structure.  Provide one from
      core OSv code and rewrite uname to just copy it out.  To ease this move
      the uname implementation to a C file as this allows using designated
      initializers and avoids the casting mess around memcpy.
      43c3f6dd
  13. May 26, 2013
  14. May 22, 2013
  15. May 20, 2013
    • Avi Kivity's avatar
      pthread: drop 'pmutex' · 83745222
      Avi Kivity authored
      We had a klugey pmutex class used to allow zero initialization of
      pthread_mutex_t.  Now that the mutex class supports it natively we
      can drop it.
      83745222
    • Avi Kivity's avatar
      pthread: drop pthread's zombie reaper · e25fc7e7
      Avi Kivity authored
      Use the generic one instead; the cleanup function allows destroying
      the pthread object.
      e25fc7e7
    • Nadav Har'El's avatar
      Replace backtrace() implementation with one using libunwind · 53c7ade5
      Nadav Har'El authored
      The previous implementation of backtrace() required frame pointers.
      This meant it could only be used in the "debug" build, but worse,
      it also got confused by libstdc++ (which was built without frame pointers),
      leading to incorrect stack traces, and more rarely, crashes.
      
      This changes backtrace() to use libunwind instead, which works even
      without frame pointers. To satisfy the link dependencies, libgcc_eh.a
      needs to be linked *after* libunwind.a. Because we also need it linked
      *before* for other reasons, we end up with libgcc_eh.a twice on the
      linker's command line. The horror...
      53c7ade5
    • Nadav Har'El's avatar
      Add partial implementation of msync() for libunwind · de374193
      Nadav Har'El authored
      libunwind, which the next patches will use to implement a more reliable
      backtrace(), needs the msync() function. It doesn't need it to actually
      sync anything - just to recognize valid frame addresses (stacks are
      always mmap()ed).
      
      Note this implementation does the checking, but is missing the "sync" part
      of msync ;-) It doesn't matter because:
      
      1. libunwind doesn't need (or want) this syncing, and neither does anything
         else in the Java stack (until now, msync() was never used).
      
      2. We don't (yet?) have write-back of mmap'ed memory anyway, so there's
         no sense in doing any writing in msync either. We'll need to work on
         a full read-write implementation of file-backed mmap() later.
      de374193
  16. May 18, 2013
  17. May 16, 2013
    • Nadav Har'El's avatar
      Implement sigismember() · 191eaf08
      Nadav Har'El authored
      This function happens to be used by Java's "-Xcheck:jni", and is
      trivial to implement, so why not...
      191eaf08
  18. May 10, 2013
  19. May 07, 2013
Loading