Skip to content
Snippets Groups Projects
  1. Dec 03, 2013
  2. Dec 01, 2013
  3. Nov 26, 2013
    • Avi Kivity's avatar
      libc: implement the GNU variant of strerror_r() · 7053ac3a
      Avi Kivity authored
      
      We previously had the POSIX variant only.  Implement the GNU variant as well,
      and update the header to point to the correct function based on the dialect
      selected.
      
      The POSIX variant is renamed __xpg_strerror_r() to conform to the ABI
      standards.
      
      This fixes calls to strerror_r() from binaries which were compiled with
      _GNU_SOURCE (libboost_system.a) but preserves the correct behaviour for
      BSD derived source.
      
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      7053ac3a
    • Nadav Har'El's avatar
      sched: Doxygen documentation of a bit of the scheduler · 6f825816
      Nadav Har'El authored
      
      Started adding Doxygen documentation for the scheduler. Currently
      only set_priority() and priority() are documented.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      6f825816
    • Nadav Har'El's avatar
      sched: New scheduler algorithm · dbc0d507
      Nadav Har'El authored
      This patch replaces the algorithm which the scheduler uses to keep track of
      threads' runtime, and to choose which thread to run next and for how long.
      
      The previous algorithm used the raw cumulative runtime of a thread as its
      runtime measure. But comparing these numbers directly was impossible: e.g.,
      should a thread that slept for an hour now get an hour of uninterrupted CPU
      time? This resulted in a hodgepodge of heuristics which "modified" and
      "fixed" the runtime. These heuristics did work quite well in our test cases,
      but we were forced to add more and more unjustified heuristics and constants
      to fix scheduling bugs as they were discovered. The existing scheduler was
      especially problematic with thread migration (moving a thread from one CPU
      to another) as the runtime measure on one CPU was meaningless in another.
      This bug, if not corrected, (e.g., by the patch which I sent a month
      ago) can cause crucial threads to acquire exceedingly high runtimes by
      mistake, and resulted in the tst-loadbalance test using only one CPU on
      a two-CPU guest.
      
      The new scheduling algorithm follows a much more rigorous design,
      proposed by Avi Kivity in:
      https://docs.google.com/document/d/1W7KCxOxP-1Fy5EyF2lbJGE2WuKmu5v0suYqoHas1jRM/edit?usp=sharing
      
      
      
      To make a long story short (read the document if you want all the
      details), the new algorithm is based on a runtime measure R which
      is the running decaying average of the thread's running time.
      It is a decaying average in the sense that the thread's act of running or
      sleeping in recent history is given more weight than its behavior
      a long time ago. This measure R can tell us which of the runnable
      threads to run next (the one with the lowest R), and using some
      highschool-level mathematics, we can calculate for how long to run
      this thread until it should be preempted by the next one. R carries
      the same meaning on all CPUs, so CPU migration becomes trivial.
      
      The actual implementation uses a normalized version of R, called R''
      (Rtt in the code), which is also explained in detail in the document.
      This Rtt allows updating just the running thread's runtime - not all
      threads' runtime - as time passes, making the whole calculation much
      more tractable.
      
      The benefits of the new scheduler code over the existing one are:
      
      1. A more rigourous design with fewer unjustified heuristics.
      
      2. A thread's runtime measurement correctly survives a migration to a
      different CPU, unlike the existing code (which sometimes botches
      it up, leading to threads hanging). In particular, tst-loadbalance
      now gives good results for the "intermittent thread" test, unlike
      the previous code which in 50% of the runs caused one CPU to be
      completely wasted (when the load- balancing thread hung).
      
      3. The new algorithm can look at a much longer runtime history than the
      previous algorithm did. With the default tau=200ms, the one-cpu
      intermittent thread test of tst-scheduler now provides good
      fairness for sleep durations of 1ms-32ms.
      The previous algorithm was never fair in any of those tests.
      
      4. The new algorithm is more deterministic in its use of timers
      (with thyst=2_ms: up to 500 timers a second), resulting in less
      varied performance in high-context-switch benchmarks like tst-ctxsw.
      
      This scheduler does very well on the fairness tests tst-scheduler and
      fairly well on tst-loadbalance. Even better performance on that second
      test will require an additional patch for the idle thread to wake other
      cpus' load balanacing threads.
      
      As expected the new scheduler is somewhat slower than the existing one
      (as we now do some relatively complex calculations instead of trivial
      integer operations), but thanks to using approximations when possible
      and to various other optimizations, the difference is relatively small:
      
      On my laptop, tst-ctxsw.so, which measures "context switch" time (actually,
      also including the time to use mutex and condvar which this test uses to
      cause context switching), on the "colocated" test I measured 355 ns with
      the old scheduler, and 382 ns with the new scheduler - meaning that the
      new scheduler adds 27ns of overhead to every context switch. To see that
      this penalty is minor, consider that tst-ctxsw is an extreme example,
      doing 3 million context switches a second, and even there it only slows
      down the workload by 7%.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      dbc0d507
    • Nadav Har'El's avatar
      sched: No need for "yield" parameter of schedule() · e1722351
      Nadav Har'El authored
      
      The schedule() and cpu::schedule() functions had a "yield" parameter.
      This parameter was inconsistently used (it's not clear why specific
      places called it with "true" and other with "false"), but moreover, was
      always ignored!
      
      So this patch removes the parameter of schedule(). If you really want
      a yield, call yield(), not schedule().
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      e1722351
    • Raphael S. Carvalho's avatar
      vfs: Add the utimes system call · 832bba6e
      Raphael S. Carvalho authored
      
      v2: Check limit of microseconds, among other minor changes (Nadav Har'El, Avi Kivity).
      v3: Get rid of goto & label by adding an else clause (Nadav Har'El).
      
      - This patch adds utimes support.
      - This patch addresses the issue #93
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Tested-by: default avatarTomasz Grabiec <tgrabiec@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      832bba6e
    • Raphael S. Carvalho's avatar
      vfs: Unify attribute flags into a common place · 1519d3d1
      Raphael S. Carvalho authored
      
      Attribute flags were moved from 'bsd/sys/cddl/compat/opensolaris/sys/vnode.h'
      to 'include/osv/vnode_attr.h'
      
      'bsd/sys/cddl/compat/opensolaris/sys/vnode.h' now includes 'include/osv/vnode_attr.h'
      exactly at the place the flags were previously located.
      
      'fs/vfs/vfs.h' includes 'include/osv/vnode_attr.h' as functions that rely on the setattr
      feature must specify the flags respective to the attr fields that are going to be changed.
      
      Approach sugested by Nadav Har'El
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Tested-by: default avatarTomasz Grabiec <tgrabiec@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      1519d3d1
    • Nadav Har'El's avatar
      Warn about incorrect use of percpu<> / PERCPU(..). · 8add1b91
      Nadav Har'El authored
      
      This patch causes incorrect usage of percpu<>/PERCPU() to cause
      compilation errors instead of silent runtime corruptions.
      
      Thanks to Dmitry for first noticing this issue in xen_intr.cc (see his
      separate patch), and to Avi for suggesting a compile-time fix.
      
      With this patch:
      
      1. Using percpu<...> to *define* a per-cpu variable fails compilation.
         Instead, PERCPU(...) must be used for the definition, which is important
         because it places the variable in the ".percpu" section.
      
      2. If a *declaration* is needed additionally (e.g., for a static class
         member), percpu<...> must be used, not PERCPU().
         Trying to use PERCPU() for declaration will cause a compilation error.
      
      3. PERCPU() only works on statically-constructed objects - global variables,
         static function-variables and static class-members. Trying to use it
         on a dynamically-constructed object - stack variable, class field,
         or operator new - will cause a compilation error.
      
      With this patch, the bug in xen_intr.cc would have been caught at
      compile time.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      8add1b91
  4. Nov 25, 2013
    • Pekka Enberg's avatar
    • Pekka Enberg's avatar
      mmu: Anonymous memory demand paging · c1d5fccb
      Pekka Enberg authored
      
      Switch to demand paging for anonymous virtual memory.
      
      I used SPECjvm2008 to verify performance impact. The numbers are mostly
      the same with few exceptions, most visible in the 'serial' benchmark.
      However, there's quite a lot of variance between SPECjvm2008 runs so I
      wouldn't read too much into them.
      
      As we need the demand paging mechanism and the performance numbers
      suggest that the implementation is reasonable, I'd merge the patch as-is
      and see optimize it later.
      
        Before:
      
          Running specJVM2008 benchmarks on an OSV guest.
          Score on compiler.compiler: 331.23 ops/m
          Score on compiler.sunflow: 131.87 ops/m
          Score on compress: 118.33 ops/m
          Score on crypto.aes: 41.34 ops/m
          Score on crypto.rsa: 204.12 ops/m
          Score on crypto.signverify: 196.49 ops/m
          Score on derby: 170.12 ops/m
          Score on mpegaudio: 70.37 ops/m
          Score on scimark.fft.large: 36.68 ops/m
          Score on scimark.lu.large: 13.43 ops/m
          Score on scimark.sor.large: 22.29 ops/m
          Score on scimark.sparse.large: 29.35 ops/m
          Score on scimark.fft.small: 195.19 ops/m
          Score on scimark.lu.small: 233.95 ops/m
          Score on scimark.sor.small: 90.86 ops/m
          Score on scimark.sparse.small: 64.11 ops/m
          Score on scimark.monte_carlo: 145.44 ops/m
          Score on serial: 94.95 ops/m
          Score on sunflow: 73.24 ops/m
          Score on xml.transform: 207.82 ops/m
          Score on xml.validation: 343.59 ops/m
      
        After:
      
          Score on compiler.compiler: 346.78 ops/m
          Score on compiler.sunflow: 132.58 ops/m
          Score on compress: 116.05 ops/m
          Score on crypto.aes: 40.26 ops/m
          Score on crypto.rsa: 206.67 ops/m
          Score on crypto.signverify: 194.47 ops/m
          Score on derby: 175.22 ops/m
          Score on mpegaudio: 76.18 ops/m
          Score on scimark.fft.large: 34.34 ops/m
          Score on scimark.lu.large: 15.00 ops/m
          Score on scimark.sor.large: 24.80 ops/m
          Score on scimark.sparse.large: 33.10 ops/m
          Score on scimark.fft.small: 168.67 ops/m
          Score on scimark.lu.small: 236.14 ops/m
          Score on scimark.sor.small: 110.77 ops/m
          Score on scimark.sparse.small: 121.29 ops/m
          Score on scimark.monte_carlo: 146.03 ops/m
          Score on serial: 87.03 ops/m
          Score on sunflow: 77.33 ops/m
          Score on xml.transform: 205.73 ops/m
          Score on xml.validation: 351.97 ops/m
      
      Reviewed-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      c1d5fccb
    • Pekka Enberg's avatar
      mmu: VMA permission flags · 8a56dc8c
      Pekka Enberg authored
      
      Add permission flags to VMAs. They will be used by mprotect() and the
      page fault handler.
      
      Reviewed-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      8a56dc8c
  5. Nov 22, 2013
  6. Nov 21, 2013
    • Nadav Har'El's avatar
      Replace numbers in prio.hh by automatically defined numbers · 147de06c
      Nadav Har'El authored
      prio.hh defines various initialization priorities. The actual numbers
      don't matter, just the order between them. But when we add too many
      priorities between existing ones, we may hit a need to renumber. This
      is plain ugly, and reminds me of Basic programming ;-)
      
      So this patch switches to an enum (enum class, actually).
      We now just have a list of priority names in order, with no numbers.
      
      It would have been straightforward, if it weren't for a bug in GCC
      (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59211
      
      ) where the
      "init_priority" attribute doesn't accept the enum (while the "constructor"
      attribute does). Luckily, a simple workaround - explicitly casting to
      int - works.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      147de06c
    • Avi Kivity's avatar
      poll: refactor poll() in terms of file pointers, not file descriptors · 0b68144e
      Avi Kivity authored
      
      With epoll(), the lifetime of an ongoing poll may be longer than the
      lifetime of a file descriptor; if an fd is close()d then we expect it
      to be silently removed from the epoll.
      
      With the current implementation of epoll(), which just calls poll(), this is
      impossible to do correctly since poll() is implemented in terms of file
      descriptor.
      
      Add an intermedite do_poll() that works on file pointers. This allows a
      refactored epoll() to convert file descriptors to file pointers just once,
      and then a close()d and re-open()ed descriptor can be added without a problem.
      
      As a side effect, a lot of atomic operations (fget() and fdrop()) are saved.
      
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      0b68144e
  7. Nov 20, 2013
  8. Nov 19, 2013
  9. Nov 10, 2013
  10. Nov 07, 2013
  11. Nov 04, 2013
  12. Oct 31, 2013
  13. Oct 30, 2013
    • Nadav Har'El's avatar
      Start documenting condvar · 212a061a
      Nadav Har'El authored
      
      Add Doxygen comments to the condvar class. Only the C++ interface
      (condvar's methods) is documented, not the alternative C interface
      (condvar_* functions).
      
      A reminder: run "doxygen" and point your browser to doxyout/html/index.html
      to see the API documentation we have so far.
      
      A lot can still be added to this condvar documentation, including a
      good introduction to how to use condition variables, why they have
      a mutex, etc. But it's at least a start.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      212a061a
  14. Oct 28, 2013
    • Pekka Enberg's avatar
      percpu: Fix arithmetic on pointer to void · 8dbcbbf1
      Pekka Enberg authored
      
      Spotted by Clang:
      
      In file included from ../../loader.cc:27:
      In file included from ../../drivers/virtio-net.hh:12:
      In file included from ../../bsd/sys/net/if_var.h:80:
      In file included from ../../bsd/sys/sys/mbuf.h:40:
      In file included from ../../bsd/porting/uma_stub.h:161:
      ../../include/osv/percpu.hh:35:42: error: arithmetic on a pointer to void
              return reinterpret_cast<T*>(base + offset);
                                          ~~~~ ^
      
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      8dbcbbf1
  15. Oct 25, 2013
  16. Oct 24, 2013
Loading