Skip to content
Snippets Groups Projects
  1. Dec 08, 2013
  2. Dec 05, 2013
  3. Dec 04, 2013
  4. Dec 03, 2013
  5. Dec 01, 2013
  6. Nov 26, 2013
    • Avi Kivity's avatar
      libc: implement the GNU variant of strerror_r() · 7053ac3a
      Avi Kivity authored
      
      We previously had the POSIX variant only.  Implement the GNU variant as well,
      and update the header to point to the correct function based on the dialect
      selected.
      
      The POSIX variant is renamed __xpg_strerror_r() to conform to the ABI
      standards.
      
      This fixes calls to strerror_r() from binaries which were compiled with
      _GNU_SOURCE (libboost_system.a) but preserves the correct behaviour for
      BSD derived source.
      
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      7053ac3a
    • Nadav Har'El's avatar
      sched: Doxygen documentation of a bit of the scheduler · 6f825816
      Nadav Har'El authored
      
      Started adding Doxygen documentation for the scheduler. Currently
      only set_priority() and priority() are documented.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      6f825816
    • Nadav Har'El's avatar
      sched: New scheduler algorithm · dbc0d507
      Nadav Har'El authored
      This patch replaces the algorithm which the scheduler uses to keep track of
      threads' runtime, and to choose which thread to run next and for how long.
      
      The previous algorithm used the raw cumulative runtime of a thread as its
      runtime measure. But comparing these numbers directly was impossible: e.g.,
      should a thread that slept for an hour now get an hour of uninterrupted CPU
      time? This resulted in a hodgepodge of heuristics which "modified" and
      "fixed" the runtime. These heuristics did work quite well in our test cases,
      but we were forced to add more and more unjustified heuristics and constants
      to fix scheduling bugs as they were discovered. The existing scheduler was
      especially problematic with thread migration (moving a thread from one CPU
      to another) as the runtime measure on one CPU was meaningless in another.
      This bug, if not corrected, (e.g., by the patch which I sent a month
      ago) can cause crucial threads to acquire exceedingly high runtimes by
      mistake, and resulted in the tst-loadbalance test using only one CPU on
      a two-CPU guest.
      
      The new scheduling algorithm follows a much more rigorous design,
      proposed by Avi Kivity in:
      https://docs.google.com/document/d/1W7KCxOxP-1Fy5EyF2lbJGE2WuKmu5v0suYqoHas1jRM/edit?usp=sharing
      
      
      
      To make a long story short (read the document if you want all the
      details), the new algorithm is based on a runtime measure R which
      is the running decaying average of the thread's running time.
      It is a decaying average in the sense that the thread's act of running or
      sleeping in recent history is given more weight than its behavior
      a long time ago. This measure R can tell us which of the runnable
      threads to run next (the one with the lowest R), and using some
      highschool-level mathematics, we can calculate for how long to run
      this thread until it should be preempted by the next one. R carries
      the same meaning on all CPUs, so CPU migration becomes trivial.
      
      The actual implementation uses a normalized version of R, called R''
      (Rtt in the code), which is also explained in detail in the document.
      This Rtt allows updating just the running thread's runtime - not all
      threads' runtime - as time passes, making the whole calculation much
      more tractable.
      
      The benefits of the new scheduler code over the existing one are:
      
      1. A more rigourous design with fewer unjustified heuristics.
      
      2. A thread's runtime measurement correctly survives a migration to a
      different CPU, unlike the existing code (which sometimes botches
      it up, leading to threads hanging). In particular, tst-loadbalance
      now gives good results for the "intermittent thread" test, unlike
      the previous code which in 50% of the runs caused one CPU to be
      completely wasted (when the load- balancing thread hung).
      
      3. The new algorithm can look at a much longer runtime history than the
      previous algorithm did. With the default tau=200ms, the one-cpu
      intermittent thread test of tst-scheduler now provides good
      fairness for sleep durations of 1ms-32ms.
      The previous algorithm was never fair in any of those tests.
      
      4. The new algorithm is more deterministic in its use of timers
      (with thyst=2_ms: up to 500 timers a second), resulting in less
      varied performance in high-context-switch benchmarks like tst-ctxsw.
      
      This scheduler does very well on the fairness tests tst-scheduler and
      fairly well on tst-loadbalance. Even better performance on that second
      test will require an additional patch for the idle thread to wake other
      cpus' load balanacing threads.
      
      As expected the new scheduler is somewhat slower than the existing one
      (as we now do some relatively complex calculations instead of trivial
      integer operations), but thanks to using approximations when possible
      and to various other optimizations, the difference is relatively small:
      
      On my laptop, tst-ctxsw.so, which measures "context switch" time (actually,
      also including the time to use mutex and condvar which this test uses to
      cause context switching), on the "colocated" test I measured 355 ns with
      the old scheduler, and 382 ns with the new scheduler - meaning that the
      new scheduler adds 27ns of overhead to every context switch. To see that
      this penalty is minor, consider that tst-ctxsw is an extreme example,
      doing 3 million context switches a second, and even there it only slows
      down the workload by 7%.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      dbc0d507
    • Nadav Har'El's avatar
      sched: No need for "yield" parameter of schedule() · e1722351
      Nadav Har'El authored
      
      The schedule() and cpu::schedule() functions had a "yield" parameter.
      This parameter was inconsistently used (it's not clear why specific
      places called it with "true" and other with "false"), but moreover, was
      always ignored!
      
      So this patch removes the parameter of schedule(). If you really want
      a yield, call yield(), not schedule().
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      e1722351
    • Raphael S. Carvalho's avatar
      vfs: Add the utimes system call · 832bba6e
      Raphael S. Carvalho authored
      
      v2: Check limit of microseconds, among other minor changes (Nadav Har'El, Avi Kivity).
      v3: Get rid of goto & label by adding an else clause (Nadav Har'El).
      
      - This patch adds utimes support.
      - This patch addresses the issue #93
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Tested-by: default avatarTomasz Grabiec <tgrabiec@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      832bba6e
    • Raphael S. Carvalho's avatar
      vfs: Unify attribute flags into a common place · 1519d3d1
      Raphael S. Carvalho authored
      
      Attribute flags were moved from 'bsd/sys/cddl/compat/opensolaris/sys/vnode.h'
      to 'include/osv/vnode_attr.h'
      
      'bsd/sys/cddl/compat/opensolaris/sys/vnode.h' now includes 'include/osv/vnode_attr.h'
      exactly at the place the flags were previously located.
      
      'fs/vfs/vfs.h' includes 'include/osv/vnode_attr.h' as functions that rely on the setattr
      feature must specify the flags respective to the attr fields that are going to be changed.
      
      Approach sugested by Nadav Har'El
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Tested-by: default avatarTomasz Grabiec <tgrabiec@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      1519d3d1
    • Nadav Har'El's avatar
      Warn about incorrect use of percpu<> / PERCPU(..). · 8add1b91
      Nadav Har'El authored
      
      This patch causes incorrect usage of percpu<>/PERCPU() to cause
      compilation errors instead of silent runtime corruptions.
      
      Thanks to Dmitry for first noticing this issue in xen_intr.cc (see his
      separate patch), and to Avi for suggesting a compile-time fix.
      
      With this patch:
      
      1. Using percpu<...> to *define* a per-cpu variable fails compilation.
         Instead, PERCPU(...) must be used for the definition, which is important
         because it places the variable in the ".percpu" section.
      
      2. If a *declaration* is needed additionally (e.g., for a static class
         member), percpu<...> must be used, not PERCPU().
         Trying to use PERCPU() for declaration will cause a compilation error.
      
      3. PERCPU() only works on statically-constructed objects - global variables,
         static function-variables and static class-members. Trying to use it
         on a dynamically-constructed object - stack variable, class field,
         or operator new - will cause a compilation error.
      
      With this patch, the bug in xen_intr.cc would have been caught at
      compile time.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      8add1b91
  7. Nov 25, 2013
    • Pekka Enberg's avatar
    • Pekka Enberg's avatar
      mmu: Anonymous memory demand paging · c1d5fccb
      Pekka Enberg authored
      
      Switch to demand paging for anonymous virtual memory.
      
      I used SPECjvm2008 to verify performance impact. The numbers are mostly
      the same with few exceptions, most visible in the 'serial' benchmark.
      However, there's quite a lot of variance between SPECjvm2008 runs so I
      wouldn't read too much into them.
      
      As we need the demand paging mechanism and the performance numbers
      suggest that the implementation is reasonable, I'd merge the patch as-is
      and see optimize it later.
      
        Before:
      
          Running specJVM2008 benchmarks on an OSV guest.
          Score on compiler.compiler: 331.23 ops/m
          Score on compiler.sunflow: 131.87 ops/m
          Score on compress: 118.33 ops/m
          Score on crypto.aes: 41.34 ops/m
          Score on crypto.rsa: 204.12 ops/m
          Score on crypto.signverify: 196.49 ops/m
          Score on derby: 170.12 ops/m
          Score on mpegaudio: 70.37 ops/m
          Score on scimark.fft.large: 36.68 ops/m
          Score on scimark.lu.large: 13.43 ops/m
          Score on scimark.sor.large: 22.29 ops/m
          Score on scimark.sparse.large: 29.35 ops/m
          Score on scimark.fft.small: 195.19 ops/m
          Score on scimark.lu.small: 233.95 ops/m
          Score on scimark.sor.small: 90.86 ops/m
          Score on scimark.sparse.small: 64.11 ops/m
          Score on scimark.monte_carlo: 145.44 ops/m
          Score on serial: 94.95 ops/m
          Score on sunflow: 73.24 ops/m
          Score on xml.transform: 207.82 ops/m
          Score on xml.validation: 343.59 ops/m
      
        After:
      
          Score on compiler.compiler: 346.78 ops/m
          Score on compiler.sunflow: 132.58 ops/m
          Score on compress: 116.05 ops/m
          Score on crypto.aes: 40.26 ops/m
          Score on crypto.rsa: 206.67 ops/m
          Score on crypto.signverify: 194.47 ops/m
          Score on derby: 175.22 ops/m
          Score on mpegaudio: 76.18 ops/m
          Score on scimark.fft.large: 34.34 ops/m
          Score on scimark.lu.large: 15.00 ops/m
          Score on scimark.sor.large: 24.80 ops/m
          Score on scimark.sparse.large: 33.10 ops/m
          Score on scimark.fft.small: 168.67 ops/m
          Score on scimark.lu.small: 236.14 ops/m
          Score on scimark.sor.small: 110.77 ops/m
          Score on scimark.sparse.small: 121.29 ops/m
          Score on scimark.monte_carlo: 146.03 ops/m
          Score on serial: 87.03 ops/m
          Score on sunflow: 77.33 ops/m
          Score on xml.transform: 205.73 ops/m
          Score on xml.validation: 351.97 ops/m
      
      Reviewed-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      c1d5fccb
    • Pekka Enberg's avatar
      mmu: VMA permission flags · 8a56dc8c
      Pekka Enberg authored
      
      Add permission flags to VMAs. They will be used by mprotect() and the
      page fault handler.
      
      Reviewed-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      8a56dc8c
Loading