Skip to content
Snippets Groups Projects
  1. Nov 26, 2013
    • Avi Kivity's avatar
      d2a06d34
    • Avi Kivity's avatar
      libc: implement the GNU variant of strerror_r() · 7053ac3a
      Avi Kivity authored
      
      We previously had the POSIX variant only.  Implement the GNU variant as well,
      and update the header to point to the correct function based on the dialect
      selected.
      
      The POSIX variant is renamed __xpg_strerror_r() to conform to the ABI
      standards.
      
      This fixes calls to strerror_r() from binaries which were compiled with
      _GNU_SOURCE (libboost_system.a) but preserves the correct behaviour for
      BSD derived source.
      
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      7053ac3a
    • Avi Kivity's avatar
      build: source dialect control · 3e1e86c4
      Avi Kivity authored
      
      Some functions (strerror_r()) are defined differently based on the source
      dialect.  We need to provide both dialects since we have mixed source.
      
      Add a source-dialect macro (defaulting to _GNU_SOURCE) and override it
      as appropriate.
      
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      3e1e86c4
    • Nadav Har'El's avatar
      sched: Doxygen documentation of a bit of the scheduler · 6f825816
      Nadav Har'El authored
      
      Started adding Doxygen documentation for the scheduler. Currently
      only set_priority() and priority() are documented.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      6f825816
    • Nadav Har'El's avatar
      sched: New scheduler algorithm · dbc0d507
      Nadav Har'El authored
      This patch replaces the algorithm which the scheduler uses to keep track of
      threads' runtime, and to choose which thread to run next and for how long.
      
      The previous algorithm used the raw cumulative runtime of a thread as its
      runtime measure. But comparing these numbers directly was impossible: e.g.,
      should a thread that slept for an hour now get an hour of uninterrupted CPU
      time? This resulted in a hodgepodge of heuristics which "modified" and
      "fixed" the runtime. These heuristics did work quite well in our test cases,
      but we were forced to add more and more unjustified heuristics and constants
      to fix scheduling bugs as they were discovered. The existing scheduler was
      especially problematic with thread migration (moving a thread from one CPU
      to another) as the runtime measure on one CPU was meaningless in another.
      This bug, if not corrected, (e.g., by the patch which I sent a month
      ago) can cause crucial threads to acquire exceedingly high runtimes by
      mistake, and resulted in the tst-loadbalance test using only one CPU on
      a two-CPU guest.
      
      The new scheduling algorithm follows a much more rigorous design,
      proposed by Avi Kivity in:
      https://docs.google.com/document/d/1W7KCxOxP-1Fy5EyF2lbJGE2WuKmu5v0suYqoHas1jRM/edit?usp=sharing
      
      
      
      To make a long story short (read the document if you want all the
      details), the new algorithm is based on a runtime measure R which
      is the running decaying average of the thread's running time.
      It is a decaying average in the sense that the thread's act of running or
      sleeping in recent history is given more weight than its behavior
      a long time ago. This measure R can tell us which of the runnable
      threads to run next (the one with the lowest R), and using some
      highschool-level mathematics, we can calculate for how long to run
      this thread until it should be preempted by the next one. R carries
      the same meaning on all CPUs, so CPU migration becomes trivial.
      
      The actual implementation uses a normalized version of R, called R''
      (Rtt in the code), which is also explained in detail in the document.
      This Rtt allows updating just the running thread's runtime - not all
      threads' runtime - as time passes, making the whole calculation much
      more tractable.
      
      The benefits of the new scheduler code over the existing one are:
      
      1. A more rigourous design with fewer unjustified heuristics.
      
      2. A thread's runtime measurement correctly survives a migration to a
      different CPU, unlike the existing code (which sometimes botches
      it up, leading to threads hanging). In particular, tst-loadbalance
      now gives good results for the "intermittent thread" test, unlike
      the previous code which in 50% of the runs caused one CPU to be
      completely wasted (when the load- balancing thread hung).
      
      3. The new algorithm can look at a much longer runtime history than the
      previous algorithm did. With the default tau=200ms, the one-cpu
      intermittent thread test of tst-scheduler now provides good
      fairness for sleep durations of 1ms-32ms.
      The previous algorithm was never fair in any of those tests.
      
      4. The new algorithm is more deterministic in its use of timers
      (with thyst=2_ms: up to 500 timers a second), resulting in less
      varied performance in high-context-switch benchmarks like tst-ctxsw.
      
      This scheduler does very well on the fairness tests tst-scheduler and
      fairly well on tst-loadbalance. Even better performance on that second
      test will require an additional patch for the idle thread to wake other
      cpus' load balanacing threads.
      
      As expected the new scheduler is somewhat slower than the existing one
      (as we now do some relatively complex calculations instead of trivial
      integer operations), but thanks to using approximations when possible
      and to various other optimizations, the difference is relatively small:
      
      On my laptop, tst-ctxsw.so, which measures "context switch" time (actually,
      also including the time to use mutex and condvar which this test uses to
      cause context switching), on the "colocated" test I measured 355 ns with
      the old scheduler, and 382 ns with the new scheduler - meaning that the
      new scheduler adds 27ns of overhead to every context switch. To see that
      this penalty is minor, consider that tst-ctxsw is an extreme example,
      doing 3 million context switches a second, and even there it only slows
      down the workload by 7%.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      dbc0d507
    • Nadav Har'El's avatar
      sched: No need for "yield" parameter of schedule() · e1722351
      Nadav Har'El authored
      
      The schedule() and cpu::schedule() functions had a "yield" parameter.
      This parameter was inconsistently used (it's not clear why specific
      places called it with "true" and other with "false"), but moreover, was
      always ignored!
      
      So this patch removes the parameter of schedule(). If you really want
      a yield, call yield(), not schedule().
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      e1722351
    • Nadav Har'El's avatar
      sched: Use schedule(), not yield() in idle thread · da583f27
      Nadav Har'El authored
      
      The idle thread cpu::idle() waits for other threads to become runnable,
      and then lets them run. It used to yield the CPU by calling yield(),
      because in early OSv history we didn't have an idle priority so simply
      calling schedule() would not guarantee that the new thread, not the idle
      thread, will run.
      
      But now we actually do have an idle priority; If the run queue is not
      empty, we are sure that calling schedule() will run another thread,
      not the idle thread. So this patch calls schedule(), which is simpler,
      faster, and more reliable than yield().
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      da583f27
    • Nadav Har'El's avatar
      sched: Don't change runtime of a queued thread · e60ebaf3
      Nadav Har'El authored
      
      The scheduler (reschedule_from_interrupt()) changes the runtime of the
      current thread. This assumes that the current thread is not in the
      runqueue - because the runqueue is sorted by runtime, and modifying the
      runtime of a thread which is already in the runqueue ruins the sorted
      tree's invariants.
      
      Unfortunately, the existing code broke this assumption in two places:
      
      1.  When handle_incoming_wakeups() wakes up the current thread (i.e., a
      thread that prepared to wait but was woken before it could go to sleep),
      the current thread was queued. We need to instead to simply return
      the thread to the "running" state.
      
      2.  yield() queued the current thread. Rather, it needs to just change
      its runtime, and reschedule_from_interrupt() will decide to queue this
      thread.
      
      This patch fixes the first problem. The second problem will be solved
      by a yield() rewrite which is part of the new scheduler in a later
      patch.
      
      By the way, after we fix both problems, we can also be sure that the
      strange if(n != thread::current()) in the scheduler is always true.
      This is because n, picked up from the run queue, could never be the
      current thread, because the current thread isn't in the run queue.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      e60ebaf3
    • Pekka Enberg's avatar
      test.py: add utimes.so · 64d97a06
      Pekka Enberg authored
      
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      64d97a06
    • Raphael S. Carvalho's avatar
      tests: Add tst-utimes · 2570b30c
      Raphael S. Carvalho authored
      
      v2: Let's convert everything to std::chrono::timepoint (Avi Kivity)
      v3: Use the to_timeptr approach suggested by Nadav Har'El
      
      This test checks the functionality of the utimes support.
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      2570b30c
    • Raphael S. Carvalho's avatar
      vfs: Add the utimes system call · 832bba6e
      Raphael S. Carvalho authored
      
      v2: Check limit of microseconds, among other minor changes (Nadav Har'El, Avi Kivity).
      v3: Get rid of goto & label by adding an else clause (Nadav Har'El).
      
      - This patch adds utimes support.
      - This patch addresses the issue #93
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Tested-by: default avatarTomasz Grabiec <tgrabiec@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      832bba6e
    • Raphael S. Carvalho's avatar
      vfs: Unify attribute flags into a common place · 1519d3d1
      Raphael S. Carvalho authored
      
      Attribute flags were moved from 'bsd/sys/cddl/compat/opensolaris/sys/vnode.h'
      to 'include/osv/vnode_attr.h'
      
      'bsd/sys/cddl/compat/opensolaris/sys/vnode.h' now includes 'include/osv/vnode_attr.h'
      exactly at the place the flags were previously located.
      
      'fs/vfs/vfs.h' includes 'include/osv/vnode_attr.h' as functions that rely on the setattr
      feature must specify the flags respective to the attr fields that are going to be changed.
      
      Approach sugested by Nadav Har'El
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Tested-by: default avatarTomasz Grabiec <tgrabiec@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      1519d3d1
    • Raphael S. Carvalho's avatar
      devfs/ramfs: Change vop_null to vop_eperm · fe3c7df0
      Raphael S. Carvalho authored
      
      Use vop_eperm instead to warn the caller about the lack of support (Glauber Costa).
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Tested-by: default avatarTomasz Grabiec <tgrabiec@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      fe3c7df0
    • Raphael S. Carvalho's avatar
    • Nadav Har'El's avatar
      Warn about incorrect use of percpu<> / PERCPU(..). · 8add1b91
      Nadav Har'El authored
      
      This patch causes incorrect usage of percpu<>/PERCPU() to cause
      compilation errors instead of silent runtime corruptions.
      
      Thanks to Dmitry for first noticing this issue in xen_intr.cc (see his
      separate patch), and to Avi for suggesting a compile-time fix.
      
      With this patch:
      
      1. Using percpu<...> to *define* a per-cpu variable fails compilation.
         Instead, PERCPU(...) must be used for the definition, which is important
         because it places the variable in the ".percpu" section.
      
      2. If a *declaration* is needed additionally (e.g., for a static class
         member), percpu<...> must be used, not PERCPU().
         Trying to use PERCPU() for declaration will cause a compilation error.
      
      3. PERCPU() only works on statically-constructed objects - global variables,
         static function-variables and static class-members. Trying to use it
         on a dynamically-constructed object - stack variable, class field,
         or operator new - will cause a compilation error.
      
      With this patch, the bug in xen_intr.cc would have been caught at
      compile time.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      8add1b91
    • Dmitry Fleytman's avatar
      xen: move per-cpu interrupt threads to .percpu section · 63d2e472
      Dmitry Fleytman authored
      
      Bug fixed by this patch made OSv crash on Xen during boot.
      The problem started to show up after commit:
      
        commit ed808267
        Author: Nadav Har'El <nyh@cloudius-systems.com>
        Date:   Mon Nov 18 23:01:09 2013 +0200
      
            percpu: Reduce size of .percpu section
      
      Signed-off-by: default avatarDmitry Fleytman <dmitry@daynix.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      63d2e472
  2. Nov 25, 2013
    • Dmitry Fleytman's avatar
      release-ec2: Introduce image and version parameters · cf482bcc
      Dmitry Fleytman authored
      
      This feature will be used to release images
      with preinstalled applications.
      
      Signed-off-by: default avatarDmitry Fleytman <dmitry@daynix.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      cf482bcc
    • Amnon Heiman's avatar
      Start up shell and management web in parallel · c29222c6
      Amnon Heiman authored
      
      Start up shell and management web in parallel to make boot faster.  Note
      that we also switch to latest mgmt.git which decouples JRuby and CRaSH
      startup.
      
      Signed-off-by: default avatarAmnon Heiman <amnon@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      c29222c6
    • Amnon Heiman's avatar
      java: Support for loading multiple mains · 10d6f18b
      Amnon Heiman authored
      
      When using the MultiJarLoader as the main class, it will use a
      configuration file for the java loading.  Each line in the file will be
      used to start a main, you can use -jar in each line or specify a main
      class.
      
      Signed-off-by: default avatarAmnon Heiman <amnon@cloudius-systems.com>
      Reviewed-by: default avatarTomasz Grabiec <tgrabiec@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      10d6f18b
    • Pekka Enberg's avatar
      tests: mincore() tests for demand paging · 20aad632
      Pekka Enberg authored
      
      As suggested by Nadav, add tests for mincore() interraction with demand
      paging.
      
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      20aad632
    • Pekka Enberg's avatar
      tests: Anonymous demand paging microbenchmark · d4bcf559
      Pekka Enberg authored
      
      This adds a simple mmap microbenchmark that can be run on both OSv and
      Linux.  The benchmark mmaps memory for various sizes and touches the
      mmap'd memory in 4K increments to fault in memory.  The benchmark also
      repeats the same tests using MAP_POPULATE for reference.
      
      OSv page faults are slightly slower than Linux on first iteration but
      faster on subsequent iterations after host operating system has faulted
      in memory for the guest.
      
      I've included full numbers on 2-core Sandy Bridge i7 for a OSv guest,
      Linux guest, and Linux host below:
      
        OSv guest
        ---------
      
        Iteration 1
      
             time (seconds)
         MiB demand populate
           1 0.004  0.000
           2 0.000  0.000
           4 0.000  0.000
           8 0.001  0.000
          16 0.003  0.000
          32 0.007  0.000
          64 0.013  0.000
         128 0.024  0.000
         256 0.052  0.001
         512 0.229  0.002
        1024 0.587  0.005
      
        Iteration 2
      
             time (seconds)
         MiB demand populate
           1 0.001  0.000
           2 0.000  0.000
           4 0.000  0.000
           8 0.001  0.000
          16 0.002  0.000
          32 0.004  0.000
          64 0.010  0.000
         128 0.019  0.001
         256 0.036  0.001
         512 0.069  0.002
        1024 0.137  0.005
      
        Iteration 3
      
             time (seconds)
         MiB demand populate
           1 0.001  0.000
           2 0.000  0.000
           4 0.000  0.000
           8 0.001  0.000
          16 0.002  0.000
          32 0.005  0.000
          64 0.010  0.000
         128 0.020  0.000
         256 0.039  0.001
         512 0.087  0.002
        1024 0.138  0.005
      
        Iteration 4
      
             time (seconds)
         MiB demand populate
           1 0.001  0.000
           2 0.000  0.000
           4 0.000  0.000
           8 0.001  0.000
          16 0.002  0.000
          32 0.004  0.000
          64 0.012  0.000
         128 0.025  0.001
         256 0.040  0.001
         512 0.082  0.002
        1024 0.138  0.005
      
        Iteration 5
      
             time (seconds)
         MiB demand populate
           1 0.001  0.000
           2 0.000  0.000
           4 0.000  0.000
           8 0.001  0.000
          16 0.002  0.000
          32 0.004  0.000
          64 0.012  0.000
         128 0.028  0.001
         256 0.040  0.001
         512 0.082  0.002
        1024 0.166  0.005
      
        Linux guest
        -----------
      
        Iteration 1
      
             time (seconds)
         MiB demand populate
           1 0.001  0.000
           2 0.001  0.000
           4 0.002  0.000
           8 0.003  0.000
          16 0.005  0.000
          32 0.008  0.000
          64 0.015  0.000
         128 0.151  0.001
         256 0.090  0.001
         512 0.266  0.003
        1024 0.401  0.006
      
        Iteration 2
      
             time (seconds)
         MiB demand populate
           1 0.000  0.000
           2 0.000  0.000
           4 0.001  0.000
           8 0.001  0.000
          16 0.002  0.000
          32 0.005  0.000
          64 0.009  0.000
         128 0.019  0.001
         256 0.037  0.001
         512 0.072  0.003
        1024 0.144  0.006
      
        Iteration 3
      
             time (seconds)
         MiB demand populate
           1 0.000  0.000
           2 0.001  0.000
           4 0.001  0.000
           8 0.001  0.000
          16 0.002  0.000
          32 0.005  0.000
          64 0.010  0.000
         128 0.019  0.001
         256 0.037  0.001
         512 0.072  0.003
        1024 0.143  0.006
      
        Iteration 4
      
             time (seconds)
         MiB demand populate
           1 0.000  0.000
           2 0.001  0.000
           4 0.001  0.000
           8 0.001  0.000
          16 0.003  0.000
          32 0.005  0.000
          64 0.010  0.000
         128 0.020  0.001
         256 0.038  0.001
         512 0.073  0.003
        1024 0.143  0.006
      
        Iteration 5
      
             time (seconds)
         MiB demand populate
           1 0.000  0.000
           2 0.001  0.000
           4 0.001  0.000
           8 0.001  0.000
          16 0.003  0.000
          32 0.005  0.000
          64 0.010  0.000
         128 0.020  0.001
         256 0.037  0.001
         512 0.072  0.003
        1024 0.144  0.006
      
        Linux host
        ----------
      
        Iteration 1
      
             time (seconds)
         MiB demand populate
           1 0.000  0.000
           2 0.001  0.000
           4 0.001  0.000
           8 0.001  0.000
          16 0.002  0.000
          32 0.005  0.000
          64 0.009  0.000
         128 0.019  0.001
         256 0.035  0.001
         512 0.152  0.003
        1024 0.286  0.011
      
        Iteration 2
      
             time (seconds)
         MiB demand populate
           1 0.000  0.000
           2 0.000  0.000
           4 0.001  0.000
           8 0.001  0.000
          16 0.002  0.000
          32 0.004  0.000
          64 0.010  0.000
         128 0.018  0.001
         256 0.035  0.001
         512 0.192  0.003
        1024 0.334  0.011
      
        Iteration 3
      
             time (seconds)
         MiB demand populate
           1 0.000  0.000
           2 0.000  0.000
           4 0.001  0.000
           8 0.001  0.000
          16 0.002  0.000
          32 0.004  0.000
          64 0.010  0.000
         128 0.018  0.001
         256 0.035  0.001
         512 0.194  0.003
        1024 0.329  0.011
      
        Iteration 4
      
             time (seconds)
         MiB demand populate
           1 0.000  0.000
           2 0.000  0.000
           4 0.001  0.000
           8 0.001  0.000
          16 0.002  0.000
          32 0.004  0.000
          64 0.010  0.000
         128 0.018  0.001
         256 0.036  0.001
         512 0.138  0.003
        1024 0.341  0.011
      
        Iteration 5
      
             time (seconds)
         MiB demand populate
           1 0.000  0.000
           2 0.000  0.000
           4 0.001  0.000
           8 0.001  0.000
          16 0.002  0.000
          32 0.004  0.000
          64 0.010  0.000
         128 0.018  0.001
         256 0.035  0.001
         512 0.135  0.002
        1024 0.324  0.011
      
      Reviewed-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      d4bcf559
    • Pekka Enberg's avatar
    • Pekka Enberg's avatar
      mmu: Anonymous memory demand paging · c1d5fccb
      Pekka Enberg authored
      
      Switch to demand paging for anonymous virtual memory.
      
      I used SPECjvm2008 to verify performance impact. The numbers are mostly
      the same with few exceptions, most visible in the 'serial' benchmark.
      However, there's quite a lot of variance between SPECjvm2008 runs so I
      wouldn't read too much into them.
      
      As we need the demand paging mechanism and the performance numbers
      suggest that the implementation is reasonable, I'd merge the patch as-is
      and see optimize it later.
      
        Before:
      
          Running specJVM2008 benchmarks on an OSV guest.
          Score on compiler.compiler: 331.23 ops/m
          Score on compiler.sunflow: 131.87 ops/m
          Score on compress: 118.33 ops/m
          Score on crypto.aes: 41.34 ops/m
          Score on crypto.rsa: 204.12 ops/m
          Score on crypto.signverify: 196.49 ops/m
          Score on derby: 170.12 ops/m
          Score on mpegaudio: 70.37 ops/m
          Score on scimark.fft.large: 36.68 ops/m
          Score on scimark.lu.large: 13.43 ops/m
          Score on scimark.sor.large: 22.29 ops/m
          Score on scimark.sparse.large: 29.35 ops/m
          Score on scimark.fft.small: 195.19 ops/m
          Score on scimark.lu.small: 233.95 ops/m
          Score on scimark.sor.small: 90.86 ops/m
          Score on scimark.sparse.small: 64.11 ops/m
          Score on scimark.monte_carlo: 145.44 ops/m
          Score on serial: 94.95 ops/m
          Score on sunflow: 73.24 ops/m
          Score on xml.transform: 207.82 ops/m
          Score on xml.validation: 343.59 ops/m
      
        After:
      
          Score on compiler.compiler: 346.78 ops/m
          Score on compiler.sunflow: 132.58 ops/m
          Score on compress: 116.05 ops/m
          Score on crypto.aes: 40.26 ops/m
          Score on crypto.rsa: 206.67 ops/m
          Score on crypto.signverify: 194.47 ops/m
          Score on derby: 175.22 ops/m
          Score on mpegaudio: 76.18 ops/m
          Score on scimark.fft.large: 34.34 ops/m
          Score on scimark.lu.large: 15.00 ops/m
          Score on scimark.sor.large: 24.80 ops/m
          Score on scimark.sparse.large: 33.10 ops/m
          Score on scimark.fft.small: 168.67 ops/m
          Score on scimark.lu.small: 236.14 ops/m
          Score on scimark.sor.small: 110.77 ops/m
          Score on scimark.sparse.small: 121.29 ops/m
          Score on scimark.monte_carlo: 146.03 ops/m
          Score on serial: 87.03 ops/m
          Score on sunflow: 77.33 ops/m
          Score on xml.transform: 205.73 ops/m
          Score on xml.validation: 351.97 ops/m
      
      Reviewed-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      c1d5fccb
    • Pekka Enberg's avatar
      mmu: Optimistic locking in populate() · 7e568ba0
      Pekka Enberg authored
      
      Use optimistic locking in populate() to make it robust against
      concurrent page faults.
      
      Reviewed-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      7e568ba0
    • Pekka Enberg's avatar
      mmu: VMA permission flags · 8a56dc8c
      Pekka Enberg authored
      
      Add permission flags to VMAs. They will be used by mprotect() and the
      page fault handler.
      
      Reviewed-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      8a56dc8c
    • Tomasz Grabiec's avatar
      loader.py: add commands for function duration analysis · af723084
      Tomasz Grabiec authored
      
      Duration analysis is based on trace pairs which follow the convention
      in which function entry generates trace named X and ends with either
      trace X_ret or X_err. Traces which do not have an accompanying return
      tracepoint are ignored.
      
      New commands:
      
        osv trace summary
      
            Prints execution time statistics for traces
      
        osv trace duration {function}
      
            Prints timed traces sorted by duration in descending order.
            Optionally narrowed down to a specified function
      
      gdb$ osv trace summary
      Execution times [ms]:
      name          count      min      50%      90%      99%    99.9%      max    total
      vfs_pwritev       3    0.682    1.042    1.078    1.078    1.078    1.078    2.801
      vfs_pwrite       32    0.006    1.986    3.313    6.816    6.816    6.816   53.007
      
      gdb$ osv trace duration
      0xffffc000671f0010  1    1385318632.103374   6.816 vfs_pwrite
      0xffffc0003bbef010  0    1385318637.929424   3.923 vfs_pwrite
      
      Signed-off-by: default avatarTomasz Grabiec <tgrabiec@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      af723084
    • Tomasz Grabiec's avatar
    • Tomasz Grabiec's avatar
      loader.py: add wrapper for intrusive list · 6cc939a6
      Tomasz Grabiec authored
      
      The iteration logic was duplicated in two places. The patches yet to
      come would add yet another place, so let's refactor first.
      
      Signed-off-by: default avatarTomasz Grabiec <tgrabiec@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      6cc939a6
    • Raphael S. Carvalho's avatar
      libc/network: feof shouldn't be used on a closed file · df6278fe
      Raphael S. Carvalho authored
      
      Calling feof on a closed file isn't safe, and the result is undefined.
      Found while auditing the code.
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      df6278fe
    • Avi Kivity's avatar
      sched: fix iteration across timer list · 9c3308f1
      Avi Kivity authored
      
      We iterate over the timer list using an iterator, but the timer list can
      change during iteration due to timers being re-inserted.
      
      Switch to just looking at the head of the list instead, maintaining no
      state across loop iterations.
      
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      Tested-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      9c3308f1
    • Avi Kivity's avatar
      sched: prevent a re-armed timer from being ignored · 870d8410
      Avi Kivity authored
      
      When a hardware timer fires, we walk over the timer list, expiring timers
      and erasing them from the list.
      
      This is all well and good, except that a timer may rearm itself in its
      callback (this only holds for timer_base clients, not sched::timer, which
      consumes its own callback).  If it does, we end up erasing it even though
      it wants to be triggered.
      
      Fix by checking for the armed state before erasing.
      
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      Tested-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      870d8410
    • Nadav Har'El's avatar
      Fix possible deadlock in condvar · 15a32ac8
      Nadav Har'El authored
      
      When a condvar's timeout and wakeup race, we wait for the concurrent
      wakeup to complete, so it won't crash. We did this wr.wait() with
      the condvar's internal mutex (m) locked, which was fine when this code
      was written; But now that we have wait morphing, wr.wait() waits not
      just for the wakeup to complete, but also for the user_mutex to become
      available. With m locked and us waiting for user_mutex, we're now in
      deadlock territory - because a common idiom of using a condvar is to
      do the locks in opposite order: lock user_mutex first and then use the
      condvar, which locks m.
      
      I can't think of an easy way to actually demonstrate this deadlock,
      short of having a locked condvar_wait timeout racing with condvar_wake_one
      racing and then an additional locked condvar operation coming in
      concurrently, but I don't have a test case demonstrating this.
      I am hoping it will fix the lockups that Pekka is seeing in his
      Cassandra tests (which are the reason I looked for possible condvar
      deadlocks in the first place).
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Tested-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      15a32ac8
    • Glauber Costa's avatar
      sched: delay initialization of early threads · d91d7799
      Glauber Costa authored
      
      The problem with sleep, is that we can initialize early threads before the
      cpu itself is initialized. If we note what goes on in init_on_cpu, it should
      become clear:
      
      void cpu::init_on_cpu()
      {
          arch.init_on_cpu();
          clock_event->setup_on_cpu();
      }
      
      When we finally initialize the clock_event, it can get lost if we already have
      pending timers of any kind - which we may, if we have early threads being
      start()ed before that. I have played with many potential solutions, but in the
      end, I think the most sensible thing to do is to delay initialization of early
      threads to the point when we are first idle. That is the best way to guarantee
      that everything will be properly initialized and running.
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      d91d7799
  3. Nov 22, 2013
Loading