Skip to content
Snippets Groups Projects
  1. May 21, 2014
  2. May 16, 2014
  3. May 14, 2014
  4. May 09, 2014
  5. May 05, 2014
  6. Apr 11, 2014
    • Nadav Har'El's avatar
      loader: power off when command line is invalid · 129ac4bb
      Nadav Har'El authored
      
      Currently, in several cases when a bad command line is set in the image,
      such as an empty command line (as in "make image=empty") or one with
      invalid paramters (e.g., run.py -e "-a a"), we use abort(). abort() has two
      annoying "features" - it hangs the VM forever, and shows an ugly stack
      trace. Both are useful for a debugging - but it doesn't make sense to use
      a debugger when just the command line is misconfigured - we just need to
      print a message and power off the VM.
      
      Calling osv::poweroff() in this early time during the boot is fine after
      the previous patch which fixed osv::poweroff().
      
      By the way, running a non-existant file (e.g., 'run.py -e a') already
      had this correct behavior of powering off, not hanging.
      
      Reviewed-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      129ac4bb
  7. Apr 02, 2014
    • Claudio Fontana's avatar
      aarch64: install simple vectors and get the cmdline · 3751c5b1
      Claudio Fontana authored
      
      get the command line and elf header start, then try to
      stay clear of apparent random limitations during boot
      early stage with the model, and set up vectors as soon
      as possible, to enable some minimal post-mortem info.
      
      Signed-off-by: default avatarClaudio Fontana <claudio.fontana@huawei.com>
      3751c5b1
    • Claudio Fontana's avatar
      8e776790
    • Nadav Har'El's avatar
      v3 RCU: Per-CPU rcu_defer() · e5fc1f1b
      Nadav Har'El authored
      
      Changes in v3, following Avi's review:
      * Use WITH_LOCK(migration_lock) instead of migrate_disable()/enable().
      * Make the global RCU "generation" counter a static class variable,
        instead of static function variable. Rename it "next_generation"
        (the name "generation" was grossly overloaded previously)
      * In rcu_synchronize(), use migration_lock to be sure we wake up the
        thread to which we just added work.
      * Use thread_handle, instead of thread*, for percpu_quiescent_state_thread.
        This is safer (atomic variable, so we can't see it half-set on some
        esoteric CPU), and cleaner (no need to check t!=0). Thread_handle is
        a bit of an overkill here, but it's not in a performance sensitive area.
      
      The existing rcu_defer() used a global list of deferred work, protected by
      a global mutex. It also woke up the cleanup thread on every call. These
      decisions made rcu_dispose() noticably slower than a regular delete, to the
      point that when commit 70502950 introduced
      an rcu_dispose() to every poll() call, we saw performance of UDP memcached,
      which calls poll() on every request, drop by as much as 40%.
      
      The slowness of rcu_defer() was even more apparent in an artificial benchmark
      which repeatedly calls new and rcu_dispose from one or several concurrent
      threads. While on my machine a new/delete pair takes 24 ns, a new/rcu_dispose
      from a single thread (on a 4 cpus VM) takes a whopping 330 ns, and worse -
      when we have 4 threads on 4 cpus in a tight new/rcu_dispose loop, the mutex
      contention, the fact we free the memory on the "wrong" cpu, and the excessive
      context switches all bring the measurement to as much as 12,000 ns.
      
      With this patch the new/rcu_dispose numbers are down to 60 ns on a single
      thread (on 4 cpus) and 111 ns on 4 concurrent threads (on 4 cpus). This is
      a x5.5 - x120 speedup :-)
      
      This patch replaces the single list of functions with a per-cpu list.
      rcu_defer() can add more callbacks to this per-cpu list without a mutex,
      and instead of a single "garbage collection" thread running these callbacks,
      the per-cpu RCU thread, which we already had, is the one that runs the work
      deferred on this cpu's list. This per-cpu work is particularly effective
      for free() work (i.e., rcu_dispose()) because it is faster to free memory
      on the same CPU where it was allocated. This patch also eliminates the
      single "garbage collection" thread which the previous code needed.
      
      The per-CPU work queue has a fixed size, currently set to 2000 functions.
      It is actually a double-buffer, so we can continue to accumulate more work
      while cleaning up; If rcu_defer() is used so quickly that it outpaces the
      cleanup, rcu_defer() will wait while the buffer is no longer full.
      The choice of buffer size is a tradeoff between speed and memory: a larger
      buffer means fewer context switches (between the thread doing rcu_defer()
      and the RCU thread doing the cleanup), but also more memory temporarily
      being used by unfreed objects.
      
      Unlike the previous code, we do not wake up the cleanup thread after
      every rcu_defer(). When the RCU cleanup work is frequent but still small
      relative to the main work of the application (e.g., memcached server),
      the RCU cleanup thread would always have low runtime which meant we suffered
      a context switch on almost every wakeup of this thread by rcu_defer().
      In this patch, we only wake up the cleanup thread when the buffer becomes
      full, so we have far fewer context switches. This means that currently
      rcu_defer() may delay the cleanup an unbounded amount of time. This is
      normally not a problem, and when it it, namely in rcu_synchronize(),
      we wake up the thread immediately.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      e5fc1f1b
  8. Apr 01, 2014
    • Avi Kivity's avatar
      Revert "rcu: Per-CPU rcu_defer()" · 6d68d1ab
      Avi Kivity authored
      
      This reverts commit d24cda2c.  It wants
      migration_lock to be merged first.
      
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      6d68d1ab
    • Nadav Har'El's avatar
      rcu: Per-CPU rcu_defer() · d24cda2c
      Nadav Har'El authored
      
      The existing rcu_defer() used a global list of deferred work, protected by
      a global mutex. It also woke up the cleanup thread on every call. These
      decisions made rcu_dispose() noticably slower than a regular delete, to the
      point that when commit 70502950 introduced
      an rcu_dispose() to every poll() call, we saw performance of UDP memcached,
      which calls poll() on every request, drop by as much as 40%.
      
      The slowness of rcu_defer() was even more apparent in an artificial benchmark
      which repeatedly calls new and rcu_dispose from one or several concurrent
      threads. While on my machine a new/delete pair takes 24 ns, a new/rcu_dispose
      from a single thread (on a 4 cpus VM) takes a whopping 330 ns, and worse -
      when we have 4 threads on 4 cpus in a tight new/rcu_dispose loop, the mutex
      contention, the fact we free the memory on the "wrong" cpu, and the excessive
      context switches all bring the measurement to as much as 12,000 ns.
      
      With this patch the new/rcu_dispose numbers are down to 60 ns on a single
      thread (on 4 cpus) and 111 ns on 4 concurrent threads (on 4 cpus). This is
      a x5.5 - x120 speedup :-)
      
      This patch replaces the single list of functions with a per-cpu list.
      rcu_defer() can add more callbacks to this per-cpu list without a mutex,
      and instead of a single "garbage collection" thread running these callbacks,
      the per-cpu RCU thread, which we already had, is the one that runs the work
      deferred on this cpu's list. This per-cpu work is particularly effective
      for free() work (i.e., rcu_dispose()) because it is faster to free memory
      on the same CPU where it was allocated. This patch also eliminates the
      single "garbage collection" thread which the previous code needed.
      
      The per-CPU work queue has a fixed size, currently set to 2000 functions.
      It is actually a double-buffer, so we can continue to accumulate more work
      while cleaning up; If rcu_defer() is used so quickly that it outpaces the
      cleanup, rcu_defer() will wait while the buffer is no longer full.
      The choice of buffer size is a tradeoff between speed and memory: a larger
      buffer means fewer context switches (between the thread doing rcu_defer()
      and the RCU thread doing the cleanup), but also more memory temporarily
      being used by unfreed objects.
      
      Unlike the previous code, we do not wake up the cleanup thread after
      every rcu_defer(). When the RCU cleanup work is frequent but still small
      relative to the main work of the application (e.g., memcached server),
      the RCU cleanup thread would always have low runtime which meant we suffered
      a context switch on almost every wakeup of this thread by rcu_defer().
      In this patch, we only wake up the cleanup thread when the buffer becomes
      full, so we have far fewer context switches. This means that currently
      rcu_defer() may delay the cleanup an unbounded amount of time. This is
      normally not a problem, and when it it, namely in rcu_synchronize(),
      we wake up the thread immediately.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      d24cda2c
  9. Mar 27, 2014
  10. Mar 25, 2014
  11. Mar 24, 2014
  12. Mar 06, 2014
    • Asias He's avatar
      vmw-pvscsi: Initial support · 81fdc730
      Asias He authored
      
      This driver is for VMware's pvscsi disk. It has better performance than
      using AHCI device in VMware. This driver uses the common scsi code in
      scsi-common.
      
      This driver is written from scratch. QEMU and Linux pvscsi drivers were
      used as reference as there's no specification available.
      
      Tested on QEMU's pvscsi implementation and VMware Workstation.
      
      Signed-off-by: default avatarAsias He <asias@cloudius-systems.com>
      81fdc730
  13. Mar 04, 2014
  14. Feb 12, 2014
  15. Feb 11, 2014
  16. Feb 07, 2014
  17. Feb 06, 2014
  18. Jan 27, 2014
  19. Jan 22, 2014
  20. Jan 21, 2014
Loading