Skip to content
Snippets Groups Projects
  1. Jan 20, 2014
    • Avi Kivity's avatar
      sched: wake_lock() · acd36f2d
      Avi Kivity authored
      
      This adds a facility to wake a thread, but with the intention that it will
      acquire a certain lock after waking, and while the waker holds the lock.
      This is implemented using the regular wait morphing code (send_lock() and
      receive_lock()), but with additional mutual exclusion to allow regular
      wake()s in parallel.
      
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      acd36f2d
  2. Jan 19, 2014
  3. Jan 17, 2014
    • Dmitry Fleytman's avatar
      DHCP: Support MTU option · 69bf74a7
      Dmitry Fleytman authored
      
      This patch introduces support for MTU option as described in
      RFC2132, chapter 5.1. Interface MTU Option
      
      Amazon EC2 networking uses this option in some cases and it gives
      throughput improvement of about 250% on big instances with 10G networking.
      
      Netperf results for hi1.4xlarge instances, TCP_MAERTS test, OSv runs netserver:
      
      Send buffer size     Throughput w/ patch (Mbps)     Throughput w/o patch (Mbps)     Improvement (%)
      
      32                   4912.29                        1386.28                         254
      64                   4832.01                        1385.99                         249
      128                  4835.09                        1401.46                         245
      256                  4746.41                        1382.28                         243
      512                  4849.04                        1375.23                         253
      1024                 4631.8                         1356.69                         241
      2048                 4859.59                        1371.92                         254
      4096                 4864.99                        1383.67                         252
      8192                 4627.07                        1364.05                         239
      16384                4868.73                        1366.48                         256
      32768                4822.69                        1366.63                         253
      65536                4837.67                        1353.87                         257
      
      Signed-off-by: default avatarDmitry Fleytman <dmitry@daynix.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      69bf74a7
    • Pekka Enberg's avatar
      mmu: procfs support · b01a5444
      Pekka Enberg authored
      
      Add procfs_maps() function to core/mmu.cc that returns all the VMAs
      formatted for Linux compatible "/proc/<pid>/maps" file.
      
      This will be called by the procfs filesystem.
      
      Limitations:
      
        * Shared mappings are not identified as such.
        * File-backed mmap offset, device, inode, and pathname are not
          reported.
        * Special region names such as [heap] and [stack] are not reported.
      
      Reviewed-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      b01a5444
  4. Jan 16, 2014
  5. Jan 15, 2014
    • Eduardo Piva's avatar
      debug: create circular buffer for silent mode · 63ff4d06
      Eduardo Piva authored
      
      Create a circular buffer that stored all debug messages accordingly.
      If the debug buffer is full, reuse it. A method called flush_debug_buffer
      is added to enable printing all messages to console if verbose mode is
      configured.
      
      The global variable debug_buffer_full is used to track if, when
      flushing debug buffer to console, we need to flush both buffer sides.
      
      If verbose boolean variable is set, all messages are printed to the
      console after beeing stored in the buffer.
      
      The size of the buffer is 50Kb, defined in debug.hh.
      
      A function debugf that received a variable list of arguments
      is defined so we can change some printf from boot sequence
      to debugf call. A different name is used to prevent C overload.
      
      Signed-off-by: default avatarEduardo Piva <efpiva@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      63ff4d06
  6. Jan 13, 2014
  7. Jan 10, 2014
    • Glauber Costa's avatar
      mm: Count total memory used by the JVM heap · 478f8746
      Glauber Costa authored
      
      To make informed reclaim decisions, we need to have as much relevant
      information as possible about our reclaim targets. Specifically, it
      is useful to know how much memory is currently used by the JVM heap.
      
      The reasoning behind this is that if pressure is coming from the heap,
      ballooning will harm us, instead of helping us.
      
      Note: This is really just a first approximation. Ideally, total memory
      shouldn't matter, but rather memory delta since a last common event.
      But counting memory is the initial first step for both.
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      478f8746
    • Glauber Costa's avatar
      jvm: insert probe · b32a006b
      Glauber Costa authored
      
      To find out which vmas hold the Java heap, we will use a technique that is very
      close to ballooning (in the implementation, it is effectively the same)
      
      What we will do is we will insert a very small element (2 pages), and mark the
      vma where the object is present as containing the JVM heap. Due to the way the
      JVM allocates objects, that will end up in the young generation. As time
      passes, the object will move the same way the balloon moves, and every new vma
      that is seen will be marked as holding the JVM heap.
      
      That mechanism should work for every generational GC, which should encompass
      most of the JDK7 GCs (it not all). It shouldn't work with the G1GC, but that
      debuts at JDK8, and for that we can do something a lot simpler, namely: having
      the JVM to tell us in advance which map areas contain the heap.
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      b32a006b
    • Glauber Costa's avatar
      jvm_balloon: control shrinker activation / deactivation · 52cb4738
      Glauber Costa authored
      
      There are restrictions on when and how a shrinker can run. For instance, if we
      have no balloons inflated, there is nothing to deflate (the relaxer should,
      then, be deactivated). Or also, when the JVM fails to allocate memory for an
      extra balloon, it is pointless to keep trying (which would only lead to
      unnecessary spins) until *at least* the next garbage collection phase.
      
      I believe this behavior of activation / deactivation ought to be shrinker
      specific. The reclaiming framework will only provide the infrastructure to do
      so.
      
      In this patch, the JVM Balloon uses that to inform the reclaimer when it makes
      sense for the shrinker or relaxer to be called.
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      52cb4738
    • Glauber Costa's avatar
      JVM ballon driver · 9c59e7e8
      Glauber Costa authored
      
      This patch implements the JVM balloon driver, that is responsible for borrowing
      memory from the JVM when OSv is short on memory, and giving it back when we are
      plentiful. It works by allocating a java byte array, and then unmapping a large
      page-aligned region inside it (as big as our size allows).
      
      This array is good to go until the GC decides to move us. When that happens, we
      need to carefuly emulate the memcpy fault and put things back in place.
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      9c59e7e8
    • Glauber Costa's avatar
      mmu: implement a new JVM vma · b657d2b3
      Glauber Costa authored
      
      After carrying on some testing, I quickly realized that the old fixup-only
      solution I was attempting for the ballooning was not really flying. The reason
      for that, is that we would take a fault, figure out the fixup address, and
      return.  If that wasn't a JVM fault, we were forced to take another fault
      (since we were already out of fault context).
      
      Once demand paging is a reality, the vast majority of the faults are for non
      balloon addresses, so we were effectively doubling our number of page faults
      for no reason. I have decided to go with the VMA (+fixups for instruction
      decoding) route after all. This is way more efficient and it seems to be
      working fine.
      
      The JVM vma is really close to the normal anonymous VMA. Except that it can
      never hold pages, and its fault handler calls into the JVM balloon facilities
      for decoding.
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      b657d2b3
    • Glauber Costa's avatar
      mempool: shrink memory when no longer used. · 4afd087b
      Glauber Costa authored
      
      This patch introduces the memory reclaimer thread, which I hope to use to
      dispose of unused memory when pressure kicks in. "Pressure" right now is
      defined to be when we have only 20 % of total memory available. But that can be
      revisited.
      
      The way it will work is that each memory user that is able to dispose of its
      memory will register a shrinker, and the reclaimer will loop through them.
      However, the current "loop through all" only "works" because we have only one
      shrinker being registered. When other appears, we need better policies to drive
      how much to take, and from whom.
      
      Memory allocation will now wait if memory is not available, instead of
      aborting.  The decision of aborting should belong to the reclaimer and no one
      else.
      
      We should never expect to have an unbounded and more importantly, all opaque,
      number of shrinkers like Linux does. We have control of who they are and how
      they behave, so I expect that we will be able to make a lot better decisions
      in the long run.
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      4afd087b
    • Glauber Costa's avatar
      semaphore: allow extending the interface · 21d9c318
      Glauber Costa authored
      
      Following an early suggestion from Nadav, I am trying to use semaphores for the
      balloon instead of keeping our own queue. For that to work, I need to have a bit
      more functionality that may not belong in the main balloon class. Namely:
      
      1) I need to query for the presence of waiters (and maybe in the future for the
      number of waiters)
      
      2) I need a special post that would allow me to make sure that we are almost posting
      at most as much we're waiting for, and nothing more.
      
      This patch transforms the post method in an unlocked version (and exposes a
      trivial version that just locks around it) and make other changes necessary to allow
      subclassing
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      21d9c318
    • Glauber Costa's avatar
      mmu: account evacuated size · ab459e83
      Glauber Costa authored
      
      This will be useful when we shrink, so we know how much memory we newly
      released for system consumption.
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      ab459e83
    • Glauber Costa's avatar
      mmu: make operate quantifiable. · f1cd4f8d
      Glauber Costa authored
      
      operate so far operates in a page range and at the very most sets a success
      flag somewhere. I am here extending the API to allow it to return how much
      data it manipulated.
      
      So as an example, if we fault in 2Mb in an empty range, it will return 2 << 20.
      But if fault in the same 2Mb in a range that already contained some sparse 4k
      pages, we will return 2 << 20 - previous_pages.
      
      That will be useful to count memory usage in certain VMAs.
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      f1cd4f8d
  8. Jan 08, 2014
    • Glauber Costa's avatar
      mem: fix allocation accounting · 8d7812fa
      Glauber Costa authored
      
      There was a small bug in the free memory tracking code that I've only hit
      recently. I was wrong in assuming that in the first branch for huge page
      allocation, where we erase the entire range, we should account for N bytes.
      This assumption came from my - wrong - understanding that we would do that when
      the range is exactly N bytes.
      
      Looking at the code with fresh eyes, that is definitely not what happens. In my
      previous stress test we were hitting the second branch all the time, so this
      bug lived on.
      
      Turns out that we will delete the entire page range, which may be bigger than
      N, the allocation size. Therefore, the whole range should be discounted from
      our calculation. The remainder (bigger than N part) will be accounted for later
      when we reinsert it in the page range, in the same way it is for the second
      branch of this code.
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      8d7812fa
  9. Jan 07, 2014
  10. Jan 03, 2014
  11. Jan 02, 2014
    • Gleb Natapov's avatar
      mmu: Make map_file() more efficient · c31fff09
      Gleb Natapov authored
      
      Currently map_file() do three passes over vma memory in a worst case.
      First it maps memory with write permission while zeroing it, then it
      reads a file into memory and, if vma is read only, it does one more
      pass to fix memory permissions. Fix it by providing new specialization
      of fill_page class which builds iovec of all allocated memory and
      reads from a file using the iovec at the end of populate stage.
      
      Signed-off-by: default avatarGleb Natapov <gleb@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      c31fff09
    • Nadav Har'El's avatar
      Sched: fix race start_early_threads() · 4bed7ed5
      Nadav Har'El authored
      
      In issue #145 I reported a crash during boot in start_early_threads().
      I wasn't actually able to replicate this bug on master, but it happens
      quite frequently (e.g., on virtually every "make check" run) with some
      patches of mine that seem unrelated to this bug.
      
      The problem is that start_early_threads() (added in 63216e85)
      iterates on the threads in the thread list, and uses
      t->remote_thread_local_var() for each thread. This can only work if
      the thread has its TLS initialized, but unfortunately in thread's
      constructor we first added the new thread to the list, and only later
      called setup_tcb() (which allocates and initializes the TLS). If we're
      unlucky, start_early_threads() can find a thread on the list which still
      doesn't have its TLS allocated, so remote_thread_local_var() will crash.
      
      The simple fix is to switch the order of the construction: First
      set up the new thread's TLS, and only then add it to the list of
      threads.
      
      Fixes #145.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      4bed7ed5
    • Tomasz Grabiec's avatar
      core: extract graceful shutdown logic · 8b616285
      Tomasz Grabiec authored
      
      In order to reuse the logic it needs to be extracted.
      
      Signed-off-by: default avatarTomasz Grabiec <tgrabiec@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      8b616285
  12. Jan 01, 2014
  13. Dec 31, 2013
  14. Dec 30, 2013
  15. Dec 27, 2013
Loading