Skip to content
Snippets Groups Projects
  1. May 18, 2014
  2. May 16, 2014
    • Nadav Har'El's avatar
      sched: high-resolution thread::current()->thread_clock() · c4ebb11a
      Nadav Har'El authored
      
      thread::current()->thread_clock() returns the CPU time consumed by this
      thread. A thread that wishes to measure the amount of CPU time consumed
      by some short section of code will want this clock to have high resolution,
      but in the existing code it was only updated on context switches, so shorter
      durations could not be measured with it.
      
      This patch fixes thread_clock() to also add the time that passed since
      the the time slice started.
      
      When running thread_clock() on *another* thread (not thread::current()),
      we still return a cpu time snapshot from the last context switch - even
      if the thread happens to be running now (on another CPU). Fixing that case
      is quite difficult (and will probably require additional memory-ordering
      guarantees), and anyway not very important: Usually we don't need a
      high-resolution estimate of a different thread's cpu time.
      
      Fixes #302.
      
      Reviewed-by: default avatarGleb Natapov <gleb@cloudius-systems.com>
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      c4ebb11a
    • Glauber Costa's avatar
      sched: make preempt functions inline · 97f5c29d
      Glauber Costa authored
      
      Again, we are currently calling a function everytime we disable/enable preemption
      (actually a pair of functions), where simple mov instructions would do.
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      97f5c29d
    • Glauber Costa's avatar
      sched: make current inline · 19b9d16f
      Glauber Costa authored
      
      We are heavily using this function to grab the address of the current thread.
      That means a function call will be issued every time that is done, where a
      simple mov instruction would do.
      
      For objects outside the main ELF, we don't want that to be inlined, since that
      would mean the resolution would have to go through an expensive __tls_get_addr.
      So what we do is that we don't present the symbol as inline for them, and make
      sure the symbol is always generated.
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      19b9d16f
  3. May 15, 2014
  4. May 14, 2014
  5. May 13, 2014
    • Glauber Costa's avatar
      netchannels: specialize new and delete operators · 8e087635
      Glauber Costa authored
      
      While running one of the redis benchmarks, I saw around 23k calls to
      malloc_large.  Among those, ~10 - 11k were 2-page sized. I managed to track it
      down to the creation of net channels. The problem here is that the net channel
      structure is slightly larger than half a page - the maximum size for small
      object pools. That will throw all allocations into malloc_large. Besides being
      slow, it also wastes a page for every net channel created, since malloc_large
      will include an extra page in the beginning of each allocation.
      
      This patch fixes this by overloading the operators new and delete for the
      netchannel structure so that we use the more efficient and less wasteful
      alloc_page.
      
      Reviewed-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      8e087635
  6. May 12, 2014
  7. May 08, 2014
    • Nadav Har'El's avatar
      Fix bug booting with 64 CPUs · 8dd67e0d
      Nadav Har'El authored
      
      OSv is currently limited to 64 vCPUs, because we use a 64-bit bitmask for
      wakeups (see max_cpus in sched.cc). Having exactly 64 CPUs *should* work,
      but unfortunately didn't because of a bug:
      
      cpu_set::operator++ first incremented the index, and then called advance()
      to find the following one-bit. We had a bug when the index was 63: we then
      expect operator++ to return 64 (end(), signaling the end of the iteration),
      but what happened was that after it incremented the index to 64, advance()
      wrongly handled the case idx=64 (1<<64 returns 1, unexpectedly) and moved
      it back to idx=63.
      
      The patch fixes operator++ to not call advance when idx=64 is reached,
      so now it works correctly also for idx=63, and booting with 64 CPUs now
      works.
      
      Fixes #234.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      8dd67e0d
    • Jaspal Singh Dhillon's avatar
      assert: fix __assert_fail() · 7254b4b0
      Jaspal Singh Dhillon authored
      
      This patch changes the definition of __assert_fail() in api/assert.h which
      would allow it and other header files which include it (such as debug.hh) to
      be used in mgmt submodules. Fixes conflict with declaration of
      __assert_fail() in external/x64/glibc.bin/usr/include/assert.h
      
      Signed-off-by: default avatarJaspal Singh Dhillon <jaspal.iiith@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      7254b4b0
  8. May 07, 2014
  9. May 05, 2014
  10. May 04, 2014
  11. May 03, 2014
  12. Apr 29, 2014
  13. Apr 28, 2014
  14. Apr 25, 2014
    • Tomasz Grabiec's avatar
      net: log packets going through loopback and virtio-net. · f30ba40d
      Tomasz Grabiec authored
      
      There was no way to sniff packets going through OSv's loopback
      interface. I faced a need to debug in-guest TCP traffic. Packets are
      logged using tracing infrastructure. Packet data is serialized as
      sample data up to a limit, which is currently hardcoded to 128 bytes.
      
      To enable capturing of packets just enable tracepoints named:
        - net_packet_loopback
        - net_packet_eth
      
      Raw data can be seen in `trace list` output. Better presentation
      methods will be added in the following patches.
      
      This may also become useful when debugging network problems in the
      cloud, as we have no ability to run tcpdump on the host there.
      
      Signed-off-by: default avatarTomasz Grabiec <tgrabiec@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      f30ba40d
    • Tomasz Grabiec's avatar
      trace: support for serializing variable-length sequences of bytes · 2d795f99
      Tomasz Grabiec authored
      
      Tracepoint argument which extends 'blob_tag' will be interpreted as a
      range of byte-sized values. Storage required to serialize such object
      is proportional to its size.
      
      I need it to implement storage-fiendly packet capturing using tracing layer.
      
      It could be also used to capture variable length strings. Current
      limit (50 chars) is too short for some paths passed to vfs calls. With
      variable-length encoding, we could set a more generous limit.
      
      Signed-off-by: default avatarTomasz Grabiec <tgrabiec@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      2d795f99
    • Avi Kivity's avatar
      memory: add facility to indicate that a thread is a temporarily a reclaimer · cfba1a5e
      Avi Kivity authored
      
      We already have a facility that to indicate that a thread is a reclaimer
      and should be allowed to allocate reserve memory (since that memory will be
      used to free memory).  Extend it to allow indicating that a particular
      code section is used to free memory, not the entire thread.
      
      Reviewed-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      cfba1a5e
    • Gleb Natapov's avatar
      pagecache: change locking between mmu and ARC · 4792e60c
      Gleb Natapov authored
      
      Now vma_list_mutex is used to protect against races between ARC buffer
      mapping by MMU and eviction by ZFS. The problem is that MMU code calls
      into ZFS with vma_list_mutex held, so on that path all ZFS related locks
      are taken after vma_list_mutex. An attempt to acquire vma_list_mutex
      during ARC buffer eviction, while many of the same ZFS locks are already
      held, causes deadlock. It was solved by using trylock() and skipping an
      eviction if vma_list_mutex cannot be acquired, but it appears that some
      mmapped buffers are destroyed not during eviction, but after writeback and
      this destruction cannot be delayed. It calls for locking scheme redesign.
      
      This patch introduce arc_lock that have to be held during access to
      read_cache. It prevents simultaneous eviction and mapping. arc_lock should
      be the most inner lock held on any code path. Code is change to adhere to
      this rule. For that the patch replaces ARC_SHARED_BUF flag by new b_mmaped
      field. The reason is that access to b_flags field is guarded by hash_lock
      and it is impossible to guaranty same order between hash_lock and arc_lock
      on all code paths. Dropping the need for hash_lock is a nice solution.
      
      Signed-off-by: default avatarGleb Natapov <gleb@cloudius-systems.com>
      4792e60c
    • Gleb Natapov's avatar
      mmu: populate pte in page_allocator · 31d939c7
      Gleb Natapov authored
      
      Currently page_allocator return a page to a page mapper and the later
      populates a pte with it. Sometimes page allocation and pte population
      needs to be appear atomic though. For instance in case of a pagecache
      we want to prevent page eviction before pte is populated since page
      eviction clears pte, but if allocation and mapping is not atomic pte
      can be populated with stale data after eviction. With current approach
      very wide scoped lock is needed to guaranty atomicity. Moving pte
      population into page_allocator allows for much simpler locking.
      
      Signed-off-by: default avatarGleb Natapov <gleb@cloudius-systems.com>
      31d939c7
    • Gleb Natapov's avatar
      pagecache: track ARC buffers in the pagecache · 4fd8693a
      Gleb Natapov authored
      
      Current code assumes that for the same file and same offset ZFS will
      always return same ARC buffer, but this appears to be not the case.
      ZFS may create new ARC buffer while an old one is undergoing writeback.
      It means that we need to track mapping between file/offset and mmapped
      ARC buffer by ourselves. It's exactly what this patch is about. It adds
      new kind of cached page that holds pointers to an ARC buffer and stores
      them in new read_cache map.
      
      Signed-off-by: default avatarGleb Natapov <gleb@cloudius-systems.com>
      4fd8693a
Loading