Skip to content
Snippets Groups Projects
  1. Oct 03, 2013
  2. Sep 30, 2013
  3. Sep 29, 2013
  4. Sep 26, 2013
  5. Sep 25, 2013
    • Nadav Har'El's avatar
      Dynamic linker: run finalizers when unloading shared object · bf0688f4
      Nadav Har'El authored
      
      ELF allows specifying initializers - functions to be run after loading a
      a shared object, in DT_INIT_ARRAY, and also finalizers - functions to be
      run before unloading a shared objects, in DT_FINI_ARRAY. The existing code
      ran the initializers, but forgot to run the finalizers, and this patch
      fixes this oversight.
      
      This fix is necessary for destructors of static objects defined in the
      shared object. But this fix is not sufficient for C++ destructors - see
      also the next patch.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      bf0688f4
  6. Sep 24, 2013
    • Nadav Har'El's avatar
      Fix missing poll() wakeup on POLLHUP · 554e80f6
      Nadav Har'El authored
      
      Our poll_wake() code ignored calls with the POLLHUP event, because
      the user did not explicitly ask for this event. This causes a poll()
      waiting on read from a pipe whose write side closes not to wake up.
      
      This patch adds a test for this case in tst-pipe.cc, and fixes the
      bug by adding to the poll structure's _events also ~POLL_REQUESTABLE,
      i.e., any bits which do not have to be explicitly requested by the
      user (POLL_REQUESTABLE is a new macro defined in this patch).
      
      After this patch, poll() wakes as needed in the test (instead of just
      hang), but returns the wrong event because of another bug which will
      be fixed in a separate patch.
      
      Signed-off-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      554e80f6
  7. Sep 20, 2013
  8. Sep 15, 2013
  9. Sep 12, 2013
  10. Sep 11, 2013
    • Nadav Har'El's avatar
      Add reboot function · 542c319b
      Nadav Har'El authored
      Added a new function, osv::reboot() (declared in <osv/power.hh>)
      for rebooting the VM.
      
      Also added a Java interface - com.cloudius.util.Power.reboot().
      
      NOTE: Power.java and/or jni/power.cc also need to be copied into
      the mgmt submodule.
      542c319b
    • Avi Kivity's avatar
      mutex: make the constructor constexpr · a919d5f4
      Avi Kivity authored
      Statically allocated mutexes are very common.  Make the mutex constructor
      constexpr to ensure that a statically allocated mutex is initialized before
      use, even if that use is from static constructors.
      a919d5f4
  11. Sep 10, 2013
    • Pekka Enberg's avatar
      mmu: Fix file-backed vma splitting · d72b550c
      Pekka Enberg authored
      Commit 3510a5ea ("mmu: File-backed VMAs") forgot to fix vma::split() to
      take file-backed mappings into account. Fix the problem by making
      vma::split() a virtual function and implementing it separately for
      file_vma.
      
      Spotted by Avi Kivity.
      d72b550c
  12. Sep 08, 2013
  13. Sep 05, 2013
    • Glauber Costa's avatar
      read partition table · 7fb8b99b
      Glauber Costa authored
      This code, living in device.c for maximum generality, will read the partition
      table from any disk that calls it. Ideally, each new device would have its own
      private data. But that would mean having to callback to the driver to set each
      of the partitions up. Therefore, I found it easier to convention that all
      partitions in the same drive have the same private data. This makes some sense
      if we consider that the hypervisors are usually agnostic about partitions, and
      all of the addressing and communications go through a single entry point, which
      is the disk.
      7fb8b99b
    • Glauber Costa's avatar
      add offset calculation · cd14aecc
      Glauber Costa authored
      To support multiple partitions to a disk, I found it easier to add a
      post-processing offset calculation to the bio just before calling the strategy.
      
      The reason is, we have many (really many) entry points for bio preparation
      (pre-strategy) and only two entry points for the strategy itself (the drivers).
      Since multiplex_strategy is a good thing to be used even for virtio (although I
      am not converting it now), since it allows for arbitrary sized requests, we
      could very well reduce it to just one.
      
      At this moment, the offset is always 0 and everything works as before.
      cd14aecc
    • Glauber Costa's avatar
      hpet clock driver · e2991fce
      Glauber Costa authored
      This patch implement the HPET clock driver, that should work as a fallback for
      both Xen and KVM, in case the paravirtual clock is not present. This is
      unfortunately the situation for all HVM guests running on EC2, so support for
      this is paramount. I have tested on KVM forcing the kvmclock to disappear, and
      it seems to work all right.
      e2991fce
    • Glauber Costa's avatar
      acpi: move table initialization to its own constructor · bf15592d
      Glauber Costa authored
      Right now we are doing it right before we parse the MADT, but this is by far
      not MADT specific. Other users are planned, and the best way to resolve the
      disputes is to have it in a separate constructor
      bf15592d
  14. Sep 03, 2013
    • Avi Kivity's avatar
      irq_lock: avoid 'irq_lock defined but not used' warning · 90390cca
      Avi Kivity authored
      In an attempt to be clever, we define irq_lock as an object in an anonymous
      namespace, so that each translation unit gets its own copy, which is then
      optimized away, since the object is never touched.  But the compiler complains
      that the object is defined but not used if we include the file but don't
      use irq_lock.
      
      Simplify by only declaring the object there, and defining it somewhere else.
      90390cca
  15. Sep 02, 2013
    • Pekka Enberg's avatar
      mmu: msync for file-backed memory maps · 1691c89d
      Pekka Enberg authored
      This adds simple msync() implementation for file-backed memory maps. It
      uses the newly added 'file_vma' data structure to write out and fsync
      the msync'd region as suggested by Avi Kivity.
      1691c89d
    • Pekka Enberg's avatar
      mmu: File-backed VMAs · 3510a5ea
      Pekka Enberg authored
      Add a new 'file_vma' class that extends 'vma'. This is needed to keep
      track of fileref and offset for file-backed VMAs for msync().
      3510a5ea
    • Avi Kivity's avatar
      error: fix inverted condition in error_to_libc() · 7a21aa3e
      Avi Kivity authored
      Spotted by Pekka.
      7a21aa3e
    • Avi Kivity's avatar
      osv: add error class · c0f7f0f9
      Avi Kivity authored
      Different source bases have different error conventions; libc has 0/-1+errno,
      while the rest os the source base uses 0/error.
      
      Wrap errors in a class to prevent confusion between the two.
      c0f7f0f9
  16. Aug 29, 2013
    • Avi Kivity's avatar
      mutex: add DROP_LOCK · db907b1a
      Avi Kivity authored
      This is used for temporarily dropping a lock in a lexical scope, and
      reacquiring it after an exit from the scope (similar to wait_until(mutex),
      but without the waiting):
      
        WITH_LOCK(preempt_lock) {
           // do some stuff
           while (not enough resources) {
              DROP_LOCK(preempt_lock) {
                 acquire more resources
              }
              // reload anything that may have changed after DROP_LOCK()
           }
           // do more stuff with the acquired resources
        }
      
      Note that DROP_LOCK() doesn't work will with recursively-taken locks.
      db907b1a
    • Avi Kivity's avatar
      rcu: add compiler barrier on rcu read unlock · d92eac12
      Avi Kivity authored
      We don't want the compiler moving reads after a possible rcu_defer().
      d92eac12
  17. Aug 27, 2013
    • Nadav Har'El's avatar
      Fix mincore() on non-mmap()ed memory · 6924f7db
      Nadav Har'El authored
      Commit 65afd075 fixed mincore() to recognize
      unmapped addresses. However, it used mmu::ismapped() which just checks for
      mmap()'ed addresses, and doesn't know about malloc()ed memory. This causes
      trouble for libunwind (which we use for backtrace()) which tests mincore()
      on an on-stack variable, and for non-pthread threads, this stack might be
      malloc'ed, not mmap'ed.
      
      So this patch adds mmu::isreadable(), which checks that a given memory range
      is all readable (this memory can be mmapped, malloced, stack, whatever).
      mincore() now uses that.
      
      mmu::isreadable() is implemented, following Avi's idea, by trying to read,
      with safe_load(), one byte from every page in the range. This approach is
      faster than page-table-walking especially for one-byte checks (which all
      libunwind uses anyway), and also very simple.
      6924f7db
    • Nadav Har'El's avatar
      Fix deadlock in leak detector · 227eb39b
      Nadav Har'El authored
      Commit 65afd075 that fixed mincore()
      exposed a deadlock in the leak detector, caused by two threads taking
      two locks in opposite order:
      
      Thread 1:  malloc() does alloc_tracker::remember(). This takes the tracker
         lock and calls backtrace() calling mincore() which takes the
         vma_list_mutex.
      
      Thread 2: mmap() does mmu::allocate() which takes the vma_list_mutex and
         then through mmu::populate::small_page calls memory::alloc_page() which
         calls alloc_tracker::remember() and takes the tracker lock.
      
      This patch fixes this deadlock: alloc_tracker::remember() will now drop its
      lock while running backtrace(), as the lock is only needed to protect the
      allocations[] array. We need to retake the lock after backtrace() completes,
      to copy the backtrace back to the allocations[] array.
      
      Previously, the lock's depth was also (ab)used for avoiding nested
      allocation tracking (e.g., tracking of memory allocation done inside
      backtrace() itself), but now that backtrace() is run without the lock,
      we need a different mechanism - a per-thread "in_tracker" flag, which
      is turned on inside the alloc_tracker::remember()/forget() methods.
      227eb39b
  18. Aug 26, 2013
    • Nadav Har'El's avatar
      Avoid including elf.hh from sched.hh · 714d313a
      Nadav Har'El authored
      sched.hh included elf.hh, just so it can refer to the elf::tls_data
      type. But now that we have rcu.hh which includes sched.hh and therefore
      elf.hh, if we wish to use rcu in elf.hh (we'll do this in a later patch),
      we have an include loop mess.
      
      So better not include elf.hh from sched.hh, and just declare the one
      struct we need.
      
      After sched.hh no longer includes elf.hh and the dozen includes that
      it further included, we need to add missing includes to some of the
      code that included sched.hh and relied on its implict includes.
      714d313a
  19. Aug 18, 2013
    • Avi Kivity's avatar
      sched: reduce wakeup IPIs further · 5b05bade
      Avi Kivity authored
      Following 71fec998, we note that if any bit in the wakeup mask
      is set, then an IPI to that cpu is either imminent or already in flight, and
      we can elide our own IPI to that cpu.
      5b05bade
  20. Aug 16, 2013
    • Pekka Enberg's avatar
      sched: Avoid IPIs in thread::wake() · 71fec998
      Pekka Enberg authored
      Avoid sending an IPI to a CPU that's already being woken up by another
      IPI.  This reduces IPIs by 17% for a cassandra-stress run. Execution
      time is obviously unaffected because execution is bound by lock
      contention.
      
      Before:
      
      [penberg@localhost ~]$ sudo perf kvm stat -e kvm:* -p `pidof qemu-system-x86_64`
      ^C
       Performance counter stats for process id '610':
      
               6,909,333 kvm:kvm_entry
                       0 kvm:kvm_hypercall
                       0 kvm:kvm_hv_hypercall
               1,035,125 kvm:kvm_pio
                       0 kvm:kvm_cpuid
               5,149,393 kvm:kvm_apic
               6,909,369 kvm:kvm_exit
               2,108,440 kvm:kvm_inj_virq
                       0 kvm:kvm_inj_exception
                     982 kvm:kvm_page_fault
               2,783,005 kvm:kvm_msr
                       0 kvm:kvm_cr
                   7,354 kvm:kvm_pic_set_irq
               2,366,388 kvm:kvm_apic_ipi
               2,468,569 kvm:kvm_apic_accept_irq
               2,067,044 kvm:kvm_eoi
               1,982,000 kvm:kvm_pv_eoi
                       0 kvm:kvm_nested_vmrun
                       0 kvm:kvm_nested_intercepts
                       0 kvm:kvm_nested_vmexit
                       0 kvm:kvm_nested_vmexit_inject
                       0 kvm:kvm_nested_intr_vmexit
                       0 kvm:kvm_invlpga
                       0 kvm:kvm_skinit
                   3,677 kvm:kvm_emulate_insn
                       0 kvm:vcpu_match_mmio
                       0 kvm:kvm_update_master_clock
                       0 kvm:kvm_track_tsc
                   7,354 kvm:kvm_userspace_exit
                   7,354 kvm:kvm_set_irq
                   7,354 kvm:kvm_ioapic_set_irq
                     674 kvm:kvm_msi_set_irq
                       0 kvm:kvm_ack_irq
                       0 kvm:kvm_mmio
                 609,915 kvm:kvm_fpu
                       0 kvm:kvm_age_page
                       0 kvm:kvm_try_async_get_page
                       0 kvm:kvm_async_pf_doublefault
                       0 kvm:kvm_async_pf_not_present
                       0 kvm:kvm_async_pf_ready
                       0 kvm:kvm_async_pf_completed
      
            81.180469772 seconds time elapsed
      
      After:
      
      [penberg@localhost ~]$ sudo perf kvm stat -e kvm:* -p `pidof qemu-system-x86_64`
      ^C
       Performance counter stats for process id '30824':
      
               6,411,175 kvm:kvm_entry                                                [100.00%]
                       0 kvm:kvm_hypercall                                            [100.00%]
                       0 kvm:kvm_hv_hypercall                                         [100.00%]
                 992,454 kvm:kvm_pio                                                  [100.00%]
                       0 kvm:kvm_cpuid                                                [100.00%]
               4,300,001 kvm:kvm_apic                                                 [100.00%]
               6,411,133 kvm:kvm_exit                                                 [100.00%]
               2,055,189 kvm:kvm_inj_virq                                             [100.00%]
                       0 kvm:kvm_inj_exception                                        [100.00%]
                   9,760 kvm:kvm_page_fault                                           [100.00%]
               2,356,260 kvm:kvm_msr                                                  [100.00%]
                       0 kvm:kvm_cr                                                   [100.00%]
                   3,354 kvm:kvm_pic_set_irq                                          [100.00%]
               1,943,731 kvm:kvm_apic_ipi                                             [100.00%]
               2,047,024 kvm:kvm_apic_accept_irq                                      [100.00%]
               2,019,044 kvm:kvm_eoi                                                  [100.00%]
               1,949,821 kvm:kvm_pv_eoi                                               [100.00%]
                       0 kvm:kvm_nested_vmrun                                         [100.00%]
                       0 kvm:kvm_nested_intercepts                                    [100.00%]
                       0 kvm:kvm_nested_vmexit                                        [100.00%]
                       0 kvm:kvm_nested_vmexit_inject                                 [100.00%]
                       0 kvm:kvm_nested_intr_vmexit                                   [100.00%]
                       0 kvm:kvm_invlpga                                              [100.00%]
                       0 kvm:kvm_skinit                                               [100.00%]
                   1,677 kvm:kvm_emulate_insn                                         [100.00%]
                       0 kvm:vcpu_match_mmio                                          [100.00%]
                       0 kvm:kvm_update_master_clock                                  [100.00%]
                       0 kvm:kvm_track_tsc                                            [100.00%]
                   3,354 kvm:kvm_userspace_exit                                       [100.00%]
                   3,354 kvm:kvm_set_irq                                              [100.00%]
                   3,354 kvm:kvm_ioapic_set_irq                                       [100.00%]
                     927 kvm:kvm_msi_set_irq                                          [100.00%]
                       0 kvm:kvm_ack_irq                                              [100.00%]
                       0 kvm:kvm_mmio                                                 [100.00%]
                 620,278 kvm:kvm_fpu                                                  [100.00%]
                       0 kvm:kvm_age_page                                             [100.00%]
                       0 kvm:kvm_try_async_get_page                                   [100.00%]
                       0 kvm:kvm_async_pf_doublefault                                 [100.00%]
                       0 kvm:kvm_async_pf_not_present                                 [100.00%]
                       0 kvm:kvm_async_pf_ready                                       [100.00%]
                       0 kvm:kvm_async_pf_completed
      
            79.947992238 seconds time elapsed
      71fec998
    • Christoph Hellwig's avatar
      vfs: store a struct dentry in struct file · 42912366
      Christoph Hellwig authored
      We'll need this for any pathname related actions.
      42912366
    • Christoph Hellwig's avatar
      vfs: split dentries from vnodes · 85dda0a8
      Christoph Hellwig authored
      Create a new dentry structure for pathname components, following the Linux
      VFS model.  The vnodes are left-as is for now but are always fronted by
      dentries for pathname lookups.  In a second step they will be moved to
      use non-pathname indices.
      
      [penberg: fix open(O_CREAT|O_EXCL) breakage ]
      85dda0a8
    • Christoph Hellwig's avatar
      68a1624d
  21. Aug 14, 2013
  22. Aug 13, 2013
    • Glauber Costa's avatar
      simple alternative system · 2f453e38
      Glauber Costa authored
      I am proposing, with this patch, a very simple alternative system to serve as a
      basis for xen pv operations. The end goal is to patch the performance critical
      instructions in, but I will defer it until later since this is a performance
      optmization. Let's get that working first.
      
      However, I figured that if we are already writing the xen pv code enclosed in
      some kind of macro, then when we do patch, we won't have to change anything.
      
      That said, I don't expect to have a lot of pure pv users - It is 2013, and even
      VMWare discontinued their vmi, leaving Xen as the only relevant player. We
      don't need, then, a fully featured core-pv ops like Linux. This system of
      alternatives is simple enough to accomodate xen, and it works by providing two
      code blocks and a condition. The first block is executed if the condition is
      false, and the second if the condition is true.
      
      For future reference, note that we can use when patching by doing something
      very similar to Linux jump labels: we replace the branch with a jump
      instruction that just jumps to the right place (taken or not-taken part).  This
      brings simplicity and runtime efficiency at the expense of a little bit more
      icache pressure.
      2f453e38
    • Glauber Costa's avatar
      simple implementation of bio_finish · ea8a5891
      Glauber Costa authored
      This is used by subr_disk during bio flush operation
      ea8a5891
Loading