Commits · 22c859332a50360763e54f9b883494145b46fb12 · Verlässliche Systemsoftware / projects / osv

Sep 02, 2013

Pekka Enberg authored 11 years ago

Add simple tests for munmap() for file-backed memory maps. This exposes
a limitation in munmap() not writing out MAP_SHARED mappings.

22c85933

mman: Write out file-backed memory maps in munmap() · c66f9d8b

Pekka Enberg authored 11 years ago

Use the new mmu::msync() function to make sure file-backed memory maps
are written out to disk in munmap().

c66f9d8b

mmu: msync for file-backed memory maps · 1691c89d

Pekka Enberg authored 11 years ago

This adds simple msync() implementation for file-backed memory maps. It
uses the newly added 'file_vma' data structure to write out and fsync
the msync'd region as suggested by Avi Kivity.

1691c89d

mmu: File-backed VMAs · 3510a5ea

Pekka Enberg authored 11 years ago

Add a new 'file_vma' class that extends 'vma'. This is needed to keep
track of fileref and offset for file-backed VMAs for msync().

3510a5ea

error: fix inverted condition in error_to_libc() · 7a21aa3e
Avi Kivity authored 11 years ago
```
Spotted by Pekka.
```
7a21aa3e

osv: add error class · c0f7f0f9

Avi Kivity authored 11 years ago

Different source bases have different error conventions; libc has 0/-1+errno,
while the rest os the source base uses 0/error.

Wrap errors in a class to prevent confusion between the two.

c0f7f0f9

Sep 01, 2013
- fixing missing commit (adding artifactory wasnt in) · e8c8a90f
  narkisr authored 11 years ago
  
  e8c8a90f
- Merge branch 'master' of github.com:cloudius-systems/osv · a8a76b88
  narkisr authored 11 years ago
  
  a8a76b88
- latest master · 30333580
  narkisr authored 11 years ago
  
  30333580
- adding managment.so and tools.jar · f4cedea2
  narkisr authored 11 years ago
  
  f4cedea2
- using artifactory · 9404b92d
  Ronen Narkis authored 11 years ago
  
  9404b92d
- mman: Fix missing libc.hh include · 584c1737
  Pekka Enberg authored 11 years ago
  
  Fixes the following build brekage caused by commit a7d6b269 ("mman: Use libc_error() in mprotect()"): ../../libc/mman.cc: In function ‘int mprotect(void*, size_t, int)’: ../../libc/mman.cc:36:33: error: ‘libc_error’ was not declared in this scope return libc_error(EINVAL); ^ ../../libc/mman.cc:41:33: error: ‘libc_error’ was not declared in this scope return libc_error(ENOMEM); ^ CXX libc/pipe_buffer.o CXX libc/pipe.o make[1]: *** [libc/mman.o] Error 1
  584c1737
- mman: Use libc_error() in mprotect() · a7d6b269
  Pekka Enberg authored 11 years ago
  
  a7d6b269
Aug 29, 2013

adding crash jar · 1ab19610
narkisr authored 11 years ago

1ab19610
mempool: use DROP_LOCK() · 72d21c49
Avi Kivity authored 11 years ago

72d21c49

mutex: add DROP_LOCK · db907b1a

Avi Kivity authored 11 years ago

This is used for temporarily dropping a lock in a lexical scope, and
reacquiring it after an exit from the scope (similar to wait_until(mutex),
but without the waiting):

  WITH_LOCK(preempt_lock) {
     // do some stuff
     while (not enough resources) {
        DROP_LOCK(preempt_lock) {
           acquire more resources
        }
        // reload anything that may have changed after DROP_LOCK()
     }
     // do more stuff with the acquired resources
  }

Note that DROP_LOCK() doesn't work will with recursively-taken locks.

db907b1a

rcu: add compiler barrier on rcu read unlock · d92eac12
Avi Kivity authored 11 years ago
```
We don't want the compiler moving reads after a possible rcu_defer().
```
d92eac12
deleting jars.bin · aa10d564
narkisr authored 11 years ago

aa10d564
Adding mgmt · 77995735
narkisr authored 11 years ago

77995735
mman: Add tracepoints for mmap() and munmap() · aad01cf7
Pekka Enberg authored 11 years ago

aad01cf7
Describe further the dhcp, application execution, etc · 5d524c12
Dor Laor authored 11 years ago

5d524c12
Flex and bison are required to build the acpi library · 441cae8e
Dor Laor authored 11 years ago

441cae8e

java.so: Allow both -classpath and -jar · d96bb41a

Nadav Har'El authored 11 years ago

In the existing code, each -classpath or -jar paramter replaced the
classpath. This is inconvenient (and unlike the Unix "java" program).
Better just add to the classpath.

For example, now we can run:

  java.so -cp /java/cli.jar -jar /java/web.jar app

  Which runs web.jar's main class, but adds both cli.jar and web.jar
  to the classpath.

d96bb41a

Aug 28, 2013

mbufs: use an entire page for jumbop zone allocations · 0d466fab

Glauber Costa authored 11 years ago

Xen has hard requirements on page transfers, and how to feed the grant tables.
The address need to be page aligned, since the pfns and not addresses are used,
and we need to provide at least a full page per buffer, since the hypervisor is
free to fill any data within the page.

To achieve that, the netfront driver will use m_cljget to attach an extended
buffer to the mbuf, from the jumbop zone, since they are page-sized. However,
two problems arise from this:

1) Allocating a page goes through malloc_large. Our implementation of malloc_large
is currently terribly inefficient, and that creates a very heavy contention site.

What I am doing with this patch is to switch our uma implementation to
alloc_page / free_page instead of malloc if the caller of zcreate so specified
(and then of course, specify it for the jumbop cache)

2) The refcount that is attached in the end of the buffer would either extend the
buffer to 4100 bytes - defeating our purpose, or then the buffer would have to be
PAGE_SIZE - 4, to accomodate for the refcount. But since the hypervisor will write
to the whole page, it will eventually overwrite the refcount.

To address that, I am allocating an external reference counter. BSD already
have some infrastructure to do that, and I am taking advantage of this.
However, I have found no way of implementing this in a way in which the
reference count can be easily deduceable from the address of the extended
buffer, without having the supporting mbuf to start from. Any external data
structure such as hashes would probably make freeing way too slow. Thankfully,
uma_find_refcnt and the UMA_ZONE_REFCNT seems to be used mostly in the
setup/destruction phase (the mbuf refcount is used directly, open coded). So my
proposal here is to remove the UMA_ZONE_REFCNT for that zone.

0d466fab

work around xen x2apic bug · cc3d517a

Glauber Costa authored 11 years ago

The x2APIC specification says that reading from the X2APIC_ID MSR should return
the physical apic id of the current processor. However, the Xen implementation
(as of 4.2.2) is broken, and reads actually return old style xAPIC id. Even if
they fix it, we still have HVs deployed around that will return the wrong ID.
We can work around this by testing if the returned APIC id is in the form (id
<< 24), since in that case, the first 24 bits will all be zeroed. Then at least
we can get this working everywhere. This may pose a problem if we want to ever
support more than 1 << 24 vCPUs (or if any other HV has some random x2apic
ids), but that is highly unlikely anyway.

cc3d517a

apic: bringup cpus individually instead of all at the same time · 5cb16020

Glauber Costa authored 11 years ago

As I have described in a previous patch, the Xen hypervisor has a very nasty
bug that causes all of the x2apic msr writes to trigger a GPF. Although the
request proceeds fine despite the GPF, it does bring a problem for all-but-self
style init sequences we are using: after "failing" (succeeding but returning
failure) to deliver the interrupt for the first cpu in the group, xen will
break the loop, therefore not delivering the SIPIs to other cpus in the system
at all. We can work around that by delivering interrupts to each cpu
individually, instead of all-but-self.

5cb16020

implement wrmsr_safe · a7ea5784

Glauber Costa authored 11 years ago

Unfortunately, the Xen hypervisor has a very nasty bug (seems to be fixed by a
2013 patch - which means that although it is fixed, a lot of hypervisors will
have it), that causes all of the x2apic msr writes to init related registers
(INIT, SIPI, etc) trigger a GPF. The way to work around this, is to implement a
form of "wrmsr_safe".

a7ea5784

trivial: remove device debug messages · c6bc3478

Glauber Costa authored 11 years ago

I ended up forgetting to remove some kprintfs from device.c that were inserted
during Xen's blkfront development

c6bc3478

gdb: Add mmap info to 'osv mem' · 34efd764

Pekka Enberg authored 11 years ago

Now that we can walk through the vma list, add mmap numbers to 'osv
mem':

  (gdb) osv mem
  Total Memory: 4294564864 Bytes
  Mmap Memory:  3278278656 Bytes (76.34%)
  Free Memory:  474492928 Bytes (11.05%)

34efd764

gdb: 'osv mmap' for inspecting vmas · 448ef255
Pekka Enberg authored 11 years ago

448ef255

Aug 27, 2013

Fix mincore() on non-mmap()ed memory · 6924f7db

Nadav Har'El authored 11 years ago

Commit 65afd075 fixed mincore() to recognize
unmapped addresses. However, it used mmu::ismapped() which just checks for
mmap()'ed addresses, and doesn't know about malloc()ed memory. This causes
trouble for libunwind (which we use for backtrace()) which tests mincore()
on an on-stack variable, and for non-pthread threads, this stack might be
malloc'ed, not mmap'ed.

So this patch adds mmu::isreadable(), which checks that a given memory range
is all readable (this memory can be mmapped, malloced, stack, whatever).
mincore() now uses that.

mmu::isreadable() is implemented, following Avi's idea, by trying to read,
with safe_load(), one byte from every page in the range. This approach is
faster than page-table-walking especially for one-byte checks (which all
libunwind uses anyway), and also very simple.

6924f7db

Test mincore() on stack and malloc()ed memory · 73cc470d

Nadav Har'El authored 11 years ago

Unlike msync(), mincore() should also work on non-mmapped memory,
such as stack and malloc()ed memory. Currently it doesn't - it
fails on malloc()ed memory and only sometimes works on stacks (works
on pthread stacks which are mmapped, but not on sched::thread stacks
which are malloced by default).

This patch adds a test to tst-mmap.cc to demonstrate this problem.
The test currently fails, will be fixed in a follow-up patch.

73cc470d

mempool.c: trace large allocations · 0a798e4d

Glauber Costa authored 11 years ago

Most of the performance problems I have found on Xen were due to the fact that
we were hitting malloc_large consistently, for allocations that we should be
able to service in some other way. Because malloc_large in our implementation
is such a bottleneck, it was very useful for me to have separate tracepoints
for them. I am then proposing for inclusion.

0a798e4d

Fix deadlock in leak detector · 227eb39b

Nadav Har'El authored 11 years ago

Commit 65afd075 that fixed mincore()
exposed a deadlock in the leak detector, caused by two threads taking
two locks in opposite order:

Thread 1: malloc() does alloc_tracker::remember(). This takes the tracker
lock and calls backtrace() calling mincore() which takes the
vma_list_mutex.

Thread 2: mmap() does mmu::allocate() which takes the vma_list_mutex and
then through mmu::populate::small_page calls memory::alloc_page() which
calls alloc_tracker::remember() and takes the tracker lock.

This patch fixes this deadlock: alloc_tracker::remember() will now drop its
lock while running backtrace(), as the lock is only needed to protect the
allocations[] array. We need to retake the lock after backtrace() completes,
to copy the backtrace back to the allocations[] array.

Previously, the lock's depth was also (ab)used for avoiding nested
allocation tracking (e.g., tracking of memory allocation done inside
backtrace() itself), but now that backtrace() is run without the lock,
we need a different mechanism - a per-thread "in_tracker" flag, which
is turned on inside the alloc_tracker::remember()/forget() methods.

227eb39b

docs: fix netperf instructions · 6f56f6a5
Glauber Costa authored 11 years ago
```
This allows lazy people like me to just copy the instructions
```
6f56f6a5

cpu: initialize the FPU and CSR register · 04ddff7a

Glauber Costa authored 11 years ago

We can't trust the state of the FPU and the CSR registers to be always sane.
Apparently, they aren't on at least one version of Xen (which happens to be
the one I am using) Initialize it manually for all CPUs on bringup.

04ddff7a

xen: correctly ack interrupts · bcf77dc9

Glauber Costa authored 11 years ago

In the xen interrupt code, I have made the mistake of exchanging the previous
value of _irq_pending with true, which means that we were constantly polling
for data in the interrupt threads.

This was responsible for the latency spikes I was seeing. The simple "ping"
test still shows bad results in absolute terms, but at least now the spikes are
gone.

bcf77dc9

Aug 26, 2013

Avoid including elf.hh from sched.hh · 714d313a

Nadav Har'El authored 11 years ago

sched.hh included elf.hh, just so it can refer to the elf::tls_data
type. But now that we have rcu.hh which includes sched.hh and therefore
elf.hh, if we wish to use rcu in elf.hh (we'll do this in a later patch),
we have an include loop mess.

So better not include elf.hh from sched.hh, and just declare the one
struct we need.

After sched.hh no longer includes elf.hh and the dozen includes that
it further included, we need to add missing includes to some of the
code that included sched.hh and relied on its implict includes.

714d313a

signal: avoid nested signals · 4af36771

Avi Kivity authored 11 years ago

A signal within a signal handler is really bad news, abort when it happens
to let the developers debug it.

4af36771

mmu: don't pass really bad faults to the application · 6f464e76

Avi Kivity authored 11 years ago

Trying to execute the null pointer, or faults within the kernel code, are
a really bad sign and it's better to abort early with them.

6f464e76