Commits · c29222c6bc09015c22a1b85979297d4387bdab75 · Verlässliche Systemsoftware / projects / osv

Nov 25, 2013

Start up shell and management web in parallel · c29222c6

Amnon Heiman authored 11 years ago


Start up shell and management web in parallel to make boot faster.  Note
that we also switch to latest mgmt.git which decouples JRuby and CRaSH
startup.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c29222c6

java: Support for loading multiple mains · 10d6f18b

Amnon Heiman authored 11 years ago


When using the MultiJarLoader as the main class, it will use a
configuration file for the java loading.  Each line in the file will be
used to start a main, you can use -jar in each line or specify a main
class.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Reviewed-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

10d6f18b

tests: mincore() tests for demand paging · 20aad632

Pekka Enberg authored 11 years ago


As suggested by Nadav, add tests for mincore() interraction with demand
paging.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

20aad632

tests: Anonymous demand paging microbenchmark · d4bcf559

Pekka Enberg authored 11 years ago


This adds a simple mmap microbenchmark that can be run on both OSv and
Linux.  The benchmark mmaps memory for various sizes and touches the
mmap'd memory in 4K increments to fault in memory.  The benchmark also
repeats the same tests using MAP_POPULATE for reference.

OSv page faults are slightly slower than Linux on first iteration but
faster on subsequent iterations after host operating system has faulted
in memory for the guest.

I've included full numbers on 2-core Sandy Bridge i7 for a OSv guest,
Linux guest, and Linux host below:

  OSv guest
  ---------

  Iteration 1

       time (seconds)
   MiB demand populate
     1 0.004  0.000
     2 0.000  0.000
     4 0.000  0.000
     8 0.001  0.000
    16 0.003  0.000
    32 0.007  0.000
    64 0.013  0.000
   128 0.024  0.000
   256 0.052  0.001
   512 0.229  0.002
  1024 0.587  0.005

  Iteration 2

       time (seconds)
   MiB demand populate
     1 0.001  0.000
     2 0.000  0.000
     4 0.000  0.000
     8 0.001  0.000
    16 0.002  0.000
    32 0.004  0.000
    64 0.010  0.000
   128 0.019  0.001
   256 0.036  0.001
   512 0.069  0.002
  1024 0.137  0.005

  Iteration 3

       time (seconds)
   MiB demand populate
     1 0.001  0.000
     2 0.000  0.000
     4 0.000  0.000
     8 0.001  0.000
    16 0.002  0.000
    32 0.005  0.000
    64 0.010  0.000
   128 0.020  0.000
   256 0.039  0.001
   512 0.087  0.002
  1024 0.138  0.005

  Iteration 4

       time (seconds)
   MiB demand populate
     1 0.001  0.000
     2 0.000  0.000
     4 0.000  0.000
     8 0.001  0.000
    16 0.002  0.000
    32 0.004  0.000
    64 0.012  0.000
   128 0.025  0.001
   256 0.040  0.001
   512 0.082  0.002
  1024 0.138  0.005

  Iteration 5

       time (seconds)
   MiB demand populate
     1 0.001  0.000
     2 0.000  0.000
     4 0.000  0.000
     8 0.001  0.000
    16 0.002  0.000
    32 0.004  0.000
    64 0.012  0.000
   128 0.028  0.001
   256 0.040  0.001
   512 0.082  0.002
  1024 0.166  0.005

  Linux guest
  -----------

  Iteration 1

       time (seconds)
   MiB demand populate
     1 0.001  0.000
     2 0.001  0.000
     4 0.002  0.000
     8 0.003  0.000
    16 0.005  0.000
    32 0.008  0.000
    64 0.015  0.000
   128 0.151  0.001
   256 0.090  0.001
   512 0.266  0.003
  1024 0.401  0.006

  Iteration 2

       time (seconds)
   MiB demand populate
     1 0.000  0.000
     2 0.000  0.000
     4 0.001  0.000
     8 0.001  0.000
    16 0.002  0.000
    32 0.005  0.000
    64 0.009  0.000
   128 0.019  0.001
   256 0.037  0.001
   512 0.072  0.003
  1024 0.144  0.006

  Iteration 3

       time (seconds)
   MiB demand populate
     1 0.000  0.000
     2 0.001  0.000
     4 0.001  0.000
     8 0.001  0.000
    16 0.002  0.000
    32 0.005  0.000
    64 0.010  0.000
   128 0.019  0.001
   256 0.037  0.001
   512 0.072  0.003
  1024 0.143  0.006

  Iteration 4

       time (seconds)
   MiB demand populate
     1 0.000  0.000
     2 0.001  0.000
     4 0.001  0.000
     8 0.001  0.000
    16 0.003  0.000
    32 0.005  0.000
    64 0.010  0.000
   128 0.020  0.001
   256 0.038  0.001
   512 0.073  0.003
  1024 0.143  0.006

  Iteration 5

       time (seconds)
   MiB demand populate
     1 0.000  0.000
     2 0.001  0.000
     4 0.001  0.000
     8 0.001  0.000
    16 0.003  0.000
    32 0.005  0.000
    64 0.010  0.000
   128 0.020  0.001
   256 0.037  0.001
   512 0.072  0.003
  1024 0.144  0.006

  Linux host
  ----------

  Iteration 1

       time (seconds)
   MiB demand populate
     1 0.000  0.000
     2 0.001  0.000
     4 0.001  0.000
     8 0.001  0.000
    16 0.002  0.000
    32 0.005  0.000
    64 0.009  0.000
   128 0.019  0.001
   256 0.035  0.001
   512 0.152  0.003
  1024 0.286  0.011

  Iteration 2

       time (seconds)
   MiB demand populate
     1 0.000  0.000
     2 0.000  0.000
     4 0.001  0.000
     8 0.001  0.000
    16 0.002  0.000
    32 0.004  0.000
    64 0.010  0.000
   128 0.018  0.001
   256 0.035  0.001
   512 0.192  0.003
  1024 0.334  0.011

  Iteration 3

       time (seconds)
   MiB demand populate
     1 0.000  0.000
     2 0.000  0.000
     4 0.001  0.000
     8 0.001  0.000
    16 0.002  0.000
    32 0.004  0.000
    64 0.010  0.000
   128 0.018  0.001
   256 0.035  0.001
   512 0.194  0.003
  1024 0.329  0.011

  Iteration 4

       time (seconds)
   MiB demand populate
     1 0.000  0.000
     2 0.000  0.000
     4 0.001  0.000
     8 0.001  0.000
    16 0.002  0.000
    32 0.004  0.000
    64 0.010  0.000
   128 0.018  0.001
   256 0.036  0.001
   512 0.138  0.003
  1024 0.341  0.011

  Iteration 5

       time (seconds)
   MiB demand populate
     1 0.000  0.000
     2 0.000  0.000
     4 0.001  0.000
     8 0.001  0.000
    16 0.002  0.000
    32 0.004  0.000
    64 0.010  0.000
   128 0.018  0.001
   256 0.035  0.001
   512 0.135  0.002
  1024 0.324  0.011

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

d4bcf559

mmu: MAP_POPULATE support for anon mmap() · fb24dc3e

Pekka Enberg authored 11 years ago


Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

fb24dc3e

mmu: Anonymous memory demand paging · c1d5fccb

Pekka Enberg authored 11 years ago


Switch to demand paging for anonymous virtual memory.

I used SPECjvm2008 to verify performance impact. The numbers are mostly
the same with few exceptions, most visible in the 'serial' benchmark.
However, there's quite a lot of variance between SPECjvm2008 runs so I
wouldn't read too much into them.

As we need the demand paging mechanism and the performance numbers
suggest that the implementation is reasonable, I'd merge the patch as-is
and see optimize it later.

  Before:

    Running specJVM2008 benchmarks on an OSV guest.
    Score on compiler.compiler: 331.23 ops/m
    Score on compiler.sunflow: 131.87 ops/m
    Score on compress: 118.33 ops/m
    Score on crypto.aes: 41.34 ops/m
    Score on crypto.rsa: 204.12 ops/m
    Score on crypto.signverify: 196.49 ops/m
    Score on derby: 170.12 ops/m
    Score on mpegaudio: 70.37 ops/m
    Score on scimark.fft.large: 36.68 ops/m
    Score on scimark.lu.large: 13.43 ops/m
    Score on scimark.sor.large: 22.29 ops/m
    Score on scimark.sparse.large: 29.35 ops/m
    Score on scimark.fft.small: 195.19 ops/m
    Score on scimark.lu.small: 233.95 ops/m
    Score on scimark.sor.small: 90.86 ops/m
    Score on scimark.sparse.small: 64.11 ops/m
    Score on scimark.monte_carlo: 145.44 ops/m
    Score on serial: 94.95 ops/m
    Score on sunflow: 73.24 ops/m
    Score on xml.transform: 207.82 ops/m
    Score on xml.validation: 343.59 ops/m

  After:

    Score on compiler.compiler: 346.78 ops/m
    Score on compiler.sunflow: 132.58 ops/m
    Score on compress: 116.05 ops/m
    Score on crypto.aes: 40.26 ops/m
    Score on crypto.rsa: 206.67 ops/m
    Score on crypto.signverify: 194.47 ops/m
    Score on derby: 175.22 ops/m
    Score on mpegaudio: 76.18 ops/m
    Score on scimark.fft.large: 34.34 ops/m
    Score on scimark.lu.large: 15.00 ops/m
    Score on scimark.sor.large: 24.80 ops/m
    Score on scimark.sparse.large: 33.10 ops/m
    Score on scimark.fft.small: 168.67 ops/m
    Score on scimark.lu.small: 236.14 ops/m
    Score on scimark.sor.small: 110.77 ops/m
    Score on scimark.sparse.small: 121.29 ops/m
    Score on scimark.monte_carlo: 146.03 ops/m
    Score on serial: 87.03 ops/m
    Score on sunflow: 77.33 ops/m
    Score on xml.transform: 205.73 ops/m
    Score on xml.validation: 351.97 ops/m

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c1d5fccb

mmu: Optimistic locking in populate() · 7e568ba0

Pekka Enberg authored 11 years ago


Use optimistic locking in populate() to make it robust against
concurrent page faults.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7e568ba0

mmu: VMA permission flags · 8a56dc8c

Pekka Enberg authored 11 years ago


Add permission flags to VMAs. They will be used by mprotect() and the
page fault handler.

Reviewed-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

8a56dc8c

loader.py: add commands for function duration analysis · af723084

Tomasz Grabiec authored 11 years ago


Duration analysis is based on trace pairs which follow the convention
in which function entry generates trace named X and ends with either
trace X_ret or X_err. Traces which do not have an accompanying return
tracepoint are ignored.

New commands:

  osv trace summary

      Prints execution time statistics for traces

  osv trace duration {function}

      Prints timed traces sorted by duration in descending order.
      Optionally narrowed down to a specified function

gdb$ osv trace summary
Execution times [ms]:
name          count      min      50%      90%      99%    99.9%      max    total
vfs_pwritev       3    0.682    1.042    1.078    1.078    1.078    1.078    2.801
vfs_pwrite       32    0.006    1.986    3.313    6.816    6.816    6.816   53.007

gdb$ osv trace duration
0xffffc000671f0010  1    1385318632.103374   6.816 vfs_pwrite
0xffffc0003bbef010  0    1385318637.929424   3.923 vfs_pwrite

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

af723084

loader.py: extract trace iteration so that it can be reused · 9e321062

Tomasz Grabiec authored 11 years ago


Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9e321062

loader.py: add wrapper for intrusive list · 6cc939a6

Tomasz Grabiec authored 11 years ago

The iteration logic was duplicated in two places. The patches yet to
come would add yet another place, so let's refactor first.

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

6cc939a6

libc/network: feof shouldn't be used on a closed file · df6278fe

Raphael S. Carvalho authored 11 years ago


Calling feof on a closed file isn't safe, and the result is undefined.
Found while auditing the code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

df6278fe

sched: fix iteration across timer list · 9c3308f1

Avi Kivity authored 11 years ago


We iterate over the timer list using an iterator, but the timer list can
change during iteration due to timers being re-inserted.

Switch to just looking at the head of the list instead, maintaining no
state across loop iterations.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Tested-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9c3308f1

sched: prevent a re-armed timer from being ignored · 870d8410

Avi Kivity authored 11 years ago


When a hardware timer fires, we walk over the timer list, expiring timers
and erasing them from the list.

This is all well and good, except that a timer may rearm itself in its
callback (this only holds for timer_base clients, not sched::timer, which
consumes its own callback).  If it does, we end up erasing it even though
it wants to be triggered.

Fix by checking for the armed state before erasing.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Tested-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

870d8410

Fix possible deadlock in condvar · 15a32ac8

Nadav Har'El authored 11 years ago

When a condvar's timeout and wakeup race, we wait for the concurrent
wakeup to complete, so it won't crash. We did this wr.wait() with
the condvar's internal mutex (m) locked, which was fine when this code
was written; But now that we have wait morphing, wr.wait() waits not
just for the wakeup to complete, but also for the user_mutex to become
available. With m locked and us waiting for user_mutex, we're now in
deadlock territory - because a common idiom of using a condvar is to
do the locks in opposite order: lock user_mutex first and then use the
condvar, which locks m.

I can't think of an easy way to actually demonstrate this deadlock,
short of having a locked condvar_wait timeout racing with condvar_wake_one
racing and then an additional locked condvar operation coming in
concurrently, but I don't have a test case demonstrating this.
I am hoping it will fix the lockups that Pekka is seeing in his
Cassandra tests (which are the reason I looked for possible condvar
deadlocks in the first place).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Tested-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

15a32ac8

sched: delay initialization of early threads · d91d7799

Glauber Costa authored 11 years ago


The problem with sleep, is that we can initialize early threads before the
cpu itself is initialized. If we note what goes on in init_on_cpu, it should
become clear:

void cpu::init_on_cpu()
{
    arch.init_on_cpu();
    clock_event->setup_on_cpu();
}

When we finally initialize the clock_event, it can get lost if we already have
pending timers of any kind - which we may, if we have early threads being
start()ed before that. I have played with many potential solutions, but in the
end, I think the most sensible thing to do is to delay initialization of early
threads to the point when we are first idle. That is the best way to guarantee
that everything will be properly initialized and running.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

d91d7799

Nov 22, 2013

README.md: Fix typos · f336eaa8

ufokaradagli@gmail.com authored 11 years ago


Fixed a couple of spelling mistakes in README.md

Signed-off-by: Omer Karadagli <ufokaradagli@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f336eaa8

epoll: remove epoll registrations after close() · c8d212eb

Avi Kivity authored 11 years ago


To prevent leaks when a file is close()d without an EPOLL_CTL_DEL,
record epoll registrations in the file structure and remove them
when the file is destroyed.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

c8d212eb

tst-tcp: close socket before retaking the lock · 0f085966

Avi Kivity authored 11 years ago


Avoid possible blocking.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0f085966

tst-tcp: clean up better after threads · 4ed4e4da

Avi Kivity authored 11 years ago


Make sure to wait until the running thread count drops to zero
before destroying things.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

4ed4e4da

vfs: prevent double initialization of file::f_lock · b38cd33b

Avi Kivity authored 11 years ago

Since it's initialized with the constructor, the mutex is already initialized.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

b38cd33b

vfs: give 'file' a proper destructor · cfb1d736

Avi Kivity authored 11 years ago


use file::operator delete to ensure it is reclaimed via rcu,
and let the rest of the cleanup happen via the destructor.

This allows us to add other members to file, and let the standard
construction/destruction sequence take place.

Note the constructor is already used (falloc_noinstall()).

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

cfb1d736

epoll: switch epoll_object to using file pointers · 7185874b

Avi Kivity authored 11 years ago


Holding filerefs causes close() to be delayed indefinitly in case
the user "forgets" to EPOLL_CTL_DEL the file before close().

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

7185874b

test.py: add tst-except.so · 2f2b5a4f
Pekka Enberg authored 11 years ago
```
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
```
2f2b5a4f

tst-except: add test for _Unwind_Resume · 9ce64a11

Avi Kivity authored 11 years ago


Previously, _Unwind_Resume wasn't available, so functions that handled an
exception implicitly (by running a few destructors) crashed.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9ce64a11

build: bring back libgcc_s.so · be565320

Avi Kivity authored 11 years ago


Commit c9e61d4a ("build: link libstdc++, libgcc_s only once") threw
away libgcc_s.so since we already link with libgcc.a and libgcc_eh.a, which
provide the same symbols, and since having the same symbols in multiple
objects violates certain C++ rules.

However, libgcc_eh.a provides certain symbols only as local symbols, which
means they aren't available to the payload.  This manifests itself in errors
such as failing to find _Unwind_Resume if an exception is thrown.

(This is likely due to the requirement that mulitple objects linked with
libgcc_eh.a work together, which also brings some confidence that the ODR
violations of having two versions of the library won't bite us).

Fix the problem by adding libgcc_s.so to the filesystem and allowing
the payload to link to it.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

be565320

Nov 21, 2013

Replace numbers in prio.hh by automatically defined numbers · 147de06c

Nadav Har'El authored 11 years ago

prio.hh defines various initialization priorities. The actual numbers
don't matter, just the order between them. But when we add too many
priorities between existing ones, we may hit a need to renumber. This
is plain ugly, and reminds me of Basic programming ;-)

So this patch switches to an enum (enum class, actually).
We now just have a list of priority names in order, with no numbers.

It would have been straightforward, if it weren't for a bug in GCC
(see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59211

) where the
"init_priority" attribute doesn't accept the enum (while the "constructor"
attribute does). Luckily, a simple workaround - explicitly casting to
int - works.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

147de06c

tests: add tcp connection test · be940896

Avi Kivity authored 11 years ago

The test creates and destroys threads, each of which creates a random number
of connections, each transferring a random number of bytes to an echo server.

This is used to stress the tcp/ip stack.

The test is portable, and builds on the host with the command

g++ -O2 -g3 -pthread -std=gnu++11 -lboost_program_options -lboost_system tests/tst-tcp.cc

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

be940896

epoll: switch to file based implementation using do_poll() · e771caab

Avi Kivity authored 11 years ago


Instead of using file descriptors and poll(), use do_poll().  This allows
us to get rid of user supplied fds early, which is important as fd lifetime
is decoupled from epoll lifetime.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e771caab

poll: refactor poll() in terms of file pointers, not file descriptors · 0b68144e

Avi Kivity authored 11 years ago

With epoll(), the lifetime of an ongoing poll may be longer than the
lifetime of a file descriptor; if an fd is close()d then we expect it
to be silently removed from the epoll.

With the current implementation of epoll(), which just calls poll(), this is
impossible to do correctly since poll() is implemented in terms of file
descriptor.

Add an intermedite do_poll() that works on file pointers. This allows a
refactored epoll() to convert file descriptors to file pointers just once,
and then a close()d and re-open()ed descriptor can be added without a problem.

As a side effect, a lot of atomic operations (fget() and fdrop()) are saved.

Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

0b68144e

vfs: Make error messages when mounting rootfs more verbose. · f0d96816

Raphael S. Carvalho authored 11 years ago


Provide a better error message instead of simply printing the error
codes.

Before:

  failed to create /dev, error = 17

After:

  failed to create /dev, error = File exists

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

f0d96816

scripts: Change mkzfs.py to call run.py with unsafe cache. · 56cdfc49

Raphael S. Carvalho authored 11 years ago

This patch adds the unsafe-cache option to run.py and changes mkzfs.py
to always call run.py with this option enabled.
Thus, we're doing this change just for the build run (Suggested by Nadav Har'El).
The main goal is to boost the time it takes to complete the entire process.

Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

56cdfc49

mmu: allow early tlb flushes · 9dc7cc25

Glauber Costa authored 11 years ago

TLB flushes cannot happen early, because we will try to send IPIs around before
they are ready to go. Now, the funny thing is *why* that happen:

We test for the size of the cpu vector to be 1. But before the cpus are
initialized, that vector is empty. Because there is a limit on how soon we can
initialize a cpu(), let's change the test to also acount for an empty vector.
It should be obvious and clear that when we have an empty vector, only one cpu
is present.

I have triggered this in the context of my last patchset for threads. My test
script was set to -c1 (sorry about that), and as soon as I tested it with SMP
it exploded here.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

9dc7cc25

Nov 20, 2013

sched: start and initialize early threads · 63216e85

Glauber Costa authored 11 years ago

We may have threads that were initialized and started very early, before
sched::init() took place. We can easily identify such threads: they are all
threads that are in the thread list so far with the exception of the main
thread.

For those, we finish their initialization so they are now in a safe state.
Also, some of them may have been started already. Since we cannot really start
anything before the main thread, they were put in as special state called
"prestarted". Every thread found in this state is started at this moment.

Note how this code needs to run in the main thread itself, since we depend on
initialization that will only happen inside switch_to_first to properly
function those procedures.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

63216e85

sched: add a new thread state, prestarted · e005a738

Glauber Costa authored 11 years ago

It may be that a thread that is initialized early is also started early. We need to
somehow mark that thread as already started, so we can start it for real later when
the scheduler is ready to go. We will do this by adding an extra state, prestarted.

Later on, we will take action to start those threads properly.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e005a738

sched: avoid dereferencing current() fields on thread creation · d6ea366f

Glauber Costa authored 11 years ago

Threads that are created very early will see this field with a NULL address. We
should test against it before dereferencing. If we found current not to be
available, we skip some steps or use default values.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

d6ea366f

sched: start thread list early · 9381f4b6

Glauber Costa authored 11 years ago


Since the thread list does not depend on nothing but the memory allocator,
allocate it as early as we can.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

9381f4b6

Power-off, instead of halt, when loader can't run command · 3615b885

Nadav Har'El authored 11 years ago


Currently, when we try to run an invalid shared object (e.g, run.py -e aaa)
loader.cc calls abort(). This patch changes it to use osv::poweroff().

This is useful, for example, to measure how much time our boot/poweroff
cycle takes, without running any payload, by doing

	time scripts/run.py -e aaa

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

3615b885

mmu: move huge_page_size to mmu.hh · e5cf0f1d

Glauber Costa authored 11 years ago

No reason at all for page_size to be in mmu.hh but huge_page_size in mmu.cc.
Move it, so we can also use huge_page_size outside the mmu.cc scope.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>

e5cf0f1d

Nov 19, 2013

Explicitly request alignment when allocating per-cpu area · e9549266

Nadav Har'El authored 11 years ago

Commit ed808267 used malloc() to allocate
the per-cpu variables area. As Avi pointed out, we need this area to be
aligned like the strictest alignment of any per-cpu variable. The strictest
alignment we need is probably CACHELINE_ALIGNED (64 bytes), but it's easiest
just to require 4096-byte alignment, and this is what the code prior to the
above patch did.

The above commit worked because luckily enough, our malloc() does return
page-aligned memory for large allocations. But it's possible that this will
not be the case in the future. So this patch switches to use aligned_alloc()
instead, explicitly requesting a 4096-byte-aligned block of memory.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>

e9549266