- Jan 10, 2014
-
-
Glauber Costa authored
This patch introduces the memory reclaimer thread, which I hope to use to dispose of unused memory when pressure kicks in. "Pressure" right now is defined to be when we have only 20 % of total memory available. But that can be revisited. The way it will work is that each memory user that is able to dispose of its memory will register a shrinker, and the reclaimer will loop through them. However, the current "loop through all" only "works" because we have only one shrinker being registered. When other appears, we need better policies to drive how much to take, and from whom. Memory allocation will now wait if memory is not available, instead of aborting. The decision of aborting should belong to the reclaimer and no one else. We should never expect to have an unbounded and more importantly, all opaque, number of shrinkers like Linux does. We have control of who they are and how they behave, so I expect that we will be able to make a lot better decisions in the long run. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Following an early suggestion from Nadav, I am trying to use semaphores for the balloon instead of keeping our own queue. For that to work, I need to have a bit more functionality that may not belong in the main balloon class. Namely: 1) I need to query for the presence of waiters (and maybe in the future for the number of waiters) 2) I need a special post that would allow me to make sure that we are almost posting at most as much we're waiting for, and nothing more. This patch transforms the post method in an unlocked version (and exposes a trivial version that just locks around it) and make other changes necessary to allow subclassing Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This will be useful when we shrink, so we know how much memory we newly released for system consumption. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
operate so far operates in a page range and at the very most sets a success flag somewhere. I am here extending the API to allow it to return how much data it manipulated. So as an example, if we fault in 2Mb in an empty range, it will return 2 << 20. But if fault in the same 2Mb in a range that already contained some sparse 4k pages, we will return 2 << 20 - previous_pages. That will be useful to count memory usage in certain VMAs. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
When we start using the JVM balloon, our memcpy could fail for valid reasons when the JVM is moving memory that is now in an unmapped region. To handle that, register a fixup that will trigger a JVM call when the fault happens. If all goes well, we will be able to continue normally. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Takuya ASADA authored
On VMware, pci_readw(PCI_CFG_DEVICE_ID) returns the *vendor ID*. pci_readw(PCI_CFG_VENDOR_ID) returns vendor ID as well. Compare to FreeBSD implementation of read/write PCI config space, FreeBSD masks lower bit of offset when write to PCI_CONFIG_ADDRESS, and adds lower bit of offset to PCI_CONFIG_DATA. http://fxr.watson.org/fxr/source/amd64/pci/pci_cfgreg.c#L206 This patch changes accessing method in OSv to the FreeBSD way. Tested on QEMU/KVM and VMware. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Takuya ASADA <syuu@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
This patch starts to solve both issue #142 ("Support MONOTONIC_CLOCK") and issue #81 (use <chrono> for time). First, it adds an uptime() function to the "clock" interface, and implements it for kvm/xen/hpet by returning the system time from which we subtract the system time at boot (but not adding any correction for wallclock). Second, it adds a new std::chrono-based interface to this clock, in a new header file <osv/clock.hh>. Instead of the old-style clock::get()->uptime(), one should prefer osv::clock::uptime::now(). This returns a std::chrono::time_point which is type-safe, in the sense that: 1. It knows what its epoch is (i.e., that it belongs to osv::clock::uptime), and 2. It knows what its units are (nanoseconds). This allows the compiler to prevent a user from confusing measurements from this clock with those from other clocks, or making mistakes in its units. Third, this patch implements clock_gettime(MONOTONIC_CLOCK), using the new osv::clock::uptime::now(). Note that though the new osv::clock::uptime is almost identical to std::chrono::steady_clock, they should not be confused. The former is actually OSv's implementation of the latter: steady_clock is implemented by the C++11 standard library using the Posix clock_gettime, and that is implemented (in this patch) using osv::clock::uptime. With this patch, we're *not* done with either issues #142 or #81. For issue #142, i.e., for supporting MONOTONIC_CLOCK in timerfd, we need OSv's timers to work on uptime(), not on clock::get()->time(). For issue #81, we should add a osv::clock::wall type too (similar to what clock::get()->time() does today, but more correctly), and use either osv::clock::wall or osv::clock::uptime everywhere that clock::get()->time() is currently used in the code. clock::get()->time() should be removed. Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Currently the parameter was read from the generated Makefile which was not re-generated on incremental build. The fix is to move the default to build.mk, this way the default will always be picked unless masked by command line argument. Fixes #153 Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 09, 2014
-
-
Tomasz Grabiec authored
To start netserver inside OSv just do: make image=netperf sudo scripts/run.py -nv Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
This problem was found when running 'tests/tst-zfs-mount.so' multiple times. At the first time, all tests succeed, however, a subsequent run would fail at the test: 'mkdir /foo/bar', the error message reported that the target file already exists. The test basically creates a directory /foo/bar, rename it to /foo/bar2, then remove /foo/bar2. How could /foo/bar still be there? Quite simple. Our shutdown function calls unmount_rootfs() which will attempt to unmount zfs with the flag MNT_FOURCE, however, it's not being passed to zfs_unmount(), neither unmount_rootfs() tests itself the return status (which was always getting failures previously). So OSv is really being shutdown while there is remaining data waiting to be synced with the backing store. As a result, inconsitency. This problem was fixed by passing the flag to VFS_UNMOUNT which will now unmount the fs properly on sudden shutdowns. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Processing of this manifest was inside JVM-specific code which caused the manifest was not processed if there was no java application in the image. For example: make image=empty check ... run_main(): cannot execute tests/tst-af-local.so. Powering off. Test tst-af-local.so FAILED make: *** [check] Error 1 Let's move it to the main manifest processing function. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 08, 2014
-
-
Tomasz Grabiec authored
In some workloads it noticably improves performance. I measured 6% increase in netperf throughput on my laptop. Object file size is only slightly bloated: loader.elf (O2): 47246227 loader.elf (O3): 51272625 (+8.5%) Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pekka Enberg authored
Changes: - web: Added /upload view class - Shell: Rewrite 'ls' and add formatting/sort flags - Update the jvm API to be more verbose - adding REST API specification: api, os, jvm Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
The current implementation of memmove is a PITA (I mean the bread, of course) to decode if a fault happens. We have very little control of where exactly in the code the fault happens, therefore it is difficult to reason about it. This patch implements memmove in terms of memcpy + memcpy_backwards. For those, we can have specific fixups in the possible fault sites, that will allow us to decode the faults with ease. Note that originally, the only reason why the first branch was not a memcpy is that we would like to handle alignment. Since our implementation of memcpy is fast enough, we can just ignore that and we will end up being even faster. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This patch provides a backwards version of memcpy. It works all the same, but will start the copy from dst + n <= src + n, instead of dst <= src. That is needed for memmove when the source and destination operands overlap. Being a nonstandard interface, I've just named it "memcpy_backwards" Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
There was a small bug in the free memory tracking code that I've only hit recently. I was wrong in assuming that in the first branch for huge page allocation, where we erase the entire range, we should account for N bytes. This assumption came from my - wrong - understanding that we would do that when the range is exactly N bytes. Looking at the code with fresh eyes, that is definitely not what happens. In my previous stress test we were hitting the second branch all the time, so this bug lived on. Turns out that we will delete the entire page range, which may be bigger than N, the allocation size. Therefore, the whole range should be discounted from our calculation. The remainder (bigger than N part) will be accounted for later when we reinsert it in the page range, in the same way it is for the second branch of this code. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Fixes issue with JVM failing when started with a debugger with the following message: NPT ERROR: Cannot find nptInitialize Missing openjdk files in usr.manifest were a fertile source of issues. This patch aims at making them less likely and adding all files except blacklisted files to the image. This patch skips two files from JRE which are broken links and inclusion of which would cause manifest upload failure: - jre/lib/audio/default.sf2 - jre/lib/security/cacerts These should be fixed incrementally. Reported-by:
Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
If module has 'usr_files' or 'bootfs_files' declared then their value will be interpreted as FileMaps and appended to appropriate manifests. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
When plain manifests are not enough this is a concise alternative with improved expresiveness. It allows to declare exclude and include patterns. It's python based. Example: m = FileMap() m.add('${OSV_BUILD_PATH}/tests').to('/tests') \ .include('**/*.so') \ .exclude('host/**') Declared mappings can be saved in manifest form or be subject of further processing. To save in manifest format: save_as_manifest(m, 'my.manifest') Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
This patch makes java files are copied to the guest image only when 'java' modue is included. Modules can pull it explicitly by stating: require('java') or implicitly, by creating api.run_java() run configurations. In future we could consider moving api.run_java() into a java meta-module. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
No functional changes, just renames to more adequate names. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
No need to create an empty bootfs.manifest anymore. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Currently importing module from a module definition would fail because we cannot call import module with the same name (module.py) recursively, __import__ will complain that we removed 'module' from sys.modules. There is a simple solution to this problem, we can use runpy.run_path() which works like a charm. In addition to this we cache loaded modules so that we don't have to load the file twice. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
In his review of timerfd.cc, Avi asked that I simplify the implementation by having a single "timerfd" object (instead of two I had - timerfd_file and timerfd_object), and by using a single mutex instead of the complex combination of mutexes and atomic variable. This new version indeed does this. It should be easier to understand this code, and it is 30 lines shorter. The performance of this code is slightly inferior to the previous one - in particular poll() now locks and unlocks a mutex - but this should be negligible in practice. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
This patch improves the command by adding useful info for debugging ZFS in general, and also addresses some stylistic issues. The new output is as follow: (gdb) osv zfs :: ZFS TUNABLES :: zil_replay_disable: 0 zfs_nocacheflush: 0 zfs_prefetch_disable: 0 zfs_no_write_throttle: 0 zfs_txg_timeout: 5 zfs_write_limit_override: 0 vdev_min_pending: 4 vdev_max_pending: 10 :: ARC SIZES :: Actual ARC Size: 122905056 Target size of ARC: 1341923840 Min Target size of ARC: 167740480 Max Target size of ARC: 1341923840 Most Recently Used (MRU) size: 670961920 (50.00%) Most Frequently Used (MFU) size: 670961920 (50.00%) :: ARC EFFICIENCY :: Total ARC accesses: 42662 ARC hits: 41615 (97.55%) ARC MRU hits: 12550 (30.16%) Ghost Hits: 0 ARC MFU hits: 29045 (69.79%) Ghost Hits: 0 ARC misses: 1047 (2.45%) Prefetch workload ratio: 0.0097% Prefetch total: 412 Prefetch hits: 20 Prefetch misses: 392 Total Hash elements: 1053 Max Hash elements: 1053 Hash collisions: 13 Hash chains: 11 Reviewed-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 07, 2014
-
-
Nadav Har'El authored
A previous patch renamed mutex.cc to spinlock.cc. This fixes the build.mk dependency to make the code compile again... Sorry about that. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Amnon Heiman authored
This document describe OSv coding style. Signed-off-by:
Amnon Heiman <amnon@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
In very early OSv history, the spinlock was used in the mutex's implementation so it made sense to put it in mutex.cc and mutex.h. But now that the spinlock is all that's left in mutex.cc (the real mutex is in lfmutex.cc), rename this file spinlock.cc. Also, move the spinlock definitions from <osv/mutex.h> to a new <osv/spinlock.h>, so if someone wants to make the grave mistake of using a spinlock - they will at least need to explicitly include this header file. Currently, the only remaining user of the spinlock is the console. Using a spinlock (and not a mutex) in the console allows printing debug messages while preemption is disabled. Arguably, this use-case is no longer important (we have tracepoints), so in the future we can consider dropping the spinlock completely. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This patch implements a simplified version of pthread_getcpuclockid that should be enough for our needs. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This patch reserves some thread ids, that are kept unused. This is so we can construct values that reuse the thread public id and add it together with other information and still fit in 32-bits. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
This will be used later to determine for how long have a thread been running. It can easily be updated right before we call ran_for(), reusing its interval parameter. Fixes #135 Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 06, 2014
-
-
Raphael S. Carvalho authored
Start using spaces instead of tabs and surround all single-line control statements with curly braces. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Previously, scripts/test.py had no option to do that. It launched an OSv instance for each test case. Terribly slow PCs like mine took a bunch of time to run all test cases through 'make check'. Then let's take advantage of testrunner.so which will use a single OSv instance to run all test cases, consequently boosting the speed considerably. Let's also change testunner.so to conform our needs, e.g. blacklist. To run this fast check, do: scripts/test.py --single; Results show that this option is about 2.5x faster than the current one. By now, let's not use this approach as the default version given that its output has to be better formatted. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Add on the top of README.md a short introduction to what OSv is. If someone gets to our github page, https://github.com/cloudius-systems/osv , and scrolls down, it's strange that we only explain how to build OSv, without first mentioning what it is. Fixes #148 Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- Jan 03, 2014
-
-
Raphael S. Carvalho authored
Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Start using spaces instead of tabs and surround all single-line control statements with curly braces. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
It will be useful to take better and safer VFS decisions in the future. For example, avoiding code that uses the absolute path to determine something. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
newmp->m_covered must be released if not NULL. Found this problem while dumping dcache content. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-