- May 26, 2014
-
-
Tomasz Grabiec authored
Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com>
-
Tomasz Grabiec authored
It is meant to provide both the speed of a ring buffer and non-blocking properties of linked queues by combining the two. Unlike for ring_spsc, push() is always guaranteed to succeed. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com>
-
Tomasz Grabiec authored
It's like queue_mpsc with two improvements: * consumer and producer links are cache line aligned to avoid false sharing. I was tempted to apply this to queue_mpsc too but then discovered that this queue is embedded in a mutex, and doing so would greatly bloat mutex size, so I gave up on this idea. * The contract of pop() is relaxed to return items in no particular order so that we can avoid the cost of reversing the chain. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com>
-
Tomasz Grabiec authored
free_page_ranges is an intrusive set. erasing via a reference requires iteration over reference equal_range under the hood, which means traversing the tree to the leafs. Whereas erasing via an iterator requires no such lookups so should be faster. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com>
-
Tomasz Grabiec authored
Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com>
-
Tomasz Grabiec authored
In some runs callq to mempool_cpuid shows up in 'perf kvm top' profile. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com>
-
Glauber Costa authored
Same as fork, vfork, etc. So goes in the same place. Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Fixed error in ::clock's Doxygen comment. It referred to osv::clock::monotonic, while in fact the correct name is osv::clock::uptime. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- May 25, 2014
-
-
Avi Kivity authored
Fixes debug build. Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Raphael S. Carvalho authored
The lz4 code checks from a predetermined list of definitions if the CPU word size is 64. Otherwise, it's 32. Therefore, __aarch64__ definition must be added into the afore- mentioned list. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
- May 23, 2014
-
-
Raphael S. Carvalho authored
This patch enables LZ4 compression on the ZFS dataset right after its insertion in the pool. Then the image creation process will go through all the steps with compression enabled, and when it's done, compression is disabled. From that moment on, compression stops taking effect, and files previously compressed will be still supported. Why disabling compression after image creation? There seems to be corner-cases where setting compression by default would affect applications performance. For example, applications that compress data themselves (e.g. Cassandra) might end up slower as ZFS would be duplicating the compression process that was previously done, and consequently wasting CPU cycles. It's worth mentioning that LZ4 is ~300% faster than LZJB when compressing 'in-compressible' data, so it might be good even for Cassandra. Additional information: The first version of this patch used the LZJB algorithm, however, it slowed down read operations on compressed files. On the other hand, LZ4 improves read on compressed files, improves boot time, and still provides a good compression ratio. RESULTS ===== - UNCOMPRESSED: * Image size -rw-r--r--. 1 root root 154533888 May 19 23:02 build/release/usr.img * Read benchmark REPORT ----- Files: 552 Read: 127399kb Time: 1069.90ms MBps: 115.90 * Boot time 1) ZFS mounted: 426.57ms, (+157.75ms) 2) ZFS mounted: 439.13ms, (+156.24ms) - COMPRESSED (LZ4): * Image size -rw-r--r--. 1 root root 81002496 May 19 23:33 build/release/usr.img * Read benchmark REPORT ----- Files: 552 Read: 127399kb Time: 957.96ms MBps: 129.44 * Boot time 1) ZFS mounted: 414.55ms, (+145.47ms) 2) ZFS mounted: 403.72ms, (+142.82ms) Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Besides refactoring the code, this patch makes mkfs support more than one instance of the same shared object within the same mkfs instance, i.e. by releasing the resources at the function prologue. Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
Useful for getting a notion of response time and throughput on sequential read operations. Random read option should be added later on. Currently being used by me to measure read performance on compressed vs uncompressed data. Example output: OSv v0.08-160-gddb9322 eth0: 192.168.122.15 /zpool.so: 96kb: 1.77ms, (+1.77ms) /libzfs.so: 211kb: 6.57ms, (+4.80ms) /zfs.so: 96kb: 8.25ms, (+1.68ms) /tools/mkfs.so: 10kb: 9.32ms, (+1.07ms) /tools/cpiod.so: 244kb: 14.08ms, (+4.76ms) ... /usr/lib/jvm/jre/lib/content-types.properties: 5kb: 1066.17ms, (+2.87ms) /usr/lib/jvm/jre/lib/cmm/GRAY.pf: 556b: 1066.74ms, (+0.57ms) /usr/lib/jvm/jre/lib/cmm/CIEXYZ.pf: 784b: 1067.34ms, (+0.60ms) /usr/lib/jvm/jre/lib/cmm/sRGB.pf: 6kb: 1067.96ms, (+0.62ms) /usr/lib/jvm/jre/lib/cmm/LINEAR_RGB.pf: 488b: 1068.61ms, (+0.64ms) /usr/lib/jvm/jre/lib/cmm/PYCC.pf: 228kb: 1073.96ms, (+5.36ms) /usr/lib/jvm/jre/lib/sound.properties: 1kb: 1074.65ms, (+0.69ms) REPORT ----- Files: 552 Read: 127395kb Time: 1074.65ms MBps: 115.39 Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Raphael S. Carvalho authored
OSv port details: - Discarded manpage changes. - lz4 license was added to the licenses directory. - Addressed some conflicts in zfs/zfs_ioctl.c. - Add unused attributed to a few functions in zfs/lz4.c which are actually unused. * Illumos zfs issue #3035 [1] LZ4 compression support in ZFS. LZ4 is a new high-speed BSD-licensed compression algorithm created by Yann Collet that delivers very high compression and decompression performance compared to lzjb (>50% faster on compression, >80% faster on decompression and around 3x faster on compression of incompressible data), while giving better compression ratio [1]. FreeBSD commit hash: c6d9dc1 Signed-off-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Just like memcpy, memset can also benefit from special cases for small sizes. However, as expected, the tradeoffs are different and the benefit is not as large. In the best case, we are able to get it better up to 64 bytes. There should still be a gain, because in workloads where memcpy will deal with small sizes, memset will likely do so as well. Again, I have compared the simple loop, duff's device, and "glommer's device", with the latest being the winner. Here are the results, up to the point each one starts losing: Original: ========= memset,4,9.007000,9.161000,9.024967,0.042445 memset,8,9.007000,9.137000,9.028934,0.043388 memset,16,9.006000,9.267000,9.028168,0.056487 memset,32,9.007000,11.719000,9.287668,0.716163 memset,64,9.007000,9.143000,9.023834,0.034745 memset,128,9.007000,9.174000,9.030134,0.044414 Loop: ===== memset,4,3.122000,3.293000,3.158033,0.026586 memset,8,4.151000,5.077000,4.570933,0.207710 memset,16,7.021000,8.288000,7.873499,0.276310 memset,32,19.414000,19.792999,19.551334,0.086234 Duff: ===== memset,4,3.602000,4.829000,3.936233,0.425657 memset,8,4.117000,4.526000,4.282266,0.100237 memset,16,4.889000,5.227000,5.105134,0.084525 memset,32,8.748000,8.884000,8.763433,0.038910 memset,64,16.983999,17.163000,17.018702,0.051896 Glommer: ======== memset,4,3.524000,3.664000,3.601167,0.028642 memset,8,3.088000,3.144000,3.092500,0.009790 memset,16,4.117000,4.170000,4.126300,0.014074 memset,32,4.888000,5.400000,5.172900,0.123619 memset,64,6.963000,7.023000,6.968966,0.013802 memset,128,11.065000,11.174000,11.076533,0.027541 Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
It is really the same kind of test, so let's just reuse memcpy example Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pawel Dziepak authored
This patch makes memory_analyzer understand the newly introduced tracepoint arguments: allocator type, allocated memory and requested alignment. Allocations are grouped and shown in as a tree together with frequency information, number of blocks that hasn't been freed yet and amount of memory wasted by internal fragmentation. Signed-off-by:
Pawel Dziepak <pdziepak@quarnos.org> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pawel Dziepak authored
Signed-off-by:
Pawel Dziepak <pdziepak@quarnos.org> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pawel Dziepak authored
Signed-off-by:
Pawel Dziepak <pdziepak@quarnos.org> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pawel Dziepak authored
Signed-off-by:
Pawel Dziepak <pdziepak@quarnos.org> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Pawel Dziepak authored
Signed-off-by:
Pawel Dziepak <pdziepak@quarnos.org> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Gleb Natapov authored
Run a thread in a background to scan pagecache for accessed and propagate them to ARC. The thread may take anywhere from 0.1% to 20% of CPU time. There is no hard science behind how current CPU usage is determined, it uses page access rate to calculate how hard pagecache should be scanned currently. It can be improved by taking eviction rate into account too. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
Just so the symbol exists. We expect people to run their programs in foreground, but if linked without lazy bindings, the symbol may be required. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
According to reality, the idea that rep movsb is the preferred way to implement memcpy for x86 in the presence of the rep_good flag is false. This implementation performs better in the misc-memcpy benchmark for pretty much all sizes. I have also tested a simple loop with byte-by-byte copy, and the duff's mechanism. For the Duff, I am seeing a weird bug when it is implemented together with our memcpy. But It is off course possible to implement it up to 256 separately for analysis, which is what I did. What can be seen in the results below is that all versions start faster than rep movsb for very small objects, but the loop starts to be slower for sizes as low as 32-bytes. Duff is slower for 64-byte elements, but this patch is faster for all sizes measured. We can copy 64i bytes in 5.6ns, 128 bytes in 7.7ns and 256 bytes in 13.3ns while the original numbers would be 11ns, 11ns, and 13.8 ns. Balloon Safety: Balloon memcpys are 128Mb in size. Even for partial copy, they are at least in the kb range. So I am not expecting any funny interaction with this, nor anticipating the need to insert fixups here. Full Results: Original ======== 4,11.066000,13.217000,11.313369,0.527048 8,29.427999,31.054001,29.797934,0.540056 16,11.065000,11.147000,11.088465,0.030663 32,11.065000,11.199000,11.093401,0.043994 64,11.065000,11.508000,11.115365,0.092626 128,12.866000,13.137000,12.914132,0.066646 256,13.896000,14.252000,13.937533,0.067841 512,15.955000,16.304001,16.006964,0.073594 1024,20.072001,20.301001,20.122099,0.052627 2048,28.306999,28.577999,28.377703,0.063443 4096,44.785999,45.087002,44.899033,0.068806 8192,77.783997,78.370003,77.918457,0.113472 16384,150.259003,183.679001,158.534668,5.947755 32768,1049.886963,1053.098022,1051.364380,0.851499 Loop ==== 4,3.152000,3.734000,3.347033,0.185811 8,4.467000,5.336000,4.936766,0.221336 16,6.655000,8.262000,7.695767,0.377303 32,19.788000,20.438000,19.960333,0.221289 64,25.996000,29.969999,29.217133,0.828447 128,44.501999,45.562000,45.335640,0.244315 256,85.459000,95.369003,91.925179,3.409483 512,14.925000,15.014000,14.939700,0.024197 1024,19.042999,19.143000,19.060701,0.028286 2048,27.277000,27.386000,27.306065,0.035528 4096,43.750000,43.902000,43.789631,0.038810 8192,76.699997,76.872002,76.769691,0.040407 16384,149.393997,164.602005,157.051132,4.324330 32768,1045.287964,1047.580933,1046.380493,0.617742 Duff ==== 4,3.602000,4.120000,3.722167,0.163732 8,4.631000,4.725000,4.643835,0.028509 16,7.205000,7.316000,7.213567,0.022538 32,11.838000,12.613000,12.032168,0.285366 64,21.681000,22.173000,21.754402,0.088584 128,41.331001,41.651001,41.452267,0.066087 256,80.431000,80.927002,80.737724,0.106475 This patch ========== 4,3.602000,3.895000,3.636133,0.071126 8,3.602000,3.679000,3.607600,0.015768 16,3.859000,3.981000,3.875433,0.032632 32,4.888000,4.994000,4.899767,0.025539 64,5.663000,6.404000,6.001000,0.158665 128,7.737000,8.168000,7.881701,0.156874 256,13.301000,17.438999,14.937235,0.880874 512,14.925000,15.226000,14.975132,0.072150 1024,19.042999,19.412001,19.099068,0.095145 2048,27.278000,32.022999,27.617165,1.007376 4096,43.750000,44.146000,43.844494,0.094062 8192,76.698997,83.873001,77.137794,1.266063 16384,153.483994,168.636002,160.516830,3.837175 32768,1047.878052,1068.301025,1052.600586,4.441750 Signed-off-by:
Glauber Costa <glommer@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- May 22, 2014
-
-
Zifei Tong authored
Use backtick quotes for inline code. Signed-off-by:
Zifei Tong <zifeitong@gmail.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
How log does our memcpy take? misc-memcpy will let you find out Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
When running on a KVM host without EPT, OSv used to crash if ran 1GB or less memory. The crash was in early acpi initialization, which called AcpiOsMapMemory, which called linear_map() on the small chunk of physical memory requested by ACPI. It turns out that the chunk requested was just 4096 bytes, which caused our linear_map() implementation to break down a huge-page into small-pages. This didn't work - I'm don't know yet what exactly is the bug or why it only shows up when the host doesn't use EPT, or if the memory is smaller than 1GB... But in any case, since we know our complete linear map always uses huge pages, I think it makes no sense to map a single small page, and we can just map a whole huge page. This will cause linear_map() not to try to break up any huge pages, and this bug goes away. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Gleb Natapov authored
Currently the code overwrites large pte with a pointer to intermediate page before populating intermediate page's ptes with correct values. It happen to work with EPT since TLB is still valid, but without EPT this is not the case. Fixes #316. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Vlad Zolotarov authored
vec_sz in net::txq::try_xmit_one_locked(net_req*) should be initialized to 1 and not to 0 since there is always net_hdr element and vec_sz is incremented only for fragments not taking into an account the header element. This miscalculation caused vqueue->add_buf() return false while vqueue->avail_ring_has_room(vec_sz) was returning true when the amount of elements in the avail ring was equal to vec_sz since the actual size of the _sg_vec was "vec_sz + 1". Signed-off-by:
Vlad Zolotarov <vladz@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-
Pekka Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Tomasz Grabiec authored
trace.py allows to slice samples to a time range using --since, --until and --period. Currently the default unit for timestamps was nanoseconds. However the timestamp pretty much always comes from the 'trace list' output, which displays timestamp in seconds with a dot separating the nanosecond precision part. That required explicit unit designating by appending 's' suffix to the timestamp when passing to --since and --until to This change makes --since and --until use seconds as a default unit, so that the timestamp copied from trace list can be used without any change, eg: $ scripts/trace.py --since 1400607382.775557041 Default unit of --period (nanoseconds) remains unchanged. Signed-off-by:
Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Claudio Fontana authored
In AArch64 the I-bit set in DAIF means "interrupts disabled". Therefore, the assertion check must be inverted in page_fault. Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Claudio Fontana authored
Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Claudio Fontana authored
by marking as @function we get them to show up in the backtrace. Signed-off-by:
Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Glauber Costa authored
preadv, pwritev. Reviewed-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
This test catches a few bugs we had in our C++ locale support (see issue #314). After the previous patches, this test will succeed. This test verifies that: 1. Trying to use an unsupported locale (we only support "C") throws an exception, as required, rather than crashing as before. 2. std::isspace(' ',std::locale) returns true, as it should (previously our ctype array was shifted by one, so this returned false!) 3. istream's input operator (operator>>) should stop on a space. Previously, it didn't, because we didn't recognize the space. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
Fixes #314. In two's complement, the lowest signed 8-bit number is -128, not -127. By wrongly starting to generate the ctype array starting with -127 instead of -128, we got all the locale ctype questions shifted by one character; For example character 32 (' ') was not considered a space, but 31 was! We didn't see this bug because our C library isspace() and friends are currently implemented without using the locale framework (which is fine, as we only support the "C" locale anyway). However, this bug is apparent in C++, as explained in issue #314: std::isspace() returns the wrong answer, and C++ facilities which use this under the hood - such as reading from an istream which is supposed to stop at a space - also got broken. Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
When an application attempts to use an unknown locale, don't abort but rather fail normally, as Linux does on an unknown locale. Theoretically, Posix specifies newlocale() can also fail with EINVAL if the category_mask is malformed. However, the only reasonable usage of this function we support is when base=NULL (or "C" locale) and locale is again "C", and then we just ignore the category_mask. If that is not the case, we can just complain with ENOENT (meaning we couldn't find the named locale). In any case, callers like std::locale() don't actually care why newlocale() failed, and anyway assume a failure means the localename wasn't recognized. Reviewed-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
Nadav Har'El authored
The __newlocale() function in runtime.cc used a mixture of spaces and tabs. Reindent it, with no changes to actual content (that would come later). Reviewed-by:
Raphael S. Carvalho <raphaelsc@cloudius-systems.com> Signed-off-by:
Nadav Har'El <nyh@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
-
- May 21, 2014
-
-
Gleb Natapov authored
If thread is not started its cpu pointer will be NULL. Handle this case properly. Signed-off-by:
Gleb Natapov <gleb@cloudius-systems.com> Signed-off-by:
Avi Kivity <avi@cloudius-systems.com>
-