Skip to content
Snippets Groups Projects
  1. May 26, 2014
  2. May 25, 2014
  3. May 23, 2014
    • Raphael S. Carvalho's avatar
      zfs: Enable compression on zfs dataset when creating the image · 6a29063c
      Raphael S. Carvalho authored
      
      This patch enables LZ4 compression on the ZFS dataset right after its
      insertion in the pool. Then the image creation process will go through
      all the steps with compression enabled, and when it's done, compression
      is disabled. From that moment on, compression stops taking effect, and
      files previously compressed will be still supported.
      
      Why disabling compression after image creation?
      There seems to be corner-cases where setting compression by default
      would affect applications performance.
      For example, applications that compress data themselves (e.g. Cassandra)
      might end up slower as ZFS would be duplicating the compression process
      that was previously done, and consequently wasting CPU cycles.
      It's worth mentioning that LZ4 is ~300% faster than LZJB when compressing
      'in-compressible' data, so it might be good even for Cassandra.
      
      Additional information: The first version of this patch used the LZJB
      algorithm, however, it slowed down read operations on compressed files.
      On the other hand, LZ4 improves read on compressed files, improves boot
      time, and still provides a good compression ratio.
      
      RESULTS
      =====
      
      - UNCOMPRESSED:
      * Image size
      -rw-r--r--. 1 root root 154533888 May 19 23:02 build/release/usr.img
      
      * Read benchmark
      REPORT
      -----
      Files:    552
      Read:    127399kb
      Time:    1069.90ms
      MBps:    115.90
      
      * Boot time
      1)
          ZFS mounted: 426.57ms, (+157.75ms)
      2)
          ZFS mounted: 439.13ms, (+156.24ms)
      
      - COMPRESSED (LZ4):
      * Image size
      -rw-r--r--. 1 root root 81002496 May 19 23:33 build/release/usr.img
      
      * Read benchmark
      REPORT
      -----
      Files:    552
      Read:    127399kb
      Time:    957.96ms
      MBps:    129.44
      
      * Boot time
      1)
          ZFS mounted: 414.55ms, (+145.47ms)
      2)
          ZFS mounted: 403.72ms, (+142.82ms)
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      6a29063c
    • Raphael S. Carvalho's avatar
      mkfs: Code refactoring and allow instances of the same shared object · 9ca6522a
      Raphael S. Carvalho authored
      
      Besides refactoring the code, this patch makes mkfs support more than
      one instance of the same shared object within the same mkfs instance,
      i.e. by releasing the resources at the function prologue.
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      9ca6522a
    • Raphael S. Carvalho's avatar
      tests: Add read-only fsop benchmark · cb5db36c
      Raphael S. Carvalho authored
      
      Useful for getting a notion of response time and throughput
      on sequential read operations.
      Random read option should be added later on.
      Currently being used by me to measure read performance on
      compressed vs uncompressed data.
      
      Example output:
      OSv v0.08-160-gddb9322
      eth0: 192.168.122.15
      /zpool.so: 96kb: 1.77ms, (+1.77ms)
      /libzfs.so: 211kb: 6.57ms, (+4.80ms)
      /zfs.so: 96kb: 8.25ms, (+1.68ms)
      /tools/mkfs.so: 10kb: 9.32ms, (+1.07ms)
      /tools/cpiod.so: 244kb: 14.08ms, (+4.76ms)
      ...
      /usr/lib/jvm/jre/lib/content-types.properties: 5kb: 1066.17ms, (+2.87ms)
      /usr/lib/jvm/jre/lib/cmm/GRAY.pf: 556b: 1066.74ms, (+0.57ms)
      /usr/lib/jvm/jre/lib/cmm/CIEXYZ.pf: 784b: 1067.34ms, (+0.60ms)
      /usr/lib/jvm/jre/lib/cmm/sRGB.pf: 6kb: 1067.96ms, (+0.62ms)
      /usr/lib/jvm/jre/lib/cmm/LINEAR_RGB.pf: 488b: 1068.61ms, (+0.64ms)
      /usr/lib/jvm/jre/lib/cmm/PYCC.pf: 228kb: 1073.96ms, (+5.36ms)
      /usr/lib/jvm/jre/lib/sound.properties: 1kb: 1074.65ms, (+0.69ms)
      
      REPORT
      -----
      Files:	552
      Read:	127395kb
      Time:	1074.65ms
      MBps:	115.39
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      cb5db36c
    • Raphael S. Carvalho's avatar
      zfs: Port lz4 compression algorithm from FreeBSD · ac3f540a
      Raphael S. Carvalho authored
      
      OSv port details:
      - Discarded manpage changes.
      - lz4 license was added to the licenses directory.
      - Addressed some conflicts in zfs/zfs_ioctl.c.
      - Add unused attributed to a few functions in zfs/lz4.c which are
      actually unused.
      
       * Illumos zfs issue #3035 [1] LZ4 compression support in ZFS.
      
      LZ4 is a new high-speed BSD-licensed compression algorithm created
      by Yann Collet that delivers very high compression and decompression
      performance compared to lzjb (>50% faster on compression, >80% faster
      on decompression and around 3x faster on compression of incompressible
      data), while giving better compression ratio [1].
      
      FreeBSD commit hash: c6d9dc1
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      ac3f540a
    • Glauber Costa's avatar
      memset: make memset faster for small sizes · 28ff5b27
      Glauber Costa authored
      
      Just like memcpy, memset can also benefit from special cases for small sizes.
      However, as expected, the tradeoffs are different and the benefit is not as
      large. In the best case, we are able to get it better up to 64 bytes. There
      should still be a gain, because in workloads where memcpy will deal with small
      sizes, memset will likely do so as well.
      
      Again, I have compared the simple loop, duff's device, and "glommer's device",
      with the latest being the winner. Here are the results, up to the point each
      one starts losing:
      
      Original:
      =========
      
      memset,4,9.007000,9.161000,9.024967,0.042445
      memset,8,9.007000,9.137000,9.028934,0.043388
      memset,16,9.006000,9.267000,9.028168,0.056487
      memset,32,9.007000,11.719000,9.287668,0.716163
      memset,64,9.007000,9.143000,9.023834,0.034745
      memset,128,9.007000,9.174000,9.030134,0.044414
      
      Loop:
      =====
      
      memset,4,3.122000,3.293000,3.158033,0.026586
      memset,8,4.151000,5.077000,4.570933,0.207710
      memset,16,7.021000,8.288000,7.873499,0.276310
      memset,32,19.414000,19.792999,19.551334,0.086234
      
      Duff:
      =====
      
      memset,4,3.602000,4.829000,3.936233,0.425657
      memset,8,4.117000,4.526000,4.282266,0.100237
      memset,16,4.889000,5.227000,5.105134,0.084525
      memset,32,8.748000,8.884000,8.763433,0.038910
      memset,64,16.983999,17.163000,17.018702,0.051896
      
      Glommer:
      ========
      
      memset,4,3.524000,3.664000,3.601167,0.028642
      memset,8,3.088000,3.144000,3.092500,0.009790
      memset,16,4.117000,4.170000,4.126300,0.014074
      memset,32,4.888000,5.400000,5.172900,0.123619
      memset,64,6.963000,7.023000,6.968966,0.013802
      memset,128,11.065000,11.174000,11.076533,0.027541
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      28ff5b27
    • Glauber Costa's avatar
      tests: increment memcpy tests to test memset too · 94f00eec
      Glauber Costa authored
      
      It is really the same kind of test, so let's just reuse memcpy example
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      94f00eec
    • Pawel Dziepak's avatar
      memory_analyzer: major rework · 5878840b
      Pawel Dziepak authored
      
      This patch makes memory_analyzer understand the newly introduced tracepoint
      arguments: allocator type, allocated memory and requested alignment.
      Allocations are grouped and shown in as a tree together with frequency
      information, number of blocks that hasn't been freed yet and amount of
      memory wasted by internal fragmentation.
      
      Signed-off-by: default avatarPawel Dziepak <pdziepak@quarnos.org>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      5878840b
    • Pawel Dziepak's avatar
    • Pawel Dziepak's avatar
    • Pawel Dziepak's avatar
    • Pawel Dziepak's avatar
    • Gleb Natapov's avatar
      pagecache: add accessed bit scanner thread · 7d122f7d
      Gleb Natapov authored
      
      Run a thread in a background to scan pagecache for accessed and
      propagate them to ARC. The thread may take anywhere from 0.1% to 20%
      of CPU time. There is no hard science behind how current CPU usage is
      determined, it uses page access rate to calculate how hard pagecache
      should be scanned currently. It can be improved by taking eviction rate
      into account too.
      
      Signed-off-by: default avatarGleb Natapov <gleb@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      7d122f7d
    • Glauber Costa's avatar
      runtime: stub daemon symbol · 88d44e95
      Glauber Costa authored
      
      Just so the symbol exists. We expect people to run their programs in foreground,
      but if linked without lazy bindings, the symbol may be required.
      
      Reviewed-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      88d44e95
    • Glauber Costa's avatar
      memcpy: improve performance for x86's memcpy · e7055f04
      Glauber Costa authored
      
      According to reality, the idea that rep movsb is the preferred way to implement
      memcpy for x86 in the presence of the rep_good flag is false. This
      implementation performs better in the misc-memcpy benchmark for pretty much all
      sizes.
      
      I have also tested a simple loop with byte-by-byte copy, and the duff's
      mechanism. For the Duff, I am seeing a weird bug when it is implemented
      together with our memcpy. But It is off course possible to implement it up to
      256 separately for analysis, which is what I did.
      
      What can be seen in the results below is that all versions start faster than
      rep movsb for very small objects, but the loop starts to be slower for sizes as
      low as 32-bytes.  Duff is slower for 64-byte elements, but this patch is faster
      for all sizes measured.  We can copy 64i bytes in 5.6ns, 128 bytes in 7.7ns and
      256 bytes in 13.3ns while the original numbers would be 11ns, 11ns, and 13.8
      ns.
      
      Balloon Safety:
      
      Balloon memcpys are 128Mb in size. Even for partial copy, they are at least in
      the kb range. So I am not expecting any funny interaction with this, nor
      anticipating the need to insert fixups here.
      
      Full Results:
      
      Original
      ========
      4,11.066000,13.217000,11.313369,0.527048
      8,29.427999,31.054001,29.797934,0.540056
      16,11.065000,11.147000,11.088465,0.030663
      32,11.065000,11.199000,11.093401,0.043994
      64,11.065000,11.508000,11.115365,0.092626
      128,12.866000,13.137000,12.914132,0.066646
      256,13.896000,14.252000,13.937533,0.067841
      512,15.955000,16.304001,16.006964,0.073594
      1024,20.072001,20.301001,20.122099,0.052627
      2048,28.306999,28.577999,28.377703,0.063443
      4096,44.785999,45.087002,44.899033,0.068806
      8192,77.783997,78.370003,77.918457,0.113472
      16384,150.259003,183.679001,158.534668,5.947755
      32768,1049.886963,1053.098022,1051.364380,0.851499
      
      Loop
      ====
      4,3.152000,3.734000,3.347033,0.185811
      8,4.467000,5.336000,4.936766,0.221336
      16,6.655000,8.262000,7.695767,0.377303
      32,19.788000,20.438000,19.960333,0.221289
      64,25.996000,29.969999,29.217133,0.828447
      128,44.501999,45.562000,45.335640,0.244315
      256,85.459000,95.369003,91.925179,3.409483
      512,14.925000,15.014000,14.939700,0.024197
      1024,19.042999,19.143000,19.060701,0.028286
      2048,27.277000,27.386000,27.306065,0.035528
      4096,43.750000,43.902000,43.789631,0.038810
      8192,76.699997,76.872002,76.769691,0.040407
      16384,149.393997,164.602005,157.051132,4.324330
      32768,1045.287964,1047.580933,1046.380493,0.617742
      
      Duff
      ====
      4,3.602000,4.120000,3.722167,0.163732
      8,4.631000,4.725000,4.643835,0.028509
      16,7.205000,7.316000,7.213567,0.022538
      32,11.838000,12.613000,12.032168,0.285366
      64,21.681000,22.173000,21.754402,0.088584
      128,41.331001,41.651001,41.452267,0.066087
      256,80.431000,80.927002,80.737724,0.106475
      
      This patch
      ==========
      4,3.602000,3.895000,3.636133,0.071126
      8,3.602000,3.679000,3.607600,0.015768
      16,3.859000,3.981000,3.875433,0.032632
      32,4.888000,4.994000,4.899767,0.025539
      64,5.663000,6.404000,6.001000,0.158665
      128,7.737000,8.168000,7.881701,0.156874
      256,13.301000,17.438999,14.937235,0.880874
      512,14.925000,15.226000,14.975132,0.072150
      1024,19.042999,19.412001,19.099068,0.095145
      2048,27.278000,32.022999,27.617165,1.007376
      4096,43.750000,44.146000,43.844494,0.094062
      8192,76.698997,83.873001,77.137794,1.266063
      16384,153.483994,168.636002,160.516830,3.837175
      32768,1047.878052,1068.301025,1052.600586,4.441750
      
      Signed-off-by: default avatarGlauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      e7055f04
  4. May 22, 2014
  5. May 21, 2014
Loading