Skip to content
Snippets Groups Projects
  1. May 23, 2014
    • Raphael S. Carvalho's avatar
      tests: Add read-only fsop benchmark · cb5db36c
      Raphael S. Carvalho authored
      
      Useful for getting a notion of response time and throughput
      on sequential read operations.
      Random read option should be added later on.
      Currently being used by me to measure read performance on
      compressed vs uncompressed data.
      
      Example output:
      OSv v0.08-160-gddb9322
      eth0: 192.168.122.15
      /zpool.so: 96kb: 1.77ms, (+1.77ms)
      /libzfs.so: 211kb: 6.57ms, (+4.80ms)
      /zfs.so: 96kb: 8.25ms, (+1.68ms)
      /tools/mkfs.so: 10kb: 9.32ms, (+1.07ms)
      /tools/cpiod.so: 244kb: 14.08ms, (+4.76ms)
      ...
      /usr/lib/jvm/jre/lib/content-types.properties: 5kb: 1066.17ms, (+2.87ms)
      /usr/lib/jvm/jre/lib/cmm/GRAY.pf: 556b: 1066.74ms, (+0.57ms)
      /usr/lib/jvm/jre/lib/cmm/CIEXYZ.pf: 784b: 1067.34ms, (+0.60ms)
      /usr/lib/jvm/jre/lib/cmm/sRGB.pf: 6kb: 1067.96ms, (+0.62ms)
      /usr/lib/jvm/jre/lib/cmm/LINEAR_RGB.pf: 488b: 1068.61ms, (+0.64ms)
      /usr/lib/jvm/jre/lib/cmm/PYCC.pf: 228kb: 1073.96ms, (+5.36ms)
      /usr/lib/jvm/jre/lib/sound.properties: 1kb: 1074.65ms, (+0.69ms)
      
      REPORT
      -----
      Files:	552
      Read:	127395kb
      Time:	1074.65ms
      MBps:	115.39
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      cb5db36c
    • Raphael S. Carvalho's avatar
      zfs: Port lz4 compression algorithm from FreeBSD · ac3f540a
      Raphael S. Carvalho authored
      
      OSv port details:
      - Discarded manpage changes.
      - lz4 license was added to the licenses directory.
      - Addressed some conflicts in zfs/zfs_ioctl.c.
      - Add unused attributed to a few functions in zfs/lz4.c which are
      actually unused.
      
       * Illumos zfs issue #3035 [1] LZ4 compression support in ZFS.
      
      LZ4 is a new high-speed BSD-licensed compression algorithm created
      by Yann Collet that delivers very high compression and decompression
      performance compared to lzjb (>50% faster on compression, >80% faster
      on decompression and around 3x faster on compression of incompressible
      data), while giving better compression ratio [1].
      
      FreeBSD commit hash: c6d9dc1
      
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      ac3f540a
    • Glauber Costa's avatar
      memset: make memset faster for small sizes · 28ff5b27
      Glauber Costa authored
      
      Just like memcpy, memset can also benefit from special cases for small sizes.
      However, as expected, the tradeoffs are different and the benefit is not as
      large. In the best case, we are able to get it better up to 64 bytes. There
      should still be a gain, because in workloads where memcpy will deal with small
      sizes, memset will likely do so as well.
      
      Again, I have compared the simple loop, duff's device, and "glommer's device",
      with the latest being the winner. Here are the results, up to the point each
      one starts losing:
      
      Original:
      =========
      
      memset,4,9.007000,9.161000,9.024967,0.042445
      memset,8,9.007000,9.137000,9.028934,0.043388
      memset,16,9.006000,9.267000,9.028168,0.056487
      memset,32,9.007000,11.719000,9.287668,0.716163
      memset,64,9.007000,9.143000,9.023834,0.034745
      memset,128,9.007000,9.174000,9.030134,0.044414
      
      Loop:
      =====
      
      memset,4,3.122000,3.293000,3.158033,0.026586
      memset,8,4.151000,5.077000,4.570933,0.207710
      memset,16,7.021000,8.288000,7.873499,0.276310
      memset,32,19.414000,19.792999,19.551334,0.086234
      
      Duff:
      =====
      
      memset,4,3.602000,4.829000,3.936233,0.425657
      memset,8,4.117000,4.526000,4.282266,0.100237
      memset,16,4.889000,5.227000,5.105134,0.084525
      memset,32,8.748000,8.884000,8.763433,0.038910
      memset,64,16.983999,17.163000,17.018702,0.051896
      
      Glommer:
      ========
      
      memset,4,3.524000,3.664000,3.601167,0.028642
      memset,8,3.088000,3.144000,3.092500,0.009790
      memset,16,4.117000,4.170000,4.126300,0.014074
      memset,32,4.888000,5.400000,5.172900,0.123619
      memset,64,6.963000,7.023000,6.968966,0.013802
      memset,128,11.065000,11.174000,11.076533,0.027541
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      28ff5b27
    • Glauber Costa's avatar
      tests: increment memcpy tests to test memset too · 94f00eec
      Glauber Costa authored
      
      It is really the same kind of test, so let's just reuse memcpy example
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      94f00eec
    • Pawel Dziepak's avatar
      memory_analyzer: major rework · 5878840b
      Pawel Dziepak authored
      
      This patch makes memory_analyzer understand the newly introduced tracepoint
      arguments: allocator type, allocated memory and requested alignment.
      Allocations are grouped and shown in as a tree together with frequency
      information, number of blocks that hasn't been freed yet and amount of
      memory wasted by internal fragmentation.
      
      Signed-off-by: default avatarPawel Dziepak <pdziepak@quarnos.org>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      5878840b
    • Pawel Dziepak's avatar
    • Pawel Dziepak's avatar
    • Pawel Dziepak's avatar
    • Pawel Dziepak's avatar
    • Gleb Natapov's avatar
      pagecache: add accessed bit scanner thread · 7d122f7d
      Gleb Natapov authored
      
      Run a thread in a background to scan pagecache for accessed and
      propagate them to ARC. The thread may take anywhere from 0.1% to 20%
      of CPU time. There is no hard science behind how current CPU usage is
      determined, it uses page access rate to calculate how hard pagecache
      should be scanned currently. It can be improved by taking eviction rate
      into account too.
      
      Signed-off-by: default avatarGleb Natapov <gleb@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      7d122f7d
    • Glauber Costa's avatar
      runtime: stub daemon symbol · 88d44e95
      Glauber Costa authored
      
      Just so the symbol exists. We expect people to run their programs in foreground,
      but if linked without lazy bindings, the symbol may be required.
      
      Reviewed-by: default avatarNadav Har'El <nyh@cloudius-systems.com>
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      88d44e95
    • Glauber Costa's avatar
      memcpy: improve performance for x86's memcpy · e7055f04
      Glauber Costa authored
      
      According to reality, the idea that rep movsb is the preferred way to implement
      memcpy for x86 in the presence of the rep_good flag is false. This
      implementation performs better in the misc-memcpy benchmark for pretty much all
      sizes.
      
      I have also tested a simple loop with byte-by-byte copy, and the duff's
      mechanism. For the Duff, I am seeing a weird bug when it is implemented
      together with our memcpy. But It is off course possible to implement it up to
      256 separately for analysis, which is what I did.
      
      What can be seen in the results below is that all versions start faster than
      rep movsb for very small objects, but the loop starts to be slower for sizes as
      low as 32-bytes.  Duff is slower for 64-byte elements, but this patch is faster
      for all sizes measured.  We can copy 64i bytes in 5.6ns, 128 bytes in 7.7ns and
      256 bytes in 13.3ns while the original numbers would be 11ns, 11ns, and 13.8
      ns.
      
      Balloon Safety:
      
      Balloon memcpys are 128Mb in size. Even for partial copy, they are at least in
      the kb range. So I am not expecting any funny interaction with this, nor
      anticipating the need to insert fixups here.
      
      Full Results:
      
      Original
      ========
      4,11.066000,13.217000,11.313369,0.527048
      8,29.427999,31.054001,29.797934,0.540056
      16,11.065000,11.147000,11.088465,0.030663
      32,11.065000,11.199000,11.093401,0.043994
      64,11.065000,11.508000,11.115365,0.092626
      128,12.866000,13.137000,12.914132,0.066646
      256,13.896000,14.252000,13.937533,0.067841
      512,15.955000,16.304001,16.006964,0.073594
      1024,20.072001,20.301001,20.122099,0.052627
      2048,28.306999,28.577999,28.377703,0.063443
      4096,44.785999,45.087002,44.899033,0.068806
      8192,77.783997,78.370003,77.918457,0.113472
      16384,150.259003,183.679001,158.534668,5.947755
      32768,1049.886963,1053.098022,1051.364380,0.851499
      
      Loop
      ====
      4,3.152000,3.734000,3.347033,0.185811
      8,4.467000,5.336000,4.936766,0.221336
      16,6.655000,8.262000,7.695767,0.377303
      32,19.788000,20.438000,19.960333,0.221289
      64,25.996000,29.969999,29.217133,0.828447
      128,44.501999,45.562000,45.335640,0.244315
      256,85.459000,95.369003,91.925179,3.409483
      512,14.925000,15.014000,14.939700,0.024197
      1024,19.042999,19.143000,19.060701,0.028286
      2048,27.277000,27.386000,27.306065,0.035528
      4096,43.750000,43.902000,43.789631,0.038810
      8192,76.699997,76.872002,76.769691,0.040407
      16384,149.393997,164.602005,157.051132,4.324330
      32768,1045.287964,1047.580933,1046.380493,0.617742
      
      Duff
      ====
      4,3.602000,4.120000,3.722167,0.163732
      8,4.631000,4.725000,4.643835,0.028509
      16,7.205000,7.316000,7.213567,0.022538
      32,11.838000,12.613000,12.032168,0.285366
      64,21.681000,22.173000,21.754402,0.088584
      128,41.331001,41.651001,41.452267,0.066087
      256,80.431000,80.927002,80.737724,0.106475
      
      This patch
      ==========
      4,3.602000,3.895000,3.636133,0.071126
      8,3.602000,3.679000,3.607600,0.015768
      16,3.859000,3.981000,3.875433,0.032632
      32,4.888000,4.994000,4.899767,0.025539
      64,5.663000,6.404000,6.001000,0.158665
      128,7.737000,8.168000,7.881701,0.156874
      256,13.301000,17.438999,14.937235,0.880874
      512,14.925000,15.226000,14.975132,0.072150
      1024,19.042999,19.412001,19.099068,0.095145
      2048,27.278000,32.022999,27.617165,1.007376
      4096,43.750000,44.146000,43.844494,0.094062
      8192,76.698997,83.873001,77.137794,1.266063
      16384,153.483994,168.636002,160.516830,3.837175
      32768,1047.878052,1068.301025,1052.600586,4.441750
      
      Signed-off-by: default avatarGlauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
      e7055f04
  2. May 22, 2014
  3. May 21, 2014
  4. May 20, 2014
  5. May 19, 2014
    • Glauber Costa's avatar
      rework OSV_SYM macro · 34ddac25
      Glauber Costa authored
      
      As Nadav pointed out during review, this macro could use a bit more work, to
      use a single parameter instead of one. That is what is done in this patch.
      Unfortunately just pasting __COUNTER__ doesn't work because of preprocessor
      rules, and we need some indirection to get it working. Also, visibility
      "hidden" can go because that is already implied by "static". The problem then
      becomes the fact that gcc does not really like unreferenced static variables,
      which is solved by the "used" attribute. From gcc docs about "used":
      
         "This attribute, attached to a variable with the static storage, means that
          the variable must be emitted even if it appears that the variable is not
          referenced."
      
      Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
      Signed-off-by: default avatarAvi Kivity <avi@cloudius-systems.com>
      34ddac25
Loading