-
- Downloads
memset: make memset faster for small sizes
Just like memcpy, memset can also benefit from special cases for small sizes. However, as expected, the tradeoffs are different and the benefit is not as large. In the best case, we are able to get it better up to 64 bytes. There should still be a gain, because in workloads where memcpy will deal with small sizes, memset will likely do so as well. Again, I have compared the simple loop, duff's device, and "glommer's device", with the latest being the winner. Here are the results, up to the point each one starts losing: Original: ========= memset,4,9.007000,9.161000,9.024967,0.042445 memset,8,9.007000,9.137000,9.028934,0.043388 memset,16,9.006000,9.267000,9.028168,0.056487 memset,32,9.007000,11.719000,9.287668,0.716163 memset,64,9.007000,9.143000,9.023834,0.034745 memset,128,9.007000,9.174000,9.030134,0.044414 Loop: ===== memset,4,3.122000,3.293000,3.158033,0.026586 memset,8,4.151000,5.077000,4.570933,0.207710 memset,16,7.021000,8.288000,7.873499,0.276310 memset,32,19.414000,19.792999,19.551334,0.086234 Duff: ===== memset,4,3.602000,4.829000,3.936233,0.425657 memset,8,4.117000,4.526000,4.282266,0.100237 memset,16,4.889000,5.227000,5.105134,0.084525 memset,32,8.748000,8.884000,8.763433,0.038910 memset,64,16.983999,17.163000,17.018702,0.051896 Glommer: ======== memset,4,3.524000,3.664000,3.601167,0.028642 memset,8,3.088000,3.144000,3.092500,0.009790 memset,16,4.117000,4.170000,4.126300,0.014074 memset,32,4.888000,5.400000,5.172900,0.123619 memset,64,6.963000,7.023000,6.968966,0.013802 memset,128,11.065000,11.174000,11.076533,0.027541 Signed-off-by:Glauber Costa <glommer@cloudius-systems.com> Signed-off-by:
Pekka Enberg <penberg@cloudius-systems.com>
Loading
Please register or sign in to comment