Skip to content
Snippets Groups Projects
user avatar
Glauber Costa authored
Just like memcpy, memset can also benefit from special cases for small sizes.
However, as expected, the tradeoffs are different and the benefit is not as
large. In the best case, we are able to get it better up to 64 bytes. There
should still be a gain, because in workloads where memcpy will deal with small
sizes, memset will likely do so as well.

Again, I have compared the simple loop, duff's device, and "glommer's device",
with the latest being the winner. Here are the results, up to the point each
one starts losing:

Original:
=========

memset,4,9.007000,9.161000,9.024967,0.042445
memset,8,9.007000,9.137000,9.028934,0.043388
memset,16,9.006000,9.267000,9.028168,0.056487
memset,32,9.007000,11.719000,9.287668,0.716163
memset,64,9.007000,9.143000,9.023834,0.034745
memset,128,9.007000,9.174000,9.030134,0.044414

Loop:
=====

memset,4,3.122000,3.293000,3.158033,0.026586
memset,8,4.151000,5.077000,4.570933,0.207710
memset,16,7.021000,8.288000,7.873499,0.276310
memset,32,19.414000,19.792999,19.551334,0.086234

Duff:
=====

memset,4,3.602000,4.829000,3.936233,0.425657
memset,8,4.117000,4.526000,4.282266,0.100237
memset,16,4.889000,5.227000,5.105134,0.084525
memset,32,8.748000,8.884000,8.763433,0.038910
memset,64,16.983999,17.163000,17.018702,0.051896

Glommer:
========

memset,4,3.524000,3.664000,3.601167,0.028642
memset,8,3.088000,3.144000,3.092500,0.009790
memset,16,4.117000,4.170000,4.126300,0.014074
memset,32,4.888000,5.400000,5.172900,0.123619
memset,64,6.963000,7.023000,6.968966,0.013802
memset,128,11.065000,11.174000,11.076533,0.027541

Signed-off-by: default avatarGlauber Costa <glommer@cloudius-systems.com>
Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
28ff5b27
History
Name Last commit Last update
..
aarch64
common
x64