-
- Downloads
zfsbuffers: Do not truncate files
There is a problem with the way ZFS currently handles its buffers, which is
actually a limitation of our allocator: buffers smaller than a page won't be
page aligned even if we ask for it. Therefore, if the buffer we are mapping
falls into this category, we will map the wrong location.
The way I solved this problem was so stupid, that in retrospect I can't even
believe I did it: when the file would run out of size, we would truncate the
file. This is obviously wrong because reading a file is not expected to change
its size in any circumstance, and if anybody relied in the actual size, we will
be crashing something. This is the bug that plagued Cassandra.
Not truncating, however, brings back the original problem. One solution I have
considered is to always allocate at least a page for data allocations (leaving
metadata alone), but that would deviate from ZFS and harm many-small-files
workloads.
However, During testing, I have noticed though that ZFS will allocate small
buffers only when the file itself is small. This means that we can just avoid
using the special shared mapping for small files - which makes sense anyway.
For instance, if we have a file that is 128k + 1byte (remember 128k is ZFS's
maximum buffer size), both buffers will be large enough to be aligned. And if I
that ever fails to hold, we will now see an assertion hit instead of a random
bug. In time, we should fix our allocator to provide alignment guarantees.
Signed-off-by:
Glauber Costa <glommer@cloudius-systems.com>
Loading
Please register or sign in to comment