Skip to content
Snippets Groups Projects
user avatar
Tomasz Grabiec authored
This was the cause of poor ZFS performance in misc-fs-stress test.

Before:

 Wrote 168.129 MB in 10.12 s = 16.610 Mb/s
 Wrote 194.688 MB in 10.00 s = 19.469 Mb/s
 Wrote 183.004 MB in 10.06 s = 18.186 Mb/s
 Wrote 167.754 MB in 10.28 s = 16.315 Mb/s

After:

 Wrote 636.227 MB in 10.00 s = 63.623 Mb/s
 Wrote 666.979 MB in 10.00 s = 66.696 Mb/s
 Wrote 613.512 MB in 10.00 s = 61.350 Mb/s
 Wrote 573.502 MB in 10.00 s = 57.346 Mb/s
 Wrote 668.607 MB in 10.00 s = 66.857 Mb/s
 Wrote 630.920 MB in 10.00 s = 63.087 Mb/s

It turned out that the limiting factor was the ARC cache. A check
inside arc_tempreserve_space() was forcing txg to be synced too often
(once every 400ms). The arc_c variable was only 16M (arc_c_min) which
allowed to write only 8M per transaction. It turns out that arc_c
depends on kmem_size() which is based on physmem which was never
initialized.

I would hold with commiting this yet because of several reasons,
which I want to put under your consideration.

While this improves write throughput it makes the boot time after make
much longer, on my disk the boot time is increased from 1.5s to 10s.
This is because zfs verifies the last 3 txgs upon mount. This patch
increases txg size, which results in more data to check in the next
boot. I'm working on solving this right now.

Something worth noting is that while larger transactions sync less
often incresing throughput they also sync longer increasing worst case
latency. In my test the pauses get as high as 3 seconds with 1G of
guest memory.

Signed-off-by: default avatarTomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: default avatarPekka Enberg <penberg@cloudius-systems.com>
9b72ad47
History