Skip to content

disk I O settings for checking files

Arvid Norberg edited this page Jun 7, 2020 · 1 revision

Results from running experiments with tools/benchmark_checking.py. This is the initial state of RC_2_0 as the release candidate branch was cut. The benchmark python script was modified to drop disk caches (echo 3 > /proc/sys/vm/drop_caches) in between each run. These tests were conducted on Ubuntu 20.04.

baseline

Time to check files, in milliseconds, on NVMe SSD.

aio_threads checking_mem_usage
  256 512 1024 2048
4 54682 54492 55522 54793
8 29820 29635 30068 30161
16 15619 15824 15542 15506
32 8569 8501 8513 8547
64 6244 6273 6323 6230

Time to check files, in milliseconds, on spinning harddrive (HDD).

aio_threads checking_mem_usage
  256 512 1024 2048
4 60493 61137 61384 61166
8 60546 61008 60757 59607
16 144554 144959 145621 146460
32 152985 154505 153728 152788
64 163493 164861 165105 165276

From these results it seems checking_mem_usage does not make any significant difference.

The default is changed from 1024 to 256, which represents 4 MiB of outstanding hash jobs.

Experiment 1: set MADV_SEQUENTIAL for file maps when checking

aio_threads time (ms) SSD time(ms) HDD
4 53686 59612
8 29597 60361
16 15437 145670
32 8566 152811
64 6324 165507

madvise(MADV_SEQUENTIAL) made no difference in these tests.

Experiment 2: one in three is a hashing thread

The current default is 1 in 4 disk threads are dedicated for hashing. Change this to 1 in 3 disk threads.

aio_threads time (ms) SSD time (ms) HDD
3 54082 59313
4 53889 60679
6 28985 60173
8 29283 60237
12 15264 146490
16 12523 142329
18 10760 147569
24 8335 152885
32 7172 155793
64 5864 174260

Experiment 3: one in two hashing threads

Make every other disk thread dedicated for computing hashes.

aio_threads time (ms) SSD time (ms) HDD
3 53843 60315
4 30139 59588
6 20304 141779
8 15566 144288
12 10824 146721
16 8473 152995
18 7822 153336
24 6682 158921
32 6239 162185
64 5917 251542

It seems HDD performance drops significantly at > 2 hasher threads, whereas SSD performance (which is most likely CPU bound for the most part) just improves the more threads thrown at it.

Some more investigation need to go into understanding what happens at 3 hasher threads on a hard drive.