Native ZFS for Linux

Bug #1899249
Comment #2

Comment 2 for bug 1899249

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-10-10:

It is worth noting that using dededuplication does consume a lot of memory.

Deduplication tables consume memory and eventually spill over and consume disk space - this causes extra read and write operations for every block of data on which deduplication is attempted cause more I/O waits.

A system with large pools with small memory areas does not perform deduplication well.

So, if you run the following command on your pool (pool-ssd in this example)

sudo zdb -s pool-ssd

..you see something like the following:

pool: pool-ssd
state: ONLINE
scan: scrub repaired 0B in 0 days 00:31:14 with 0 errors on Wed Sep 2 14:35:52 2020
config:

NAME STATE READ WRITE CKSUM
pool-ssd ONLINE 0 0 0
   mirror-0 ONLINE 0 0 0
     ata-INTEL_SSDSC2BW480H6_CVTR608104NW480EGN ONLINE 0 0 0
     ata-INTEL_SSDSC2BW480H6_CVTR552500S0480EGN ONLINE 0 0 0

errors: No known data errors

dedup: DDT entries 4258028, size 448B on disk, 306B in core

bucket allocated ______ _______ />_______________________ ______________________________
PSIZE DSIZE blocks LSIZE PSIZE DSIZE
----- ----- ------ ----- ----- -----
3.63M 464G 226G 226G 3.63M 464G 226G 226G
376K 46.1G 24.3G 24.3G 822K 100G 53.2G 53.2G
51.3K 5.60G 2.51G 2.51G 240K 26.3G 11.8G 11.8G
7.49K 925M 544M 544M 77.9K 9.41G 5.47G 5.47G
3.32K 415M 261M 261M 68.8K 8.41G 5.35G 5.35G
657 78.7M 62.1M 62.1M 29.3K 3.51G 2.79G 2.79G
389 46.2M 42.7M 42.7M 34.0K 4.03G 3.78G 3.78G
248 30.1M 29.5M 29.5M 44.8K 5.44G 5.32G 5.32G
339 42.1M 40.8M 40.8M 123K 15.4G 14.9G 14.9G
374 46.8M 46.1M 46.1M 271K 33.9G 33.5G 33.5G
254 31.8M 31.3M 31.3M 355K 44.4G 43.8G 43.8G
5 640K 513K 513K 12.9K 1.61G 1.20G 1.20G
4 512K 512K 512K 22.7K 2.84G 2.84G 2.84G
8 1M 897K 897K 89.3K 11.2G 10.1G 10.1G
2 256K 6K 6K 226K 28.3G 679M 679M
517G 254G 254G 5.99M 759G 421G 421G

---
Looking at the "dedup: DDT entries 4258028, size 448B on disk, 306B in core" line, we have:

4258028 de-dup entries, each entry used 306 bytes, so that's ~1.2GB of memory for just the de-dup/ table. But it is capped at 1/4 the size of the ARC, so you may find on a small memory system such as a raspberry pi that a lot of the dedup table can't be stored in memory and needs to be on-disk, and this increases the I/O waits.

It may be an idea to disable de-dup and see if it helps to resolve your issue on a comparatively slow and memory constrained system such as a raspberry pi when you have large pools

It is worth noting that using dededuplication does consume a lot of memory.

A system with large pools with small memory areas does not perform deduplication well.

So, if you run the following command on your pool (pool-ssd in this example)

sudo zdb -s pool-ssd

..you see something like the following:

pool: pool-ssd
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:31:14 with 0 errors on Wed Sep  2 14:35:52 2020
config:

NAME                                            STATE     READ WRITE CKSUM
	pool-ssd                                        ONLINE       0     0     0
	  mirror-0                                      ONLINE       0     0     0
	    ata-INTEL_SSDSC2BW480H6_CVTR608104NW480EGN  ONLINE       0     0     0
	    ata-INTEL_SSDSC2BW480H6_CVTR552500S0480EGN  ONLINE       0     0     0

errors: No known data errors

dedup: DDT entries 4258028, size 448B on disk, 306B in core

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    3.63M    464G    226G    226G    3.63M    464G    226G    226G
     2     376K   46.1G   24.3G   24.3G     822K    100G   53.2G   53.2G
     4    51.3K   5.60G   2.51G   2.51G     240K   26.3G   11.8G   11.8G
     8    7.49K    925M    544M    544M    77.9K   9.41G   5.47G   5.47G
    16    3.32K    415M    261M    261M    68.8K   8.41G   5.35G   5.35G
    32      657   78.7M   62.1M   62.1M    29.3K   3.51G   2.79G   2.79G
    64      389   46.2M   42.7M   42.7M    34.0K   4.03G   3.78G   3.78G
   128      248   30.1M   29.5M   29.5M    44.8K   5.44G   5.32G   5.32G
   256      339   42.1M   40.8M   40.8M     123K   15.4G   14.9G   14.9G
   512      374   46.8M   46.1M   46.1M     271K   33.9G   33.5G   33.5G
    1K      254   31.8M   31.3M   31.3M     355K   44.4G   43.8G   43.8G
    2K        5    640K    513K    513K    12.9K   1.61G   1.20G   1.20G
    4K        4    512K    512K    512K    22.7K   2.84G   2.84G   2.84G
    8K        8      1M    897K    897K    89.3K   11.2G   10.1G   10.1G
   64K        2    256K      6K      6K     226K   28.3G    679M    679M
 Total    4.06M    517G    254G    254G    5.99M    759G    421G    421G

---
Looking at the "dedup: DDT entries 4258028, size 448B on disk, 306B in core" line, we have:

4258028 de-dup entries, each entry used 306 bytes, so that's ~1.2GB of memory for just the de-dup/ table.  But it is capped at 1/4 the size of the ARC, so you may find on a small memory system such as a raspberry pi that a lot of the dedup table can't be stored in memory and needs to be on-disk, and this increases the I/O waits.

It may be an idea to disable de-dup and see if it helps to resolve your issue on a comparatively slow and memory constrained system such as a raspberry pi when you have large pools