I believe this and ioctl on /dev/zfs, namely ZFS_IOC_OBJSET_STATS which is getting stats on all the zfs file systems. This ioctl takes longer to do as the number of clones increases. I believe this is the API causing the bottleneck.
perf shows that over 99.9% of the zfs clone is indeed performing this ioctl:
Running strace against zfs create I see the following ioctl() taking the time:
1500475028.118005 ioctl(3, _IOC(0, 0x5a, 0x12, 0x00), 0x7ffc7c2184f0) = -1 ENOMEM (Cannot allocate memory) <0.390093> PROT_WRITE, MAP_PRIVATE| MAP_ANONYMOUS, -1, 0) = 0x7fbfd487b000 <0.000017>
1500475028.508153 mmap(NULL, 290816, PROT_READ|
1500475028.508201 ioctl(3, _IOC(0, 0x5a, 0x12, 0x00), 0x7ffc7c2184f0) = 0 <0.382304>
I believe this and ioctl on /dev/zfs, namely ZFS_IOC_ OBJSET_ STATS which is getting stats on all the zfs file systems. This ioctl takes longer to do as the number of clones increases. I believe this is the API causing the bottleneck.
perf shows that over 99.9% of the zfs clone is indeed performing this ioctl:
- 99.39% 0.00% zfs [kernel.kallsyms] [k] sys_ioctl objset_ stats objset_ stats_impl. part.20
- 60.46% fnvlist_add_nvlist
nvlist_ add_nvlist
nvlist_ add_common. part.51
nvlist_ copy_embedded. isra.54
nvlist_ copy_pairs. isra.52
- nvlist_ add_common. part.51
- 30.35% nvlist_ copy_embedded. isra.54
nvlist_ copy_pairs. isra.52
+ nvlist_ add_common. part.51
- 29.62% nvlist_ remove_ all.part. 49
strcmp
- 31.23% fnvlist_add_boolean
- nvlist_add_boolean
- nvlist_ add_common. part.51
- 30.20% nvlist_ remove_ all.part. 49
strcmp
0.94% strcmp
- 6.37% dsl_dataset_ hold_obj
- 6.28% dmu_bonus_hold
- 5.89% dnode_hold
- 5.83% dnode_hold_impl
- 5.40% dbuf_read
- dmu_zfetch
- 5.19% dmu_zfetch_ dofetch. isra.7
- 4.76% dbuf_prefetch
- 2.67% dbuf_find
mutex_ lock
0. 91% mutex_unlock
0. 55% dnode_block_freed
sys_ioctl
do_vfs_ioctl
- zfsdev_ioctl
- 99.33% zfs_ioc_
- 99.30% zfs_ioc_
- 99.21% dmu_objset_stats
- dsl_dataset_stats
- 99.18% get_clones_stat