init_on_alloc=1 causes big performance regression for ZFS

Bug #1862822 reported by Matthew Ahrens
60
This bug affects 8 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Apologies if this is not the right forum to raise this concern. If not please point me to a better place to have this discussion.

The 5.3 linux kernel added a new feature which allows pages to be zeroed when allocating or freeing them: init_on_alloc and init_on_free. The init_on_alloc flag is enabled by default on the Ubuntu 18.04 HWE kernel.

Linux kernel commit: https://github.com/torvalds/linux/commit/6471384af2a6530696fc0203bafe4de41a23c9ef

ZFS allocates and frees pages frequently (via the ABD structure), e.g. for every disk access. The additional overhead of zeroing these pages is significant. For example, I measured a ~40% regression in performance of an uncached "zfs send ... >/dev/null". While this is a specific workload, this is a general problem because whenever we read from disk we are going to allocate pages, unnecessarily zero them, and then immediately fill them with the data from disk.

I don't see any mechanism for ZFS to request that pages *not* be zeroed when they are allocated.

This performance regression has been noted by users of ZFS on Ubuntu: https://github.com/zfsonlinux/zfs/issues/9910

I'd like to request that the default in Ubuntu be changed to init_on_alloc=0, so that users of ZFS on Ubuntu do not experience this performance regression.

(FYI, my employer, Delphix, has worked around this by changing the default in our product, but my concern is that the vast majority of Ubuntu users will simply experience this as "Ubuntu HWE is much slower than before".)

$ cat /proc/version_signature
Ubuntu 5.3.0-28.30~18.04.1-generic 5.3.13
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Feb 11 18:34 seq
 crw-rw---- 1 root audio 116, 33 Feb 11 18:34 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb: Error: [Errno 2] No such file or directory: 'lsusb': 'lsusb'
MachineType: VMware, Inc. VMware Virtual Platform
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 svgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/ROOT/username.90dy59N/root@/boot/vmlinuz-5.3.0-28-generic root=ZFS=rpool/ROOT/username.90dy59N/root ro console=tty0 console=ttyS0,38400n8 mitigations=off ipv6.disable=1 elevator=noop crashkernel=256M,high crashkernel=256M,low init_on_alloc=0
ProcVersionSignature: Ubuntu 5.3.0-28.30~18.04.1-generic 5.3.13
RelatedPackageVersions:
 linux-restricted-modules-5.3.0-28-generic N/A
 linux-backports-modules-5.3.0-28-generic N/A
 linux-firmware 1.173.14
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic uec-images
Uname: Linux 5.3.0-28-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: linux-staff root
WifiSyslog:

_MarkForUpload: False
dmi.bios.date: 12/12/2018
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 6.00
dmi.board.name: 440BX Desktop Reference Platform
dmi.board.vendor: Intel Corporation
dmi.board.version: None
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd12/12/2018:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
dmi.product.name: VMware Virtual Platform
dmi.product.version: None
dmi.sys.vendor: VMware, Inc.

Revision history for this message
Matthew Ahrens (matthew.ahrens) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1862822

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Matthew Ahrens (matthew.ahrens) wrote :

I installed apport and python3-launchpadlib, and ran the requested command:

$ apport-collect 1862822
The authorization page:
 (https://launchpad.net/+authorize-token?oauth_token=KJm30XZW6qPv9HDzn45k&allow_permission=DESKTOP_INTEGRATION)
should be opening in your browser. Use your browser to authorize
this program to access Launchpad on your behalf.
Waiting to hear from Launchpad about your decision...

However, I don't have a web browser installed. Let me know if there's another way to gather the requested information.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Andy Whitcroft (apw) wrote :

@matthew.ahrens you can paste that URL into any browser on any machine and authorise it.

Revision history for this message
Matthew Ahrens (matthew.ahrens) wrote : CRDA.txt

apport information

tags: added: apport-collected bionic uec-images
description: updated
Revision history for this message
Matthew Ahrens (matthew.ahrens) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Matthew Ahrens (matthew.ahrens) wrote : Lspci.txt

apport information

Revision history for this message
Matthew Ahrens (matthew.ahrens) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Matthew Ahrens (matthew.ahrens) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Matthew Ahrens (matthew.ahrens) wrote : ProcModules.txt

apport information

Revision history for this message
Matthew Ahrens (matthew.ahrens) wrote : UdevDb.txt

apport information

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hi, Mr. Ahrens.

For now, I am not yet convinced about removing the init_on_alloc default option. I would agree if anyone argues that a page recycling mechanism defeats the purpose of the option anyway. But it would, at least, be restricted to whatever uses such mechanisms. (I am familiar with some network drivers that do.)

I hope it's at least useful for you to know what we are leaning torwards here.

Regards.
Cascardo.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

init_on_alloc=1 as a default servers Ubuntu users as a security hardening mechanism, preventing a class of security issues from being exploitable. Disabling that in general would allow those unknown vulnerabilities open to be exploited.

It can be turned off during boot by using init_on_alloc=0 kernel parameter. So any user who is affected by this performance issue can mitigate that by disabling the hardening option, after considering its possible consequences.

As we would rather not expose those users who depend or desire that ZFS performance, I would like to ask if it would be possible to mitigate that on ZFS by reusing pages or maybe pre-allocating them.

Regards.
Cascardo.

Revision history for this message
George Melikov (gmelikov) wrote :

> It can be turned off during boot by using init_on_alloc=0 kernel parameter. So any user who is affected by this performance issue can mitigate that by disabling the hardening option, after considering its possible consequences.

And one can find the root cause only by test and compare with other kernel with init_on_alloc=0 by default, and iterate over all config options and kernel changes.

Looks like another point to add into "someone's performance guide", because it can't be mitigated easily via kernel interface changes from module's side (AFAIK).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.