Transparent hugepages should default to enabled=madvise

Bug #1703742 reported by Nelson Elhage
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Colin Ian King
Xenial
Fix Released
Undecided
Colin Ian King
Yakkety
Won't Fix
Undecided
Unassigned
Zesty
Won't Fix
Undecided
Colin Ian King
Artful
Fix Released
High
Colin Ian King
linux-gke (Ubuntu)
Fix Released
Undecided
Kamal Mostafa
Xenial
Fix Released
Undecided
Kamal Mostafa

Bug Description

SRU Justification, Zesty, Xenial

[Impact]

Ubuntu kernels should default transparent_hugepages to enabled=madvise, not enabled=always

(this corresponds to TRANSPARENT_HUGEPAGE_MADVISE=y in .config).

I've blogged about this at some length here: https://blog.nelhage.com/post/transparent-hugepages/ but here is a summary:

Transparent Hugepages are a feature that allows the kernel to attempt to automatically back any anonymous maps with "huge" 2MiB page tables, instead of the normal 4k entries. It can produce small net performance gains in certain benchmarks, but also has numerous downsides, in the form of apparent memory leaks and 30% slowdowns or worse for some applications. Many popular pieces of software now refuse to run with hugepages enabled because of known performance issues.

Examples of problem reports:
MongoDB: https://docs.mongodb.com/manual/tutorial/transparent-huge-pages/
Oracle: https://blogs.oracle.com/linux/entry/performance_issues_with_transparent_huge
Splunk: https://docs.splunk.com/Documentation/Splunk/6.5.2/ReleaseNotes/SplunkandTHP
Go runtime: https://github.com/golang/go/issues/8832
jemalloc: https://blog.digitalocean.com/transparent-huge-pages-and-alternative-memory-allocators/
node.js: https://github.com/nodejs/node/issues/11077

Setting `enabled=madvise` enables applications that know they benefit from transparent huge pages to opt-in to this feature, while eliminating all the problematic behavior for other applications. Note also that transparent hugepage settings don't affect the use of explicit hugepages via hugetlbfs or mmap(…, MAP_HUGETLB, …)

[Fix]
Enable CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y

[Regression Potential]
May marginally impact performance, but the benefits of madvise'd huge transparent hugepage over-weigh this considerably.

This has been enabled in Artful since July (and in Bionic) and we've not seen any regression reports, so we are confident this has minimal impact.

[Testcase]
Kernel ADT tests and stress-ng memory tests should not show any regressions.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1703742

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Colin Ian King (colin-king) wrote :

Thank you for reporting this. I'm currently evaluating this with some benchmarking experiments and my current observation concurs with the observations. I believe enabled=madvise is probably the best way forward but I still have several more test scenarios to complete to ensure we don't hit any unexpected performance regresions.

Changed in linux (Ubuntu):
status: Incomplete → In Progress
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Colin Ian King (colin-king) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :
Seth Forshee (sforshee)
Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Revision history for this message
Colin Ian King (colin-king) wrote :

Note: Current plan is to see how this works on Artful and SRU these once we are happy with the results in Artful.

Revision history for this message
Andy Whitcroft (apw) wrote : Closing unsupported series nomination.

This bug was nominated against a series that is no longer supported, ie yakkety. The bug task representing the yakkety nomination is being closed as Won't Fix.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu Yakkety):
status: New → Won't Fix
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (24.9 KiB)

This bug was fixed in the package linux - 4.11.0-13.19

---------------
linux (4.11.0-13.19) artful; urgency=low

  * CVE-2017-7533
    - dentry name snapshots

linux (4.11.0-12.18) artful; urgency=low

  * linux: 4.11.0-12.18 -proposed tracker (LP: #1707635)
    - no change rebuild to pick up the new binutils.

  * Adt tests of src:linux time out often on armhf lxc containers (LP: #1705495)
    - [Packaging] tests -- reduce rebuild test to one flavour
    - [Packaging] tests -- reduce rebuild test to one flavour -- use filter

  * [ARM64] config EDAC_GHES=y depends on EDAC_MM_EDAC=y (LP: #1706141)
    - [Config] set EDAC_MM_EDAC=y for ARM64

  * [Hyper-V] hv_netvsc: Exclude non-TCP port numbers from vRSS hashing
    (LP: #1690174)
    - hv_netvsc: Exclude non-TCP port numbers from vRSS hashing

  * ath10k doesn't report full RSSI information (LP: #1706531)
    - ath10k: add per chain RSSI reporting

  * ideapad_laptop don't support v310-14isk (LP: #1705378)
    - platform/x86: ideapad-laptop: Add several models to no_hw_rfkill

  * Ubuntu 16.04.3: Qemu fails on P9 (LP: #1686019)
    - KVM: PPC: Pass kvm* to kvmppc_find_table()
    - KVM: PPC: Use preregistered memory API to access TCE list
    - KVM: PPC: VFIO: Add in-kernel acceleration for VFIO
    - powerpc/powernv/iommu: Add real mode version of iommu_table_ops::exchange()
    - powerpc/iommu/vfio_spapr_tce: Cleanup iommu_table disposal
    - powerpc/vfio_spapr_tce: Add reference counting to iommu_table
    - powerpc/mmu: Add real mode support for IOMMU preregistered memory
    - KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number
    - KVM: PPC: Book3S HV: Add radix checks in real-mode hypercall handlers

  * hns: ethtool selftest crashes system (LP: #1705712)
    - net/hns:bugfix of ethtool -t phy self_test

  * ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with vhost=on
    (LP: #1673564)
    - KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of ICH_APxRn_EL2
      registers
    - KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction
    - arm64: Add a facility to turn an ESR syndrome into a sysreg encoding
    - KVM: arm/arm64: vgic-v3: Add accessors for the ICH_APxRn_EL2 registers
    - KVM: arm64: Make kvm_condition_valid32() accessible from EL2
    - KVM: arm64: vgic-v3: Add hook to handle guest GICv3 sysreg accesses at EL2
    - KVM: arm64: vgic-v3: Add ICV_BPR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGRPEN1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IAR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_EOIR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_AP1Rn_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_HPPIR1_EL1 handler
    - KVM: arm64: vgic-v3: Enable trapping of Group-1 system registers
    - KVM: arm64: Enable GICv3 Group-1 sysreg trapping via command-line
    - KVM: arm64: vgic-v3: Add ICV_BPR0_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGNREN0_EL1 handler
    - KVM: arm64: vgic-v3: Add misc Group-0 handlers
    - KVM: arm64: vgic-v3: Enable trapping of Group-0 system registers
    - KVM: arm64: Enable GICv3 Group-0 sysreg trapping via command-line
    - arm64: Add MIDR values for Cavium cn83XX SoCs
    - arm64: Add wor...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Xenial):
assignee: nobody → Colin Ian King (colin-king)
Changed in linux (Ubuntu Zesty):
assignee: nobody → Colin Ian King (colin-king)
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Zesty):
status: New → In Progress
no longer affects: linux-gke (Ubuntu Yakkety)
no longer affects: linux-gke (Ubuntu Zesty)
no longer affects: linux-gke (Ubuntu Artful)
Changed in linux-gke (Ubuntu Xenial):
status: New → Fix Committed
assignee: nobody → Kamal Mostafa (kamalmostafa)
Changed in linux-gke (Ubuntu):
status: New → Fix Committed
assignee: nobody → Kamal Mostafa (kamalmostafa)
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (11.7 KiB)

This bug was fixed in the package linux-gke - 4.4.0-1033.33

---------------
linux-gke (4.4.0-1033.33) xenial; urgency=low

  * linux-gke: 4.4.0-1033.33 -proposed tracker (LP: #1722308)

  * Transparent hugepages should default to enabled=madvise (LP: #1703742)
    - [gke] UBUNTU: SAUCE: use CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y as default

  [ Ubuntu: 4.4.0-98.121 ]

  * linux: 4.4.0-98.121 -proposed tracker (LP: #1722299)
  * Controller lockup detected on ProLiant DL380 Gen9 with P440 Controller
    (LP: #1720359)
    - scsi: hpsa: limit transfer length to 1MB
  * [Dell Docking IE][0bda:8153] Realtek USB Ethernet leads to system hang
    (LP: #1720977)
    - r8152: fix the list rx_done may be used without initialization
  * Add installer support for Broadcom BCM573xx network drivers. (LP: #1720466)
    - d-i: Add bnxt_en to nic-modules.
  * snapcraft.yaml: add dpkg-dev to the build deps (LP: #1718886)
    - snapcraft.yaml: add dpkg-dev to the build deps
  * Support setting I2C_TIMEOUT via ioctl for i2c-designware (LP: #1718578)
    - i2c: designware: Use transfer timeout from ioctl I2C_TIMEOUT
  * 5U84 - ses driver isn't binding right - cannot blink lights on 1 of the 2
    5u84 (LP: #1693369)
    - scsi_transport_sas: add function to get SAS endpoint address
    - ses: fix discovery of SATA devices in SAS enclosures
    - scsi: sas: provide stub implementation for scsi_is_sas_rphy
    - scsi: ses: Fix SAS device detection in enclosure
  * multipath -ll is not showing the disks which are actually multipath
    (LP: #1718397)
    - fs: aio: fix the increment of aio-nr and counting against aio-max-nr
  * Support Dell Wireless DW5819/5818 WWAN devices (LP: #1721455)
    - SAUCE: USB: serial: qcserial: add Dell DW5818, DW5819
  * CVE-2017-10911
    - xen-blkback: don't leak stack data via response ring
  * implement 'complain mode' in seccomp for developer mode with snaps
    (LP: #1567597)
    - seccomp: Provide matching filter for introspection
    - seccomp: Sysctl to display available actions
    - seccomp: Operation for checking if an action is available
    - seccomp: Sysctl to configure actions that are allowed to be logged
    - seccomp: Selftest for detection of filter flag support
    - seccomp: Action to log before allowing
  * implement errno action logging in seccomp for strict mode with snaps
    (LP: #1721676)
    - seccomp: Provide matching filter for introspection
    - seccomp: Sysctl to display available actions
    - seccomp: Operation for checking if an action is available
    - seccomp: Sysctl to configure actions that are allowed to be logged
    - seccomp: Selftest for detection of filter flag support
    - seccomp: Filter flag to log all actions except SECCOMP_RET_ALLOW
  * [Xenial] update OpenNSL kernel modules to 6.5.10 (LP: #1721511)
    - SAUCE: update OpenNSL kernel modules to 6.5.10
  * Xenial update to 4.4.90 stable release (LP: #1721550)
    - cifs: release auth_key.response for reconnect.
    - mac80211: flush hw_roc_start work before cancelling the ROC
    - KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()
    - tracing: Fix trace_pipe behavior for instance traces
    - tracing: Erase irq...

Changed in linux-gke (Ubuntu Xenial):
status: Fix Committed → Fix Released
description: updated
Stefan Bader (smb)
Changed in linux (Ubuntu Zesty):
status: In Progress → Won't Fix
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Stefan Bader (smb) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Colin Ian King (colin-king) wrote :

I've been exercising the -proposed kernel all day on amd64 using stress-ng memory tests without any observed regressions.

Benchmark comparisons between the current -proposed kernel and the previous xenial kernel show that:

1. stream test:

stream test (stress-ng): stress-ng --stream 1 --stream-l3-size 1M -t 60 --metrics-brief -v

a 1% to 7% improvement on bogo stream ops per second with stream memory sizes ranging from 1M through to 512M.

2. malloc test:

a 13-30% improvement on bogo malloc/free allocations (indirectly using mmap/munmap) for memory sizes ranging from 1M through to 1024M

I believe this vindicates the default enabling of THP using enabled=madvise

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (56.9 KiB)

This bug was fixed in the package linux - 4.4.0-119.143

---------------
linux (4.4.0-119.143) xenial; urgency=medium

  * linux: 4.4.0-119.143 -proposed tracker (LP: #1760327)

  * Dell XPS 13 9360 bluetooth scan can not detect any device (LP: #1759821)
    - Revert "Bluetooth: btusb: fix QCA Rome suspend/resume"

linux (4.4.0-118.142) xenial; urgency=medium

  * linux: 4.4.0-118.142 -proposed tracker (LP: #1759607)

  * Kernel panic with AWS 4.4.0-1053 / 4.4.0-1015 (Trusty) (LP: #1758869)
    - x86/microcode/AMD: Do not load when running on a hypervisor

  * CVE-2018-8043
    - net: phy: mdio-bcm-unimac: fix potential NULL dereference in
      unimac_mdio_probe()

linux (4.4.0-117.141) xenial; urgency=medium

  * linux: 4.4.0-117.141 -proposed tracker (LP: #1755208)

  * Xenial update to 4.4.114 stable release (LP: #1754592)
    - x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
    - usbip: prevent vhci_hcd driver from leaking a socket pointer address
    - usbip: Fix implicit fallthrough warning
    - usbip: Fix potential format overflow in userspace tools
    - x86/microcode/intel: Fix BDW late-loading revision check
    - x86/retpoline: Fill RSB on context switch for affected CPUs
    - sched/deadline: Use the revised wakeup rule for suspending constrained dl
      tasks
    - can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
    - can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
    - PM / sleep: declare __tracedata symbols as char[] rather than char
    - time: Avoid undefined behaviour in ktime_add_safe()
    - timers: Plug locking race vs. timer migration
    - Prevent timer value 0 for MWAITX
    - drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
    - drivers: base: cacheinfo: fix boot error message when acpi is enabled
    - PCI: layerscape: Add "fsl,ls2085a-pcie" compatible ID
    - PCI: layerscape: Fix MSG TLP drop setting
    - mmc: sdhci-of-esdhc: add/remove some quirks according to vendor version
    - fs/select: add vmalloc fallback for select(2)
    - hwpoison, memcg: forcibly uncharge LRU pages
    - cma: fix calculation of aligned offset
    - mm, page_alloc: fix potential false positive in __zone_watermark_ok
    - ipc: msg, make msgrcv work with LONG_MIN
    - x86/ioapic: Fix incorrect pointers in ioapic_setup_resources()
    - ACPI / processor: Avoid reserving IO regions too early
    - ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
    - ACPICA: Namespace: fix operand cache leak
    - netfilter: x_tables: speed up jump target validation
    - netfilter: arp_tables: fix invoking 32bit "iptable -P INPUT ACCEPT" failed
      in 64bit kernel
    - netfilter: nf_dup_ipv6: set again FLOWI_FLAG_KNOWN_NH at flowi6_flags
    - netfilter: nf_ct_expect: remove the redundant slash when policy name is
      empty
    - netfilter: nfnetlink_queue: reject verdict request from different portid
    - netfilter: restart search if moved to other chain
    - netfilter: nf_conntrack_sip: extend request line validation
    - netfilter: use fwmark_reflect in nf_send_reset
    - ext2: Don't clear SGID when inheriting ACLs
    - reiserfs: fix race in prealloc discard
    - re...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Po-Hsu Lin (cypressyew)
Changed in linux-gke (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.