mlock203 test in ubuntu_ltp_syscalls failed with Xenial kernel

Bug #1793451 reported by Po-Hsu Lin on 2018-09-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Undecided
Po-Hsu Lin
linux (Ubuntu)
Undecided
Unassigned
Xenial
Undecided
Po-Hsu Lin

Bug Description

== Justification ==
When one vma was with flag VM_LOCKED|VM_LOCKONFAULT (by invoking
mlock2(,MLOCK_ONFAULT)), it can again be populated with mlock() with
VM_LOCKED flag only.

There is a hole in mlock_fixup() which increase mm->locked_vm twice even
the two operations are on the same vma and both with VM_LOCKED flags.

The issue can be reproduced by following code:

  mlock2(p, 1024 * 64, MLOCK_ONFAULT); //VM_LOCKED|VM_LOCKONFAULT
  mlock(p, 1024 * 64); //VM_LOCKED

Then check the increase VmLck field in /proc/pid/status(to 128k).

When vma is set with different vm_flags, and the new vm_flags is with
VM_LOCKED, it is not necessarily be a "new locked" vma.

There is a dedicated reproducer, the "mlock203" test in ubuntu_ltp_syscalls, you can see the failure for all the Ubuntu 4.4 kernel:

 <<<test_start>>>
 tag=mlock203 stime=1537369891
 cmdline="mlock203"
 contacts=""
 analysis=exit
 <<<test_output>>>
 tst_test.c:1063: INFO: Timeout per run is 0h 05m 00s
 mlock203.c:63: FAIL: Locking one memory again increased VmLck

 Summary:
 passed 0
 failed 1
 skipped 0
 warnings 0

== Fix ==
b155b4fd (mm: mlock: avoid increase mm->locked_vm on mlock() when already mlock2(,MLOCK_ONFAULT))

A test kernel for Xenial / Xenial-KVM could be found here:
http://people.canonical.com/~phlin/kernel/lp-1793451-mlock203/

== Regression Potential ==
Low, this patch prevents mm->locked_vm from increment just by adding an extra check to see if the old vm_flags is already VM_LOCKED.

== Test Case ==
Run the mlock203 test in ubuntu_ltp_syscalls test suite. And it will pass with the patched kernel.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-1068-aws 4.4.0-1068.78
ProcVersionSignature: User Name 4.4.0-1068.78-aws 4.4.144
Uname: Linux 4.4.0-1068-aws x86_64
ApportVersion: 2.20.1-0ubuntu2.18
Architecture: amd64
Date: Thu Sep 20 06:44:13 2018
Ec2AMI: ami-0e32ec5bc225539f5
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-west-2b
Ec2InstanceType: c3.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
SourcePackage: linux-aws
UpgradeStatus: No upgrade log present (probably fresh install)

Po-Hsu Lin (cypressyew) wrote :
Po-Hsu Lin (cypressyew) wrote :

I can reproduce this in 4.4.0-1067-aws, so this is not a regression.

Po-Hsu Lin (cypressyew) wrote :

Found on 4.4 Trusty as well.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1793451

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Po-Hsu Lin (cypressyew) on 2018-10-23
summary: - mlock203 test in ubuntu_ltp_syscalls failed with X-AWS
+ mlock203 test in ubuntu_ltp_syscalls failed with Xenial kernel
Changed in ubuntu-kernel-tests:
assignee: nobody → Po-Hsu Lin (cypressyew)
status: New → In Progress
Changed in linux (Ubuntu):
status: Incomplete → In Progress
no longer affects: ubuntu-kernel-tests
Changed in ubuntu-kernel-tests:
assignee: nobody → Po-Hsu Lin (cypressyew)
status: New → In Progress
Changed in linux (Ubuntu):
assignee: nobody → Po-Hsu Lin (cypressyew)
Po-Hsu Lin (cypressyew) on 2018-10-23
description: updated
Po-Hsu Lin (cypressyew) on 2018-10-23
description: updated
Stefan Bader (smb) wrote :

I deleted the linux-aws task(s) because this will get fixed in the master kernel and then automatically end up in all derivatives. No need to track it for every derivative.

no longer affects: linux-aws (Ubuntu)
no longer affects: linux-aws (Ubuntu Xenial)
Changed in linux (Ubuntu Xenial):
status: New → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Po-Hsu Lin (cypressyew) wrote :

Passed with Xenial SRU

tags: added: verification-done-xenial
removed: verification-needed-xenial
Po-Hsu Lin (cypressyew) on 2018-11-20
Changed in linux (Ubuntu Xenial):
assignee: nobody → Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu):
assignee: Po-Hsu Lin (cypressyew) → nobody
Changed in ubuntu-kernel-tests:
status: In Progress → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (8.4 KiB)

This bug was fixed in the package linux - 4.4.0-140.166

---------------
linux (4.4.0-140.166) xenial; urgency=medium

  * linux: 4.4.0-140.166 -proposed tracker (LP: #1802776)

  * Bypass of mount visibility through userns + mount propagation (LP: #1789161)
    - mount: Retest MNT_LOCKED in do_umount
    - mount: Don't allow copying MNT_UNBINDABLE|MNT_LOCKED mounts

  * kdump fail due to an IRQ storm (LP: #1797990)
    - SAUCE: x86/PCI: Export find_cap() to be used in early PCI code
    - SAUCE: x86/quirks: Add parameter to clear MSIs early on boot
    - SAUCE: x86/quirks: Scan all busses for early PCI quirks

  * crash in ENA driver on removing an interface (LP: #1802341)
    - SAUCE: net: ena: fix crash during ena_remove()

  * xenial guest on arm64 drops to busybox under openstack bionic-rocky
    (LP: #1797092)
    - [Config] CONFIG_PCI_ECAM=y
    - PCI: Provide common functions for ECAM mapping
    - PCI: generic, thunder: Use generic ECAM API
    - PCI, of: Move PCI I/O space management to PCI core code
    - PCI: Move ecam.h to linux/include/pci-ecam.h
    - PCI: Add parent device field to ECAM struct pci_config_window
    - PCI: Add pci_unmap_iospace() to unmap I/O resources
    - PCI/ACPI: Support I/O resources when parsing host bridge resources
    - [Config] CONFIG_ACPI_MCFG=y
    - PCI/ACPI: Add generic MCFG table handling
    - PCI: Refactor pci_bus_assign_domain_nr() for CONFIG_PCI_DOMAINS_GENERIC
    - PCI: Factor DT-specific pci_bus_find_domain_nr() code out
    - ARM64: PCI: Add acpi_pci_bus_find_domain_nr()
    - ARM64: PCI: ACPI support for legacy IRQs parsing and consolidation with DT
      code
    - ARM64: PCI: Support ACPI-based PCI host controller

  * [GLK/CLX] Enhanced IBRS (LP: #1786139)
    - x86/speculation: Remove SPECTRE_V2_IBRS in enum spectre_v2_mitigation
    - x86/speculation: Support Enhanced IBRS on future CPUs

  * Update ENA driver to version 2.0.1K (LP: #1798182)
    - net: ena: remove ndo_poll_controller
    - net: ena: fix warning in rmmod caused by double iounmap
    - net: ena: fix rare bug when failed restart/resume is followed by driver
      removal
    - net: ena: fix NULL dereference due to untimely napi initialization
    - net: ena: fix auto casting to boolean
    - net: ena: minor performance improvement
    - net: ena: complete host info to match latest ENA spec
    - net: ena: introduce Low Latency Queues data structures according to ENA spec
    - net: ena: add functions for handling Low Latency Queues in ena_com
    - net: ena: add functions for handling Low Latency Queues in ena_netdev
    - net: ena: use CSUM_CHECKED device indication to report skb's checksum status
    - net: ena: explicit casting and initialization, and clearer error handling
    - net: ena: limit refill Rx threshold to 256 to avoid latency issues
    - net: ena: change rx copybreak default to reduce kernel memory pressure
    - net: ena: remove redundant parameter in ena_com_admin_init()
    - net: ena: update driver version to 2.0.1
    - net: ena: fix indentations in ena_defs for better readability
    - net: ena: Fix Kconfig dependency on X86
    - net: ena: enable Low Latency Queues
    - net: ena: fix compilat...

Read more...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Po-Hsu Lin (cypressyew) on 2018-12-05
Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers