Comment 9 for bug 1798127

Revision history for this message
Jeff Lane  (bladernr) wrote : Re: [Bug 1798127] Re: CPU Soft Lockups when stress-ng stack stressor runs with M.2 NVMe as root FS

Ahhh, ok. Thanks. That makes sense, then.
On Fri, Nov 16, 2018 at 12:01 AM Joseph Salisbury
<email address hidden> wrote:
>
> The Xenial bug task is for the base Xenial kernel version 4.4. Any
> commits/fixes applied to Bionic flow down into Xenial HWE because Bionic
> is 4.15 based, which is the source for Xenial HWE.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1798127
>
> Title:
> CPU Soft Lockups when stress-ng stack stressor runs with M.2 NVMe as
> root FS
>
> Status in linux package in Ubuntu:
> Triaged
> Status in linux source package in Xenial:
> Invalid
> Status in linux source package in Bionic:
> Triaged
> Status in linux source package in Cosmic:
> Triaged
>
> Bug description:
> This was reported by a hardware partner. The system set up is a
> server with 512GB RAM and an M.2 NVMe drive as the root
> filesystem/boot device.
>
> Per the customer, when running the certification Memory Stress test
> (utilizing several stress-ng stressors run in sequence) the system
> freezes with CPU Soft Lockup errors appearing on console whe the
> "stack" stressor is run.
>
> Tester has tried with 2.5” SATA (1TB), 2.5” NVMe (800GB), and M.2 NVMe
> (1.9TB).
>
> So far, this only seems to affect the 4.15 kernel. The tester has
> tried using the 2.5" SATA SSD as the RootFS/Boot device and the tests
> pass on all attempts. It is ONLY when using the M.2 NVMe as the root
> / boot device that the tests cause a lockup. The tester is re-trying
> now with the 2.5" NVMe device to see if this only occurs with the M.2
> NVMe.
>
> The tester has tried this on the following while using the M.2 NVMe as the rootFS/Boot device:
> Test run #1 – 16.04.5 at kernel 4.15; Result: Failed stress-ng memory on stack stressor
> Test run #2 – 18.04.1 at kernel 4.15; Result: Failed stress-ng memory on stack stressor
> Test run #3 – 16.04.5 at kernel 4.4; Result: Passed stress-ng memory test
>
> The stress-ng command invoked at the time the soft lockups occur is
> this:
>
> 'stress-ng -k --aggressive --verify --timeout 300 --stack 0'
>
> This can be reproduced by running the memory_stress_ng test script
> from the cert suite:
>
> sudo /usr/lib/plainbox-provider-certification-
> server/bin/memory_stress_ng
>
> It may be more easily reproducible running the stack stressor alone,
> or the whole memory stress script without dealing with Checkbox.
>
> UPDATE: The tester also confirms that the 2.5" NVMe drives also fail
> with the 4.15 kernel and pass with the 4.4 kernel. The SSD works on
> all kernels.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1798127/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: distribution=ubuntu; sourcepackage=linux; component=main; status=Triaged; importance=High; <email address hidden>;
> Launchpad-Bug: distribution=ubuntu; distroseries=xenial; sourcepackage=linux; component=main; status=Invalid; importance=Undecided; assignee=None;
> Launchpad-Bug: distribution=ubuntu; distroseries=bionic; sourcepackage=linux; component=main; status=Triaged; importance=High; <email address hidden>;
> Launchpad-Bug: distribution=ubuntu; distroseries=cosmic; sourcepackage=linux; component=main; status=Triaged; importance=High; <email address hidden>;
> Launchpad-Bug-Tags: blocks-hwcert-server kernel-da-key kernel-fixed-upstream
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: acduroy bladernr jsalisbury ubuntu-kernel-bot
> Launchpad-Bug-Reporter: Jeff Lane (bladernr)
> Launchpad-Bug-Modifier: Joseph Salisbury (jsalisbury)
> Launchpad-Message-Rationale: Subscriber
> Launchpad-Message-For: bladernr

--
Jeff Lane
Technical Partnership and Server Certification Programmes

"Entropy isn't what it used to be."