Activity log for bug #1946149

Date Who What changed Old value New value Message
2021-10-05 19:44:25 Ian May bug added bug
2021-10-05 19:45:26 Ian May affects ubuntu linux-aws (Ubuntu)
2021-10-05 20:42:16 Ian May description When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html But after successfully updating to the 4.15 without those patches applied, I can then upgrade to a test kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot.
2021-10-05 23:08:24 Matthew Ruffell bug added subscriber Mauricio Faria de Oliveira
2021-10-06 03:40:36 Ian May description When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html But after successfully updating to the 4.15 without those patches applied, I can then upgrade to a test kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot. When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html But after successfully updating to the 4.15 without those patches applied, I can then upgrade to a test kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot. I have been unable to capture a stack trace using 'aws get-console-output'. I enabled kdump and was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme.
2021-10-06 03:42:46 Ian May description When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html But after successfully updating to the 4.15 without those patches applied, I can then upgrade to a test kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot. I have been unable to capture a stack trace using 'aws get-console-output'. I enabled kdump and was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html But after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot. I have been unable to capture a stack trace using 'aws get-console-output'. I enabled kdump and was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme.
2021-10-06 03:43:57 Ian May description When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html But after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot. I have been unable to capture a stack trace using 'aws get-console-output'. I enabled kdump and was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html But after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console-output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme.
2021-10-06 04:56:27 Ian May description When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html But after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console-output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console-output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme.
2021-10-06 13:56:18 Ian May description When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console-output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' from mount options and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console-output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme.
2021-10-07 17:59:35 Marcelo Cerri bug added subscriber Marcelo Cerri
2021-10-13 18:20:55 Mark Thomas bug added subscriber Mark Thomas
2021-10-13 21:27:32 Mauricio Faria de Oliveira attachment added serial-console-output.txt https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+attachment/5532619/+files/serial-console-output.txt
2021-10-13 23:44:28 Pedro Principeza bug added subscriber Pedro Principeza
2021-10-15 10:51:03 Kleber Sacilotto de Souza description When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' from mount options and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console-output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. [ Impact ] The bionic 4.15 kernels are failing to boot on r5.metal instances on AWS . The default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) or bionic/linux (4.15.0-160.168) the machine fails to boot the 4.15 kernel. This problem only appears on metal instances, which uses NVME instead of XVDA devices. [ Fix ] It was discovered that after reverting the following two commits from upstream stable the 4.15 kernels can be booted again on the affected AWS metal instance: PCI/MSI: Enforce that MSI-X table entry is masked for update PCI/MSI: Enforce MSI[X] entry updates to be visible [ Test Case ] Deploy a r5.metal instance on AWS with a bionic image, which should boot initially with bionic/linux-aws-5.4. Install bionic/linux or bionic/linux-aws (4.15 based) and reboot the system. [ Where problems could occur ] These two commits are part of a larger patchset fixing PCI/MSI issues which were backported to some upstream stable releases. By reverting only part of the set we might end up with MSI issues that were not present before the whole set was applied. Regression potential can be minimized by testing the kernels with these two reverted patches on all the platforms available. [ Original Description ] When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' from mount options and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console-output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme.
2021-10-15 10:53:13 Kleber Sacilotto de Souza description [ Impact ] The bionic 4.15 kernels are failing to boot on r5.metal instances on AWS . The default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) or bionic/linux (4.15.0-160.168) the machine fails to boot the 4.15 kernel. This problem only appears on metal instances, which uses NVME instead of XVDA devices. [ Fix ] It was discovered that after reverting the following two commits from upstream stable the 4.15 kernels can be booted again on the affected AWS metal instance: PCI/MSI: Enforce that MSI-X table entry is masked for update PCI/MSI: Enforce MSI[X] entry updates to be visible [ Test Case ] Deploy a r5.metal instance on AWS with a bionic image, which should boot initially with bionic/linux-aws-5.4. Install bionic/linux or bionic/linux-aws (4.15 based) and reboot the system. [ Where problems could occur ] These two commits are part of a larger patchset fixing PCI/MSI issues which were backported to some upstream stable releases. By reverting only part of the set we might end up with MSI issues that were not present before the whole set was applied. Regression potential can be minimized by testing the kernels with these two reverted patches on all the platforms available. [ Original Description ] When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' from mount options and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console-output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme. [ Impact ] The bionic 4.15 kernels are failing to boot on r5.metal instances on AWS. The default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) or bionic/linux (4.15.0-160.168) the machine fails to boot the 4.15 kernel. This problem only appears on metal instances, which uses NVME instead of XVDA devices. [ Fix ] It was discovered that after reverting the following two commits from upstream stable the 4.15 kernels can be booted again on the affected AWS metal instance: PCI/MSI: Enforce that MSI-X table entry is masked for update PCI/MSI: Enforce MSI[X] entry updates to be visible [ Test Case ] Deploy a r5.metal instance on AWS with a bionic image, which should boot initially with bionic/linux-aws-5.4. Install bionic/linux or bionic/linux-aws (4.15 based) and reboot the system. [ Where problems could occur ] These two commits are part of a larger patchset fixing PCI/MSI issues which were backported to some upstream stable releases. By reverting only part of the set we might end up with MSI issues that were not present before the whole set was applied. Regression potential can be minimized by testing the kernels with these two reverted patches on all the platforms available. [ Original Description ] When creating an r5.metal instance on AWS, the default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel. If I remove these patches the instance correctly boots the 4.15 kernel https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html With that being said, after successfully updating to the 4.15 without those patches applied, I can then upgrade to a 4.15 kernel with the above patches included, and the instance will boot properly. This problem only appears on metal instances, which uses NVME instead of XVDA devices. AWS instances also use the 'discard' mount option with ext4, thought maybe there could be a race condition between ext4 discard and journal flush. Removed 'discard' from mount options and rebooted 5.4 kernel prior to 4.15 kernel installation, but still wouldn't boot after installing the 4.15 kernel. I have been unable to capture a stack trace using 'aws get-console-output'. After enabling kdump I was unable to replicate the failure. So there must be some sort of race with either ext4 and/or nvme.
2021-10-15 13:02:48 Stefan Bader nominated for series Ubuntu Bionic
2021-10-15 13:02:48 Stefan Bader bug task added linux-aws (Ubuntu Bionic)
2021-10-15 13:03:02 Stefan Bader linux-aws (Ubuntu Bionic): importance Undecided High
2021-10-15 13:03:02 Stefan Bader linux-aws (Ubuntu Bionic): status New In Progress
2021-10-15 13:03:33 Stefan Bader affects linux-aws (Ubuntu) linux (Ubuntu)
2021-10-15 13:03:45 Stefan Bader linux (Ubuntu): status New Invalid
2021-10-15 14:11:04 Stefan Bader linux (Ubuntu Bionic): status In Progress Fix Committed
2021-10-19 16:19:27 Launchpad Janitor linux (Ubuntu Bionic): status Fix Committed Fix Released
2021-10-19 16:19:27 Launchpad Janitor cve linked 2021-40490
2021-10-27 00:25:19 Ubuntu Kernel Bot tags verification-needed-bionic
2021-11-02 08:23:38 Krzysztof Kozlowski tags verification-needed-bionic verification-done-bionic