Ubuntu
linux package

Activity log for bug #1849682

Date	Who	What changed	Old value	New value	Message
2019-10-24 14:28:54	dann frazier	bug			added bug
2019-10-24 14:28:54	dann frazier	attachment added		upgrade log https://bugs.launchpad.net/bugs/1849682/+attachment/5299751/+files/4.15.0-67-generic
2019-10-24 14:29:47	dann frazier	nominated for series		Ubuntu Bionic
2019-10-24 14:29:47	dann frazier	bug task added		linux (Ubuntu Bionic)
2019-10-24 14:29:56	dann frazier	nominated for series		Ubuntu Focal
2019-10-24 14:29:56	dann frazier	bug task added		linux (Ubuntu Focal)
2019-10-24 14:29:56	dann frazier	nominated for series		Ubuntu Disco
2019-10-24 14:29:56	dann frazier	bug task added		linux (Ubuntu Disco)
2019-10-24 14:29:56	dann frazier	nominated for series		Ubuntu Eoan
2019-10-24 14:29:56	dann frazier	bug task added		linux (Ubuntu Eoan)
2019-10-24 14:30:03	dann frazier	linux (Ubuntu Focal): status	Confirmed	New
2019-10-24 14:30:07	dann frazier	linux (Ubuntu Focal): importance	Critical	Undecided
2019-10-24 14:30:08	dann frazier	linux (Ubuntu Bionic): importance	Undecided	Critical
2019-10-24 14:30:11	dann frazier	linux (Ubuntu Bionic): status	New	Confirmed
2019-10-24 14:30:13	dann frazier	linux (Ubuntu Bionic): assignee		dann frazier (dannf)
2019-10-24 14:30:15	dann frazier	linux (Ubuntu Focal): assignee	dann frazier (dannf)
2019-10-24 15:00:26	Ubuntu Kernel Bot	linux (Ubuntu): status	New	Incomplete
2019-10-24 15:00:29	Ubuntu Kernel Bot	linux (Ubuntu Disco): status	New	Incomplete
2019-10-24 15:00:30	Ubuntu Kernel Bot	linux (Ubuntu Eoan): status	New	Incomplete
2019-10-24 21:06:29	dann frazier	description	[Impact] After installing the 4.15.0-67.76 kernel from bionic-proposed, our Nvidia DGX2 system is no longer bootable. [Test Case] [Fix] [Regression Risk]	Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[] - Data has been written to the array while running kernels < 3.14 and >= 3.14. Upstream is dealing with this by adding a versioned layout in v5.4, and backporting that via stable. Version 1 is the pre-3.14 layout, Version 2 is post 3.14. However, unless a layout-version-aware kernel created* the array, there's no way for the kernel to know which version was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. These changes are now coming into our kernels via stable backports of the following commit, which describes the problem in the commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be raid0.default_layout not raid.default_layout as the message says - a fix for that message is now queued for stable. https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere.
2019-10-24 21:18:30	dann frazier	description	Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[] - Data has been written to the array while running kernels < 3.14 and >= 3.14. Upstream is dealing with this by adding a versioned layout in v5.4, and backporting that via stable. Version 1 is the pre-3.14 layout, Version 2 is post 3.14. However, unless a layout-version-aware kernel created* the array, there's no way for the kernel to know which version was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. These changes are now coming into our kernels via stable backports of the following commit, which describes the problem in the commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be raid0.default_layout not raid.default_layout as the message says - a fix for that message is now queued for stable. https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere.	Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[] - Data has been written to the array while running kernels < 3.14 and >= 3.14. Upstream is dealing with this by adding a versioned layout in v5.4, and backporting that via stable. Version 1 is the pre-3.14 layout, Version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. These changes are now coming into our kernels via stable backports of the following commit, which describes the problem in the commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be raid0.default_layout not raid.default_layout as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem before the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere.
2019-10-24 21:18:54	dann frazier	description	Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[] - Data has been written to the array while running kernels < 3.14 and >= 3.14. Upstream is dealing with this by adding a versioned layout in v5.4, and backporting that via stable. Version 1 is the pre-3.14 layout, Version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. These changes are now coming into our kernels via stable backports of the following commit, which describes the problem in the commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be raid0.default_layout not raid.default_layout as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem before the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere.	Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[] - Data has been written to the array while running kernels < 3.14 and* >= 3.14. Upstream is dealing with this by adding a versioned layout in v5.4, and backporting that via stable. Version 1 is the pre-3.14 layout, Version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel created the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. These changes are now coming into our kernels via stable backports of the following commit, which describes the problem in the commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be raid0.default_layout not raid.default_layout as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem before the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere.
2019-10-24 21:20:09	dann frazier	description	Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[] - Data has been written to the array while running kernels < 3.14 and* >= 3.14. Upstream is dealing with this by adding a versioned layout in v5.4, and backporting that via stable. Version 1 is the pre-3.14 layout, Version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel created the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. These changes are now coming into our kernels via stable backports of the following commit, which describes the problem in the commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be raid0.default_layout not raid.default_layout as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem before the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere.	Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[] - Data has been written to the array while running kernels < 3.14 and* >= 3.14. Upstream is dealing with this by adding a versioned layout in v5.4, and backporting that via stable. Version 1 is the pre-3.14 layout, Version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel created the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. These changes are now coming into our kernels via stable backports of the following commit, which describes the problem in the commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be raid0.default_layout not raid.default_layout as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem before the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere.
2019-10-24 21:27:01	dann frazier	description	Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[] - Data has been written to the array while running kernels < 3.14 and* >= 3.14. Upstream is dealing with this by adding a versioned layout in v5.4, and backporting that via stable. Version 1 is the pre-3.14 layout, Version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel created the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. These changes are now coming into our kernels via stable backports of the following commit, which describes the problem in the commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be raid0.default_layout not raid.default_layout as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem before the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere.	Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[] - Data has been written to the array while running kernels < 3.14 and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel created the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be raid0.default_layout not raid.default_layout as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem before the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere.
2019-10-27 06:41:36	Khaled El Mously	linux (Ubuntu Bionic): status	Confirmed	Fix Committed
2019-10-29 13:37:22	Ubuntu Kernel Bot	tags		verification-needed-bionic
2019-10-29 19:20:51	dann frazier	description	Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[] - Data has been written to the array while running kernels < 3.14 and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel created the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be raid0.default_layout not raid.default_layout as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem before the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere.	This bug tracks the temporary revert of the upstream fix for a corruption issue. Bug 1850540 tracks the re-application of that fix once we have a full solution. Users of RAID0 arrays are susceptible to a corruption issue if: - The members of the RAID array are not all the same size[] - Data has been written to the array while running kernels < 3.14 and* >= 3.14. This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message: https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9 To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel created the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention. The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg: Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages: [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2 [ 72.733979] md: pers->run() failed ... mdadm: failed to start array /dev/md0: Unknown error 524 What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be raid0.default_layout not raid.default_layout as the message says - a fix for that message is now queued for stable: https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571) IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem before the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot. Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel. References from users of other distros: https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/ https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/ [*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere.
2019-10-30 18:14:38	Ubuntu Kernel Bot	tags	verification-needed-bionic	verification-needed-bionic verification-needed-disco
2019-10-31 08:42:43	Andrew Cloke	bug			added subscriber Andrew Cloke
2019-11-06 01:49:21	Khaled El Mously	tags	verification-needed-bionic verification-needed-disco	verification-done-bionic verification-needed-disco
2019-11-08 17:48:37	dann frazier	tags	verification-done-bionic verification-needed-disco	verification-done-bionic verification-failed-disco
2019-11-08 21:58:09	Khaled El Mously	linux (Ubuntu Disco): status	Incomplete	Fix Committed
2019-11-11 16:07:59	Guilherme G. Piccoli	bug			added subscriber Guilherme G. Piccoli
2019-11-12 22:18:04	Launchpad Janitor	linux (Ubuntu Eoan): status	Incomplete	Fix Released
2019-11-12 22:18:04	Launchpad Janitor	cve linked		2018-12207
2019-11-12 22:18:04	Launchpad Janitor	cve linked		2019-0154
2019-11-12 22:18:04	Launchpad Janitor	cve linked		2019-0155
2019-11-12 22:18:04	Launchpad Janitor	cve linked		2019-11135
2019-11-12 22:18:04	Launchpad Janitor	cve linked		2019-15793
2019-11-12 22:18:04	Launchpad Janitor	cve linked		2019-17666
2019-11-12 22:21:40	Launchpad Janitor	linux (Ubuntu Disco): status	Fix Committed	Fix Released
2019-11-12 22:21:40	Launchpad Janitor	cve linked		2019-15098
2019-11-12 22:21:40	Launchpad Janitor	cve linked		2019-17052
2019-11-12 22:21:40	Launchpad Janitor	cve linked		2019-17053
2019-11-12 22:21:40	Launchpad Janitor	cve linked		2019-17054
2019-11-12 22:21:40	Launchpad Janitor	cve linked		2019-17055
2019-11-12 22:21:40	Launchpad Janitor	cve linked		2019-17056
2019-11-12 22:24:59	Launchpad Janitor	linux (Ubuntu Bionic): status	Fix Committed	Fix Released
2019-11-13 18:57:47	dann frazier	bug task added		ubuntu-release-notes
2019-11-13 18:58:49	dann frazier	bug watch added		https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=944676
2019-11-13 18:58:49	dann frazier	bug task added		mdadm (Debian)
2019-11-13 18:59:10	dann frazier	bug task deleted	mdadm (Debian)
2019-11-13 20:25:10	dann frazier	bug			added subscriber Sean Feole
2019-12-06 15:57:44	Launchpad Janitor	linux (Ubuntu Focal): status	Incomplete	Fix Released
2019-12-06 15:57:44	Launchpad Janitor	cve linked		2019-15794
2019-12-06 22:32:04	dann frazier	bug task deleted	ubuntu-release-notes

Ubuntulinux package

Activity log for bug #1849682

Ubuntu
linux package