2019-10-29 19:17:10 |
dann frazier |
bug |
|
|
added bug |
2019-10-29 19:17:24 |
dann frazier |
nominated for series |
|
Ubuntu Trusty |
|
2019-10-29 19:17:24 |
dann frazier |
bug task added |
|
linux (Ubuntu Trusty) |
|
2019-10-29 19:17:24 |
dann frazier |
nominated for series |
|
Ubuntu Xenial |
|
2019-10-29 19:17:24 |
dann frazier |
bug task added |
|
linux (Ubuntu Xenial) |
|
2019-10-29 19:17:24 |
dann frazier |
nominated for series |
|
Ubuntu Eoan |
|
2019-10-29 19:17:24 |
dann frazier |
bug task added |
|
linux (Ubuntu Eoan) |
|
2019-10-29 19:17:24 |
dann frazier |
nominated for series |
|
Ubuntu Disco |
|
2019-10-29 19:17:24 |
dann frazier |
bug task added |
|
linux (Ubuntu Disco) |
|
2019-10-29 19:17:24 |
dann frazier |
nominated for series |
|
Ubuntu Bionic |
|
2019-10-29 19:17:24 |
dann frazier |
bug task added |
|
linux (Ubuntu Bionic) |
|
2019-10-29 19:17:24 |
dann frazier |
nominated for series |
|
Ubuntu Focal |
|
2019-10-29 19:17:24 |
dann frazier |
bug task added |
|
linux (Ubuntu Focal) |
|
2019-10-29 19:17:24 |
dann frazier |
nominated for series |
|
Ubuntu Precise |
|
2019-10-29 19:17:24 |
dann frazier |
bug task added |
|
linux (Ubuntu Precise) |
|
2019-10-29 19:18:33 |
dann frazier |
description |
Users of RAID0 arrays are susceptible to a corruption issue if:
- The members of the RAID array are not all the same size[*]
- Data has been written to the array while running kernels < 3.14 *and* >= 3.14.
This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message:
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
That change has been applied to stable, but we reverted it to fix 1849682 until we have a full solution ready.
To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention.
The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg:
Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages:
[ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting
[ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
[ 72.733979] md: pers->run() failed ...
mdadm: failed to start array /dev/md0: Unknown error 524
What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable:
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)
IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot.
Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel.
References from users of other distros:
https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/
[*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. |
Bug 1849682 tracks the temporarily revert of the fix for this issue, while this bug tracks the re-application of that fix once we have a full solution.
Users of RAID0 arrays are susceptible to a corruption issue if:
- The members of the RAID array are not all the same size[*]
- Data has been written to the array while running kernels < 3.14 *and* >= 3.14.
This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message:
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
That change has been applied to stable, but we reverted it to fix 1849682 until we have a full solution ready.
To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention.
The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg:
Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages:
[ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting
[ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
[ 72.733979] md: pers->run() failed ...
mdadm: failed to start array /dev/md0: Unknown error 524
What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable:
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)
IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot.
Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel.
References from users of other distros:
https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/
[*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. |
|
2019-10-29 19:18:44 |
dann frazier |
bug |
|
|
added subscriber Stefan Bader |
2019-10-29 19:18:53 |
dann frazier |
bug |
|
|
added subscriber Brad Figg |
2019-10-29 19:18:58 |
dann frazier |
bug |
|
|
added subscriber Andy Whitcroft |
2019-10-29 19:19:05 |
dann frazier |
bug |
|
|
added subscriber Terry Rudd |
2019-10-29 19:19:30 |
dann frazier |
bug task added |
|
mdadm (Ubuntu) |
|
2019-10-29 20:30:10 |
Ubuntu Kernel Bot |
linux (Ubuntu): status |
New |
Incomplete |
|
2019-10-29 20:30:13 |
Ubuntu Kernel Bot |
linux (Ubuntu Bionic): status |
New |
Incomplete |
|
2019-10-29 20:30:15 |
Ubuntu Kernel Bot |
linux (Ubuntu Disco): status |
New |
Incomplete |
|
2019-10-29 20:30:17 |
Ubuntu Kernel Bot |
linux (Ubuntu Eoan): status |
New |
Incomplete |
|
2019-10-29 20:30:20 |
Ubuntu Kernel Bot |
linux (Ubuntu Precise): status |
New |
Incomplete |
|
2019-10-29 20:30:22 |
Ubuntu Kernel Bot |
linux (Ubuntu Trusty): status |
New |
Incomplete |
|
2019-10-29 20:30:24 |
Ubuntu Kernel Bot |
linux (Ubuntu Xenial): status |
New |
Incomplete |
|
2019-10-30 15:50:42 |
dann frazier |
description |
Bug 1849682 tracks the temporarily revert of the fix for this issue, while this bug tracks the re-application of that fix once we have a full solution.
Users of RAID0 arrays are susceptible to a corruption issue if:
- The members of the RAID array are not all the same size[*]
- Data has been written to the array while running kernels < 3.14 *and* >= 3.14.
This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message:
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
That change has been applied to stable, but we reverted it to fix 1849682 until we have a full solution ready.
To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, unless a layout-version-aware kernel *created* the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention.
The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg:
Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages:
[ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting
[ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
[ 72.733979] md: pers->run() failed ...
mdadm: failed to start array /dev/md0: Unknown error 524
What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable:
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)
IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot.
Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel.
References from users of other distros:
https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/
[*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. |
Bug 1849682 tracks the temporarily revert of the fix for this issue, while this bug tracks the re-application of that fix once we have a full solution.
Users of RAID0 arrays are susceptible to a corruption issue if:
- The members of the RAID array are not all the same size[*]
- Data has been written to the array while running kernels < 3.14 *and* >= 3.14.
This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message:
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
That change has been applied to stable, but we reverted it to fix 1849682 until we have a full solution ready.
To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, until an mdadm exists that is able to set a layout in the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention.
The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg:
Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages:
[ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting
[ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
[ 72.733979] md: pers->run() failed ...
mdadm: failed to start array /dev/md0: Unknown error 524
What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable:
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)
IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot.
Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel.
References from users of other distros:
https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/
[*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. |
|
2019-10-30 16:09:40 |
dann frazier |
description |
Bug 1849682 tracks the temporarily revert of the fix for this issue, while this bug tracks the re-application of that fix once we have a full solution.
Users of RAID0 arrays are susceptible to a corruption issue if:
- The members of the RAID array are not all the same size[*]
- Data has been written to the array while running kernels < 3.14 *and* >= 3.14.
This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message:
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
That change has been applied to stable, but we reverted it to fix 1849682 until we have a full solution ready.
To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, until an mdadm exists that is able to set a layout in the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention.
The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg:
Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages:
[ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting
[ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
[ 72.733979] md: pers->run() failed ...
mdadm: failed to start array /dev/md0: Unknown error 524
What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable:
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)
IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot.
Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel.
References from users of other distros:
https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/
[*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. |
Bug 1849682 tracks the temporarily revert of the fix for this issue, while this bug tracks the re-application of that fix once we have a full solution.
Fix checklist:
[ ] Restore c84a1372df929 md/raid0: avoid RAID0 data corruption due to layout confusion.
[ ] Also apply these fixes:
33f2c35a54dfd md: add feature flag MD_FEATURE_RAID0_LAYOUT
3874d73e06c9b md/raid0: fix warning message for parameter default_layout
[ ] If upstream, include https://marc.info/?l=linux-raid&m=157239231220119&w=2
[ ] mdadm update (see Comment #2)
[ ] Packaging work to detect/aide admin before reboot
Users of RAID0 arrays are susceptible to a corruption issue if:
- The members of the RAID array are not all the same size[*]
- Data has been written to the array while running kernels < 3.14 *and* >= 3.14.
This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message:
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
That change has been applied to stable, but we reverted it to fix 1849682 until we have a full solution ready.
To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, until an mdadm exists that is able to set a layout in the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention.
The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg:
Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages:
[ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting
[ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
[ 72.733979] md: pers->run() failed ...
mdadm: failed to start array /dev/md0: Unknown error 524
What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable:
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)
IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot.
Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel.
References from users of other distros:
https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/
[*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. |
|
2019-10-31 08:42:48 |
Andrew Cloke |
bug |
|
|
added subscriber Andrew Cloke |
2019-10-31 19:44:33 |
Guilherme G. Piccoli |
bug |
|
|
added subscriber Guilherme G. Piccoli |
2019-11-01 18:00:48 |
dann frazier |
mdadm (Ubuntu Focal): status |
New |
Confirmed |
|
2019-11-01 18:01:02 |
dann frazier |
mdadm (Ubuntu Eoan): status |
New |
Confirmed |
|
2019-11-01 18:01:14 |
dann frazier |
mdadm (Ubuntu Disco): status |
New |
Confirmed |
|
2019-11-01 18:01:41 |
dann frazier |
mdadm (Ubuntu Bionic): status |
New |
Confirmed |
|
2019-11-01 18:01:53 |
dann frazier |
mdadm (Ubuntu Xenial): status |
New |
Confirmed |
|
2019-11-01 18:02:04 |
dann frazier |
mdadm (Ubuntu Trusty): status |
New |
Confirmed |
|
2019-11-01 18:02:27 |
dann frazier |
linux (Ubuntu Focal): status |
Incomplete |
Confirmed |
|
2019-11-01 18:02:54 |
dann frazier |
linux (Ubuntu Eoan): status |
Incomplete |
Confirmed |
|
2019-11-01 18:02:58 |
dann frazier |
linux (Ubuntu Disco): status |
Incomplete |
Confirmed |
|
2019-11-01 18:03:02 |
dann frazier |
linux (Ubuntu Bionic): status |
Incomplete |
Confirmed |
|
2019-11-01 18:03:05 |
dann frazier |
linux (Ubuntu Xenial): status |
Incomplete |
Confirmed |
|
2019-11-01 18:03:11 |
dann frazier |
linux (Ubuntu Trusty): status |
Incomplete |
Confirmed |
|
2019-11-01 18:03:15 |
dann frazier |
linux (Ubuntu Precise): status |
Incomplete |
New |
|
2019-11-13 18:59:31 |
dann frazier |
bug watch added |
|
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=944676 |
|
2019-11-13 18:59:31 |
dann frazier |
bug task added |
|
mdadm (Debian) |
|
2019-11-13 19:01:47 |
dann frazier |
bug task added |
|
ubuntu-release-notes |
|
2019-11-13 23:47:48 |
Bug Watch Updater |
mdadm (Debian): status |
Unknown |
New |
|
2019-11-21 16:39:57 |
Newton Liu |
bug |
|
|
added subscriber Newton Liu |
2019-12-03 17:26:36 |
Bug Watch Updater |
mdadm (Debian): status |
New |
Fix Released |
|
2019-12-04 18:56:15 |
Launchpad Janitor |
mdadm (Ubuntu Focal): status |
Confirmed |
Fix Released |
|
2019-12-04 21:26:11 |
dann frazier |
description |
Bug 1849682 tracks the temporarily revert of the fix for this issue, while this bug tracks the re-application of that fix once we have a full solution.
Fix checklist:
[ ] Restore c84a1372df929 md/raid0: avoid RAID0 data corruption due to layout confusion.
[ ] Also apply these fixes:
33f2c35a54dfd md: add feature flag MD_FEATURE_RAID0_LAYOUT
3874d73e06c9b md/raid0: fix warning message for parameter default_layout
[ ] If upstream, include https://marc.info/?l=linux-raid&m=157239231220119&w=2
[ ] mdadm update (see Comment #2)
[ ] Packaging work to detect/aide admin before reboot
Users of RAID0 arrays are susceptible to a corruption issue if:
- The members of the RAID array are not all the same size[*]
- Data has been written to the array while running kernels < 3.14 *and* >= 3.14.
This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message:
https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
That change has been applied to stable, but we reverted it to fix 1849682 until we have a full solution ready.
To summarize, upstream is dealing with this by adding a versioned layout in v5.4, and that is being backported to stable kernels - which is why we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2 is post 3.14. Mixing version 1 & version 2 layouts can cause corruption. However, until an mdadm exists that is able to set a layout in the array, there's no way for the kernel to know which version(s) was used to write the existing data. This undefined mode is considered "Version 0", and the kernel will now refuse to start these arrays w/o user intervention.
The user experience is pretty awful here. A user upgrades to the next SRU and all of a sudden their system stops at an (initramfs) prompt. A clueful user can spot something like the following in dmesg:
Here's the message which , as you can see from the log in Comment #1, is hidden in a ton of other messages:
[ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting
[ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
[ 72.733979] md: pers->run() failed ...
mdadm: failed to start array /dev/md0: Unknown error 524
What that is trying to say is that you should determine if your data - specifically the data toward the end of your array - was most likely written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on the kernel command line. And note it should be *raid0.default_layout* not *raid.default_layout* as the message says - a fix for that message is now queued for stable:
https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)
IMHO, we should work with upstream to create a web page that clearly walks the user through this process, and update the error message to point to that page. I'd also like to see if we can detect this problem *before* the user reboots (debconf?) and help the user fix things. e.g. "We detected that you have RAID0 arrays that maybe susceptible to a corruption problem", guide the user to choosing a layout, and update the mdadm initramfs hook to poke the answer in via sysfs before starting the array on reboot.
Note that it also seems like we should investigate backporting this to < 3.14 kernels. Imagine a user switching between the trusty HWE kernel and the GA kernel.
References from users of other distros:
https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/
[*] Which surprisingly is not the case reported in this bug - the user here had a raid0 of 8 identically-sized devices. I suspect there's a bug in the detection code somewhere. |
Bug 1849682 tracks the temporarily revert of the fix for this issue, while this bug tracks the re-application of that fix once we have a full solution.
[Impact]
(cut & paste from https://marc.info/?l=linux-raid&m=157360088014027&w=2)
An unintentional RAID0 layout change was introduced in the v3.14 kernel. This effectively means there are 2 different layouts Linux will use to write data to RAID0 arrays in the wild - the “pre-3.14” way and the “3.14 and later” way. Mixing these layouts by writing to an array while booted on these different kernel versions can lead to corruption.
Note that this only impacts RAID0 arrays that include devices of different sizes. If your devices are all the same size, both layouts are equivalent, and your array is not at risk of corruption due to this issue.
Unfortunately, the kernel cannot detect which layout was used for writes to pre-existing arrays, and therefore requires input from the administrator. This input can be provided via the kernel command line with the raid0.default_layout=<N> parameter, or by setting the default_layout module parameter when loading the raid0 module. With a new enough version of mdadm (>= 4.2, or equivalent distro backports), you can set the layout version when assembling a stopped array. For example:
mdadm --stop /dev/md0
mdadm --assemble -U layout-alternate /dev/md0 /dev/sda1 /dev/sda2
See the mdadm manpage for more details. Once set in this manner, the layout will be recorded in the array and will not need to be explicitly specified in the future.
(The mdadm part of this SRU is for the above support ^)
[Test Case]
= mdadm =
Confirm that a multi-zone raid0 created w/ older mdadm is able to be started on a fixed kernel by setting a layout.
1) Ex: w/ old kernel/mdadm:
mdadm --create /dev/md0 --run --metadata=default \
--level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1
2) Reboot onto fixed kernel & update mdadm
3) sudo mdadm --assemble -U layout-alternate \
/dev/md0 /dev/vdb1 /dev/vdc1
4) Confirm that the array autostarts on reboot
5) Confirm that w/ new kernel & new mdadm, a user can create and start an array in a backwards-compatible fashion (i.e. w/o an explicit layout).
6) Verify that 'mdadm --detail /dev/md0' displays the layout
= linux =
Similar to above, but using kernel command line options.
[Regression Risk]
The kernel side of things will break starting pre-existing arrays. That's intentional.
Although I've done due-diligence to check for backwards compatibility issues, the mdadm side may still present some. |
|
2019-12-04 21:26:25 |
dann frazier |
mdadm (Ubuntu Eoan): status |
Confirmed |
In Progress |
|
2019-12-04 21:26:25 |
dann frazier |
mdadm (Ubuntu Eoan): assignee |
|
dann frazier (dannf) |
|
2019-12-04 21:26:44 |
dann frazier |
mdadm (Ubuntu Disco): status |
Confirmed |
In Progress |
|
2019-12-04 21:26:44 |
dann frazier |
mdadm (Ubuntu Disco): assignee |
|
dann frazier (dannf) |
|
2019-12-04 21:27:02 |
dann frazier |
mdadm (Ubuntu Bionic): status |
Confirmed |
In Progress |
|
2019-12-04 21:27:02 |
dann frazier |
mdadm (Ubuntu Bionic): assignee |
|
dann frazier (dannf) |
|
2019-12-06 20:00:11 |
Brian Murray |
mdadm (Ubuntu Eoan): status |
In Progress |
Fix Committed |
|
2019-12-06 20:00:16 |
Brian Murray |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2019-12-06 20:00:19 |
Brian Murray |
bug |
|
|
added subscriber SRU Verification |
2019-12-06 20:00:26 |
Brian Murray |
tags |
|
verification-needed verification-needed-eoan |
|
2019-12-06 20:05:30 |
Brian Murray |
mdadm (Ubuntu Disco): status |
In Progress |
Fix Committed |
|
2019-12-06 20:05:41 |
Brian Murray |
tags |
verification-needed verification-needed-eoan |
verification-needed verification-needed-disco verification-needed-eoan |
|
2019-12-06 20:06:37 |
Brian Murray |
mdadm (Ubuntu Bionic): status |
In Progress |
Fix Committed |
|
2019-12-06 20:06:49 |
Brian Murray |
tags |
verification-needed verification-needed-disco verification-needed-eoan |
verification-needed verification-needed-bionic verification-needed-disco verification-needed-eoan |
|
2019-12-06 21:50:02 |
dann frazier |
description |
Bug 1849682 tracks the temporarily revert of the fix for this issue, while this bug tracks the re-application of that fix once we have a full solution.
[Impact]
(cut & paste from https://marc.info/?l=linux-raid&m=157360088014027&w=2)
An unintentional RAID0 layout change was introduced in the v3.14 kernel. This effectively means there are 2 different layouts Linux will use to write data to RAID0 arrays in the wild - the “pre-3.14” way and the “3.14 and later” way. Mixing these layouts by writing to an array while booted on these different kernel versions can lead to corruption.
Note that this only impacts RAID0 arrays that include devices of different sizes. If your devices are all the same size, both layouts are equivalent, and your array is not at risk of corruption due to this issue.
Unfortunately, the kernel cannot detect which layout was used for writes to pre-existing arrays, and therefore requires input from the administrator. This input can be provided via the kernel command line with the raid0.default_layout=<N> parameter, or by setting the default_layout module parameter when loading the raid0 module. With a new enough version of mdadm (>= 4.2, or equivalent distro backports), you can set the layout version when assembling a stopped array. For example:
mdadm --stop /dev/md0
mdadm --assemble -U layout-alternate /dev/md0 /dev/sda1 /dev/sda2
See the mdadm manpage for more details. Once set in this manner, the layout will be recorded in the array and will not need to be explicitly specified in the future.
(The mdadm part of this SRU is for the above support ^)
[Test Case]
= mdadm =
Confirm that a multi-zone raid0 created w/ older mdadm is able to be started on a fixed kernel by setting a layout.
1) Ex: w/ old kernel/mdadm:
mdadm --create /dev/md0 --run --metadata=default \
--level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1
2) Reboot onto fixed kernel & update mdadm
3) sudo mdadm --assemble -U layout-alternate \
/dev/md0 /dev/vdb1 /dev/vdc1
4) Confirm that the array autostarts on reboot
5) Confirm that w/ new kernel & new mdadm, a user can create and start an array in a backwards-compatible fashion (i.e. w/o an explicit layout).
6) Verify that 'mdadm --detail /dev/md0' displays the layout
= linux =
Similar to above, but using kernel command line options.
[Regression Risk]
The kernel side of things will break starting pre-existing arrays. That's intentional.
Although I've done due-diligence to check for backwards compatibility issues, the mdadm side may still present some. |
Bug 1849682 tracks the temporarily revert of the fix for this issue, while this bug tracks the re-application of that fix once we have a full solution.
[Impact]
(cut & paste from https://marc.info/?l=linux-raid&m=157360088014027&w=2)
An unintentional RAID0 layout change was introduced in the v3.14 kernel. This effectively means there are 2 different layouts Linux will use to write data to RAID0 arrays in the wild - the “pre-3.14” way and the “3.14 and later” way. Mixing these layouts by writing to an array while booted on these different kernel versions can lead to corruption.
Note that this only impacts RAID0 arrays that include devices of different sizes. If your devices are all the same size, both layouts are equivalent, and your array is not at risk of corruption due to this issue.
Unfortunately, the kernel cannot detect which layout was used for writes to pre-existing arrays, and therefore requires input from the administrator. This input can be provided via the kernel command line with the raid0.default_layout=<N> parameter, or by setting the default_layout module parameter when loading the raid0 module. With a new enough version of mdadm (>= 4.2, or equivalent distro backports), you can set the layout version when assembling a stopped array. For example:
mdadm --stop /dev/md0
mdadm --assemble -U layout-alternate /dev/md0 /dev/sda1 /dev/sda2
See the mdadm manpage for more details. Once set in this manner, the layout will be recorded in the array and will not need to be explicitly specified in the future.
(The mdadm part of this SRU is for the above support ^)
[Test Case]
= mdadm =
Confirm that a multi-zone raid0 created w/ older mdadm is able to be started on a fixed kernel by setting a layout.
1) Ex: w/ old kernel/mdadm:
mdadm --create /dev/md0 --run --metadata=default \
--level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1
2) Reboot onto fixed kernel & update mdadm
3) sudo mdadm --stop /dev/md0 &&
sudo mdadm --assemble -U layout-alternate \
/dev/md0 /dev/vdb1 /dev/vdc1
4) Confirm that the array autostarts on reboot
5) Confirm that w/ new kernel & new mdadm, a user can create and start an array in a backwards-compatible fashion (i.e. w/o an explicit layout).
6) Verify that 'mdadm --detail /dev/md0' displays the layout
= linux =
Similar to above, but using kernel command line options.
[Regression Risk]
The kernel side of things will break starting pre-existing arrays. That's intentional.
Although I've done due-diligence to check for backwards compatibility issues, the mdadm side may still present some. |
|
2019-12-06 21:54:25 |
dann frazier |
tags |
verification-needed verification-needed-bionic verification-needed-disco verification-needed-eoan |
verification-done-eoan verification-needed verification-needed-bionic verification-needed-disco |
|
2019-12-06 22:17:48 |
dann frazier |
tags |
verification-done-eoan verification-needed verification-needed-bionic verification-needed-disco |
verification-done-disco verification-done-eoan verification-needed verification-needed-bionic |
|
2019-12-06 22:31:44 |
dann frazier |
tags |
verification-done-disco verification-done-eoan verification-needed verification-needed-bionic |
verification-done verification-done-bionic verification-done-disco verification-done-eoan |
|
2019-12-11 03:37:50 |
Mathew Hodson |
bug |
|
|
added subscriber Mathew Hodson |
2019-12-12 13:54:51 |
dann frazier |
description |
Bug 1849682 tracks the temporarily revert of the fix for this issue, while this bug tracks the re-application of that fix once we have a full solution.
[Impact]
(cut & paste from https://marc.info/?l=linux-raid&m=157360088014027&w=2)
An unintentional RAID0 layout change was introduced in the v3.14 kernel. This effectively means there are 2 different layouts Linux will use to write data to RAID0 arrays in the wild - the “pre-3.14” way and the “3.14 and later” way. Mixing these layouts by writing to an array while booted on these different kernel versions can lead to corruption.
Note that this only impacts RAID0 arrays that include devices of different sizes. If your devices are all the same size, both layouts are equivalent, and your array is not at risk of corruption due to this issue.
Unfortunately, the kernel cannot detect which layout was used for writes to pre-existing arrays, and therefore requires input from the administrator. This input can be provided via the kernel command line with the raid0.default_layout=<N> parameter, or by setting the default_layout module parameter when loading the raid0 module. With a new enough version of mdadm (>= 4.2, or equivalent distro backports), you can set the layout version when assembling a stopped array. For example:
mdadm --stop /dev/md0
mdadm --assemble -U layout-alternate /dev/md0 /dev/sda1 /dev/sda2
See the mdadm manpage for more details. Once set in this manner, the layout will be recorded in the array and will not need to be explicitly specified in the future.
(The mdadm part of this SRU is for the above support ^)
[Test Case]
= mdadm =
Confirm that a multi-zone raid0 created w/ older mdadm is able to be started on a fixed kernel by setting a layout.
1) Ex: w/ old kernel/mdadm:
mdadm --create /dev/md0 --run --metadata=default \
--level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1
2) Reboot onto fixed kernel & update mdadm
3) sudo mdadm --stop /dev/md0 &&
sudo mdadm --assemble -U layout-alternate \
/dev/md0 /dev/vdb1 /dev/vdc1
4) Confirm that the array autostarts on reboot
5) Confirm that w/ new kernel & new mdadm, a user can create and start an array in a backwards-compatible fashion (i.e. w/o an explicit layout).
6) Verify that 'mdadm --detail /dev/md0' displays the layout
= linux =
Similar to above, but using kernel command line options.
[Regression Risk]
The kernel side of things will break starting pre-existing arrays. That's intentional.
Although I've done due-diligence to check for backwards compatibility issues, the mdadm side may still present some. |
Bug 1849682 tracks the temporarily revert of the fix for this issue, while this bug tracks the re-application of that fix once we have a full solution.
[Impact]
(cut & paste from https://marc.info/?l=linux-raid&m=157360088014027&w=2)
An unintentional RAID0 layout change was introduced in the v3.14 kernel. This effectively means there are 2 different layouts Linux will use to write data to RAID0 arrays in the wild - the “pre-3.14” way and the “3.14 and later” way. Mixing these layouts by writing to an array while booted on these different kernel versions can lead to corruption.
Note that this only impacts RAID0 arrays that include devices of different sizes. If your devices are all the same size, both layouts are equivalent, and your array is not at risk of corruption due to this issue.
Unfortunately, the kernel cannot detect which layout was used for writes to pre-existing arrays, and therefore requires input from the administrator. This input can be provided via the kernel command line with the raid0.default_layout=<N> parameter, or by setting the default_layout module parameter when loading the raid0 module. With a new enough version of mdadm (>= 4.2, or equivalent distro backports), you can set the layout version when assembling a stopped array. For example:
mdadm --stop /dev/md0
mdadm --assemble -U layout-alternate /dev/md0 /dev/sda1 /dev/sda2
See the mdadm manpage for more details. Once set in this manner, the layout will be recorded in the array and will not need to be explicitly specified in the future.
(The mdadm part of this SRU is for the above support ^)
[Test Case]
= mdadm =
Confirm that a multi-zone raid0 created w/ older mdadm is able to be started on a fixed kernel by setting a layout.
1) Ex: w/ old kernel/mdadm:
mdadm --create /dev/md0 --run --metadata=default \
--level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1
2) Reboot onto fixed kernel & update mdadm
3) sudo mdadm --stop /dev/md0 &&
sudo mdadm --assemble -U layout-alternate \
/dev/md0 /dev/vdb1 /dev/vdc1
4) Confirm that the array autostarts on reboot
5) Confirm that w/ new kernel & new mdadm, a user can create and start an array in a backwards-compatible fashion (i.e. w/o an explicit layout).
6) Verify that 'mdadm --detail /dev/md0' displays the layout
= linux =
Similar to above, but using kernel command line options.
[Regression Risk]
The kernel side of things will break starting pre-existing arrays. That's intentional.
The mdadm side will cause a regression in functionality where a user can no longer create multi-zone raid0s on kernels that do not yet have the raid0 layout patches. This is intentional, as such RAID arrays present a corruption risk. |
|
2019-12-12 13:56:43 |
dann frazier |
tags |
verification-done verification-done-bionic verification-done-disco verification-done-eoan |
verification-needed verification-needed-bionic verification-needed-disco verification-needed-eoan |
|
2019-12-12 15:22:10 |
dann frazier |
tags |
verification-needed verification-needed-bionic verification-needed-disco verification-needed-eoan |
block-proposed-bionic block-proposed-disco block-proposed-eoan verification-done verification-done-bionic verification-done-disco verification-done-eoan |
|
2019-12-13 05:01:53 |
Mathew Hodson |
removed subscriber Mathew Hodson |
|
|
|
2020-01-07 13:14:21 |
Kleber Sacilotto de Souza |
linux (Ubuntu Bionic): status |
Confirmed |
Fix Committed |
|
2020-01-07 13:15:54 |
Kleber Sacilotto de Souza |
linux (Ubuntu Disco): status |
Confirmed |
Fix Committed |
|
2020-01-07 13:17:05 |
Kleber Sacilotto de Souza |
linux (Ubuntu Eoan): status |
Confirmed |
Fix Committed |
|
2020-01-14 13:15:58 |
Kleber Sacilotto de Souza |
linux (Ubuntu Bionic): status |
Fix Committed |
In Progress |
|
2020-01-14 13:16:08 |
Kleber Sacilotto de Souza |
linux (Ubuntu Disco): status |
Fix Committed |
In Progress |
|
2020-01-14 13:16:18 |
Kleber Sacilotto de Souza |
linux (Ubuntu Eoan): status |
Fix Committed |
In Progress |
|
2020-01-23 15:38:02 |
Connor Kuehl |
linux (Ubuntu Trusty): status |
Confirmed |
Fix Committed |
|
2020-01-29 02:26:54 |
Khaled El Mously |
linux (Ubuntu Bionic): status |
In Progress |
Fix Committed |
|
2020-01-29 02:27:08 |
Khaled El Mously |
linux (Ubuntu Disco): status |
In Progress |
Fix Committed |
|
2020-01-29 02:27:18 |
Khaled El Mously |
linux (Ubuntu Eoan): status |
In Progress |
Fix Committed |
|
2020-01-29 02:27:33 |
Khaled El Mously |
linux (Ubuntu Xenial): status |
Confirmed |
Fix Committed |
|
2020-01-30 14:52:42 |
Ubuntu Kernel Bot |
tags |
block-proposed-bionic block-proposed-disco block-proposed-eoan verification-done verification-done-bionic verification-done-disco verification-done-eoan |
block-proposed-bionic block-proposed-disco block-proposed-eoan verification-done verification-done-bionic verification-done-disco verification-done-eoan verification-needed-xenial |
|
2020-02-03 11:54:33 |
Launchpad Janitor |
mdadm (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2020-02-10 22:58:22 |
dann frazier |
tags |
block-proposed-bionic block-proposed-disco block-proposed-eoan verification-done verification-done-bionic verification-done-disco verification-done-eoan verification-needed-xenial |
block-proposed-bionic block-proposed-disco block-proposed-eoan verification-needed verification-needed-bionic verification-needed-disco verification-needed-eoan verification-needed-xenial |
|
2020-02-10 22:59:02 |
dann frazier |
tags |
block-proposed-bionic block-proposed-disco block-proposed-eoan verification-needed verification-needed-bionic verification-needed-disco verification-needed-eoan verification-needed-xenial |
verification-needed verification-needed-bionic verification-needed-disco verification-needed-eoan verification-needed-xenial |
|
2020-02-11 00:03:12 |
dann frazier |
tags |
verification-needed verification-needed-bionic verification-needed-disco verification-needed-eoan verification-needed-xenial |
verification-done verification-done-bionic verification-done-disco verification-done-eoan verification-needed-xenial |
|
2020-02-11 00:08:30 |
dann frazier |
tags |
verification-done verification-done-bionic verification-done-disco verification-done-eoan verification-needed-xenial |
verification-done verification-done-bionic verification-done-disco verification-done-eoan verification-done-xenial |
|
2020-02-17 10:23:38 |
Launchpad Janitor |
linux (Ubuntu Eoan): status |
Fix Committed |
Fix Released |
|
2020-02-17 10:23:38 |
Launchpad Janitor |
cve linked |
|
2019-19050 |
|
2020-02-17 10:23:38 |
Launchpad Janitor |
cve linked |
|
2019-19077 |
|
2020-02-17 10:23:38 |
Launchpad Janitor |
cve linked |
|
2019-19078 |
|
2020-02-17 10:23:38 |
Launchpad Janitor |
cve linked |
|
2019-19082 |
|
2020-02-17 10:23:38 |
Launchpad Janitor |
cve linked |
|
2019-19332 |
|
2020-02-17 10:23:38 |
Launchpad Janitor |
cve linked |
|
2019-19965 |
|
2020-02-17 10:36:02 |
Launchpad Janitor |
linux (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2020-02-17 10:36:02 |
Launchpad Janitor |
cve linked |
|
2019-18885 |
|
2020-02-17 10:36:02 |
Launchpad Janitor |
cve linked |
|
2019-20096 |
|
2020-02-17 10:36:02 |
Launchpad Janitor |
cve linked |
|
2019-5108 |
|
2020-02-17 14:18:22 |
Launchpad Janitor |
linux (Ubuntu Xenial): status |
Fix Committed |
Fix Released |
|
2020-03-16 23:19:02 |
Launchpad Janitor |
linux (Ubuntu Focal): status |
Confirmed |
Fix Released |
|
2020-03-16 23:19:02 |
Launchpad Janitor |
cve linked |
|
2019-19076 |
|
2020-07-02 19:57:43 |
Steve Langasek |
linux (Ubuntu Disco): status |
Fix Committed |
Won't Fix |
|
2020-07-02 19:57:45 |
Steve Langasek |
mdadm (Ubuntu Disco): status |
Fix Committed |
Won't Fix |
|
2021-10-14 02:32:46 |
Steve Langasek |
linux (Ubuntu Precise): status |
New |
Won't Fix |
|
2021-10-14 02:32:49 |
Steve Langasek |
mdadm (Ubuntu Precise): status |
New |
Won't Fix |
|
2021-11-17 20:17:24 |
Brian Murray |
ubuntu-release-notes: status |
New |
Won't Fix |
|
2024-07-26 16:21:20 |
Brian Murray |
mdadm (Ubuntu Eoan): status |
Fix Committed |
Won't Fix |
|