raid10: discard leads to corrupted file system

Bug #1907262 reported by Thimo E
76
This bug affects 11 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Trusty
Undecided
Unassigned
Xenial
Undecided
Unassigned
Bionic
High
Unassigned
Focal
High
Unassigned
Groovy
High
Unassigned

Bug Description

Seems to be closely related to https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578

After updating the Ubuntu 18.04 kernel from 4.15.0-124 to 4.15.0-126 the fstrim command triggered by fstrim.timer causes a severe number of mismatches between two RAID10 component devices.

This bug affects several machines in our company with different HW configurations (All using ECC RAM). Both, NVMe and SATA SSDs are affected.

How to reproduce:
 - Create a RAID10 LVM and filesystem on two SSDs
    mdadm -C -v -l10 -n2 -N "lv-raid" -R /dev/md0 /dev/nvme0n1p2 /dev/nvme1n1p2
    pvcreate -ff -y /dev/md0
    vgcreate -f -y VolGroup /dev/md0
    lvcreate -n root -L 100G -ay -y VolGroup
    mkfs.ext4 /dev/VolGroup/root
    mount /dev/VolGroup/root /mnt
 - Write some data, sync and delete it
    dd if=/dev/zero of=/mnt/data.raw bs=4K count=1M
    sync
    rm /mnt/data.raw
 - Check the RAID device
    echo check >/sys/block/md0/md/sync_action
 - After finishing (see /proc/mdstat), check the mismatch_cnt (should be 0):
    cat /sys/block/md0/md/mismatch_cnt
 - Trigger the bug
    fstrim /mnt
 - Re-Check the RAID device
    echo check >/sys/block/md0/md/sync_action
 - After finishing (see /proc/mdstat), check the mismatch_cnt (probably in the range of N*10000):
    cat /sys/block/md0/md/mismatch_cnt

After investigating this issue on several machines it *seems* that the first drive does the trim correctly while the second one goes wild. At least the number and severity of errors found by a USB stick live session fsck.ext4 suggests this.

To perform the single drive evaluation the RAID10 was started using a single drive at once:
  mdadm --assemble /dev/md127 /dev/nvme0n1p2
  mdadm --run /dev/md127
  fsck.ext4 -n -f /dev/VolGroup/root

  vgchange -a n /dev/VolGroup
  mdadm --stop /dev/md127

  mdadm --assemble /dev/md127 /dev/nvme1n1p2
  mdadm --run /dev/md127
  fsck.ext4 -n -f /dev/VolGroup/root

When starting these fscks without -n, on the first device it seems the directory structure is OK while on the second device there is only the lost+found folder left.

Side-note: Another machine using HWE kernel 5.4.0-56 (after using -53 before) seems to have a quite similar issue.

Unfortunately the risk/regression assessment in the aforementioned bug is not complete: the workaround only mitigates the issues during FS creation. This bug on the other hand is triggered by a weekly service (fstrim) causing severe file system corruption.

CVE References

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Thimo,

Thank you for the very detailed bug report. I will start investigating this
immediately.

Thanks,
Matthew

tags: added: sts
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Groovy):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
Changed in linux (Ubuntu Focal):
importance: Undecided → High
Changed in linux (Ubuntu Groovy):
importance: Undecided → High
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Thimo,

Firstly, thank you for your bug report, we really, really appreciate it.

You are correct, the recent raid10 patches appear to cause filesystem corruption on raid10 arrays.

I have spent the day reproducing, and I can confirm that the 4.15.0-126-generic, 5.4.0-56-generic and 5.8.0-31-generic kernels are affected.

The kernel team are aware of the situation, and we have begun an emergency revert of the patches, and we should have new kernels available in the next few hours / day or so.

The current mainline kernel is affected, so I have written to the raid subsystem maintainer, and the original author of the raid10 block discard patches, to aid with debugging and fixing the problem.

You can follow the upstream thread here:

https://www.spinics.net/lists/kernel/msg3765302.html

As for the data corruption on your servers, I am deeply sorry for causing this regression.

When I was testing the raid10 block discard patches on the Ubuntu stable kernels, I did not think to fsck each of the disks in the array, instead, I was contempt with the speed of creating new arrays, writing a basic dataset to the disks, and rebooting the server to ensure the array came up again with those same files.

Since the first disk seems to be okay, there is at least a small window of opportunity for you to restore any data that you have not backed up.

I will keep you informed of getting the patches reverted, and getting the root cause fixed upstream. If you have any questions, feel free to ask, and if you have any more details from your own debugging, feel free to share in this bug, or on the upstream mailing list discussion.

Thanks,
Matthew

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Trent Lloyd (lathiat) wrote :

I can reproduce this on a Google Cloud n1-standard-16 using 2x Local NVMe disks. Then partition nvme0n1 and nvne0n2 with only an 8GB partition, then format directly with ext4 (skip LVM).

In this setup each 'check' takes <1 min so speeds up testing considerably. Example details - seems pre-emptible instance cost for this is $0.292/hour / $7/day.

gcloud compute instances create raid10-test --project=juju2-157804 \
        --zone=us-west1-b \
        --machine-type=n1-standard-16 \
        --subnet=default \
        --network-tier=STANDARD \
        --no-restart-on-failure \
        --maintenance-policy=TERMINATE \
        --preemptible \
        --boot-disk-size=32GB \
        --boot-disk-type=pd-ssd \
        --image=ubuntu-1804-bionic-v20201116 --image-project=ubuntu-os-cloud \
        --local-ssd=interface=NVME --local-ssd=interface=NVME

# apt install linux-image-virtual
# apt-get remove linux-image-gcp linux-image-5.4.0-1029-gcp linux-image-unsigned-5.4.0-1029-gcp --purge
# reboot

sgdisk -n 0:0:+8G /dev/nvme0n1
sgdisk -n 0:0:+8G /dev/nvme0n2
mdadm -C -v -l10 -n2 -N "lv-raid" -R /dev/md0 /dev/nvme0n1p2 /dev/nvme1n1p2
mkfs.ext4 /dev/md0
mount /dev/md0 /mnt
dd if=/dev/zero of=/mnt/data.raw bs=4K count=1M; sync; rm /mnt/data.raw
echo check >/sys/block/md0/md/sync_action; watch 'grep . /proc/mdstat /sys/block/md0/md/mismatch_cnt' # no mismatch
fstrim -v /mnt
echo check >/sys/block/md0/md/sync_action; watch 'grep . /proc/mdstat /sys/block/md0/md/mismatch_cnt' # mismatch=256

I ran blktrace /dev/md0 /dev/nvme0n1 /dev/nvme0n2 and will upload the results I didn't have time to try and understand the results as yet.

Some thoughts
 - It was asserted that the first disk 'appears' fine
 - So I wondered can we reliably repair by asking mdadm to do a 'repair' or 'resync'
 - It seems that reads are at least sometimes balanced (maybe by PID) to different disks since this post.. https://www.spinics.net/lists/raid/msg62762.html - unclear if the same selection impacts writes (not that it would help performance)
 - So it's unclear we can reliably say only a 'passive mirror' is being corrupted, it's possible application reads may or may not be corrupted. More testing/understanding of the code required.
 - This area of RAID10 and RAID1 seems quite under-documented, "man md" doesn't talk much about how or which disk is used to repair the other if there is a mismatch (unlike RAID5 where the parity gives us some assurances as to which data is wrong).
 - We should try writes from different PIDs, with known different data, and compare the data on both disks with the known data to see if we can knowingly get the wrong data on both disks or only one. And try that with 4 disks instead of 2.

Revision history for this message
Trent Lloyd (lathiat) wrote :
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Committed
Revision history for this message
Thimo E (thimoe) wrote :

Hi Matthew and all,

thank you for taking action immediately. I really appreciate your effort.

After investigating the issue further I have to add that the mount option discard seems to trigger the issue, too.

@Trent
The general problem here is that RAID10 can balance single read streams to all disks (which is probably the major advantage over RAID1 effectively providing you RAID0 read speed; RAID1 needs parallel reads to achieve this).

That said it is no big surprise that several machines at our site went to readonly mode after *some time* (probably reading some filesystem relevant data from the "bad disk"). Unfortunately the "clean first disk" only happens if you act immediately, otherwise you might have some data corruption.
I verified this on one system where the root partition was affected using the debsums tool (just run debsums -xa) after fixing FS errors.

My procedure to recover was:
Assembly of the RAID:
mdadm --assemble /dev/md127 /dev/nvme0n1p2
mdadm --run /dev/md127

Filesystem check on all partitions (note the -f parameter, some FS "think" they are clean):
fsck.ext4 -f /dev/VolGroup/...

Re-add the second component:
mdadm --zero-superblock /dev/nvme1n1p2
mdadm --add /dev/md127 /dev/nvme1n1p2

Best regards

Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

Thimo,

Thanks for the update; just to clarify, for your "procedure to recover," are you saying that that procedure will always resolve the damage, or that even after that procedure, there may be corruption?

Revision history for this message
Thimo E (thimoe) wrote :

This is just the procedure with the least damage I found.
Still data loss may happen (and actually happened to some of our systems).

Probably first re-adding (after zeroing) the second component to the RAID and then fsck-ing leads to the exact same result but I wanted to keep the second component as fall-back until I could see the results of fsck.

Revision history for this message
voidlily (voidlily) wrote :

This issue is also affecting xenial, or at least the package was pulled from xenial as well. When I try to click the "add distribution" button in launchpad I'm getting an oops error, so posting a comment about xenial being affected in the meantime.

Revision history for this message
Eric Desrochers (slashd) wrote :

@voidlily,

I would assume you are running a HWE kernel (v4.15) on Xenial.

If it's the case, fixing the Bionic kernel will generate a new HWE (4.15) kernel for Xenial.

Changed in linux (Ubuntu Xenial):
status: New → Invalid
Changed in linux (Ubuntu Trusty):
status: New → Invalid
Revision history for this message
Eric Desrochers (slashd) wrote :

For Trusty and Xenial, fstrim is scheduled via cron[0] to run weekly at each Sunday at 6h47[1].
For Bionic onward, fstrim is scheduled via systemd timer to also run weekly[2]

Impacted users may want to take action before the next scheduled run by downgrading the running kernel or temporarily disabling the fstrim job.

[Trusty and Xenial]
By default, an /etc/cron.weekly/fstrim job is installed, but this may be supplanted by local modifications.

Check if you are running a cron job which might invoke fstrim:
$ sudo grep -r fstrim /etc/cron*

If an fstrim job is found in the results of the above command, edit the appropriate file and comment out the command with a “#” at the beginning of the line to disable the execution of fstrim.

For the default Ubuntu configuration, the command in the /etc/cron.weekly/fstrim file starts with “/sbin/fstrim” or “exec fstrim-all” and is the last line of the file.

[Bionic or late]
$ sudo systemctl disable --now fstrim.timer
$ sudo systemctl mask fstrim.service

[0] - /etc/cron.weekly/fstrim
[1] - grep -i weekly /etc/crontab:
[2] - systemctl status fstrim.timer | grep "Trigger:"

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
tags: added: verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for Bionic.

I spun up a m5d.4xlarge instance on AWS, to utilise the 2x 300GB NVMe drives that support block discard.

I enabled -proposed, and installed the 4.15.0-128-generic kernel.

The following is the repro session running through the full testcase:

https://paste.ubuntu.com/p/VpwjbRRcy6/

A 2 disk Raid10 array was created, LVM created and formatted ext4. I let the consistency checks finish, and created, then deleted a file. Did another consistency check, then performed a fstrim. After another consistency check, we unmount and perform a fsck on each individual disk.

root@ip-172-31-10-77:~# fsck.ext4 -n -f /dev/VolGroup/root
e2fsck 1.44.1 (24-Mar-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/VolGroup/root: 11/6553600 files (0.0% non-contiguous), 557848/26214400 blocks

root@ip-172-31-10-77:~# fsck.ext4 -n -f /dev/VolGroup/root
e2fsck 1.44.1 (24-Mar-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/VolGroup/root: 11/6553600 files (0.0% non-contiguous), 557848/26214400 blocks

Both of them pass, there is no corruption to the filesystem.

4.15.0-128-generic fixes the problem, the revert is effective.

Marking bug as verified for Bionic.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for Focal.

I spun up a m5d.4xlarge instance on AWS, to utilise the 2x 300GB NVMe drives that support block discard.

I enabled -proposed, and installed the 5.4.0-58-generic kernel.

The following is the repro session running through the full testcase:

https://paste.ubuntu.com/p/Zr4C2pMbrk/

A 2 disk Raid10 array was created, LVM created and formatted ext4. I let the consistency checks finish, and created, then deleted a file. Did another consistency check, then performed a fstrim. After another consistency check, we unmount and perform a fsck on each individual disk.

root@ip-172-31-1-147:/home/ubuntu# fsck.ext4 -n -f /dev/VolGroup/root
e2fsck 1.45.5 (07-Jan-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/VolGroup/root: 11/6553600 files (0.0% non-contiguous), 557848/26214400 blocks

root@ip-172-31-1-147:/home/ubuntu# fsck.ext4 -n -f /dev/VolGroup/root
e2fsck 1.45.5 (07-Jan-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/VolGroup/root: 11/6553600 files (0.0% non-contiguous), 557848/26214400 blocks

Both of them pass, there is no corruption to the filesystem.

5.4.0-58-generic fixes the problem, the revert is effective.

Marking bug as verified for Focal.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed-groovy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-groovy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.8.0-33.36

---------------
linux (5.8.0-33.36) groovy; urgency=medium

  * groovy/linux: 5.8.0-33.36 -proposed tracker (LP: #1907408)

  * raid10: discard leads to corrupted file system (LP: #1907262)
    - Revert "dm raid: remove unnecessary discard limits for raid10"
    - Revert "dm raid: fix discard limits for raid1 and raid10"
    - Revert "md/raid10: improve discard request for far layout"
    - Revert "md/raid10: improve raid10 discard request"
    - Revert "md/raid10: pull codes that wait for blocked dev into one function"
    - Revert "md/raid10: extend r10bio devs to raid disks"
    - Revert "md: add md_submit_discard_bio() for submitting discard bio"

 -- Khalid Elmously <email address hidden> Wed, 09 Dec 2020 03:56:47 -0500

Changed in linux (Ubuntu Groovy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.4.0-58.64

---------------
linux (5.4.0-58.64) focal; urgency=medium

  * focal/linux: 5.4.0-58.64 -proposed tracker (LP: #1907390)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * raid10: discard leads to corrupted file system (LP: #1907262)
    - Revert "dm raid: remove unnecessary discard limits for raid10"
    - Revert "dm raid: fix discard limits for raid1 and raid10"
    - Revert "md/raid10: improve discard request for far layout"
    - Revert "md/raid10: improve raid10 discard request"
    - Revert "md/raid10: pull codes that wait for blocked dev into one function"
    - Revert "md/raid10: extend r10bio devs to raid disks"
    - Revert "md: add md_submit_discard_bio() for submitting discard bio"

 -- Khalid Elmously <email address hidden> Wed, 09 Dec 2020 02:10:30 -0500

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.15.0-128.131

---------------
linux (4.15.0-128.131) bionic; urgency=medium

  * bionic/linux: 4.15.0-128.131 -proposed tracker (LP: #1907354)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * raid10: discard leads to corrupted file system (LP: #1907262)
    - Revert "md/raid10: improve discard request for far layout"
    - Revert "md/raid10: improve raid10 discard request"
    - Revert "md/raid10: pull codes that wait for blocked dev into one function"
    - Revert "md/raid10: extend r10bio devs to raid disks"
    - Revert "md: add md_submit_discard_bio() for submitting discard bio"

 -- Khalid Elmously <email address hidden> Wed, 09 Dec 2020 01:27:33 -0500

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.8.0-36.40+21.04.1

---------------
linux (5.8.0-36.40+21.04.1) hirsute; urgency=medium

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  [ Ubuntu: 5.8.0-36.40 ]

  * debian/scripts/file-downloader does not handle positive failures correctly
    (LP: #1878897)
    - [Packaging] file-downloader not handling positive failures correctly

  [ Ubuntu: 5.8.0-35.39 ]

  * Packaging resync (LP: #1786013)
    - update dkms package versions
  * CVE-2021-1052 // CVE-2021-1053
    - [Packaging] NVIDIA -- Add the NVIDIA 460 driver

 -- Kleber Sacilotto de Souza <email address hidden> Thu, 07 Jan 2021 11:57:30 +0100

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Thimo,

Recently, Xiao Ni, the original author of the Raid10 block discard patchset, has posted a new revision of the patchset to the linux-raid mailing list for feedback.

Xiao has fixed the two bugs that caused the regression. The first was incorrectly calculating the start offset for block discard for the second and extra disks. The second bug was an incorrect stripe size for far layouts.

The new patches are:

https://www.spinics.net/lists/raid/msg67208.html
https://www.spinics.net/lists/raid/msg67212.html
https://www.spinics.net/lists/raid/msg67213.html
https://www.spinics.net/lists/raid/msg67209.html
https://www.spinics.net/lists/raid/msg67210.html
https://www.spinics.net/lists/raid/msg67211.html

Now, at some point in the future I do want to try and SRU these patches to the Ubuntu kernel, but only when they are ready.

I was wondering if you would be interested in helping to test these new patches, since you have a lot of experience with Raid10.

If you have some time, and a dedicated spare server, read comment 13 in the below bug which contains instructions to install test kernels I have built.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/13

This is entirely optional, and don't feel that you are obligated to test. We just want to get more eyes on the patches and some wider testing done, and to give feedback back to Xiao, the author, and to Song Liu, the Raid subsystem maintainer about the performance and safety of these patches.

I have tested the test kernels with the regression reproducer from this bug, and the mismatch count is always 0, and all fsck -f comes back clean for all disks.

If you have some spare time and a spare server, I would really appreciate help testing these kernels.

Thanks!
Matthew

Changed in linux (Ubuntu Groovy):
assignee: nobody → Sinclair Willis (yousure122244444444)
Changed in linux (Ubuntu Groovy):
assignee: Sinclair Willis (yousure122244444444) → nobody
Revision history for this message
Thimo E (thimoe) wrote :

Hi Matthew,

are these tests still relevant for you?

BR,
 Thimo

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Thimo,

Thanks for writing back, great timing!

So, the new revision of the patches that we have been testing since February have just been merged into mainline. The md/raid10 patches got merged on Friday, and the dm/raid patches got merged on Saturday, and will be tagged into 5.13-rc1. There's been a few of us testing them, and we haven't seen any regressions that cause data loss or disk corruption. Things are looking okay.

If you are interested, you can see a list of the new commits on bug 1896578.

We are still planning to SRU the new revision into the Ubuntu kernels, and I have spent the day backporting the official mainline commits to the Ubuntu 5.11, 5.8, 5.4 and 4.15 kernels.

I'm currently building re-spins of the test kernels, based on more recently released Ubuntu kernels, with these official mainline patches, instead of the patches I got from the development mailing list I used in my previous set of test kernels.

I'm expecting these kernels to finish building overnight, and I will make sure to write back tomorrow morning with instructions on how to install these test kernels.

It would be great if you could give them a test before they get built into the next Ubuntu kernel update. Even when they are built into the next kernel update, I'll let you know how you can test them when they are in -proposed, before they are officially released to -updates.

I'll write back tomorrow morning with instructions on how to install the fresh test kernels.

Thanks,
Matthew

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Thimo,

As promised yesterday, the new re-spins of the test kernels have finished building and are now available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/lp1896578-test

The patches used are the ones I will be submitting for SRU, and are more or less identical to the patches in the previous test kernels I supplied in February.

Please go ahead and do some testing, and let me know if you find any problems.

Please note this package is NOT SUPPORTED by Canonical, and is for TESTING PURPOSES ONLY. ONLY Install in a dedicated test environment.

Instructions to install:
1) sudo add-apt-repository ppa:mruffell/lp1896578-test
2) sudo apt update

For 21.04 / Hirsute:

3) sudo apt install linux-image-unsigned-5.11.0-16-generic linux-modules-5.11.0-16-generic \
linux-modules-extra-5.11.0-16-generic linux-headers-5.11.0-16-generic

For 20.10 / Groovy:

3) sudo apt install linux-image-unsigned-5.8.0-50-generic linux-modules-5.8.0-50-generic \
linux-modules-extra-5.8.0-50-generic linux-headers-5.8.0-50-generic

For 20.04 / Focal:

3) sudo apt install linux-image-unsigned-5.4.0-72-generic linux-modules-5.4.0-72-generic \
linux-modules-extra-5.4.0-72-generic linux-headers-5.4.0-72-generic

For 18.04 / Bionic:
For the 5.4 Bionic HWE kernel:

3) sudo apt install linux-image-unsigned-5.4.0-72-generic linux-modules-5.4.0-72-generic \
linux-modules-extra-5.4.0-72-generic linux-headers-5.4.0-72-generic

For the 4.15 Bionic GA kernel:

3) sudo apt install linux-image-unsigned-4.15.0-142-generic linux-modules-4.15.0-142-generic \
linux-modules-extra-4.15.0-142-generic linux-headers-4.15.0-142-generic

4) sudo reboot
5) uname -rv
Make sure the string "+TEST1896578v20210504b1" is present in the uname -rv.

You may need to modify your grub configuration to boot the correct kernel. If you need help, read these instructions: https://paste.ubuntu.com/p/XrTzWPPnWJ/

I'm still doing final regression testing, but things are looking okay so far. The deadline for patch submission to the next SRU cycle is tomorrow. I'm still planning on submitting the patches for tomorrow, but if I think we need more time for testing, worst case it will slip to the SRU cycle after, which is 3 weeks away.

I will write back tomorrow with the results of my regression testing and if I have submitted the patches for SRU.

Thanks,
Matthew

Revision history for this message
Thimo E (thimoe) wrote :

Hi Matthew,

thank you for providing the test-kernel and instructions. I will give it a try.

Regards,
 Thimo

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Thimo,

I have been doing quite a bit of regression testing, and so far everything is
looking good. The performance of the block discard is there, and I haven't
come across any data corruption.

I have also spent some time running through the testcase you created for this
bug, and I have the results of those tests below.

For each of the 5.11, 5.8, 5.4 and 4.15 kernels, the problem does not reproduce,
as the values of /sys/block/md0/md/mismatch_cnt are always 0, and mounting each
disk in singular and performing a full deep fsck shows no data corruption.

Test results for each kernel are below:

5.11.0-16-generic #17+TEST1896578v20210503b1-Ubuntu
https://paste.ubuntu.com/p/Dp3sR9mNdY/

5.8.0-50-generic #56+TEST1896578v20210504b1-Ubuntu
https://paste.ubuntu.com/p/tXmtmd5Jys/

5.4.0-72-generic #80+TEST1896578v20210504b1-Ubuntu
https://paste.ubuntu.com/p/VzX2mXcKbF/

4.15.0-142-generic #146+TEST1896578v20210504b1-Ubuntu
https://paste.ubuntu.com/p/HpMcX3N9fD/

I'm going to look into some longer running test cases as well, so far I have
been focusing on short term (less than six hour) test cases.

Otherwise, I have submitted the patches to the Ubuntu kernel mailing list for
SRU. Now, these patches will still be subject to review by senior members of the
kernel team, and their approval is required before they get applied to the
official Ubuntu kernels. I will let you know if they get approval or not.

In the meantime, please test the test kernels, and if you find any issues at
all with the test kernels, please let me know.

Thanks,
Matthew

Revision history for this message
Thimo E (thimoe) wrote :

Hi Matthew,

thank you for your continuous effort. I tested your 5.4.0-72-generic #80+TEST1896578v20210504b1-Ubuntu until now without trouble.
I also started fstrim manually on a machine which did not do it for some time due to disabled fstrim service.

Regards,
 Thimo

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Thimo,

Thanks for helping test! I really appreciate it. It is great to hear that you haven't had any trouble with the test kernel.

Just a quick update on the state of the Raid10 patchset. I submitted them for SRU for the current cycle, and the kernel team wrote back to me asking for more testing to be done before they make a decision to include them in the Ubuntu kernels.

I am currently looking into longer running tests.

At the moment, I am using a cloud instance as my personal computer with 4x scratch NVMe disks built as a Raid10 array with the same 5.4 test kernel, and I put my /home directory on the raid array. Everything is okay so far.

I am planing to submit the patches for SRU to the next kernel SRU cycle, so hopefully we can get them reviewed and accepted then.

I hope things are still running nice and stable on your side. I'll let you know how I get on with my /home on a Raid10 array, and when I next submit the patches for SRU.

Thanks,
Matthew

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Thimo,

As I mentioned in my previous message, I submitted the patches to the Ubuntu kernel mailing list for SRU.

These patches have now gotten 2 acks [1][2] from senior kernel team members, and the patches have now been applied [3] to the 4.15, 5.4, 5.8 and 5.11 kernels.

[1] https://lists.ubuntu.com/archives/kernel-team/2021-May/120475.html
[2] https://lists.ubuntu.com/archives/kernel-team/2021-May/120799.html
[3] https://lists.ubuntu.com/archives/kernel-team/2021-May/120800.html

This is what is going to happen next. Next week, between the 31st of May and 4th of June, the kernel team will build the next kernel update, and place it in -proposed for testing.

As soon as these kernels enter -proposed, we need to install and test Raid10 in these new kernels as much as possible. The testing and verification window is between the 7th and 18th of June.

If all goes well, we can mark the launchpad bug as verified, and we will see a release to -updates around the 21st of June, give or take a few days if any CVEs turn up.

The schedule is on https://kernel.ubuntu.com/ if anything were to change.

I will write back once the next kernel update is in -proposed, likely early to mid next week. I would really, really appreciate it if you could help test the kernels when they arrive in -proposed, as I really don't want to introduce any more regressions.

Thanks,
Matthew

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Thimo,

The kernel team have built all of the kernels for this SRU cycle, and have placed them into -proposed for verification.

We now need to do some thorough testing and make sure that Raid10 arrays function with good performance, ensure data integrity and make sure we won't be introducing any regressions when these kernels are released in two weeks time.

I would really appreciate it if you could help test and verify these kernels function as intended.

Instructions to Install:

1) cat << EOF | sudo tee /etc/apt/sources.list.d/ubuntu-$(lsb_release -cs)-proposed.list
# Enable Ubuntu proposed archive
deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -cs)-proposed main universe
EOF
2) sudo apt update

For 21.04 / Hirsute:

3) sudo apt install linux-image-5.11.0-20-generic linux-modules-5.11.0-20-generic \
linux-modules-extra-5.11.0-20-generic linux-headers-5.11.0-20-generic

For 20.10 / Groovy:

3) sudo apt install linux-image-5.8.0-56-generic linux-modules-5.8.0-56-generic \
linux-modules-extra-5.8.0-56-generic linux-headers-5.8.0-56-generic

For 20.04 / Focal:

3) sudo apt install linux-image-5.4.0-75-generic linux-modules-5.4.0-75-generic \
linux-modules-extra-5.4.0-75-generic linux-headers-5.4.0-75-generic

For 18.04 / Bionic:
 For the 5.4 Bionic HWE Kernel:

 3) sudo apt install linux-image-5.4.0-75-generic linux-modules-5.4.0-75-generic \
linux-modules-extra-5.4.0-75-generic linux-headers-5.4.0-75-generic

 For the 4.15 Bionic GA Kernel:

 3) sudo apt install linux-image-4.15.0-145-generic linux-modules-4.15.0-145-generic \
linux-modules-extra-4.15.0-145-generic linux-headers-4.15.0-145-generic

4) sudo reboot
5) uname -rv

You may need to modify your grub configuration to boot the correct kernel. If you need help, read these instructions: https://paste.ubuntu.com/p/XrTzWPPnWJ/

I am running the -proposed kernel on my cloud instance with my /home directory on a Raid10 array made up of 4x NVMe devices, and things are looking okay.
I will be performing my detailed regression testing against these kernels tomorrow, and I will write back with the results then.

Please help test these kernels in -proposed, and let me know how they go.

Thanks,
Matthew

Revision history for this message
Thimo E (thimoe) wrote :

Hi Matthew,

Thanks for your effort to add this feature to the Ubuntu kernels.

I installed linux-image-5.4.0-75-generic on 2021-06-08.
Neither during normal work nor during manual fstrim any problems so far.

Best regards,
 Thimo

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Thimo,

Thanks for letting me know, and great to hear that things are working as
expected. I'll check in with you in one week's time, to double check things are
still going okay.

I spent some time today performing verification on all the kernels in -proposed,
testing block discard performance [1], and also running through the regression
testcase from LP #1907262 [2].

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578
[2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262

All kernels performed as expected, with block discard on 4x 1.9TB NVMe disks
on an i3.8xlarge AWS instance taking 3-4 seconds, and the consistency checks
performed returned clean disks, with no filesystem or data corruption.

I have documented my tests in my verification messages:

Hirsute:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/26

Groovy:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/27

Focal:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/28

Bionic:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578/comments/29

I have marked the launchpad bug as verified for all releases.

I'm still running my own testing, with my /home directory being on a Raid10 array
on a Google Cloud instance, and it has no issues.

If things keep going well, we should see a release to -updates around the 21st
of June, give or take a few days if any CVEs turn up.

Thanks,
Matthew

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Thimo,

Just checking in. Are you still running 5.4.0-75-generic on your server?

Is everything nice and stable? Is your data fully intact, and no signs of corruption at all?

My server has been running for two weeks now, and it does a fstrim every 30 minutes, and everything appears to be stable, and I don't have any corruption when I fsck my disks.

If things keep looking good, the SRU cycle will complete early next week, and the kernel will be released to -updates around the 21st of June, give or take a few days if any CVEs turn up.

Let me know how things are going.

Thanks,
Matthew

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi Thimo,

The SRU cycle has completed, and all kernels containing the Raid10 block discard performance patches have now been released to -updates.

Note that the versions are different than the kernels in -proposed, due to the kernel team needing to do a last minute respin to fix two sets of CVEs, one for broadcom wifi chipsets and the other for bpf, hence the kernels being released a day later than usual.

The released kernels are:

Hirsute: 5.11.0-22-generic
Groovy: 5.8.0-59-generic
Focal: 5.4.0-77-generic
Bionic: 4.15.0-147-generic

The HWE equivalents have also been released to -updates.

You may now install these kernels to your systems and enjoy fast block discard for your Raid10 arrays.

All of our testing has concluded that these patches are stable, but if you run into any issues whatsoever as you roll this out to more systems, please let us know, and we will investigate accordingly.

I wish you a trouble free rollout of these kernels to your systems.

Thanks,
Matthew

Revision history for this message
Thimo E (thimoe) wrote :

Hi Matthew,

sorry for the late reply.
Today I triggered another fstrim with the linux-image-5.4.0-75-generic kernel and made a final check on the RAID - for me no trouble occured yet.
Thank you for pursuing this topic so persistently and providing the patches to the Ubuntu kernel finally.

Best regards,
 Thimo

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers