Activity log for bug #1896154

Date Who What changed Old value New value Message
2020-09-18 02:51:42 Matthew Ruffell bug added bug
2020-09-18 02:51:59 Matthew Ruffell nominated for series Ubuntu Focal
2020-09-18 02:51:59 Matthew Ruffell bug task added linux (Ubuntu Focal)
2020-09-18 02:52:09 Matthew Ruffell linux (Ubuntu): status New Fix Released
2020-09-18 02:52:13 Matthew Ruffell linux (Ubuntu Focal): status New In Progress
2020-09-18 02:52:16 Matthew Ruffell linux (Ubuntu Focal): importance Undecided Medium
2020-09-18 02:52:20 Matthew Ruffell linux (Ubuntu Focal): assignee Matthew Ruffell (mruffell)
2020-09-18 02:53:18 Matthew Ruffell description BugLink: https://bugs.launchpad.net/bugs/ [Impact] Since 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit") which landed in 5.3, btrfs wont trim a range that has already been trimmed, and will instead go looking for a range where the CHUNK_TRIMMED and CHUNK_ALLOCATED bits aren't set. If a device had been shrunk, the CHUNK_TRIMMED and CHUNK_ALLOCATED bits are never cleared, which means that btrfs could go looking for a range to trim which is beyond the new device size. This leads to an underflow in a length calculation for the range to trim, and we will end up trimming past the device's boundary. This has an unfortunate side effect of mangling and filling the root disk with garbage data, and it will not stop until the root disk is totally filled, and makes the instance unusable. [Fix] The issue was fixed in the following commit, in 5.9-rc1: commit c57dd1f2f6a7cd1bb61802344f59ccdc5278c983 Author: Qu Wenruo <wqu@suse.com> Date: Fri Jul 31 19:29:11 2020 +0800 Subject: btrfs: trim: fix underflow in trim length to prevent access beyond device boundary Link: https://github.com/torvalds/linux/commit/c57dd1f2f6a7cd1bb61802344f59ccdc5278c983 The fix clears the CHUNK_TRIMMED and CHUNK_ALLOCATED bits when a device is being shrunk, and performs some additional checks to ensure we do not trim past the device size boundary. The fix was backported to 5.7.17 and 5.8.3 upstream stable, but it seems 5.4 was skipped. The patch required a minor backport to 5.4, with the CHUNK_STATE_MASK #define moving files back to fs/btrfs/extent_io.h, as the file had been renamed in later kernels. [Testcase] The easiest way to reproduce is to use a cloud instance that supplies a real NVMe drive, that supports TRIM and block discards. Warning, this will fill the root disk with garbage data, ONLY run on a throwaway instance! Run the following commands: $ dev=/dev/nvme0n1 $ mnt=/mnt $ mkfs.btrfs -f $dev -b 10G $ mount $dev $mnt $ fstrim $mnt $ btrfs filesystem resize 1:-1G $mnt $ fstrim $mnt The last command will appear to hang, while the root filesystem will begin filling with garbage data. Once the root filesystem fills, you will see the following error: fstrim: /mnt: FITRIM ioctl failed: Input/output error /dev/sda1 29G 29G 0 100% / A test kernel is available from the following PPA: https://launchpad.net/~mruffell/+archive/ubuntu/sf293389-test If you install the test kernel, then the final fstrim command completes successfully in a short amount of time. [Regression Potential] If a regression were to occur, it could affect users who are attempting to shrink or resize their btrfs volume. Most users already understand that changing the size of a volume is a risky operation, and would have a backup. If a regression occurs, then there is potential for data loss when users resize or shrink their btrfs volumes. Standard volume creation would not be affected. The patches have been backported to upstream stable, and are trusted by the community. BugLink: https://bugs.launchpad.net/bugs/1896154 [Impact] Since 929be17a9b49 ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit") which landed in 5.3, btrfs wont trim a range that has already been trimmed, and will instead go looking for a range where the CHUNK_TRIMMED and CHUNK_ALLOCATED bits aren't set. If a device had been shrunk, the CHUNK_TRIMMED and CHUNK_ALLOCATED bits are never cleared, which means that btrfs could go looking for a range to trim which is beyond the new device size. This leads to an underflow in a length calculation for the range to trim, and we will end up trimming past the device's boundary. This has an unfortunate side effect of mangling and filling the root disk with garbage data, and it will not stop until the root disk is totally filled, and makes the instance unusable. [Fix] The issue was fixed in the following commit, in 5.9-rc1: commit c57dd1f2f6a7cd1bb61802344f59ccdc5278c983 Author: Qu Wenruo <wqu@suse.com> Date: Fri Jul 31 19:29:11 2020 +0800 Subject: btrfs: trim: fix underflow in trim length to prevent access beyond device boundary Link: https://github.com/torvalds/linux/commit/c57dd1f2f6a7cd1bb61802344f59ccdc5278c983 The fix clears the CHUNK_TRIMMED and CHUNK_ALLOCATED bits when a device is being shrunk, and performs some additional checks to ensure we do not trim past the device size boundary. The fix was backported to 5.7.17 and 5.8.3 upstream stable, but it seems 5.4 was skipped. The patch required a minor backport to 5.4, with the CHUNK_STATE_MASK #define moving files back to fs/btrfs/extent_io.h, as the file had been renamed in later kernels. [Testcase] The easiest way to reproduce is to use a cloud instance that supplies a real NVMe drive, that supports TRIM and block discards. Warning, this will fill the root disk with garbage data, ONLY run on a throwaway instance! Run the following commands: $ dev=/dev/nvme0n1 $ mnt=/mnt $ mkfs.btrfs -f $dev -b 10G $ mount $dev $mnt $ fstrim $mnt $ btrfs filesystem resize 1:-1G $mnt $ fstrim $mnt The last command will appear to hang, while the root filesystem will begin filling with garbage data. Once the root filesystem fills, you will see the following error: fstrim: /mnt: FITRIM ioctl failed: Input/output error /dev/sda1 29G 29G 0 100% / A test kernel is available from the following PPA: https://launchpad.net/~mruffell/+archive/ubuntu/sf293389-test If you install the test kernel, then the final fstrim command completes successfully in a short amount of time. [Regression Potential] If a regression were to occur, it could affect users who are attempting to shrink or resize their btrfs volume. Most users already understand that changing the size of a volume is a risky operation, and would have a backup. If a regression occurs, then there is potential for data loss when users resize or shrink their btrfs volumes. Standard volume creation would not be affected. The patches have been backported to upstream stable, and are trusted by the community.
2020-09-18 02:54:31 Matthew Ruffell tags sts
2020-09-24 16:53:51 Marcelo Cerri bug task added linux-azure (Ubuntu)
2020-09-24 16:54:50 Marcelo Cerri linux-azure (Ubuntu Focal): status New In Progress
2020-09-24 17:55:16 Marcelo Cerri linux-azure (Ubuntu Focal): status In Progress Fix Committed
2020-10-06 18:56:30 Ian May linux (Ubuntu Focal): status In Progress Fix Committed
2020-10-13 22:39:02 Launchpad Janitor linux-azure (Ubuntu Focal): status Fix Committed Fix Released
2020-10-13 22:39:02 Launchpad Janitor cve linked 2020-16119
2020-10-13 22:39:02 Launchpad Janitor cve linked 2020-16120
2020-11-17 10:05:11 Ubuntu Kernel Bot tags sts sts verification-needed-focal
2020-11-17 22:29:55 Matthew Ruffell tags sts verification-needed-focal sts verification-done-focal
2020-11-30 15:46:09 Launchpad Janitor linux (Ubuntu Focal): status Fix Committed Fix Released
2020-11-30 15:46:09 Launchpad Janitor cve linked 2020-14351
2020-11-30 15:46:09 Launchpad Janitor cve linked 2020-4788