Comment 0 for bug 2036467

Revision history for this message
Krister Johansen (kmjohansen) wrote : superblock checksum mismatch in resize2fs

Hi,
We run ext4 on EBS volumes on EC2. During provisioning, cloud-init will occasionally report that resize2fs has failed due to a superblock checksum mismatch. We debugged this internally, and were able to come up with the following reproducer:

   #!/usr/bin/bash
   set -euxo pipefail

   while true
   do
           parted /dev/nvme1n1 mklabel gpt mkpart primary 2048s 2099200s
           sleep .5
           mkfs.ext4 /dev/nvme1n1p1
           mount -t ext4 /dev/nvme1n1p1 /mnt
           stress-ng --temp-path /mnt -D 4 &
           STRESS_PID=$!
           sleep 1
           growpart /dev/nvme1n1 1
           resize2fs /dev/nvme1n1p1
           kill $STRESS_PID
           wait $STRESS_PID
           umount /mnt
           wipefs -a /dev/nvme1n1p1
           wipefs -a /dev/nvme1n1
   done

(This was on a 60gb gp3 volume attached to a c5.4xlarge)

We were able to find a fix that works and get the patch accepted upstream. The short explanation is that by switching the superblock read to direct io, we no longer see the problem.

The patch is available here, but hasn't been published in a released version of e2fsprogs:

https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/commit/?id=43a498e938887956f393b5e45ea6ac79cc5f4b84

A longer thread with the maintainer is available here:

https://<email address hidden>/

This bug report is to request that Ubuntu backport this patch to the versions of e2fsprogs that are in releases that are available in images on AWS, preferably Focal and Jammy.