fstrim destroying XFS on SAN

Bug #1686687 reported by Anonymous on 2017-04-27
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)

Bug Description

We observed severe data loss/filesystem corruption when executing fstrim on a filesystem hosted on an Eternus DX600 S3 system.

There is multipathing via a fibre channel fabrics but the issue could be reproduced when disabling multipathing and using one of the block devices directly.

It could not be reproduced when creating a multipathing device via dmsetup with four paths pointing to four loop devices mapping the same file.

The observed behavior is that XFS cannot read vital filesystem metadata as the underlying storage device returns blocks of 0x00. The blocks are discarded via UNMAP commands and since thin provisioning is used, the SAN deallocates them and returns 0x00 on subsequent reads. Invoking find yields error messages like "find: ./dir_16: Structure needs cleaning". In other tests, where more data had been written, files were accessible but checksums did no longer match.

In consequence, the XFS filesystem is in an unusable state and has to be created freshly, equaling complete data loss. Trying to repair the filesystem had proven not to be worth it as backups were available and trust had already been compromised.

The problem was discovered after installing a new storage server with ubuntu 16.04, intending to replace the current machine running 14.04. Every weekend, the test volumes were corrupted. Investigation pointed towards Sunday, 06:47, which is the time `cron.weekly` is run. The job file `/etc/cron.weekly/fstrim` seemed most likely, so `fstrim -a` was run manually after `mkfs.xfs` and the filesystem became damaged. The damage only became apparent after a `umount` `mount` cycle, when all buffers were flushed and data was re-read from the device.

We now could use config management to install a cronjob that (every minute!) checks for /sbin/fstrim and renames it, if present. This would be extremely unsatisfactory as it is a brittle workaround. So for now, we are locked on ubuntu 14.04. Since util-linux is one of the most central packages, there is no way to not have fstrim or the cronjob on a ubuntu system.

I have attached a script used to reproduce the bug reliably on our system and its log output, as well as excerpts from syslog and md5sum.

Anonymous (anonymous654) wrote :
Phillip Susi (psusi) wrote :

Just reomve /etc/cron.weekly/fstrim as a workaround. The underlying bug is not in fstrim, but in the kernel somewhere; probably the XFS filesystem driver and its interaction between preallocation and the FIBMAP ioctl.

affects: util-linux (Ubuntu) → linux (Ubuntu)
tags: added: bot-stop-nagging

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1686687

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Anonymous (anonymous654) wrote :

For us, fstrim was the frontend to the bug and running it consistently damaged our filesystems. That is why util-linux was chosen as the affected package. Also removing the cron file will not be sufficient as any update to the package (using cron-apt) might re-install it. Maybe a conflicting change would be a better choice. Still seems to brittle in light of such severe consequences. Mitigated by not upgrading to 16.04 for now.

Since the system concerned is not connected to the web, Apport as described here (https://wiki.ubuntu.com/Apport#Tools) will not work. Also, Ubuntu server did not install apport when selecting 'Basic Ubuntu Server' in the installer. If there is additional information that might be useful, let us know. Further response may be delayed due to people going on vacation.

Anonymous (anonymous654) wrote :

Marked as Confirmed in accordance with comment #3. Further detail might be provided on request.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Phillip Susi (psusi) wrote :

Come to think of it, since you say it only seems to happen on a multipath SAN and not on a normal disk, it may be related to the order of requests being sent to the disk and XFS's use of io barriers to make sure that the unmap request for the old data does not hit the disk after the new filesystem metadata has been written. IIRC, xfs has mount options to control the use of io barriers. Do you have them enabled?

Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.11 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc8

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Anonymous (anonymous654) wrote :

We did get around to do more testing and follow suggestions. The filesystem corruption could be reproduced. The run was done bypassing multipathing and using a single path directly.

OUTPUT reproduce.sh (tail):

+ fstrim -v /mnt/san
/mnt/san: 120 TiB (131939249291264 bytes) trimmed
+ umount -vvv /mnt/san
umount: /mnt/san unmounted
+ sync
+ mount -vvv
/dev/disk/by-path/pci-0000:02:00.0-fc-0x9999999999999999-lun-0 /mnt/san
mount: mount /dev/sde on /mnt/san failed: Structure needs cleaning
+ fail '3rd mount failed'
+ echo 'ERROR: 3rd mount failed'
ERROR: 3rd mount failed
+ exit 11

OUTPUT dmesg:

[13568.741492] XFS (sde): metadata I/O error: block 0x39fffffc70
("xfs_trans_read_buf_map") error 74 numblks 8 [13568.891639] XFS (sde): Metadata CRC error detected at
xfs_agi_read_verify+0x5e/0x110 [xfs], xfs_agi block 0x3a7ffffc68 [13569.033899] XFS (sde): Unmount and run xfs_repair [13569.103826] XFS (sde): First 64 bytes of corrupted metadata buffer:
[13569.174222] ffff9107a8836000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[13569.314629] ffff9107a8836010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[13569.454924] ffff9107a8836020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[13569.595204] ffff9107a8836030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................


Since there is no fstab entry, mount options are determined by whatever mechanism decides on defaults. In this case, mount reported the following:

xfs (rw,relatime,attr2,inode64,noquota)

The kernel for this run:

$ uname -a
Linux xxxxxxx 4.11.0-041100rc8-generic #201704232131 SMP Mon Apr 24 01:32:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Anonymous (anonymous654) on 2017-05-10
tags: added: kernel-bug-exists-upstream
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

Once this bug is reported upstream, please add the tag: 'kernel-bug-reported-upstream'.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
status: Confirmed → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers