fstrim and discard operations take too long to complete

Bug #1756311 reported by Alexandre Makoto Tanno on 2018-03-16
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux-aws (Ubuntu)
Undecided
Unassigned

Bug Description

1-) Ubuntu Release : Ubuntu 14.04.5 LTS

2-) linux-image-3.13.0-143-generic and linux-image-4.4.0-1014-aws

3-) mkfs.xfs and fstrim -v on a raid0 array using nvme and md should not take more than some seconds to complete.

4-) Formating the raid0 array with xfs took around 2 hours. Running fstrim -v on the mount point mounted on top of the raid array took around 2 hours.

How to reproduce the issue:

- Launch an i3.4xlarge instance on Amazon AWS using an Ubuntu 14.04.5 AMI ( ami-78d2be01 on EU-WEST-1 ), this will generate an instance with one 8Gb EBS root volume and two 1.9T SSD drives that are presented to the instance using the nvme driver.
- Compose a raid0 array with the following command :

 # mdadm --create --verbose --level=0 /dev/md0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1

- When trying to format the raid0 array ( /dev/md0 ) using xfs it takes around 2 hours to complete. I tried other AMIs like RHEL7, CentOS7 and Ubuntu 18.04 and the time needed was around 2 seconds.

root@ip-172-31-30-133:~# time mkfs.xfs /dev/md0

real 120m45.725s
user 0m0.000s
sys 0m18.248s

- Running fstrim -v on a filesystem mounted on top of /dev/md0 can take around 2 hours to complete. With other AMIs like RHEL7, CentOS7 and Ubuntu 18.04 and the time needed was around 2 seconds.

- When I try the same with any of the nvme SSD devices alone, let's say /dev/nvme0n1, the issue doesn't happen.

- I tried to replicate this issue using LVM and striping, fstrim and mkfs.xfs, the tasks complete without taking hours :

root@ip-172-31-27-69:~# pvcreate /dev/nvme0n1
  Physical volume "/dev/nvme0n1" successfully created

root@ip-172-31-27-69:~# pvcreate /dev/nvme1n1
  Physical volume "/dev/nvme1n1" successfully created

root@ip-172-31-27-69:~# vgcreate raid0 /dev/nvme0n1 /dev/nvme1n1
  Volume group "raid0" successfully created

root@ip-172-31-27-69:~# lvcreate --type striped --stripes 2 --extents 100%FREE raid0 /dev/nvme0n1 /dev/nvme1n1
  Using default stripesize 64.00 KiB.
  Logical volume "lvol0" created.

root@ip-172-31-27-69:~# vgchange -ay
  1 logical volume(s) in volume group "raid0" now active

root@ip-172-31-27-69:~# lvchange -ay /dev/raid0/lvol0

root@ip-172-31-27-69:~# lvs -a /dev/raid0/lvol0
  LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
  lvol0 raid0 -wi-a----- 3.46t
htop
root@ip-172-31-27-69:~# time mkfs.xfs /dev/raid0/lvol0
meta-data=/dev/raid0/lvol0 isize=512 agcount=32, agsize=28991664 blks
         = sectsz=512 attr=2, projid32bit=1
         = crc=1 finobt=1, sparse=0
data = bsize=4096 blocks=927733248, imaxpct=5
         = sunit=16 swidth=32 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=453008, version=2
         = sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0

real 0m2.926s
user 0m0.180s
sys 0m0.000s

root@ip-172-31-27-69:~# mount /dev/raid0/lvol0 /mnt

root@ip-172-31-27-69:~# time fstrim -v /mnt
/mnt: 3.5 TiB (3798138650624 bytes) trimmed

real 0m1.794s
user 0m0.000s
sys 0m0.000s

So the issue only happens when using nvme and md to compose the raid0 array.

Bellow follows some information that may be useful:

started formating the md array with mkfs.xfs. Process looks hanged.

root@ip-172-31-24-66:~# ps aux | grep -i mkfs.xfs
root 1693 12.0 0.0 12728 968 pts/1 D+ 07:54 0:03 mkfs.xfs /dev/md0

PID 1693 is in uninterruptible sleep ( D )

Looking at /proc/7965/stack

root@ip-172-31-24-66:~# cat /proc/1693/stack
[<ffffffff8134d8c2>] blkdev_issue_discard+0x232/0x2a0
[<ffffffff813524bd>] blkdev_ioctl+0x61d/0x7d0
[<ffffffff811ff6f1>] block_ioctl+0x41/0x50
[<ffffffff811d89b3>] do_vfs_ioctl+0x2e3/0x4d0
[<ffffffff811d8c21>] SyS_ioctl+0x81/0xa0
[<ffffffff81748030>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff

Looking at the stack, looks like it's hanged on a discard operation

root@ip-172-31-24-66:~# ps -flp 1693
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 D root 1693 1682 2 80 0 - 3182 blkdev 07:54 pts/1 00:00:03 mkfs.xfs /dev/md0

root@ip-172-31-24-66:~# cat /proc/1693/wchan
blkdev_issue_discard

Process stuck with function --> blkdev_issue_discard

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-aws (Ubuntu):
status: New → Confirmed
Jan-Jonas Sämann (janjonas) wrote :

Happens to me randomly, I think scheduled by the systemd timer. When fstrim is called on the root mount it instant hangs the entire system forever. Mouse and keyboard do respond but anything requiring disk access wont be accessible anmore.

iotop is showing fstrim stuck on 100% IO at 0KB/s read and 0KB/s write.

I have a Kingston SHFS37A SSD, / is ext4.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers