Ubuntu
linux package

unmounting NTFS causes mount.ntfs process to get stuck in eternal kernel deep sleep

Bug #1398859 reported by Martin Pitt on 2014-12-03

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Confirmed	High	Unassigned
	udisks2 (Ubuntu)	Fix Released	High	Martin Pitt

Bug Description

Since Mid-November the udisks2 test (https://jenkins.qa.ubuntu.com/job/vivid-adt-udisks2/? shows it happened between Nov 13 and 19) exhibits a regression with NTFS: Unmounting causes the ntfs-3g process to go into kernel deep sleep and umount hangs forever.

This can be reproduced with

  sudo apt install ntfs-3g
  sudo modprobe scsi_debug
  # check dmesg which drive this created; e. g. /dev/sda
  sudo mkntfs -F /dev/sda
  sudo mount /dev/sda /mnt

Now you have a process which is happy:
root 12821 0.0 0.0 15272 1948 ? Ss 15:49 0:00 /sbin/mount.ntfs /dev/sda /mnt -o rw

You can work with /mnt, edit files, etc. But now unmount:

sudo umount /mnt

This never returns, and the process goes into uninterruptible kernel sleep ("D"):

root 12928 0.0 0.0 15272 1948 ? Ds 15:55 0:00 /sbin/mount.ntfs /dev/sda /mnt -o rw

The strace for the umount until the hang is

ubuntu@autopkgtest:~$ sudo strace -vvtts1024 -p 12928
Process 12928 attached
15:56:17.938299 read(4, "0\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0", 135168) = 48
15:56:22.338931 read(4, "(\0\0\0&\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 135168) = 40
15:56:22.339637 sendto(5, "<29>Dec 3 15:56:22 ntfs-3g[12928]: Unmounting /dev/sda ()\n", 59, MSG_NOSIGNAL, NULL, 0) = 59
15:56:22.340509 fsync(3) = 0
15:56:22.341188 fcntl(3, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
15:56:22.341759 close(3

ProblemType: Bug
DistroRelease: Ubuntu 15.04
Package: ntfs-3g 1:2013.1.13AR.1-2ubuntu3
ProcVersionSignature: Ubuntu 3.16.0-25.33-generic 3.16.7
Uname: Linux 3.16.0-25-generic x86_64
ApportVersion: 2.15-0ubuntu1
Architecture: amd64
CurrentDesktop: Unity
Date: Wed Dec 3 15:48:12 2014
EcryptfsInUse: Yes
InstallationDate: Installed on 2014-11-20 (13 days ago)
InstallationMedia: Ubuntu 15.04 "Vivid Vervet" - Alpha amd64 (20141119)
SourcePackage: ntfs-3g
UpgradeStatus: No upgrade log present (probably fresh install)

Tags:

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-03:

Dependencies.txt Edit (2.2 KiB, text/plain; charset="utf-8")
ProcEnviron.txt Edit (103 bytes, text/plain; charset="utf-8")

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-03:

- ntfs-3g itself did not change since utopic.
- The hang happens with utopic's 3.16.0-23-generic and the previous 3.16.0-24-generic kernel too, so not a kernel regression.
- The hang happens with libc{6,-bin} 2.19-10ubuntu2, so not a glibc regression
- The hang happens with systemd+udev 215-5ubuntu2 (which we had from Nov 03 on), so not an udev regression either.

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-03:

It works again with udev from utopic (208-8ubuntu8.1). The first vivid version (215-5ubuntu1) hangs. Curiously the current libudev1 from vivid works fine.

affects:	ntfs-3g (Ubuntu) → systemd (Ubuntu)
Changed in systemd (Ubuntu):
assignee:	nobody → Martin Pitt (pitti)
status:	New → Triaged
summary:	- [regression] unmounting NTFS causes mount.ntfs process to get stuck in - eternal kernel deep sleep + [udev regression] unmounting NTFS causes mount.ntfs process to get stuck + in eternal kernel deep sleep
Changed in systemd (Ubuntu):
importance:	Undecided → High

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-04: Re: [udev regression] unmounting NTFS causes mount.ntfs process to get stuck in eternal kernel deep sleep

It's enough to downgrade udev before the "umount /mnt", everything else works with 215. Conversely, mkfs/mount with 208, upgrading to udev 215, and umount then hangs. stracing umount /mnt indeed hangs forever in the umount() call.

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-04:

This works when stopping udevd. When it's running, a worker gets stuck:

root 25350 0.0 0.2 41212 2060 ? Ds 11:14 0:00 /lib/systemd/systemd-udevd --daemon

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-04:

Bisected to http://cgit.freedesktop.org/systemd/systemd/commit/?id=e9fc29f4ecc9509cc

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-04:

Correction, previous bisect apparently was unclean. I clean/rebuild everything now, and it's pointing to http://cgit.freedesktop.org/systemd/systemd/commit/?id=02ba8fb3357daf5 . Reverting this indeed fixes the problem.

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-05:

Curiously enough, this does not happen on Debian sid nor Fedora, even though I can reproduce it just fine with udevd built straight out of upstream master on ubuntu. So it seems something else gets in the way then (different kernel?)

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-05:

bisecting script Edit (1.3 KiB, text/x-sh)

Just in case I need it again, this is the git bisect run script that I used.

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-17:

#10

This works fine on Debian and Fedora, with the same udev upstream version. I verified that the hang still happens with Debian's kernel or Debian's udev+libudev1 packages. If I install just the systemd binary package from Debian, the hang goes away. More specifically, installing Debian's /lib/systemd/systemd as /lib/systemd/systemd-debian and booting that (with init) also makes the bug go away. So something in our Ubuntu patches causes this.

Martin Pitt (pitti) on 2014-12-17

summary:

- [udev regression] unmounting NTFS causes mount.ntfs process to get stuck
- in eternal kernel deep sleep
+ unmounting NTFS causes mount.ntfs process to get stuck in eternal kernel
+ deep sleep

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-17:

#11

So this is interesting. Debian's /lib/systemd/systemd works, but the hang happens under upstart too (where that binary isn't being used!). So it must be something which we change in Ubuntu's systemd, but conceptually also do for upstart/cgmanager. The first candidate that I see is to put processes into all cgroups for session LXC, I'll verify that.

Martin Pitt (pitti) on 2014-12-17

Changed in systemd (Ubuntu):
milestone:	none → ubuntu-15.01
status:	Triaged → In Progress

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-17:

#12

Indeed reverting core-Put-session-scopes-into-all-cgroup-controllers.patch fixes the hang. So putting processes into all cgroup controllers somehow interacts badly with ntfs-3g, and reverting the above udev commit somehow fixes that.

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-17:

#13

testntfs fake source package for autopkgtest Edit (10.0 KiB, application/x-tar)

For completeness, this is the testntfs fake source package containing an autopkgtest for this bug. That's what I used for bisecting.

Changed in systemd (Ubuntu):
status:	In Progress → Fix Committed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2014-12-18:

#14

This bug was fixed in the package systemd - 218-2ubuntu3

---------------
systemd (218-2ubuntu3) vivid; urgency=medium

  * core-Put-session-scopes-into-all-cgroup-controllers.patch: Revert upstream
    commit 02ba8fb3357d to fix eternal kernel deep sleep hangs when unmounting
    an ntfs-3g partition. (LP: #1398859)
-- Martin Pitt <email address hidden> Thu, 18 Dec 2014 13:32:49 +0100

Changed in systemd (Ubuntu):
status:	Fix Committed → Fix Released

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-29:

#15

This caused bug 1404143 so I reverted this change.

Changed in systemd (Ubuntu):
status:	Fix Released → In Progress

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-29:

#16

It is enough to move the shell process (under which umount runs) out of the logind cgroup user.slice/user-1000.slice/session-1.scope into the / cgroup in the cpu,cpuacct controller:

echo $$ | sudo tee /sys/fs/cgroup/cpu,cpuacct/tasks

All other cgroup controllers work fine.

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-30:

#17

I'll fix this in udisks2's test suite instead.

affects:

systemd (Ubuntu) → udisks2 (Ubuntu)

Martin Pitt (pitti) on 2014-12-30

Changed in udisks2 (Ubuntu):
status:	In Progress → Fix Committed

Revision history for this message

Phillip Susi (psusi) wrote on 2014-12-30:

#18

Isn't there still an underlying kernel bug that should be fixed here?

Revision history for this message

Martin Pitt (pitti) wrote on 2014-12-30:

#19

Quite possibly,but at this point I don't quite understand this any more, at least not yet with the level of debugging that I did so far. :-(

Revision history for this message

Phillip Susi (psusi) wrote on 2014-12-31:

#20

A stack trace of the blocked process might help, maybe magic-sysrq-w when it happens?

Martin Pitt (pitti) on 2015-01-01

Changed in udisks2 (Ubuntu):
status:	Fix Committed → In Progress

Revision history for this message

Martin Pitt (pitti) wrote on 2015-01-02:

#21

udevd trace for umount Edit (5.2 KiB, text/plain)

strace of udevd while it's hanging:

10134 open("/dev/sda", O_RDONLY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 11
10134 flock(11, LOCK_EX|LOCK_NB) = 0
10134 ioctl(11, BLKRRPART, 0

Corresponding kernel trace:

[ 720.228114] INFO: task systemd-udevd:10134 blocked for more than 120 seconds.
[ 720.230656] Not tainted 3.16.0-28-generic #38-Ubuntu
[ 720.232666] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 720.236283] systemd-udevd D ffff88007fc13840 0 10134 1 0x00000007
[ 720.236293] ffff88007b76fcd0 0000000000000086 ffff880037641e90 0000000000013840
[ 720.236298] ffff88007b76ffd8 0000000000013840 ffff880037641e90 ffff880037641e90
[ 720.236302] ffff8800371c5068 ffff8800371c5080 ffff88007be723c0 0000000000000000
[ 720.236306] Call Trace:
[ 720.236320] [<ffffffff81785049>] schedule+0x29/0x70
[ 720.236326] [<ffffffff817888b5>] rwsem_down_read_failed+0xf5/0x190
[ 720.236334] [<ffffffff811fc743>] ? find_inode+0xa3/0xb0
[ 720.236341] [<ffffffff813a9284>] call_rwsem_down_read_failed+0x14/0x30
[ 720.236346] [<ffffffff817881e0>] ? down_read+0x20/0x30
[ 720.236351] [<ffffffff811e3db7>] get_super+0x97/0xe0
[ 720.236357] [<ffffffff812197a4>] fsync_bdev+0x14/0x60
[ 720.236363] [<ffffffff8137b604>] invalidate_partition+0x24/0x50
[ 720.236369] [<ffffffff8137e3cb>] drop_partitions.isra.12+0x1b/0x40
[ 720.236373] [<ffffffff8137e88e>] rescan_partitions+0x3e/0x2a0
[ 720.236377] [<ffffffff8137a4dc>] blkdev_ioctl+0x71c/0x7c0
[ 720.236382] [<ffffffff81218e3d>] block_ioctl+0x3d/0x50
[ 720.236386] [<ffffffff811f5138>] do_vfs_ioctl+0x2c8/0x4a0
[ 720.236393] [<ffffffff81082f30>] ? ptrace_notify+0x60/0x110
[ 720.236397] [<ffffffff811f5391>] SyS_ioctl+0x81/0xa0
[ 720.236402] [<ffffffff8178a0ff>] tracesys+0xe1/0xe6

Martin Pitt (pitti) on 2015-01-02

Changed in udisks2 (Ubuntu):
status:	In Progress → Fix Committed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-01-02:

#22

This bug was fixed in the package udisks2 - 2.1.4-1git2

---------------
udisks2 (2.1.4-1git2) vivid; urgency=medium

* debian/tests/upstream-system: Work around eternal udev BLKRRPART ioctl
hang with ntfs-3g (LP: #1398859)
-- Martin Pitt <email address hidden> Fri, 02 Jan 2015 17:20:05 +0100

Changed in udisks2 (Ubuntu):
status:	Fix Committed → Fix Released

Revision history for this message

Brad Figg (brad-figg) wrote on 2015-01-05: Missing required logs.

#23

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1398859

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Phillip Susi (psusi) on 2015-01-05

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed
tags:	added: bot-stop-nagging

Alberto Salvia Novella (es20490446e) on 2015-01-05

Changed in linux (Ubuntu):
importance:	Undecided → High

Revision history for this message

Martin Pitt (pitti) wrote on 2016-02-29:

#24

The kernel issue still happens (see http://autopkgtest.ubuntu.com/packages/u/udisks2/xenial/amd64/), so the workaround needs to come back.

Changed in udisks2 (Ubuntu):
milestone:	ubuntu-15.01 → ubuntu-16.03
status:	Fix Released → In Progress

Revision history for this message

Martin Pitt (pitti) wrote on 2016-02-29:

#25

The workaround doesn't work any more, and the test does not actually depend on ntfs-3g any more, so this must rather be fixed in our base images.

Changed in udisks2 (Ubuntu):
milestone:	ubuntu-16.03 → none
status:	In Progress → Fix Released

Revision history for this message

Martin Pitt (pitti) wrote on 2016-11-25:

#26

I also see this with exfat, so it rather seems to be related to fuse (as both ntfs-3g and exfat use fuse). Curiously I do not see this when running the tests on my laptop or on the Scalingstack cloud infra, just when I run them in QEMU.

Revision history for this message

Martin Pitt (pitti) wrote on 2016-11-25:

#27

Download full text (3.7 KiB)

> Curiously I do not see this when running the tests on my laptop

And of course in the minute I write this it does happen again (I ran the test maybe 20 times today, and only just now it happened):

[11359.569157] INFO: task systemd-udevd:341 blocked for more than 120 seconds.
[11359.569163] Not tainted 4.8.0-29-generic #31~lp1626436ProposedWithTwoPatches
[11359.569165] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11359.569167] systemd-udevd D ffffa31788e77ba8 0 341 1 0x00000100
[11359.569173] ffffa31788e77ba8 00ffffff84c71a68 ffffa31499670000 ffffa31788e82ac0
[11359.569177] 0000000088e77db0 ffffa31788e78000 ffffa317893bc070 ffffa317893bc088
[11359.569179] ffffa31788e77be0 ffffa3149cf771c0 ffffa31788e77bc0 ffffffff85296905
[11359.569182] Call Trace:
[11359.569190] [<ffffffff85296905>] schedule+0x35/0x80
[11359.569193] [<ffffffff85299543>] rwsem_down_read_failed+0x103/0x160
[11359.569197] [<ffffffff84e3eb98>] call_rwsem_down_read_failed+0x18/0x30
[11359.569200] [<ffffffff85298da0>] down_read+0x20/0x40
[11359.569202] [<ffffffff84c35e15>] get_super.part.7+0x95/0xd0
[11359.569204] [<ffffffff84c35e76>] get_super+0x26/0x30
[11359.569208] [<ffffffff84c70f98>] fsync_bdev+0x18/0x60
[11359.569211] [<ffffffff84e0f064>] invalidate_partition+0x24/0x50
[11359.569213] [<ffffffff84e12243>] rescan_partitions+0x53/0x2b0
[11359.569216] [<ffffffff84d728d8>] ? security_capable+0x48/0x60
[11359.569218] [<ffffffff84e0d0d5>] __blkdev_reread_part+0x65/0x70
[11359.569220] [<ffffffff84e0d103>] blkdev_reread_part+0x23/0x40
[11359.569222] [<ffffffff84e0db25>] blkdev_ioctl+0x425/0x910
[11359.569225] [<ffffffff84b43907>] ? __seccomp_filter+0x67/0x260
[11359.569228] [<ffffffff84c7090d>] block_ioctl+0x3d/0x50
[11359.569231] [<ffffffff84c47883>] do_vfs_ioctl+0xa3/0x610
[11359.569233] [<ffffffff84b441b3>] ? __secure_computing+0x43/0xe0
[11359.569236] [<ffffffff84a033ae>] ? syscall_trace_enter+0xce/0x2f0
[11359.569239] [<ffffffff84c47e69>] SyS_ioctl+0x79/0x90
[11359.569241] [<ffffffff84a03b9e>] do_syscall_64+0x5e/0xc0
[11359.569244] [<ffffffff8529b165>] entry_SYSCALL64_slow_path+0x25/0x25
[11359.569320] INFO: task mount.ntfs:15670 blocked for more than 120 seconds.
[11359.569323] Not tainted 4.8.0-29-generic #31~lp1626436ProposedWithTwoPatches
[11359.569324] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11359.569325] mount.ntfs D ffffa314d0d1bd68 0 15670 5268 0x00000000
[11359.569329] ffffa314d0d1bd68 0000000100000003 ffffa3178bef9c80 ffffa31499670000
[11359.569332] 000000003d862ecf ffffa314d0d1c000 ffffa3149cf771dc ffffa31499670000
[11359.569334] 00000000ffffffff ffffa3149cf771e0 ffffa314d0d1bd80 ffffffff85296905
[11359.569337] Call Trace:
[11359.569340] [<ffffffff85296905>] schedule+0x35/0x80
[11359.569342] [<ffffffff85296bae>] schedule_preempt_disabled+0xe/0x10
[11359.569344] [<ffffffff852987e9>] __mutex_lock_slowpath+0xb9/0x130
[11359.569347] [<ffffffff8529887f>] mutex_lock+0x1f/0x30
[11359.569350] [<ffffffff84c71440>] blkdev_put+0x20/0x110
[11359.569352] [<ffffffff84c71564>] blkdev_close+0x34/0x70
[11359.569354] [<ffffffff84c34f77>] __fput+0xe7/...

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

unmounting NTFS causes mount.ntfs process to get stuck in eternal kernel deep sleep

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package