Ubuntu

giant IO delays on unmount

Reported by Brian Bloniarz on 2010-05-24
42
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Tim Gardner
Lucid
Medium
Stefan Bader
Maverick
Medium
Tim Gardner

Bug Description

On a fresh lucid install, an unmount of an empty tmpfs mount is quite slow:

# mkdir -p /tmp/test; mount -t tmpfs nodev /tmp/test; time umount /tmp/test
real 0m0.318s
user 0m0.000s
sys 0m0.020s

If there is pending I/O on any other filesystem, it can be even slower:
# dd if=/dev/zero of=/tmp/testfile bs=1024 count=1024
# mkdir -p /tmp/test; mount -t tmpfs nodev /tmp/test; time umount /tmp/test
real 0m8.270s
user 0m0.000s
sys 0m0.010s

This is a regression from previous versions, e.g. on hardy:
# uname -r
2.6.24-19-generic
# mkdir -p /tmp/test; mount -t tmpfs nodev /tmp/test; time umount /tmp/test
real 0m0.001s
user 0m0.000s
sys 0m0.000s

The root cause is the workaround patch which fixed bug https://bugs.launchpad.net/ubuntu/+source/linux/+bug/543617. It syncs all filesystems when umount is called, not just the involved filesystem, and applies to unmount of non-ext4 FSs. I've verified that the problem goes away with that patch reverted.
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 2819 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'NVidia'/'HDA NVidia at 0xfe024000 irq 22'
   Mixer name : 'Realtek ALC882'
   Components : 'HDA:10ec0882,10eca201,00100101'
   Controls : 39
   Simple ctrls : 21
DistroRelease: Ubuntu 10.04
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
LiveMediaBuild: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100429)
Lsusb:
 Bus 002 Device 007: ID 413c:2106 Dell Computer Corp.
 Bus 002 Device 006: ID 413c:3200 Dell Computer Corp. Mouse
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Shuttle Inc SN27V10
Package: linux (not installed)
ProcCmdLine: BOOT_IMAGE=/casper/vmlinuz file=/cdrom/preseed/hostname.seed boot=casper initrd=/casper/initrd.lz quiet splash -- maybe-ubiquity
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-21.32-generic 2.6.32.11+drm33.2
Regression: Yes
RelatedPackageVersions: linux-firmware 1.34
Reproducible: Yes
RfKill:

Tags: lucid filesystem regression-release needs-upstream-testing
Uname: Linux 2.6.32-21-generic x86_64
UserGroups:

dmi.bios.date: 06/27/2007
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: 6.00 PG
dmi.board.name: FN27V10
dmi.board.vendor: Shuttle Inc
dmi.chassis.type: 3
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvr6.00PG:bd06/27/2007:svnShuttleInc:pnSN27V10:pvr:rvnShuttleInc:rnFN27V10:rvr:cvn:ct3:cvr:
dmi.product.name: SN27V10
dmi.sys.vendor: Shuttle Inc

Jeremy Foshee (jeremyfoshee) wrote :

Hi Brian,

Please be sure to confirm this issue exists with the latest development release of Ubuntu. ISO CD images are available from http://cdimage.ubuntu.com/daily-live/current/ . If the issue remains, please run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 585092

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete

apport information

tags: added: apport-collected
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

The slowdown is still present in the daily livecd as of today (http://cdimage.ubuntu.com/daily-live/current/lucid-desktop-amd64.iso). I've attached apport-collect info from that system.

Brian Bloniarz (brianbloniarz) wrote :

I took a crack at backporting the patches from the upstream buglink.

I cherrypicked the three patches I mentioned above, and reverted http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-lucid.git;a=commit;h=5e1941884c700b7b97bcad52c2d8212ac56a7ebc (the workaround from 543617). The backports are attached: 2 of the cherrypicks were not clean and I'm not exactly an FS expert, but it works for me.

Changed in linux (Ubuntu):
status: Incomplete → Triaged
importance: Undecided → Medium
tags: added: kernel-fs kernel-needs-review patch
removed: needs-kernel-logs
Andy Whitcroft (apw) on 2010-05-26
tags: added: kernel-candidate kernel-reviewed
removed: kernel-needs-review
Andy Whitcroft (apw) wrote :

The patches referred to above have just been reverted in upstream as they seem to introduce regressions including hangs.

Andy Whitcroft (apw) on 2010-06-07
tags: removed: kernel-candidate
Brian Bloniarz (brianbloniarz) wrote :

Here's the discussion that let up to it being reverted: http://kerneltrap.org/mailarchive/linux-fsdevel/2010/5/26/6885091

Brian Bloniarz (brianbloniarz) wrote :

The upstream devs have a second attempt at fixing the problem, which supposedly fixes the hangs: http://lkml.org/lkml/2010/6/10/175

It hasn't been merged for 2.6.35 yet as of 2010-06-15.

Kees Cook (kees) wrote :

Ran head-long into this on maverick's -6 kernel. dpkg just hung in the middle of writes. I think we need to get http://lkml.org/lkml/2010/6/10/175 in ASAP.

Kees Cook (kees) on 2010-06-25
summary: - tmpfs umount slowdown
+ giant IO delays

Kees, Brian,

I saw those going mainline now. To have them in Lucid, there are 4 more patches required. Luckily only one of those looks to have a somewhat bigger impact. But its still a bunch of 12 patches altogether. I made some Lucid kernels with those (I tricked a bit because normally those are ABI bumpers, hopefully that does not prevent tests). It would be good to have that well tested before proceeding. Thanks.

Stefan Bader (smb) wrote :
Stefan Bader (smb) wrote :

I saw Brian's mail which may require more work on that. Meanwhile I made a v2 series of test kernel because v1 still had the work around in there which caused some of this. While waiting for answers from Christoph, maybe those may or may not work...

Brian Bloniarz (brianbloniarz) wrote :

Yeah, an additional patch is needed. Christoph had coded up fixes which had been scheduled for 2.6.36, and there is some talk about just taking those for 2.6.35. Discussion is ongoing and Linus hasn't merged anything yet; the patches in question are http://lkml.org/lkml/2010/7/6/31.

Stefan, I had luck with your v2 kernel, thanks for building it! I'll update the bug if I notice anything happening upstream.

Stefan Bader (smb) wrote :

And so upstream gave us three more patches which have been put into v4 (plus one little fix up that was needed in NFS). I placed them to the usual place. So if you (Kees, Brian) can play around with those and see whether things seem better that way...

Brian Bloniarz (brianbloniarz) wrote :

v4 works great for me so far!

I tested the v4 and it works perfectly. While the 2.6.32-23-generic was taking more than 12 seconds to accomplish an umount loop, the v4 takes less than 100ms. Great ! Thanks.

Please read "2.6.32-22-generic" in comment #34.

Nathan Grennan (ngrennan) wrote :

  A new kernel update for Lucid has been released, 2.6.32-24.38. I have servers using, 2.6.32-23.38~lp585092v4. Are the v4 patches in 2.6.32-24.38, or do we need a 2.6.32-24.38~lp585092v4?

Stefan Bader (smb) wrote :

Added test kernels for the latest updates. The series of patches is not included automatically, yet as I want to get some upstream feedback on the backport and potentially submit that to upstream stable.

Tim Gardner (timg-tpi) wrote :

You can also try an LTS backported 2.6.35 kernel from http://ppa.launchpad.net/kernel-ppa/ppa/ubuntu

Changed in linux (Ubuntu):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
status: Triaged → In Progress
Olivier Gagnon (procule) wrote :

I've been dealing with this IO problem since a LONG time (months).

I've tried kernel 2.6.32-24.39~spcbug543617a and 2.6.32-23.38~lp585092v4. The problem was still there.
I then installed the 2.6.35 kernel that Tim Gardner in #38 shared and now it at last works normally !!!!

Everytime I had IO on the harddrive, it was slowing/freezing my computer. Doing an apt-get upgrade was almost non possible with too much packages. Now with 2.6.35 I see almost no IO wait. I even compiled a version of Firefox for 20 minutes without any lagging, something I could not even think to do before.

Thanks !!

Stefan Bader (smb) on 2010-08-04
Changed in linux (Ubuntu Lucid):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
importance: Undecided → Medium
status: New → In Progress
Changed in linux (Ubuntu):
assignee: Stefan Bader (stefan-bader-canonical) → Tim Gardner (timg-tpi)
status: In Progress → Triaged
Stefan Bader (smb) wrote :

Glad the Maverick kernel helps in your case. Though this is not the same problems as the initially reported one which was a long wait time on unmount. AFAIK the writeback code in 2.6.32 had another issue where an active writer might starve other IO under certain conditions. But this should probably get tracked in a separate bug. And I guess this bug should get renamed to address the issue more specifically.

summary: - giant IO delays
+ giant IO delays on unmount
Tim Gardner (timg-tpi) on 2010-08-04
Changed in linux (Ubuntu Maverick):
status: Triaged → Fix Released
Olivier Gagnon (procule) wrote :

@Stephan Bader: from my research, the *two* bugs are related to the cleaning of the dirty inodes. Umounting a ext4 filesystem with dirty inodes will generate a lot of IO since the journal is mixed up. The same dirty inodes problem exist when there is a lot of IO since jdb2 tries to keep the journal up to date while there are errors. Umounting is just the same process but done in a single operation for the whole filesystem.

The problem with unmount which the patches address and about which the report
here was (as far as I understood it) was not so much the number of IO generated
(though it makes things worse) but the fact that the sync on unmount had changed
from being asynchronous to synchronous. That means every single block pending to
be flushed is written and waited for. So no merging of IO is and can be done
which makes things really slow.

The description of the other patch I was seeing described a problem in
writeback, where it could happen that flush tried to write out buffers while the
producer would add more data and by that would prevent other things to get written.

The problem you describe seems to be even a third thing which rather sounds like
a problem in ext4/jdb2 than the generic writeback code (at least from reading
the description).

Stefan Bader (smb) wrote :

Backported writeback changes now in the repo. Until proposed is ready to upload the new kernel, it is only availble int the pre-proposed PPA (https://launchpad.net/~kernel-ppa/+archive/pre-proposed)

Changed in linux (Ubuntu Lucid):
status: In Progress → Fix Committed
Steve Conklin (sconklin) wrote :

SRU Justification

Impact: The upstream process for stable tree updates is quite similar in scope to the Ubuntu SRU process, e.g., each patch has to demonstrably fix a bug, and each patch is vetted by upstream by originating either directly from Linus' tree or in a minimally backported form of that patch.

The 2.6.32.21 upstream stable patch set is now available and contains fixes for this problem. It should be included in the Ubuntu kernel as well.

Related commits in the Lucid repo are:

a192aa1 (pre-stable) Fix compiling NFS when backporting writeback
d85d9ae (pre-stable) writeback: simplify the write back thread queue
aae03dc (pre-stable) writeback: split writeback_inodes_wb
1c12fa6 (pre-stable) writeback: remove writeback_inodes_wbc
06ff716 (pre-stable) writeback: fix pin_sb_for_writeback
43fa64c (pre-stable) writeback: add missing requeue_io in writeback_inodes_wb
4d214a5 (pre-stable) writeback: simplify and split bdi_start_writeback
4be0607 (pre-stable) writeback: simplify wakeup_flusher_threads
899d625 (pre-stable) writeback: fix writeback_inodes_wb from writeback_inodes_sb
683e9dd (pre-stable) writeback: enforce s_umount locking in writeback_inodes_sb
b3bf7c2 (pre-stable) writeback: queue work on stack in writeback_inodes_sb
5727b1b (pre-stable) writeback: fix writeback completion notifications
fb743c6 (pre-stable) writeback: add missing kernel-doc notation
08e2c5a (pre-stable) vfs: improve writeback_inodes_wb()
47dd2f7 (pre-stable) writeback: remove unused nonblocking and congestion checks
72201eb (pre-stable) writeback: remove the always false bdi_cap_writeback_dirty() test

Changed in linux (Ubuntu Lucid):
milestone: none → lucid-updates

Accepted linux into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed

17:26
$ grep Dirty /proc/meminfo
Dirty: 464116 kB
17:27
$ grep Dirty /proc/meminfo
Dirty: 518628 kB
17:28
$ grep Dirty /proc/meminfo
Dirty: 705216 kB

gets to around 1124256 then levels off

$ uname -a
Linux zirconium 2.6.32-25-generic #43-Ubuntu SMP Wed Sep 1 09:46:13 UTC 2010 x86_64 GNU/Linux

running dpkg and rdiff-backup in restore mode simultaneously

File system is xfs not ext4. rdiff-backup finished. Dirty keeps gradually dropping but strangely free space is increasing not dropping. I'd expect as data gets flushed out it would take up more disk space not less.

The
INFO: task dpkg:11727 blocked for more than 120 seconds.
messages stopped but dpkg is still stuck with 0:01 sec of CPU time.

The attached file shows things back when rdiff-backup was running.

dpkg resumed and finished up as if nothing strange had happened.

Steve Conklin (sconklin) wrote :

Could someone (the original reporter?) please test this on the proposed kernel and report on whether it is resolved? See comment #45 for information about how to enable the proposed kernel.

Kees Cook (kees) wrote :

Isn't this bug just a duplicate of bug 543617 which has also been fixed and verified?

Stefan Bader (smb) wrote :

Yes, both reports are at least closely related or even duplicates. Based on the feedback in the other bug report I would therefor go ahead and mark this verified, too.

tags: added: verification-done
removed: verification-needed
Roman Shtylman (shtylman) wrote :

I tried the new proposed kernel and the problem no longer exists for me.

Launchpad Janitor (janitor) wrote :
Download full text (43.7 KiB)

This bug was fixed in the package linux - 2.6.32-25.44

---------------
linux (2.6.32-25.44) lucid-proposed; urgency=low

  [ Brad Figg ]

  * SAUCE: (no-up) Modularize vesafb -- fix initialization
    - LP: #611471

  [ Stefan Bader ]

  * Revert "SAUCE: sync before umount to reduce time taken by ext4 umount"
    - LP: #543617, #585092

  [ Steve Conklin ]

  * Revert "SAUCE: tulip: Let dmfe handle davicom on non-sparc"
    - LP: #607824

  [ Tim Gardner ]

  * [Config] Added ums-cypress to udeb
    - LP: #576066

  [ Upstream Kernel Changes ]

  * Revert "PCI quirk: Disable MSI on VIA K8T890 systems"
    - LP: #607824
  * Revert "PCI quirks: disable msi on AMD rs4xx internal gfx bridges"
    - LP: #607824
  * Revert "(pre-stable) Input: psmouse - reset all types of mice before
    reconnecting"
    - LP: #607824
  * Revert "jbd: jbd-debug and jbd2-debug should be writable"
    - LP: #607824
  * Revert "ext4: Make fsync sync new parent directories in no-journal
    mode"
    - LP: #615548
  * Revert "ext4: Fix compat EXT4_IOC_ADD_GROUP"
    - LP: #615548
  * Revert "ext4: Conditionally define compat ioctl numbers"
    - LP: #615548
  * Revert "ext4: restart ext4_ext_remove_space() after transaction
    restart"
    - LP: #615548
  * Revert "ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted"
    - LP: #615548
  * Revert "ext4: Avoid crashing on NULL ptr dereference on a filesystem
    error"
    - LP: #615548
  * Revert "ext4: Use bitops to read/modify i_flags in struct
    ext4_inode_info"
    - LP: #615548
  * Revert "ext4: Show journal_checksum option"
    - LP: #615548
  * Revert "ext4: check for a good block group before loading buddy pages"
    - LP: #615548
  * Revert "ext4: Prevent creation of files larger than RLIMIT_FSIZE using
    fallocate"
    - LP: #615548
  * Revert "ext4: Remove extraneous newlines in ext4_msg() calls"
    - LP: #615548
  * Revert "ext4: init statistics after journal recovery"
    - LP: #615548
  * Revert "ext4: clean up inode bitmaps manipulation in ext4_free_inode"
    - LP: #615548
  * Revert "ext4: Do not zero out uninitialized extents beyond i_size"
    - LP: #615548
  * Revert "ext4: don't scan/accumulate more pages than mballoc will
    allocate"
    - LP: #615548
  * Revert "ext4: stop issuing discards if not supported by device"
    - LP: #615548
  * Revert "ext4: check s_log_groups_per_flex in online resize code"
    - LP: #615548
  * Revert "ext4: fix quota accounting in case of fallocate"
    - LP: #615548
  * Revert "ext4: allow defrag (EXT4_IOC_MOVE_EXT) in 32bit compat mode"
    - LP: #615548
  * Revert "ext4: rename ext4_mb_release_desc() to ext4_mb_unload_buddy()"
    - LP: #615548
  * Revert "ext4: Remove unnecessary call to ext4_get_group_desc() in
    mballoc"
    - LP: #615548
  * Revert "ext4: fix memory leaks in error path handling of
    ext4_ext_zeroout()"
    - LP: #615548
  * Revert "ext4: check missed return value in ext4_sync_file()"
    - LP: #615548
  * Revert "ext4: Issue the discard operation *before* releasing the blocks
    to be reused"
    - LP: #615548
  * Revert "ext4: Fix buffer head leaks after calls to
    ext4_get_inode_loc()"
    - LP: #615548
  * Revert "ex...

Changed in linux (Ubuntu Lucid):
status: Fix Committed → Fix Released

Hello all,

The fix released to address this bug's issue unfortunately leads to problem for those who use the Lustre filesystem (see http://www.lustre.org):
 - kernel hangs when attempting to mount a Lustre filesystem with the wrong "NID" (which fortunately should not happen in normal operating conditions)
 - *huge* amount of "warn_slowpath" messages in the kernel logs as soon as files are created or stat-ed on the mounted Lustre filesystem(s), resulting in degraded performances (~30%) and very fast log clogging (kern.log file growing to 1.5GB during a single bonnie++ test of 30 minutes); see attached file for /var/log/kern.log exerpts

In order to fix those issues, we had to revert the commits mentioned in comment 44 ; more precisely:
# git clone git://kernel.ubuntu.com/ubuntu/ubuntu-lucid.git
# cd ubuntu-lucid/
# git diff -R fb743c6...a192aa1 > 2.6.32-28-writeback-revert.patch
(and apply the patch to the Ubuntu/Lucid Linux kernel source package before building new kernel packages)

I hope this may help those unfortunates who stumble on the same problem.

Cheers

To post a comment you must log in.