Mount very slow for 22TB ext4 filesystem on Ubuntu Server 12.04.1 running kernel 3.2.0-30-generic

Bug #1054605 reported by bio.x2y
36
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury

Bug Description

Output of command "time sudo mount /dev/sdc1 /mymount":

real 10m46.175s
user 0m0.000s
sys 10m44.488s

Some recent threads on the web indicate that other user(s) have observed this issue in the context of large ext4 systems, and that the issue is specific to kernel 3.2.0-30-generic. Those threads indicate that the issue is resolved if the user reverts to a previous kernel. Unfortunately I'm not in a position to confirm this on our server.

In my case, the relevant file system itself is reported as "clean" by fsck.ext4.

Please let me know if I can provide any diagnostic output of interest.

bio.x2y (bio-x2y)
summary: - Mount very slow for 22TB filesystem on Ubuntu Server 12.04.1 running
- kernel 3.2.0-30-generic
+ Mount very slow for 22TB ext4 filesystem on Ubuntu Server 12.04.1
+ running kernel 3.2.0-30-generic
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1054605/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Logan Rosen (logan)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1054605

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: precise
Revision history for this message
Andriy Golovnya (andriy-golovnya) wrote :

Can confirm this issue.
This bug is really annoying because it blocks my NAS for a long time at boot time...

root@AGVault:~# time umount /data

real 0m1.031s
user 0m0.000s
sys 0m0.028s
root@AGVault:~# time mount /data

real 3m55.614s
user 0m0.000s
sys 3m37.050s

I'll call apport-collect now...

Revision history for this message
Andriy Golovnya (andriy-golovnya) wrote :

Oooops... I can't collect an info with apport-collect because I'm not the reporter. If you still need the info I can open a new big and run apport-collect.

Extra info:

root@AGVault:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
....
/dev/md3 16909250524 8455204556 7595208424 53% /data

root@AGVault:~# uname -a
Linux AGVault 3.2.0-31-generic #50-Ubuntu SMP Fri Sep 7 16:16:45 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
bio.x2y (bio-x2y) wrote :

I understand that apport can collect and transmit potentially sensitive information. I'm afraid this is unacceptable in our environment, so I am unable to run it. Given that this issue is confirmed by another user, I've updated to 'Confirmed'. If this isn't appropriate, perhaps someone else can file another report.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Andriy Golovnya (andriy-golovnya) wrote :

to Ubuntu/linux ppls:

If you still need output of apport-collect I can create a duplicate issue for this purpose.

But behavior is quite clear for me: large ext4 file systems are mounted extremely long time. No disk access (in my case) is visible during mount delay. Most probably it's a linux kernel issue...

Environment: SW RAID6, 8GB RAM, AMD X4 615e CPU, uname and df are in comment #4.

Revision history for this message
Tim Bishop (tdb) wrote :

I'm seeing the same issue on a 13T filesystem. I have two identical machines, but one has been updated more than the other. The one with 3.2.0-29.46 works fine, whilst the one with 3.2.0-30.48 has the long delays when mounting the filesystem.

Happy to provide output of apport-collect if required.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.6 kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages.

Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. Please only remove that one tag and leave the other tags. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.6-rc7-quantal/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: kernel-da-key needs-upstream-testing
Revision history for this message
bio.x2y (bio-x2y) wrote :

I'm not in a position to test an upstream kernel (cannot allocate server downtime in the near future), so I've added the tag 'kernel-unable-to-test-upstream'. Perhaps someone else can test this.

tags: added: kernel-unable-to-test-upstream
Revision history for this message
Andriy Golovnya (andriy-golovnya) wrote :

Sorry, I can't try it too.. At least not now. May be I'll have time this weekend but I'm not sure about this.

Revision history for this message
Tim Bishop (tdb) wrote :

I've tested the latest mainline kernel (v3.6-rc7-quantal, as requested) and it's working fine.

The latest precise kernel (released after my last update) still doesn't work:

linux-image-3.2.0-31-generic 3.2.0-31.50

But this one does:

linux-image-3.6.0-030600rc7-generic 3.6.0-030600rc7.201209232235
linux-image-extra-3.6.0-030600rc7-generic 3.6.0-030600rc7.201209232235

I hope this helps.

tags: added: kernel-fixed-upstream
removed: needs-upstream-testing
tags: removed: kernel-unable-to-test-upstream
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a reverse bisect to figure out which commit upstream fixes this bug. It would be very helpful to know the last kernel that had this issue and the first kernel that did not.

Can you test the following kernels and report back? We are looking for the first kernel version that doesn't have this bug:

v3.3 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.3-precise/
v3.4 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-quantal/
v3.5 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.5-quantal/
v3.6-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.6-rc1-quantal/

You don't have to test every kernel, just up until the kernel that first does not have this bug.

Thanks in advance!

Changed in linux (Ubuntu):
status: Incomplete → Triaged
tags: added: needs-bisect
tags: added: performing-bisect
removed: needs-bisect
Changed in linux (Ubuntu):
status: Triaged → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I did a quick search of upstream and found:

commit 0548bbb85337e532ca2ed697c3e9b227ff2ed4b4
Author: Theodore Ts'o <email address hidden>
Date: Thu Aug 16 11:59:04 2012 -0400

ext4: fix long mount times on very big file systems

This commit was applied in v3.6-rc3.

I don't think there is a need to test the kernels I requested in comment #12. I'll build a test kernel with this commit applied and post it shortly.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

After further investigation, it seems that a fix may already exist in v3.2.29:

commit 4ac2515cf5201e7762c16303d860b6ec0e02aecb
Author: Theodore Ts'o <email address hidden>
Date: Thu Aug 16 11:59:04 2012 -0400

ext4: fix long mount times on very big file systems

Can folks affected by this bug test the v3.2.29[0] kernel? If that kernel fixes the bug, Precise will automatically pickup the fix when the updates for 3.2.29 are pulled in.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2.29-precise/

Revision history for this message
Paul Tessman (insignis-o) wrote :

On Ubuntu Server 12.04.1 LTS, I had this problem with an 18TB drive. Applying mainline 3.2.29 kernel and rebooting fixed me right up; mount happens instantly now. Thanks!

Revision history for this message
Tim Bishop (tdb) wrote :

This appears fixed to me in the latest kernel:

linux-image-3.2.0-32-generic 3.2.0-32.51

Revision history for this message
Andriy Golovnya (andriy-golovnya) wrote :

Same for me - it appears to be fixed in 3.2.0-32-generic.
Thanks ppl!

Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.