Filesystem start runs fsck on ext4 filesystem

Bug #544051 reported by Rainer Schöpf
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cluster-agents (Ubuntu)
Fix Released
Undecided
Unassigned
Maverick
Won't Fix
Medium
Unassigned
heartbeat (Ubuntu)
Invalid
Undecided
Unassigned
Maverick
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: heartbeat-common

Package heartbeat-common 2.99.2+sles11r9-5ubuntu1 in karmic contains a critical bug:

The script

  /usr/lib/ocf/resource.d/heartbeat/Filesystem

contains a list of filesystem types for which fsck is not run. This list does not include ext4, although it does include ext3.

=======
SRU Justification

IMPACT:

This bug affects users running DRBD clusters with ext4 filesystem on top. Every time that a cluster node tries to perform a takeover on a DRBD resource with ext4 filesystem, it will run fsck, which can cause the takeover to take much longer than expected. In Clustered scenarios, this might affect the availability. For this reason, some filesystems avoid this step, including ext3. Unfortunately, ext4 is not yet listed in the version for Maverick.

REPRODUCE:

1. Install a two node DRBD cluster with Pacemaker. Format the DRBD block device with ext4.
2. Once they are configured and running in Master/Slave, perform a hard takeover by switching off the Master node.
3. In the Slave node, check syslog, where the following output will appear:

lrmd: [2623]: info: RA output: (res_fs:start:stdout) fsck from util-linux-ng 2.17.2

HOW FIXED:

The fix is simple, and consists on adding ext4|ext4dev to the list of filesystems for which to avoid performing fsck. This has been fixed upstream (and in Natty) and it is the same approach.

PATCH:

Attached. Uploaded to maverick-proposed for review there.

REGRESSION POTENTIAL:

Minimal. I've tested this thoroughly.

=======

Revision history for this message
krims0n (krims0n32) wrote :

Any updates on this bug ?

Revision history for this message
Rainer Schöpf (rainer-schoepf) wrote :

No updates, still present in package cluster-agents on maverick.

I've just reported it as bug 760596.

Changed in heartbeat (Ubuntu):
status: New → Invalid
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi there,

Thank you for taking the time to report bugs and trying to make Ubuntu Server better.

Could you please provide is with a test case to be able to reproduce this exactly. (more precisely, we would need to see after how many mounts the fsck will be started and how to validate and stuff, to be able to produce a complete bug report and SRU the fix).

Changed in cluster-agents (Ubuntu):
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Rainer Schöpf (rainer-schoepf) wrote :

Hi,

Test case: set up a cluster drbd, create an ext4 cluster filesystem on a drbd device. Perform a hard takeover by switching off the active node.

The fsck command is run every time when the Filesystem is started, e.g. even when a takeover to another cluster node occurs. This can take a long time and must not happen.

In particular, the comment in lines 424/425 of the script reads

       # NOTE: Some filesystem types don't need this step... Please modify
        # accordingly

It is line 435 that needs to be changed to include ext4 (and eventually other journaling filesystems like btrfs)

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Rainer,

Yes I already have a patch for it and ready to SRU, however, the SRU process its a bit special and it needs verification and stuff like that. I'll be testing/preparing and SRUing.

Thank you!

Revision history for this message
krims0n (krims0n32) wrote :

The patch will be simply adding ext4 to the exclude list or is there more to it ? I noticed the latest upstream version already has this.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi krims0n,

Yes, there's more to it. We can't simply just apply the patch and release. There's a whole process that needs to be followed in order for us to be able to "backport" fixes to a stable release. In this case is the SRU (Stable Release Updates) process. The process is outlined in [1] if you are interested on knowing more about it.

[1]: https://wiki.ubuntu.com/StableReleaseUpdates

Revision history for this message
Andres Rodriguez (andreserl) wrote :
description: updated
Changed in cluster-agents (Ubuntu):
status: Confirmed → Fix Released
Changed in cluster-agents (Ubuntu Maverick):
status: New → Confirmed
importance: Undecided → Medium
Changed in heartbeat (Ubuntu Maverick):
status: New → Invalid
description: updated
Changed in cluster-agents (Ubuntu):
importance: Low → Undecided
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted cluster-agents into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in cluster-agents (Ubuntu Maverick):
status: Confirmed → Fix Committed
tags: added: verification-needed
Revision history for this message
Andres Rodriguez (andreserl) wrote :

@Rainer:

Could you please enable maverick-proposed as specified above, verify the fix and confirm it through here!

Thank you!

Revision history for this message
Rolf Leggewie (r0lf) wrote :

maverick has seen the end of its life and is no longer receiving any updates. Marking the maverick task for this ticket as "Won't Fix".

Changed in cluster-agents (Ubuntu Maverick):
status: Fix Committed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.