fsck.xfs can drop into a rescue shell when fsck.mode=force is set

Bug #2101888 reported by Chris Johnson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xfsprogs (Ubuntu)
New
Undecided
Unassigned

Bug Description

Description: Ubuntu 22.04 LTS
Release: 22.04
Package: xfsprogs 5.13.0-1ubuntu2.1
Kernel cmdline containing: fsck.mode=force fsck.repair=yes

[Issue]

LP #2081163 (https://bugs.launchpad.net/ubuntu/+source/xfsprogs/+bug/2081163) backported a fix for fsck.xfs not being run during system startup:

    commit 19dde7fac0f38af2990e367ef4dd8ec512920c12
    Author: Gerald Yang <email address hidden>
    Date: Tue Aug 13 15:25:51 2024 +0800

        fsck.xfs: fix fsck.xfs run by different shells when fsck.mode=force is set

        When fsck.mode=force is specified in the kernel command line, fsck.xfs
        is executed during the boot process. However, when the default shell is
        not bash, $PS1 should be a different value, consider the following script:
        cat ps1.sh
        echo "$PS1"

        run ps1.sh with different shells:
        ash ./ps1.sh
        $
        bash ./ps1.sh

        dash ./ps1.sh
        $
        ksh ./ps1.sh

        zsh ./ps1.sh

        On systems like Ubuntu, where dash is the default shell during the boot
        process to improve startup speed. This results in FORCE being incorrectly
        set to false and then xfs_repair is not invoked:
        if [ -n "$PS1" -o -t 0 ]; then
                FORCE=false
        fi

        Other distros may encounter this issue too if the default shell is set
        to anoother shell.

        Check "-t 0" is enough to determine if we are in interactive mode, and
        xfs_repair is invoked as expected regardless of the shell used.

        Fixes: 04a2d5dc ("fsck.xfs: allow forced repairs using xfs_repair")
        Signed-off-by: Gerald Yang <email address hidden>
        Reviewed-by: Darrick J. Wong <email address hidden>

That fix was originally applied upstream for 6.11.0, but it appears to implicitly depend on an earlier commit from xfsprogs 6.1.0:

    commit 79ba1e15d80eba3aff4396f44629eb8960722d36
    Author: Srikanth C S <email address hidden>
    Date: Tue Dec 13 22:45:43 2022 +0530

        fsck.xfs: mount/umount xfs fs to replay log before running xfs_repair

        After a recent data center crash, we had to recover root filesystems
        on several thousands of VMs via a boot time fsck. Since these
        machines are remotely manageable, support can inject the kernel
        command line with 'fsck.mode=force fsck.repair=yes' to kick off
        xfs_repair if the machine won't come up or if they suspect there
        might be deeper issues with latent errors in the fs metadata, which
        is what they did to try to get everyone running ASAP while
        anticipating any future problems. But, fsck.xfs does not address the
        journal replay in case of a crash.

        fsck.xfs does xfs_repair -e if fsck.mode=force is set. It is
        possible that when the machine crashes, the fs is in inconsistent
        state with the journal log not yet replayed. This can drop the machine
        into the rescue shell because xfs_fsck.sh does not know how to clean the
        log. Since the administrator told us to force repairs, address the
        deficiency by cleaning the log and rerunning xfs_repair.

        Run xfs_repair -e when fsck.mode=force and repair=auto or yes.
        Replay the logs only if fsck.mode=force and fsck.repair=yes. For
        other option -fa and -f drop to the rescue shell if repair detects
        any corruptions.

        Signed-off-by: Srikanth C S <email address hidden>
        Reviewed-by: Carlos Maiolino <email address hidden>
        Signed-off-by: Carlos Maiolino <email address hidden>

Now that xfs_repair is actually invoked when fsck.mode=force is set, it's possible for xfs_repair to fail at boot time and drop into a rescue shell.

This behaviour is seen with xfsprogs 5.13.0-1ubuntu2.1 in Jammy. It would presumably also affect 5.3.0-1ubuntu2.1 in Focal, but I haven't tried it personally.

[Potential Solution]

I applied 79ba1e15d80eba3aff4396f44629eb8960722d36 to xfs_fsck.sh in 5.13.0-1ubuntu2.1 and purposely corrupted the fs.

Booting with fsck.mode=force fsck.repair=yes, fsck.xfs correctly mounts, replays the log, unmounts, repairs, and mounts the filesystem at boot.

Revision history for this message
Chris Johnson (cjohnson-evertz) wrote :

Looking at the original patch for 79ba1e15d80eba3aff4396f44629eb8960722d36, there may be a potential race condition as well:

    echo "Replaying log for $DEV"
    mkdir -p /tmp/repair_mnt || exit 1
    ...
        rm -d /tmp/repair_mnt

If there are two or more XFS filesystems, each invoking a parallel mount/repair sequence to the singular /tmp/repair_mnt, it's possible they interfere with one another.

It probably requires an additional upstream change with something like:

    TMP_DIR=$(mktemp -d) || exit 1
    ...
        rm -d "$TMP_DIR"

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.