An ocf:heartbeat:nfsserver resource's stop operation succeeded despite the /var/lib/nfs filesystem failing to unmount.

Bug #2065848 reported by Jacob Becker
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
resource-agents (Ubuntu)
Triaged
Undecided
Unassigned

Bug Description

The resource "ocf:heartbeat:nfsserver" is considered stopped even the process returned an error:
pacemaker-execd[7831]: notice: nfs_daemon_stop_0[65992] error output [ umount: /var/lib/nfs: target is busy. ]

Beacause it is considered successfully stopped a later unmount of an LVM resource failed:
 LVM-activate(LVM_nfs_infodir_LV)[69671]: ERROR: PARTIAL MODE. Incomplete logical volumes will be processed. Logical volume DCSS_VG/nfs_infodir contains a filesystem in use.

cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.4 LTS"

resource-agents-base/jammy-updates,now 1:4.7.0-1ubuntu7.2 all [installed,automatic]
  Cluster Resource Agents curated by Ubuntu

resource-agents-common/jammy-updates,now 1:4.7.0-1ubuntu7.2 amd64 [installed,automatic]
  Common files used by the Cluster Resource Agents

resource-agents-extra/jammy-updates,now 1:4.7.0-1ubuntu7.2 amd64 [installed]
  Cluster Resource Agents

Revision history for this message
Jacob Becker (jacob-becker-h) wrote :
Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Hello Jacob,

Thank you for making this bug report and making Ubuntu better!

If I prepare a test package in a PPA would you be willing to help test the package?

I just kicked off a build with the upstream diff applied here[0].

[0] - https://launchpad.net/~mitchdz/+archive/ubuntu/lp2065848-resource-agents-unbind

Changed in resource-agents (Ubuntu):
status: New → Triaged
Revision history for this message
Jacob Becker (jacob-becker-h) wrote :

Hi!
Yes, i can test the package

thank you in advance
Jacob

Revision history for this message
Mitchell Dzurick (mitchdz) wrote (last edit ):

Thanks Jacob! Just an FYI If you try to use my PPA right now you may not get the package since the runner is in the process of packaging it. Should be published within an hour or so.

Revision history for this message
Jacob Becker (jacob-becker-h) wrote :
Download full text (3.7 KiB)

Hi Mitchell,

i installed the package and tried to shutdowwn the resource-group.
Still got the error however:

May 23 06:32:53 blofecn2 nfsserver(nfs_daemon)[1282251]: INFO: Stopping rpc.statd
May 23 06:32:54 blofecn2 nfsserver(nfs_daemon)[1282251]: INFO: Stop: umount (1/10 attempts)
May 23 06:32:54 blofecn2 systemd[1]: export_root-nfs_infodir-rpc_pipefs.mount: Deactivated successfully.
May 23 06:32:54 blofecn2 systemd[1]: var-lib-nfs-rpc_pipefs.mount: Deactivated successfully.
May 23 06:32:55 blofecn2 nfsserver(nfs_daemon)[1282251]: ERROR: Failed to unmount /var/lib/nfs
May 23 06:32:55 blofecn2 nfsserver(nfs_daemon)[1282251]: ERROR: Failed to unmount a bind mount
May 23 06:32:55 blofecn2 pacemaker-execd[7826]: notice: nfs_daemon_stop_0[1282251] error output [ umount: /var/lib/nfs: target is busy. ]
May 23 06:32:55 blofecn2 pacemaker-execd[7826]: notice: nfs_daemon_stop_0[1282251] error output [ ocf-exit-reason:Failed to unmount a bind mount ]
May 23 06:32:55 blofecn2 pacemaker-controld[7829]: notice: Result of stop operation for nfs_daemon on blofecn-node2: error (Failed to unmount a bind mount)
May 23 06:32:55 blofecn2 pacemaker-controld[7829]: notice: blofecn-node2-nfs_daemon_stop_0:491 [ umount: /var/lib/nfs: target is busy.\nocf-exit-reason:Failed to unmount a bind mount\n ]
May 23 06:32:55 blofecn2 pacemaker-controld[7829]: notice: Transition 675 aborted by operation nfs_daemon_stop_0 'modify' on blofecn-node2: Event failed
May 23 06:32:55 blofecn2 pacemaker-controld[7829]: notice: Transition 675 action 133 (nfs_daemon_stop_0 on blofecn-node2): expected 'ok' but got 'error'
May 23 06:32:55 blofecn2 pacemaker-controld[7829]: notice: Transition 675 (Complete=22, Pending=0, Fired=0, Skipped=0, Incomplete=35, Source=/var/lib/pacemaker/pengine/pe-input-1098.bz2): Compl
ete
May 23 06:32:55 blofecn2 pacemaker-attrd[7827]: notice: Setting fail-count-nfs_daemon#stop_0[blofecn-node2]: (unset) -> INFINITY
May 23 06:32:55 blofecn2 pacemaker-attrd[7827]: notice: Setting last-failure-nfs_daemon#stop_0[blofecn-node2]: (unset) -> 1716438775
May 23 06:32:55 blofecn2 sbd[7086]: warning: inquisitor_child: pcmk health check: UNHEALTHY
May 23 06:32:55 blofecn2 sbd[7086]: warning: inquisitor_child: Servant pcmk is outdated (age: 602589)
May 23 06:32:55 blofecn2 pacemaker-schedulerd[7828]: warning: Unexpected result (error: Failed to unmount a bind mount) was recorded for stop of nfs_daemon on blofecn-node2 at May 23 06:32:53 2
024
May 23 06:32:55 blofecn2 pacemaker-schedulerd[7828]: warning: Unexpected result (error: Failed to unmount a bind mount) was recorded for stop of nfs_daemon on blofecn-node2 at May 23 06:32:53 2
024
May 23 06:32:55 blofecn2 pacemaker-schedulerd[7828]: warning: Cluster node blofecn-node2 will be fenced: nfs_daemon failed there

After that i looked at the mounts a little closer.
My issue seems to be the following:
/dev/mapper/DCSS_VG-nfs_infodir on /export_root/nfs_infodir type ext4 (rw,relatime)
/dev/mapper/DCSS_VG-nfs_infodir on /var/lib/nfs type ext4 (rw,relatime)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
sunrpc on /export_root/nfs_infodir/rpc_pipefs type rpc_pipefs (rw,relatime)

rpc_pipe...

Read more...

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Thanks for the update Jacob, we will wait to hear back from you.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.