watchdog should start after basic.target

Bug #1891801 reported by Christoph Roeder
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
watchdog (Debian)
Confirmed
Unknown
watchdog (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

When using watchdog (softdog) with sbd, watchdog starts after sbd by default, because of this unit "After" setting:

> After=multi-user.target

I think it should be:

> After=basic.target

PS: running on ubuntu 20.04 server

affects: sbd (Ubuntu) → watchdog (Ubuntu)
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Christoph,

Why do you think that ? Could you be more prolix about this change and what are the pros and cons of making this change ? An example on how you're setting up your cluster using sbd and watchdog would be also good to corroborate your request.

From:

https://wiki.clusterlabs.org/wiki/Using_SBD_with_Pacemaker

I have:

"""Ensure that the sbd daemon is running on a node before starting the cluster services. The best approach is generally to enable it to start at boot. (The cluster can't manage the sbd daemon as a cluster resource.) There are two flavors of SBD, sbd for cluster nodes, and sbd_remote for Pacemaker Remote nodes. Here we use sbd as an example, but for Pacemaker Remote nodes, replace sbd with sbd_remote:"""

Note: sbd has to start before corosync and pacemaker. It would be good to have watchdog already working, so you're probably right... but that change should be done in sbd.service and not watchdog (as watchdog is a "generic" service that serves other purposes than pacemaker/fence-agents).

and

"""With watchdog-only SBD, the cluster must have true quorum. Thus, it can only be used in a cluster with three or more nodes, or a two-node cluster with external quorum (such as corosync using qdevice with a third node).
Configure the basic setup on every node as described above.
Select a recovery interval (in seconds) that is greater than SBD_WATCHDOG_TIMEOUT in /etc/sysconfig/sbd."""

I assume the ordering has something to do with unfencing (from fence_sbd + watchdog setup) but I also know that those type of unfencing (like fence_mpath and fence_iscsi) are not supported "automatically" (meaning that any time there is a cluster split, a manual intervention is required).

Looking forward to reading more about your request.

Thanks

-rafaeldtinoco

Changed in watchdog (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

For sbd:

[Unit]
Description=Shared-storage based fencing daemon
Documentation=man:sbd(8)
Before=pacemaker.service
Before=dlm.service
After=systemd-modules-load.service iscsi.service
PartOf=corosync.service
RefuseManualStop=true
RefuseManualStart=true

...

I could add:

After=watchdog.service

so sbd starts after watchdog service is up, if properly discussed, justified.

Revision history for this message
Christoph Roeder (brightdroid) wrote :

My setup looks like this:

- two nodes cluster with drbd
- 3rd node qdevice (net)

You`re right, sbd should start before watchdog.
This was also my first attempt as you suggested above to add "After=watchdog.service".
But this creates an order cycle in pacemaker:
---
Aug 18 06:36:29 drbd01 systemd[1]: multi-user.target: Found ordering cycle on pacemaker.service/start
Aug 18 06:36:29 drbd01 systemd[1]: multi-user.target: Found dependency on sbd.service/start
Aug 18 06:36:29 drbd01 systemd[1]: multi-user.target: Found dependency on watchdog.service/start
Aug 18 06:36:29 drbd01 systemd[1]: multi-user.target: Found dependency on multi-user.target/start
Aug 18 06:36:29 drbd01 systemd[1]: multi-user.target: Job pacemaker.service/start deleted to break ordering cycle starting with multi-user.target/start
---

Btw. sbd starts fine with this modification

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Christoph, yep, because of that dependency loop and because it kind of makes sense to have the watchdog starting earlier, I opened a Debian bug about this issue asking maintainer opinion. He is also the upstream maintainer so that will help us a bit.

I'm linking the upstream issue and will follow his answer.

Changed in watchdog (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Revision history for this message
Christoph Roeder (brightdroid) wrote :

Thanks, hope for a update soon.

Changed in watchdog (Debian):
status: Unknown → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.