ldirectord systemd service fails if no /var/lock/subsys dir

Bug #1828258 reported by Christian Ehrhardt  on 2019-05-08
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
resource-agents (Ubuntu)
Status tracked in Eoan
Xenial
Medium
Heitor Alves de Siqueira
Bionic
Medium
Heitor Alves de Siqueira
Cosmic
Medium
Heitor Alves de Siqueira
Disco
Medium
Heitor Alves de Siqueira
Eoan
Medium
Heitor Alves de Siqueira

Bug Description

[impact]

ldirectord's systemd service script contains commands to touch and remove a file in the /var/lock/subsys directory; however, locks there are a SysV service serialization thing, and are unneeded with systemd. It's unclear why the ldirectord systemd service script contains the lines, but they come from upstream so we should get it fixed there and then correct debian and ubuntu.

this impacts users because if no /var/lock/subsys directory exists, the systemd service will fail to start, which can impact installing or upgrading the resource-agents package.

[test case]

remove the /var/lock/subsys directory and try to install or upgrade ldirectord:

ubuntu@lp1828258:~$ sudo rmdir /var/lock/subsys
ubuntu@lp1828258:~$ sudo apt install ldirectord
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
  libfreetype6
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
  ldirectord
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B/62.1 kB of archives.
After this operation, 233 kB of additional disk space will be used.
Selecting previously unselected package ldirectord.
(Reading database ... 30382 files and directories currently installed.)
Preparing to unpack .../ldirectord_1%3a4.2.0-1ubuntu1_all.deb ...
Unpacking ldirectord (1:4.2.0-1ubuntu1) ...
Setting up ldirectord (1:4.2.0-1ubuntu1) ...

...(120 second or so delay)...

Job for ldirectord.service failed because the control process exited with error code.
See "systemctl status ldirectord.service" and "journalctl -xe" for details.
invoke-rc.d: initscript ldirectord, action "start" failed.
● ldirectord.service - Monitor and administer real servers in a LVS cluster of load balanced virtual servers
   Loaded: loaded (/lib/systemd/system/ldirectord.service; disabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2019-05-09 11:27:53 UTC; 11ms ago
     Docs: man:ldirectord(8)
  Process: 7559 ExecStart=/usr/sbin/ldirectord start (code=exited, status=0/SUCCESS)
  Process: 7564 ExecStartPost=/usr/bin/touch /var/lock/subsys/ldirectord (code=exited, status=1/FAILURE)
  Process: 7565 ExecStopPost=/bin/rm -f /var/lock/subsys/ldirectord (code=exited, status=0/SUCCESS)
 Main PID: 7561
    Tasks: 3 (limit: 4915)
   Memory: 71.7M
   CGroup: /system.slice/ldirectord.service
           ├─2547 /usr/bin/perl -w /usr/sbin/ldirectord start
           ├─7078 /usr/bin/perl -w /usr/sbin/ldirectord start
           └─7561 /usr/bin/perl -w /usr/sbin/ldirectord start

May 09 11:26:22 lp1828258 systemd[1]: Starting Monitor and administer real servers in a LVS cluster of load balanced virtual servers...
May 09 11:26:23 lp1828258 systemd[1]: ldirectord.service: Supervising process 7561 which is not our child. We'll most likely not notice when it exits.
May 09 11:26:23 lp1828258 touch[7564]: /usr/bin/touch: cannot touch '/var/lock/subsys/ldirectord': No such file or directory

[regression potential]

if something internal to resource-agents actually uses the old SysV-style /var/lock/subsys lock, removing it could cause a regression. However, it shouldn't, because the lock was for use only by SysV service scripts. Also, as we should fix this upstream first, they should know if it is used for any other purpose, so if they accept the change it should be safe.

[other info]

this is causing autopkgtest failures, especially on arm; but the failure could happen on any arch.

also note that /var/lock/subsys (where /var/lock/ is a symlink to /run/lock) is managed by systemd's tmpfiles, as a 'legacy' directory:

ubuntu@lp1828258:~$ cat /usr/lib/tmpfiles.d/legacy.conf
...[snip]...
# These files are considered legacy and are unnecessary on legacy-free
# systems.

L /var/lock - - - - ../run/lock

# /run/lock/subsys is used for serializing SysV service execution, and
# hence without use on SysV-less systems.

d /run/lock/subsys 0755 root root -

original description:

--

In autopkgtest like
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-disco-ci-train-ppa-service-3717/disco/armhf/r/resource-agents/20190508_124516_2b20c@/log.gz
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-disco-ci-train-ppa-service-3717/disco/armhf/r/resource-agents/20190507_202519_be056@/log.gz
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-disco-ci-train-ppa-service-3717/disco/armhf/r/resource-agents/20190508_083654_2b20c@/log.gz

This fails to install:
Setting up ldirectord (1:4.2.0-1ubuntu1) ...
Job for ldirectord.service failed because the control process exited with error code.
See "systemctl status ldirectord.service" and "journalctl -xe" for details.
invoke-rc.d: initscript ldirectord, action "start" failed.
● ldirectord.service - Monitor and administer real servers in a LVS cluster of load balanced virtual servers
   Loaded: loaded (/lib/systemd/system/ldirectord.service; disabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2019-05-08 12:20:10 UTC; 42ms ago
     Docs: man:ldirectord(8)
  Process: 779 ExecStart=/usr/sbin/ldirectord start (code=exited, status=0/SUCCESS)
  Process: 783 ExecStartPost=/usr/bin/touch /var/lock/subsys/ldirectord (code=exited, status=1/FAILURE)
  Process: 785 ExecStopPost=/bin/rm -f /var/lock/subsys/ldirectord (code=exited, status=0/SUCCESS)
 Main PID: 781
    Tasks: 1 (limit: 4915)
   Memory: 14.1M
   CGroup: /system.slice/ldirectord.service
           └─781 /usr/bin/perl -w /usr/sbin/ldirectord start

May 08 12:18:39 autopkgtest-lxd-einqza systemd[1]: Starting Monitor and administer real servers in a LVS cluster of load balanced virtual servers...
May 08 12:18:39 autopkgtest-lxd-einqza systemd[1]: ldirectord.service: Supervising process 781 which is not our child. We'll most likely not notice when it exits.
May 08 12:18:40 autopkgtest-lxd-einqza touch[783]: /usr/bin/touch: cannot touch '/var/lock/subsys/ldirectord': No such file or directory
May 08 12:18:40 autopkgtest-lxd-einqza systemd[1]: ldirectord.service: Control process exited, code=exited, status=1/FAILURE
May 08 12:20:10 autopkgtest-lxd-einqza systemd[1]: ldirectord.service: State 'stop-post' timed out. Terminating.
May 08 12:20:10 autopkgtest-lxd-einqza systemd[1]: ldirectord.service: Failed with result 'exit-code'.
May 08 12:20:10 autopkgtest-lxd-einqza systemd[1]: Failed to start Monitor and administer real servers in a LVS cluster of load balanced virtual servers.

Of particular interest might be this line:
 touch[783]: /usr/bin/touch: cannot touch '/var/lock/subsys/ldirectord': No such file or directory

This runs in LXD, maybe a pathing or apparmor issue?

I tried the same on a real armhf device and it works right away.
(Thanks waveform to provide armhf raspi)

Related branches

Since on LP-infra it fails all the time, there might be an issue at a deeper layer.

Next:
- Try qemu based (emulation) Disco VM.
- Try in a LXC container on the bare-metal armhf.

Works in LXD and VM as well.

TL;DR it works everywhere (all arches) and also everywhere (all environments) except autopkgtest-env @ armhf.

I don't know how to debug this further, also it might be a waste of time going even further.
I'll provide a force-badtest for this combination in Disco instead.

The history [1] also shows that recently this is broken, but not necessarily due to a recent change [2] as that didn't touch ldirectord.

[1]: http://autopkgtest.ubuntu.com/packages/r/resource-agents/disco/armhf
[2]: https://launchpad.net/ubuntu/+source/resource-agents/1:4.2.0-1ubuntu1.1

Dan Streetman (ddstreet) wrote :

The problem is ldirectord's systemd service script; it tries to touch /var/lock/subsys/ldirectord but it appears /var/lock/subsys dir doesn't exist. I believe /var/lock/subsys is a sysv service thing, and isn't needed with systemd, but i only briefly investigated.

Most likely the service file should just have its touch/rm removed. If there is a reason to touch/remove the /var/lock/subsys lockfile, the service needs to mkdir /var/lock/subsys first or at least ignore touch errors.

This could potentially actually break ldirectord package installation or upgrade, so I think it's more serious than just a failing autopkgtest.

Dan Streetman (ddstreet) on 2019-05-09
summary: - ldirectord fails on arm (in autopkgtest environment)
+ ldirectord systemd service fails if no /var/lock/subsys dir
Dan Streetman (ddstreet) on 2019-05-09
description: updated
Dan Streetman (ddstreet) on 2019-05-09
description: updated
Changed in resource-agents (Ubuntu Eoan):
importance: Undecided → Medium
Changed in resource-agents (Ubuntu Disco):
importance: Undecided → Medium
Changed in resource-agents (Ubuntu Cosmic):
importance: Undecided → Medium
Changed in resource-agents (Ubuntu Bionic):
importance: Undecided → Medium
Changed in resource-agents (Ubuntu Eoan):
status: New → In Progress
Changed in resource-agents (Ubuntu Disco):
status: New → In Progress
Changed in resource-agents (Ubuntu Cosmic):
status: New → In Progress
Changed in resource-agents (Ubuntu Bionic):
status: New → In Progress
Changed in resource-agents (Ubuntu Eoan):
assignee: nobody → Heitor Alves de Siqueira (halves)
Changed in resource-agents (Ubuntu Disco):
assignee: nobody → Heitor Alves de Siqueira (halves)
Changed in resource-agents (Ubuntu Cosmic):
assignee: nobody → Heitor Alves de Siqueira (halves)
Changed in resource-agents (Ubuntu Bionic):
assignee: nobody → Heitor Alves de Siqueira (halves)
Changed in resource-agents (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Heitor Alves de Siqueira (halves)

I've checked the other daemons in resource-agents, and it doesn't seem like removing the /var/lock/subsys stuff will cause any issues. Quick tests on a disco container showed that it fixes the problem as well.

I submitted a PR upstream to engage the devs on a discussion about those locks: https://github.com/ClusterLabs/resource-agents/pull/1328

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers