Circular dependency for systemd units on Debian

Bug #2002653 reported by George Shuklin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Undecided
Unassigned

Bug Description

I found that pure-baremetal installation (no lxc) on Debian creates circular systemd dependency.

Symptoms:

After reboot one of the following entries in journal:

sysinit.target: Found ordering cycle on local-fs.target/start
sysinit.target: Found dependency on var-www-repo.mount/start
sysinit.target: Found dependency on network-online.target/start
sysinit.target: Found dependency on network.target/start
sysinit.target: Found dependency on openvswitch-switch.service/start
sysinit.target: Found dependency on sysinit.target/start
sysinit.target: Job local-fs.target/start deleted to break ordering cycle starting with sysinit.target/start

openvswitch-switch.service: Found ordering cycle on basic.target/start
openvswitch-switch.service: Found dependency on sockets.target/start
openvswitch-switch.service: Found dependency on mariadbcheck.socket/start
openvswitch-switch.service: Found dependency on network.target/start
openvswitch-switch.service: Found dependency on openvswitch-switch.service/start
openvswitch-switch.service: Job sockets.target/start deleted to break ordering cycle starting with openvswitch-switch.service/start

network-online.target: Job networking.service/start deleted to break ordering cycle starting with network-online.target/start
openvswitch-switch.service: Found ordering cycle on ovs-vswitchd.service/start
openvswitch-switch.service: Found dependency on ovsdb-server.service/start
openvswitch-switch.service: Found dependency on local-fs.target/start
openvswitch-switch.service: Found dependency on var-www-repo.mount/start
openvswitch-switch.service: Found dependency on network-online.target/start
openvswitch-switch.service: Found dependency on network.target/start
openvswitch-switch.service: Found dependency on openvswitch-switch.service/start

Due to circular nature systemd can't identify root source of the problem, and bans random unit (including networking.service, which renders server non-responsive or openvswitch-switch.service which breaks all neutron networking).

After extensive bisection I've identified units causing circular dependencies:

* var-www-repo.mount
* mariadbcheck.socket
* mariadbcheck@.service

All of them has dependency "After = network-online.target" or "After=network.target". Removal of that dependency do not affect functionality (f.e. mariadbcheck.socket has reverse dependency on sockets.target), and it's removal solves circular dependency problem.

Proposed fix:

* Remove "After=network.target" from mariadbcheck.socket
* Remove "After = network-online.target" from var-www-repo.mount and mariadbcheck@.service

description: updated
Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Hi George.

I believe this bug has been addressed with https://review.opendev.org/c/openstack/ansible-role-systemd_mount/+/868511

Can you kindly check if it solves the issue?

Revision history for this message
George Shuklin (george-shuklin) wrote :

Thank you, I'll check.

But! There is mariadbcheck@.service issue, which is coming from galera_server/tasks/galera_server_post_install.yml.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Well, var-www-repo.mount does require network-online as by default it's glusterfs, that does require networking. But if it has been overwritten for some local filesystem, that will cause faulty mount to be configured. Same will happen if systemd_mount role was leveraged to mount any other local filesystem, that doesn't require networking. As we've seen exact same behavior where systemd_mount was part of it, as it was conflicting with local-fs.target and going to loop because of that.

But I will take a deeper look specifically on mariadbcheck.socket now.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Eventually that does raise a question: do you have any local or custom mount that is managed with systemd_mount role? Or some kind of override, like for repo_server_systemd_mounts?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-galera_server (master)
Changed in openstack-ansible:
status: New → In Progress
Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Yes, you're right, I was able to reproduce the issue and removing network.target from mariadbcheck.socket does indeed solve it.
Thanks for reporting it and taking time to investigate and provide us with solution!

Revision history for this message
George Shuklin (george-shuklin) wrote :

I've checked fixes from commits:

* systemd_mount 480bb0c871848f2c65f6c7312eab77c33b88ee7a
* galera_server a2ce91ebcb7ab08997827ece215563ac685cd7c0 (snatch from review in Review 870071)

After reboot:

* Servers without var-www-repo.mount booted just fine (no loop), so 870071 worked.
* Servers with var-www-repo.mount and mariadbcheck booted just fine.

That means, both bugs are fixed! Thank you every much!

Revision history for this message
George Shuklin (george-shuklin) wrote :

Can I ask to backport them to stable/zed? Thanks!

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Yes, sure, we will backport the patch once it will land on master.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-galera_server (master)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/870071
Committed: https://opendev.org/openstack/openstack-ansible-galera_server/commit/a2ce91ebcb7ab08997827ece215563ac685cd7c0
Submitter: "Zuul (22348)"
Branch: master

commit a2ce91ebcb7ab08997827ece215563ac685cd7c0
Author: Dmitriy Rabotyagov <email address hidden>
Date: Fri Jan 13 11:16:43 2023 +0100

    Prevent mariadbcheck.socket to wait for network.target

    As of today bare metal scenarion does contain systemd ordering cycle [1]
    due to mariadbcheck.socket waiting for network.target while being
    part of that target. Removing that dependency solves the cycle.

    [1] https://paste.openstack.org/show/bE9UlN6dK8awqZl3uwrQ/
    Closes-Bug: #2002653

    Change-Id: If4729eca992a0e647e2f15b3d77ad6300bbf9c12

Changed in openstack-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-galera_server (stable/zed)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-galera_server (stable/yoga)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-galera_server (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/870056
Committed: https://opendev.org/openstack/openstack-ansible-galera_server/commit/6eaa3affd4e65811ac28326fc5365ba0759a2fa4
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 6eaa3affd4e65811ac28326fc5365ba0759a2fa4
Author: Dmitriy Rabotyagov <email address hidden>
Date: Fri Jan 13 11:16:43 2023 +0100

    Prevent mariadbcheck.socket to wait for network.target

    As of today bare metal scenarion does contain systemd ordering cycle [1]
    due to mariadbcheck.socket waiting for network.target while being
    part of that target. Removing that dependency solves the cycle.

    [1] https://paste.openstack.org/show/bE9UlN6dK8awqZl3uwrQ/
    Closes-Bug: #2002653

    Change-Id: If4729eca992a0e647e2f15b3d77ad6300bbf9c12
    (cherry picked from commit a2ce91ebcb7ab08997827ece215563ac685cd7c0)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-galera_server (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/870057
Committed: https://opendev.org/openstack/openstack-ansible-galera_server/commit/bedc860b2fe68c3dc5e084750693c91d218f6982
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit bedc860b2fe68c3dc5e084750693c91d218f6982
Author: Dmitriy Rabotyagov <email address hidden>
Date: Fri Jan 13 11:16:43 2023 +0100

    Prevent mariadbcheck.socket to wait for network.target

    As of today bare metal scenarion does contain systemd ordering cycle [1]
    due to mariadbcheck.socket waiting for network.target while being
    part of that target. Removing that dependency solves the cycle.

    [1] https://paste.openstack.org/show/bE9UlN6dK8awqZl3uwrQ/
    Closes-Bug: #2002653

    Change-Id: If4729eca992a0e647e2f15b3d77ad6300bbf9c12
    (cherry picked from commit a2ce91ebcb7ab08997827ece215563ac685cd7c0)

tags: added: in-stable-yoga
Revision history for this message
George Shuklin (george-shuklin) wrote :

Should we bump role requirements in stable/zed? Currently those fixes do not get to the actual deployment using openstack-ansible's stable/zed.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Yes, defenitely. We will do bump for the new tag of Zed that is going to be made quite soon.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-galera_server (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-galera_server (master)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/873334
Committed: https://opendev.org/openstack/openstack-ansible-galera_server/commit/8a8d29ea490fba6695e3356831846466f6991089
Submitter: "Zuul (22348)"
Branch: master

commit 8a8d29ea490fba6695e3356831846466f6991089
Author: Dmitriy Rabotyagov <email address hidden>
Date: Thu Feb 9 22:19:36 2023 +0100

    Allow maridbcheck socket to FreeBind

    Once we've removed network.target from wanted targets for
    mariadbcheck.socket, it started to fail to startup intermitently in LXC
    deployments, since it was trying to bind on IP address that is not
    brought up yet. At the same time we can't wait for IP being up, as
    OVS while providing network, waits for socket.target as it needs
    to have ovsdb started up, so waiting for network.target does
    create circular dependency.

    To avoid that we're allowing socket to bind on IP even when IP is not
    UP yet. Other possible solution would be to bind on 0.0.0.0.

    Depends-On: https://review.opendev.org/c/openstack/openstack-ansible/+/872896
    Change-Id: Ia4cde2153813e68419d261cd94e3017523177142
    Closes-Bug: #2003631
    Related-Bug: #2002653

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-galera_server (stable/zed)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-galera_server (stable/yoga)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-galera_server (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/874732
Committed: https://opendev.org/openstack/openstack-ansible-galera_server/commit/f9a8567e61e09e3c6ffd6b8885cb493c6c7a7a70
Submitter: "Zuul (22348)"
Branch: stable/zed

commit f9a8567e61e09e3c6ffd6b8885cb493c6c7a7a70
Author: Dmitriy Rabotyagov <email address hidden>
Date: Thu Feb 9 22:19:36 2023 +0100

    Allow maridbcheck socket to FreeBind

    Once we've removed network.target from wanted targets for
    mariadbcheck.socket, it started to fail to startup intermitently in LXC
    deployments, since it was trying to bind on IP address that is not
    brought up yet. At the same time we can't wait for IP being up, as
    OVS while providing network, waits for socket.target as it needs
    to have ovsdb started up, so waiting for network.target does
    create circular dependency.

    To avoid that we're allowing socket to bind on IP even when IP is not
    UP yet. Other possible solution would be to bind on 0.0.0.0.

    Depends-On: https://review.opendev.org/c/openstack/openstack-ansible/+/872896
    Change-Id: Ia4cde2153813e68419d261cd94e3017523177142
    Closes-Bug: #2003631
    Related-Bug: #2002653
    (cherry picked from commit 8a8d29ea490fba6695e3356831846466f6991089)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-galera_server (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/874733
Committed: https://opendev.org/openstack/openstack-ansible-galera_server/commit/4acaf657873452e0720a1b3f5ba2f889ab88d96e
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 4acaf657873452e0720a1b3f5ba2f889ab88d96e
Author: Dmitriy Rabotyagov <email address hidden>
Date: Thu Feb 9 22:19:36 2023 +0100

    Allow maridbcheck socket to FreeBind

    Once we've removed network.target from wanted targets for
    mariadbcheck.socket, it started to fail to startup intermitently in LXC
    deployments, since it was trying to bind on IP address that is not
    brought up yet. At the same time we can't wait for IP being up, as
    OVS while providing network, waits for socket.target as it needs
    to have ovsdb started up, so waiting for network.target does
    create circular dependency.

    To avoid that we're allowing socket to bind on IP even when IP is not
    UP yet. Other possible solution would be to bind on 0.0.0.0.

    Depends-On: https://review.opendev.org/c/openstack/openstack-ansible/+/872896
    Change-Id: Ia4cde2153813e68419d261cd94e3017523177142
    Closes-Bug: #2003631
    Related-Bug: #2002653
    (cherry picked from commit 8a8d29ea490fba6695e3356831846466f6991089)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-galera_server yoga-eom

This issue was fixed in the openstack/openstack-ansible-galera_server yoga-eom release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.