[2.3.0~alpha3] Lingering beacon-monitor/observe-beacons processes after stopping rackd/regiond

Bug #1718122 reported by Данило Шеган
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Medium
Mike Pontillo
2.3
Won't Fix
High
Mike Pontillo

Bug Description

After issuing "systemctl stop maas-rackd" and "systemctl stop maas-regiond" on Landmaas, none of the beacon-monitor/observe-beacons processes are stopped. I would expect stopping rackd to be sufficient, but I stopped regiond just in case.

This results in a "ps -faux | grep --context 1 beacon-monitor" output like the one on https://pastebin.canonical.com/198703/

Interestingly, restarting just the rackd kills the observe-beacons processes (https://pastebin.canonical.com/198705/), but adds another set of beacon-monitor processes and their tcpdump children.

observe-beacons processes are restarted after I start maas-regiond.

This happens with network discovery set to "disabled" in the MAAS dashboard.

Seen with 2.3.0~alpha3-6244-gfc2f29b from experimental3 ppa.

summary: - Lingering beacon-monitor/observe-beacons processes after stopping
- rackd/regiond
+ [2.3.0~alpha3] Lingering beacon-monitor/observe-beacons processes after
+ stopping rackd/regiond
description: updated
description: updated
Changed in maas:
milestone: none → 2.3.0
importance: Undecided → Critical
status: New → Triaged
Changed in maas:
milestone: 2.3.0 → 2.3.x
Changed in maas:
milestone: 2.3.x → 2.4.0beta1
Changed in maas:
milestone: 2.4.0beta1 → 2.4.0beta2
Changed in maas:
importance: Critical → High
Changed in maas:
milestone: 2.4.0beta2 → 2.4.0beta3
Changed in maas:
milestone: 2.4.0beta3 → 2.4.0rc1
Changed in maas:
assignee: nobody → Mike Pontillo (mpontillo)
importance: High → Medium
milestone: 2.4.0rc1 → 2.4.0rc2
Changed in maas:
milestone: 2.4.0rc2 → 2.5.0
Changed in maas:
milestone: 2.5.0 → 2.5.0beta2
Revision history for this message
Mike Pontillo (mpontillo) wrote :

After investigating this issue on MAAS 2.5 on Bionic, I am unable to reproduce the issue.

Given that it used to be easy to reproduce the issue on Xenial-based MAAS systems, my theory is that between Ubuntu 16.04 and Ubuntu 18.04, systemd fixed a bug that prevented these processes from being killed. (After cloning the systemd source code and searching the commit log, I see that 5 bugs related to killing zombie processes have been fixed between the time that Xenial was released and now.)

To be clear, the init system (systemd) is responsible for reaping zombie processes. These issues were occurring because that was not happening.

Changed in maas:
status: Triaged → Invalid
milestone: 2.5.0beta2 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.