[2.3.0~alpha3] Lingering beacon-monitor/observe-beacons processes after stopping rackd/regiond
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Invalid
|
Medium
|
Mike Pontillo | ||
2.3 |
Won't Fix
|
High
|
Mike Pontillo |
Bug Description
After issuing "systemctl stop maas-rackd" and "systemctl stop maas-regiond" on Landmaas, none of the beacon-
This results in a "ps -faux | grep --context 1 beacon-monitor" output like the one on https:/
Interestingly, restarting just the rackd kills the observe-beacons processes (https:/
observe-beacons processes are restarted after I start maas-regiond.
This happens with network discovery set to "disabled" in the MAAS dashboard.
Seen with 2.3.0~alpha3-
summary: |
- Lingering beacon-monitor/observe-beacons processes after stopping - rackd/regiond + [2.3.0~alpha3] Lingering beacon-monitor/observe-beacons processes after + stopping rackd/regiond |
description: | updated |
description: | updated |
Changed in maas: | |
milestone: | none → 2.3.0 |
importance: | Undecided → Critical |
status: | New → Triaged |
Changed in maas: | |
milestone: | 2.3.0 → 2.3.x |
Changed in maas: | |
milestone: | 2.3.x → 2.4.0beta1 |
Changed in maas: | |
milestone: | 2.4.0beta1 → 2.4.0beta2 |
Changed in maas: | |
importance: | Critical → High |
Changed in maas: | |
milestone: | 2.4.0beta2 → 2.4.0beta3 |
Changed in maas: | |
milestone: | 2.4.0beta3 → 2.4.0rc1 |
Changed in maas: | |
assignee: | nobody → Mike Pontillo (mpontillo) |
importance: | High → Medium |
milestone: | 2.4.0rc1 → 2.4.0rc2 |
Changed in maas: | |
milestone: | 2.4.0rc2 → 2.5.0 |
Changed in maas: | |
milestone: | 2.5.0 → 2.5.0beta2 |
After investigating this issue on MAAS 2.5 on Bionic, I am unable to reproduce the issue.
Given that it used to be easy to reproduce the issue on Xenial-based MAAS systems, my theory is that between Ubuntu 16.04 and Ubuntu 18.04, systemd fixed a bug that prevented these processes from being killed. (After cloning the systemd source code and searching the commit log, I see that 5 bugs related to killing zombie processes have been fixed between the time that Xenial was released and now.)
To be clear, the init system (systemd) is responsible for reaping zombie processes. These issues were occurring because that was not happening.