[2.3, service-tracking] MAAS service tracking never notices tracked daemons crash (ntp is an example) or viceversa

Bug #1747459 reported by Jason Hobbs
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Blake Rouse
2.3
Triaged
High
Unassigned

Bug Description

If ntp crashes on a region controller, for example, via SIGSEGV, MAAS never seems to notice. The service indicator in the UI stays green for ntp on that controller, MAAS continues to tell nodes to use the VIP for ntp, even though it may not work.

To reproduce:

1) kill -SIGSEGV <pidof ntpd> on a MAAS region controller.
2) verify that the controller's status never changes and ntp continues to be up.
3) if using HA, verify that crm status shows all units up for the MAAS vip.

This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1.

Related branches

tags: added: cdo-qa maas-ha-testing
Revision history for this message
Andres Rodriguez (andreserl) wrote :

How long did you wait to make the determination that the service crash wasn't noticed by MAAS?

Also, did you confirm that system *never* tried to start the service again ?

Changed in maas:
status: New → Incomplete
Revision history for this message
Blake Rouse (blake-rouse) wrote :

I believe there is a 30 second delay.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Ok, I waited 19 minutes, which isn't never but it's never-ish.

http://paste.ubuntu.com/26525461/

maas.service_monitor: does recognize it's not running, but it doesn't restart it, and it doesn't report any thing to MAAS.

Changed in maas:
status: Incomplete → New
Revision history for this message
Blake Rouse (blake-rouse) wrote :

19 minutes seems long enough ;-)

Changed in maas:
importance: Undecided → Critical
milestone: none → 2.4.x
milestone: 2.4.x → 2.4.0alpha1
status: New → Triaged
importance: Critical → High
summary: - MAAS never notices that ntp crashes
+ MAAS service tracking never notices tracked daemons crash (ntp is an
+ example)
Changed in maas:
milestone: 2.4.0alpha1 → 2.4.0alpha2
Revision history for this message
Andres Rodriguez (andreserl) wrote : Re: MAAS service tracking never notices tracked daemons crash (ntp is an example) or viceversa

I can confirm this is also the other way around.

e.g. a had a broken bind, i fixed it, restart it manually, systemd reported it was running fine, but MAAS never updated that in the UI.

summary: MAAS service tracking never notices tracked daemons crash (ntp is an
- example)
+ example) or viceversa
Changed in maas:
importance: High → Critical
Changed in maas:
milestone: 2.4.0alpha2 → 2.4.0beta1
summary: - MAAS service tracking never notices tracked daemons crash (ntp is an
- example) or viceversa
+ [2.4, service-tracking] MAAS service tracking never notices tracked
+ daemons crash (ntp is an example) or viceversa
Changed in maas:
milestone: 2.4.0beta1 → 2.4.0beta2
Changed in maas:
assignee: nobody → Blake Rouse (blake-rouse)
summary: - [2.4, service-tracking] MAAS service tracking never notices tracked
+ [2.3, service-tracking] MAAS service tracking never notices tracked
daemons crash (ntp is an example) or viceversa
Revision history for this message
Blake Rouse (blake-rouse) wrote :

I have been trying to reproduce this with latest master that includes https://code.launchpad.net/~blake-rouse/maas/+git/maas/+merge/342811 which I believe was the actual cause of this issue.

Also while reproducing this issue we noticed that the 1 minute delay is to long to wait to restart services so we are lowing them to 1.

Changed in maas:
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.