Pacemaker Arbitrarily restarts service after 33 failure counter increments
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Invalid
|
High
|
MOS Maintenance |
Bug Description
Detailed bug description:
This report is related to https:/
In the conditions detailed above, and only in the customer environment, the neutron-dhcp-agent resource will be killed and will not be started again cleanly by the OCF scripts.
Pacemaker can see this, and based on log outputs it appears to try and start the process repeatedly over the course of 30-40 minutes, at an interval of once per minute. It is apparent that a failure counter is incrementing over this period.
After about 33 failures, PCS bans the resource, and then about 50 seconds later starts up the resource successfully. It is unclear what configurations in PCS govern this behavior, as the configuration for neutron-dhcp-agent do not appear to match what is being seen.
Steps to reproduce:
This is, so far, only reproducable in the customer environment.
Expected results:
PCS should start up the resource immediately after the issue reported in https:/
Actual result:
It takes 33+ minutes for the resource to be started up again, which involves the resource being banned even though there doesn't appear to be any sort of failure thresholds set in the resource configuration.
Impact:
Without knowing the precise configuration that governs this behavior, it appears to be arbitrary. Thus, the customer cannot set a timeout threshold on the fuel task that would account for this situation.
Description of the environment:
- Operation system: Ubuntu 14.04
- Reference architecture: MOS 9.0
- Network model: Neutron + OVS
Additional information:
Clone: clone_neutron-
Meta Attrs: interleave=true
Resource: neutron-dhcp-agent (class=ocf provider=fuel type=neutron-
Attributes: plugin_
Operations: monitor interval=20 timeout=30 (neutron-
start interval=0 timeout=60 (neutron-
stop interval=0 timeout=60 (neutron-
Changed in mos: | |
milestone: | none → 9.2-mu-14 |
assignee: | nobody → MOS Maintenance (mos-maintenance) |
importance: | Undecided → High |
status: | New → Confirmed |
tags: | added: customer-found |
Closed per Jesse's request.