Comment 33 for bug 1656020

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

This bug (doubling ntpd service) and it's consequences (flapping ntp peers) is caused because of issue in the testcases.

After rebooting the node with pacemaker, testcase *MUST* use the following method to ensure that all pacemaker resources were started after the reboot:

self.fuel_web.assert_ha_services_ready(cluster_id)

Please add this method to all cases where deployed controllers are rebooted.
This is required *NOT* only for time sync. Working pacemaker resources are mandatory for further cluster functionality, test case should not continue until all necessary resources are considered as started.

--------------------
Details:

1. ntp.py from fuel-devops tries to detect how to control the "ntp" service: via pacemaker, via 'service' command or via 'systemctl' command.

2. Right after reboot of the node, pacemaker resources are not started, so the pacemaker check is failed (see an example below):
2017-03-30 04:04:15 - DEBUG - ssh_client.py:868 -- 'ps -C pacemakerd && crm_resource --resource p_ntp --locate' execution results:
Exit code: EX_ERROR<1(0x01)>

3. ntp.py found and init script in /etc/init.d/ntp.* and assumed that the "ntp" is controlled with this script.

4. After ntp.py starts 'ntp' service with init script, pacemaker is awaking and starts a second instance of 'ntp' service in namespace.

Here is an example what is going on after the reboot:
=====================================================
root@node-4:~# ps -C pacemakerd
  PID TTY TIME CMD
 5053 ? 00:00:00 pacemakerd

root@node-4:~# crm_resource --list
 ...
 Clone Set: clone_p_ntp [p_ntp]

root@node-4:~# crm_resource --resource p_ntp --locate
Error performing operation: No such device or address
root@node-4:~# crm_resource --resource p_ntp --locate
Error performing operation: No such device or address
root@node-4:~# crm_resource --resource p_ntp --locate
Error performing operation: No such device or address
root@node-4:~# crm_resource --resource p_ntp --locate
Error performing operation: No such device or address
root@node-4:~# crm_resource --resource p_ntp --locate
Error performing operation: No such device or address
root@node-4:~# crm_resource --resource p_ntp --locate
Error performing operation: No such device or address
root@node-4:~# crm_resource --resource p_ntp --locate
Error performing operation: No such device or address
root@node-4:~# crm_resource --resource p_ntp --locate
Error performing operation: No such device or address
root@node-4:~# crm_resource --resource p_ntp --locate
Error performing operation: No such device or address
root@node-4:~# crm_resource --resource p_ntp --locate
Error performing operation: No such device or address
root@node-4:~# crm_resource --resource p_ntp --locate

root@node-4:~# crm_resource --list
 Clone Set: clone_p_ntp [p_ntp]
     Stopped: [ node-4.test.domain.local node-5.test.domain.local ]

root@node-4:~# crm_resource --resource p_ntp --locate
resource p_ntp is NOT running
resource p_ntp is NOT running
resource p_ntp is NOT running
root@node-4:~# crm_resource --resource p_ntp --locate
resource p_ntp is NOT running
resource p_ntp is NOT running
resource p_ntp is NOT running
root@node-4:~# crm_resource --resource p_ntp --locate
resource p_ntp is running on: node-4.test.domain.local
resource p_ntp is running on: node-5.test.domain.local
resource p_ntp is running on: node-2.test.domain.local