Time failed to sync on same nodes for different tests in random order

Bug #1534513 reported by Tatyanka
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Medium
Fuel QA Team
6.1.x
Won't Fix
Medium
Fuel QA Team
8.0.x
Won't Fix
Medium
Fuel QA Team
Mitaka
Fix Released
Medium
Fuel QA Team

Bug Description

There is no steps how to reproduce this issue, but sometimes sync time failed in different tests for 1 of the nodes. It is may happens one time in 20 attempts, but if it happens, test became failed.

Provision new cluster many times after deletion the old one

Scenario:
1. Create HA cluster
2. Add 1 controller, 2 compute and 2 cinder nodes
3. Deploy the cluster
4. Delete cluster
5. Create another HA cluster
6. Create snapshot of environment
7. Revert snapshot and try provision cluster 10 times

Time on nodes was not synchronized:
[('node-3', [])]

Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/usr/lib/python2.7/unittest/case.py", line 1043, in runTest
    self._testFunc()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.repeatable_image_based/fuelweb_test/helpers/decorators.py", line 83, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.repeatable_image_based/fuelweb_test/tests/tests_strength/test_image_based.py", line 47, in repeatable_image_based
    self.env.revert_snapshot("ready_with_5_slaves")
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.repeatable_image_based/fuelweb_test/models/environment.py", line 358, in revert_snapshot
    self.sync_time(nailgun_nodes)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.repeatable_image_based/fuelweb_test/helpers/decorators.py", line 341, in wrapper
    return func(*args, **kwargs)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.repeatable_image_based/fuelweb_test/models/environment.py", line 573, in sync_time
    g_ntp.do_sync_time()
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.repeatable_image_based/fuelweb_test/helpers/ntp.py", line 94, in do_sync_time
    " \n{0}".format(self.report_not_connected()))
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/asserts.py", line 163, in assert_true
    raise ASSERTION_ERROR(message)
AssertionError: Time on nodes was not synchronized:
[('node-3', [])]

Ilya Kutukov (ikutukov)
Changed in fuel:
status: New → Confirmed
Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Tag "swarm-blocker" was added because issue is cause of many failures on SWARM, see duplicate.

Changed in fuel:
importance: Medium → High
tags: added: swarm-blocker
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

swarm-blockers affect CI/BVT badly, hence critical

Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :

Let's change priority to High because it doesn't affect BVT and it appears not often in SWARM tests.
We have a review https://review.openstack.org/#/c/269790/1 which add additional verbosity to logs and could throw light on what is wrong with ntpd service.

Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :

There are no new reproduces since 2016-01-15. Thus I change bug importance to the Medium. And we'll be waiting for a reproduce

Changed in fuel:
importance: High → Medium
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

The issue reproduced at Feb 1 in BVT job:
https://product-ci.infra.mirantis.net/job/8.0.ubuntu.bvt_2/458/

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

From the failure in /8.0.ubuntu.bvt_2/458/ , looks like on the node-1 there are no configured peers for ntpd:

2016-02-01 03:40:35,831 - DEBUG __init__.py:54 -- Calling: get_peers with args: ([<class 'fuelweb_test.helpers.ntp.NtpInitscript'>(0x7fea107dc990) admin_ip:None, node_name:node-1, sync:True, conn:False],) {}
2016-02-01 03:40:35,831 - DEBUG helpers.py:335 -- Executing command: 'ntpq -pn 127.0.0.1'
2016-02-01 03:40:45,854 - DEBUG __init__.py:59 -- Done: get_peers with result: []

Actual result should looks like:

Node: node-4, ntpd peers: ['*10.109.15.2 10.109.15.1 3 u 54 64 377 0.261 -10.694 4.128\n']

, where '10.109.15.2' is the server configured in /etc/ntp.conf on node-4.

In case of reproduce the bug, please check that in the /etc/ntp.conf configured remote server(s) and the ntpd service actually uses this config.

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Feel free to reopen issue if you meet it again.

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

There were several fixes with time sync functionality, so moved to Fix Committed.

Revision history for this message
Vadim Rovachev (vrovachev) wrote :
tags: added: non-release
Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: swarm-blocker
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

We no longer support MOS5.1, MOS6.0, MOS6.1
We deliver only Critical/Security fixes to MOS7.0, MOS8.0.
We deliver only High/Critical/Security fixes to MOS9.2.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.