Fuel for OpenStack

[system-tests]Review the approach of time syncronization in system tests

Series 6.0.x
Bug #1433484

Bug #1433484 reported by Dennis Dmitriev on 2015-03-18

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Fix Released	High	Dennis Dmitriev	Fuel for OpenStack 6.1
5.1.x	In Progress	High	Dennis Dmitriev	Fuel for OpenStack 5.1.1-updates
6.0.x	In Progress	High	Dennis Dmitriev	Fuel for OpenStack 6.0-updates

Bug Description

Our system tests use 'snapshot' and 'revert' for virtual machines to save deployment time when the same configuration is tested on several different test cases.

After each revert, time on all virtual machines left in the past and must be corrected before test run.

We synchronize time with the command 'ntpd -gq' , which has the following issues:

  - if the server with time source doesn't run 'ntpd' service, the command 'ntpd -gq' on a slave hungs forever;
  - if 'ntpd' service on the server with time source has recently restarted, it reject all incoming requests from it's slaves 'ntpd -gq' for some time. It leads to 'no servers were found' error after 'ntpd -gq' on a slave, and the time on the slave remains in the past;
  - new approach to providing time for cluster requires to detect how exactly 'ntpd' is started on a node, because on the controllers 'ntpd' is started inside the 'vrouter' namespace. If 'ntpd' is restarted by 'init' script, it will be non-functional on controller for the rest nodes on cluster.

These reasons make the time synchronization is very challenging.

There is an obvious way how we can get all environment nodes synchronized quickly and without most of these issues.
If we use host clock as the hardware time source instead of vm clock, then we will have the same time on the same nodes just with 'hwclock -s' command, even if 'ntpd' on slaves still not working.

========== Example of using host clock source:
# Host server:
(devops-venv-2.9.3)~$ date
Wed Mar 18 04:56:21 EDT 2015

# Fuel admin node:
[root@nailgun ~]# hwclock -r
Wed 18 Mar 2015 08:56:56 AM UTC -0.494653 seconds

# Controller
root@node-2:~# hwclock -r
Wed 18 Mar 2015 01:57:25 AM PDT -0.986577 seconds

# Compute
root@node-3:~# hwclock -r
Wed 18 Mar 2015 01:57:45 AM PDT -0.903103 seconds

To do this, in 'fuel-devops' repository the following option should be changed:
track='guest' to track='wall' for 'rtc' clock source: https://github.com/stackforge/fuel-devops/blob/master/devops/driver/libvirt/libvirt_xml_builder.py#L166

* Important!
This change requires from the hardware clock on the host to be synchronized, to avoid time differences when slaves on environment get their 'ntpd' synchronized with external time source.

See original description

Tags:

Dennis Dmitriev (ddmitriev) on 2015-03-18

description:

updated

Nastya Urlapova (aurlapova) on 2015-03-20

Changed in fuel:
status:	New → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-30: Related fix proposed to fuel-devops (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/168782

Dennis Dmitriev (ddmitriev) on 2015-03-30

Changed in fuel:
assignee:	Fuel QA Team (fuel-qa) → Dennis Dmitriev (ddmitriev)
status:	Confirmed → In Progress

Nastya Urlapova (aurlapova) on 2015-03-30

summary:

- Review the approach of time syncronization in system tests
+ [system-tests]Review the approach of time syncronization in system tests

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-31: Related fix merged to fuel-devops (master)

Reviewed: https://review.openstack.org/168782
Committed: https://git.openstack.org/cgit/stackforge/fuel-devops/commit/?id=d732d9cb4624ec8e12a7c53380a04a632c07cc01
Submitter: Jenkins
Branch: master

commit d732d9cb4624ec8e12a7c53380a04a632c07cc01
Author: Dennis Dmitriev <email address hidden>
Date: Mon Mar 30 08:23:44 2015 +0300

Configure VMs to use host's clock source, simplify timesync

    Correct time syncronization is not possible from devops now because
    it requires close interaction with the OpenStack clusters (because
    of p_ntp resources in the pacemaker) that is out of scope of devops
    purposes.

    Instead of making too complicated time syncronization routines in
    devops, let's use the host clock source as the single source for
    all VMs for quick setting the actual time for nodes.

System tests are still allowed to use more suitable ways for time
syncronization.

Change-Id: I6d3b2a89cd72b24d6c4f558534cd9dea82925b88
Related-Bug: #1433484

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-02: Related fix proposed to fuel-qa (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/170141

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-03: Related fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/170141
Committed: https://git.openstack.org/cgit/stackforge/fuel-qa/commit/?id=adecb980e291426745d7f444bd51d734ad7f4b84
Submitter: Jenkins
Branch: master

commit adecb980e291426745d7f444bd51d734ad7f4b84
Author: Dennis Dmitriev <email address hidden>
Date: Thu Apr 2 17:03:12 2015 +0300

Use host's NTPD as the time source for NTPD on the master node

    To get NTPD on master node syncronized more quickly but not abuse
    external NTPD servers, let's try to use NTPD on the host where test
    is running.

    When master node provisioned, NTPD on the host is checked via
    'ntpdate' command. If it completes successfuly, then replase
    settings for NTP* servers in the /etc/fuel/astute.yaml

Change-Id: Ida6c12013bcd95a638ffc624532ea71e1716b64d
Related-Bug: #1433484

Revision history for this message

Andrey Sledzinskiy (asledzinskiy) wrote on 2015-04-06:

I think we also should backport it to stable series because we have also time-sync problems there
Logs from 5.1.2 failure are attached
http://jenkins-product.srt.mirantis.net:8080/view/5.1.2/job/5.1.2.ubuntu.bvt_2/146/testReport/junit/%28root%29/prepare_slaves_5/prepare_slaves_5/

Revision history for this message

Andrey Sledzinskiy (asledzinskiy) wrote on 2015-04-06:

fail_error_prepare_slaves_5-2015_04_03__21_29_14.tar.gz Edit (466.8 KiB, application/x-tar)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-06: Fix proposed to fuel-qa (master)

Fix proposed to branch: master
Review: https://review.openstack.org/170797

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-09: Fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/170797
Committed: https://git.openstack.org/cgit/stackforge/fuel-qa/commit/?id=6fa3a00b5186e9253e33adece62f317cc7e7a99a
Submitter: Jenkins
Branch: master

commit 6fa3a00b5186e9253e33adece62f317cc7e7a99a
Author: Dennis Dmitriev <email address hidden>
Date: Mon Apr 6 12:54:16 2015 +0300

Add 'ntp' helper with more strong time synchronizing methods

    - all nodes are synchronized by groups:
      - in first group only Fuel admin node synchronized,
      - in second group all controllers are synchronized (with admin
            node or external ntp server)
      - in third group all other nodes are synchronized (with admin
            node or a controller)

- if any node is syncronized to admin node (/etc/ntp.conf), then
options 'minpoll 3 maxpoll 5' added to 'server' in the config.

- decrease waiting for nailgun after revert to 30 sec

Change-Id: I7fd4de836f72a20ee2317114a2f6e810a0e18353
Closes-Bug: #1433484