[system-tests]Review the approach of time syncronization in system tests

Bug #1433484 reported by Dennis Dmitriev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Dennis Dmitriev
5.1.x
In Progress
High
Dennis Dmitriev
6.0.x
In Progress
High
Dennis Dmitriev

Bug Description

Our system tests use 'snapshot' and 'revert' for virtual machines to save deployment time when the same configuration is tested on several different test cases.

After each revert, time on all virtual machines left in the past and must be corrected before test run.

We synchronize time with the command 'ntpd -gq' , which has the following issues:

  - if the server with time source doesn't run 'ntpd' service, the command 'ntpd -gq' on a slave hungs forever;
  - if 'ntpd' service on the server with time source has recently restarted, it reject all incoming requests from it's slaves 'ntpd -gq' for some time. It leads to 'no servers were found' error after 'ntpd -gq' on a slave, and the time on the slave remains in the past;
  - new approach to providing time for cluster requires to detect how exactly 'ntpd' is started on a node, because on the controllers 'ntpd' is started inside the 'vrouter' namespace. If 'ntpd' is restarted by 'init' script, it will be non-functional on controller for the rest nodes on cluster.

These reasons make the time synchronization is very challenging.

There is an obvious way how we can get all environment nodes synchronized quickly and without most of these issues.
If we use host clock as the hardware time source instead of vm clock, then we will have the same time on the same nodes just with 'hwclock -s' command, even if 'ntpd' on slaves still not working.

========== Example of using host clock source:
# Host server:
(devops-venv-2.9.3)~$ date
Wed Mar 18 04:56:21 EDT 2015

# Fuel admin node:
[root@nailgun ~]# hwclock -r
Wed 18 Mar 2015 08:56:56 AM UTC -0.494653 seconds

# Controller
root@node-2:~# hwclock -r
Wed 18 Mar 2015 01:57:25 AM PDT -0.986577 seconds

# Compute
root@node-3:~# hwclock -r
Wed 18 Mar 2015 01:57:45 AM PDT -0.903103 seconds

To do this, in 'fuel-devops' repository the following option should be changed:
   track='guest' to track='wall' for 'rtc' clock source: https://github.com/stackforge/fuel-devops/blob/master/devops/driver/libvirt/libvirt_xml_builder.py#L166

* Important!
This change requires from the hardware clock on the host to be synchronized, to avoid time differences when slaves on environment get their 'ntpd' synchronized with external time source.

Tags: system-tests
description: updated
Changed in fuel:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-devops (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/168782

Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Dennis Dmitriev (ddmitriev)
status: Confirmed → In Progress
summary: - Review the approach of time syncronization in system tests
+ [system-tests]Review the approach of time syncronization in system tests
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-devops (master)

Reviewed: https://review.openstack.org/168782
Committed: https://git.openstack.org/cgit/stackforge/fuel-devops/commit/?id=d732d9cb4624ec8e12a7c53380a04a632c07cc01
Submitter: Jenkins
Branch: master

commit d732d9cb4624ec8e12a7c53380a04a632c07cc01
Author: Dennis Dmitriev <email address hidden>
Date: Mon Mar 30 08:23:44 2015 +0300

    Configure VMs to use host's clock source, simplify timesync

    Correct time syncronization is not possible from devops now because
    it requires close interaction with the OpenStack clusters (because
    of p_ntp resources in the pacemaker) that is out of scope of devops
    purposes.

    Instead of making too complicated time syncronization routines in
    devops, let's use the host clock source as the single source for
    all VMs for quick setting the actual time for nodes.

    System tests are still allowed to use more suitable ways for time
    syncronization.

    Change-Id: I6d3b2a89cd72b24d6c4f558534cd9dea82925b88
    Related-Bug: #1433484

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-qa (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/170141

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/170141
Committed: https://git.openstack.org/cgit/stackforge/fuel-qa/commit/?id=adecb980e291426745d7f444bd51d734ad7f4b84
Submitter: Jenkins
Branch: master

commit adecb980e291426745d7f444bd51d734ad7f4b84
Author: Dennis Dmitriev <email address hidden>
Date: Thu Apr 2 17:03:12 2015 +0300

    Use host's NTPD as the time source for NTPD on the master node

    To get NTPD on master node syncronized more quickly but not abuse
    external NTPD servers, let's try to use NTPD on the host where test
    is running.

    When master node provisioned, NTPD on the host is checked via
    'ntpdate' command. If it completes successfuly, then replase
    settings for NTP* servers in the /etc/fuel/astute.yaml

    Change-Id: Ida6c12013bcd95a638ffc624532ea71e1716b64d
    Related-Bug: #1433484

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

I think we also should backport it to stable series because we have also time-sync problems there
Logs from 5.1.2 failure are attached
http://jenkins-product.srt.mirantis.net:8080/view/5.1.2/job/5.1.2.ubuntu.bvt_2/146/testReport/junit/%28root%29/prepare_slaves_5/prepare_slaves_5/

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-qa (master)

Fix proposed to branch: master
Review: https://review.openstack.org/170797

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/170797
Committed: https://git.openstack.org/cgit/stackforge/fuel-qa/commit/?id=6fa3a00b5186e9253e33adece62f317cc7e7a99a
Submitter: Jenkins
Branch: master

commit 6fa3a00b5186e9253e33adece62f317cc7e7a99a
Author: Dennis Dmitriev <email address hidden>
Date: Mon Apr 6 12:54:16 2015 +0300

    Add 'ntp' helper with more strong time synchronizing methods

    - all nodes are synchronized by groups:
      - in first group only Fuel admin node synchronized,
      - in second group all controllers are synchronized (with admin
            node or external ntp server)
      - in third group all other nodes are synchronized (with admin
            node or a controller)

    - if any node is syncronized to admin node (/etc/ntp.conf), then
      options 'minpoll 3 maxpoll 5' added to 'server' in the config.

    - decrease waiting for nailgun after revert to 30 sec

    Change-Id: I7fd4de836f72a20ee2317114a2f6e810a0e18353
    Closes-Bug: #1433484

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
ruhe (ruhe) wrote :
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.