Host system should have time synced during deployment to prevent deployment issues.

Bug #1776869 reported by Jiří Stránský
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Alex Schultz

Bug Description

We've encountered a number of issues caused by the *host* machine being out of time sync (even in deployments when we provided correct NtpServer parameter to the overcloud Heat stack).

It seems that when virtual machines start, they get clock set by the hypervisor. This persists until the VM does its own NTP sync, and until then it can cause a variety of "headscratcher" issues which are very difficult to debug and comprehend, usually ones which one person is able to reproduce fairly consistently, but others don't hit them ever.

I'm not sure if Quickstart itself should be in charge of setting up NTP sync on the host machine (maybe?) but i think it should at least refuse to run if NTP is not synced.

summary: - [RFE] quickstart should validate that *host* machine is NTP synced
+ [RFE] quickstart should validate that host machine is NTP synced
Revision history for this message
Carlos Camacho (ccamacho) wrote : Re: [RFE] quickstart should validate that host machine is NTP synced

I hit issues related to the UC backup/restore because of sync issues. Would be a good thing to have.

Revision history for this message
Jiří Stránský (jistr) wrote :
Revision history for this message
Alex Schultz (alex-schultz) wrote :

We should probably have a ntpdate run in a hostprep task to alleviate this to ensure we always have a proper time

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/576888

Changed in tripleo:
assignee: nobody → Alex Schultz (alex-schultz)
status: Triaged → In Progress
Revision history for this message
Alex Schultz (alex-schultz) wrote : Re: [RFE] quickstart should validate that host machine is NTP synced

For clarification this should be solved in tripleo and not in quickstart as it will possibly affect users who are not using quickstart or some other scripting. During the deployment we should sync the time to ensure the host is in a good state for docker.

summary: - [RFE] quickstart should validate that host machine is NTP synced
+ Host system should have time synced during deployment to prevent
+ deployment issues.
tags: added: containers
removed: quickstart
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/576888
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=a866f55691c5871db335cd976c71d3429dbf62b1
Submitter: Zuul
Branch: master

commit a866f55691c5871db335cd976c71d3429dbf62b1
Author: Alex Schultz <email address hidden>
Date: Wed Jun 20 08:54:57 2018 -0600

    Add host prep step for ntp time sync

    Docker doesn't like it when the time shifts so if we're building
    containers when the ntp time sync actually occurs it can lead to
    deployment failures. To prevent this, let's force a ntpdate on the host
    during step1 to ensure the hardware time is properly synced before
    proceeding.

    Change-Id: I812c7da90ae06120707fd8795a41e4fd867f510e
    Closes-Bug: #1776869

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/577631

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/577861

Revision history for this message
Rajesh Tailor (ratailor) wrote :

I have got ntp related error during overcloud deploy. I am using oooq for deployment on a physical host.

The error in overcloud_deploy.log file looks like this:

2018-06-26 10:16:07 | "TASK [NTP settings] ************************************************************",
2018-06-26 10:16:07 | "ok: [localhost]",
2018-06-26 10:16:07 | "",
2018-06-26 10:16:07 | "TASK [Install ntpdate] *********************************************************",
2018-06-26 10:16:07 | "skipping: [localhost]",
2018-06-26 10:16:07 | "",
2018-06-26 10:16:07 | "TASK [Ensure system is NTP time synced] ****************************************",
2018-06-26 10:16:07 | "fatal: [localhost]: FAILED! => {\"changed\": true, \"cmd\": [\"ntpdate\", \"-u\", \"pool.ntp.org\"], \"delta\": \"0:00:08.853160\", \"end\": \"2018-06-26 10:16:04.915994\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2018-06-26 10:15:56.062834\", \"stderr\": \"26 Jun 10:16:04 ntpdate[15981]: no server suitable for synchronization found\", \"stderr_lines\": [\"26 Jun 10:16:04 ntpdate[15981]: no server suitable for synchronization found\"], \"stdout\": \"\", \"stdout_lines\": []}",
2018-06-26 10:16:07 | "\tto retry, use: --limit @/var/lib/heat-config/heat-config-ansible/c5048b73-c3d0-42e7-8111-e799ab0cb586_playbook.retry",

Revision history for this message
Jiří Stránský (jistr) wrote :

@ratailor that looks like pool.ntp.org wasn't reachable from your machines perhaps? Previously these errors would go silently unnoticed, now we'll fail when the NTP sync fails. I think that's good because we'll get a clear error instead of hard to debug errors later (due to clock skew).

Revision history for this message
Alex Schultz (alex-schultz) wrote :

Yea I'm ok with this error. It seems to be clearer from that messaging that we cannot sync the NTP time. That's an environmental issue that usually fails silently and causes other issues down the line. In this case you'd need your deployment to use a reachable NTP source.

Changed in tripleo:
milestone: stein-1 → rocky-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/queens)

Reviewed: https://review.openstack.org/577631
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=72dba84af8d1634c50610c487df3aa16adef4361
Submitter: Zuul
Branch: stable/queens

commit 72dba84af8d1634c50610c487df3aa16adef4361
Author: Alex Schultz <email address hidden>
Date: Wed Jun 20 08:54:57 2018 -0600

    Add host prep step for ntp time sync

    Docker doesn't like it when the time shifts so if we're building
    containers when the ntp time sync actually occurs it can lead to
    deployment failures. To prevent this, let's force a ntpdate on the host
    during step1 to ensure the hardware time is properly synced before
    proceeding.

    Change-Id: I812c7da90ae06120707fd8795a41e4fd867f510e
    Closes-Bug: #1776869
    (cherry picked from commit a866f55691c5871db335cd976c71d3429dbf62b1)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/pike)

Reviewed: https://review.openstack.org/577861
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=d82b3d11eebe952e0fabd915730d6f5d1f8187ef
Submitter: Zuul
Branch: stable/pike

commit d82b3d11eebe952e0fabd915730d6f5d1f8187ef
Author: Alex Schultz <email address hidden>
Date: Wed Jun 20 08:54:57 2018 -0600

    Add host prep step for ntp time sync

    Docker doesn't like it when the time shifts so if we're building
    containers when the ntp time sync actually occurs it can lead to
    deployment failures. To prevent this, let's force a ntpdate on the host
    during step1 to ensure the hardware time is properly synced before
    proceeding.

    Change-Id: I812c7da90ae06120707fd8795a41e4fd867f510e
    Closes-Bug: #1776869
    (cherry picked from commit a866f55691c5871db335cd976c71d3429dbf62b1)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 8.0.4

This issue was fixed in the openstack/tripleo-heat-templates 8.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.14

This issue was fixed in the openstack/tripleo-heat-templates 7.0.14 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 9.0.0.0b4

This issue was fixed in the openstack/tripleo-heat-templates 9.0.0.0b4 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/658927

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/659380

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.opendev.org/659380
Reason: needs to be done sooner than this

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/658927
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=eafe3908535ec766866efb74110e057ea2509c45
Submitter: Zuul
Branch: master

commit eafe3908535ec766866efb74110e057ea2509c45
Author: Alex Schultz <email address hidden>
Date: Mon May 13 15:42:59 2019 -0600

    Try a timesync as part of first boot

    We're running into issues where if someone creates a firstboot script
    that touches a file that will eventually be mounted into a container, it
    can fail if the time of the file ends up being in the future due to a
    later timesync. Let's try a basic timesync bootstrap as part of
    cloud-init to address the case of configuration changes occuring prior
    to the host_prep_tasks where we traditionally configure chrony/ntp

    Depends-On: https://review.opendev.org/#/c/659398
    Change-Id: I294eba826b98c5793336815282f766e3d2e60a51
    Related-Bug: #1776869

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/660763

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/660764

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/660767

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/rocky)

Reviewed: https://review.opendev.org/660764
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=45dcd0e5a8334d8bd99c9d9764955e00b707c25d
Submitter: Zuul
Branch: stable/rocky

commit 45dcd0e5a8334d8bd99c9d9764955e00b707c25d
Author: Alex Schultz <email address hidden>
Date: Mon May 13 15:42:59 2019 -0600

    Try a timesync as part of first boot

    We're running into issues where if someone creates a firstboot script
    that touches a file that will eventually be mounted into a container, it
    can fail if the time of the file ends up being in the future due to a
    later timesync. Let's try a basic timesync bootstrap as part of
    cloud-init to address the case of configuration changes occuring prior
    to the host_prep_tasks where we traditionally configure chrony/ntp

    NOTE: For Rocky, we use ntp instead of chrony.

    Change-Id: I294eba826b98c5793336815282f766e3d2e60a51
    Related-Bug: #1776869
    (cherry picked from commit eafe3908535ec766866efb74110e057ea2509c45)

tags: added: in-stable-rocky
tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/stein)

Reviewed: https://review.opendev.org/660763
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=b23fd9f708fb84ee1187a06e2c0bc1c4ca21637f
Submitter: Zuul
Branch: stable/stein

commit b23fd9f708fb84ee1187a06e2c0bc1c4ca21637f
Author: Alex Schultz <email address hidden>
Date: Mon May 13 15:42:59 2019 -0600

    Try a timesync as part of first boot

    We're running into issues where if someone creates a firstboot script
    that touches a file that will eventually be mounted into a container, it
    can fail if the time of the file ends up being in the future due to a
    later timesync. Let's try a basic timesync bootstrap as part of
    cloud-init to address the case of configuration changes occuring prior
    to the host_prep_tasks where we traditionally configure chrony/ntp

    Depends-On: https://review.opendev.org/#/c/659398
    Change-Id: I294eba826b98c5793336815282f766e3d2e60a51
    Related-Bug: #1776869
    (cherry picked from commit eafe3908535ec766866efb74110e057ea2509c45)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/queens)

Reviewed: https://review.opendev.org/660767
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=35fc35bc1f42b5d868ea2f03fd144f53d717ec8c
Submitter: Zuul
Branch: stable/queens

commit 35fc35bc1f42b5d868ea2f03fd144f53d717ec8c
Author: Alex Schultz <email address hidden>
Date: Mon May 13 15:42:59 2019 -0600

    Try a timesync as part of first boot

    We're running into issues where if someone creates a firstboot script
    that touches a file that will eventually be mounted into a container, it
    can fail if the time of the file ends up being in the future due to a
    later timesync. Let's try a basic timesync bootstrap as part of
    cloud-init to address the case of configuration changes occuring prior
    to the host_prep_tasks where we traditionally configure chrony/ntp

    NOTE: For queens, the overcloud.j2.yaml change lives in
    puppet/role.role.j2.yaml. Also we use ntp instead of chrony.

    Conflicts:
     overcloud.j2.yaml

    Change-Id: I294eba826b98c5793336815282f766e3d2e60a51
    Related-Bug: #1776869
    (cherry picked from commit eafe3908535ec766866efb74110e057ea2509c45)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.