overcloud deploy failing in Ensure system is NTP time synced

Bug #1958116 reported by Ananya Banerjee
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

overcloud deploy is failing in periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-victoria with:

refid: 00000000, correction: 0.000000003, skew: 0.000\ntry: 28, refid: 00000000, correction: 0.000000003, skew: 0.000\ntry: 29, refid: 00000000, correction: 0.000000003, skew: 0.000\ntry: 30, refid: 00000000, correction: 0.000000004, skew: 0.000", "stdout_lines": ["try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 5, refid: 00000000, correction: 0.000000000, skew: 0.000", "try: 6, refid: 00000000, correction: 0.000000001, skew: 0.000", "try: 7, refid: 00000000, correction: 0.000000001, skew: 0.000", "try: 8, refid: 00000000, correction: 0.000000001, skew: 0.000", "try: 9, refid: 00000000, correction: 0.000000001, skew: 0.000", "try: 10, refid: 00000000, correction: 0.000000001, skew: 0.000", "try: 11, refid: 00000000, correction: 0.000000001, skew: 0.000", "try: 12, refid: 00000000, correction: 0.000000001, skew: 0.000", "try: 13, refid: 00000000, correction: 0.000000001, skew: 0.000", "try: 14, refid: 00000000, correction: 0.000000002, skew: 0.000", "try: 15, refid: 00000000, correction: 0.000000002, skew: 0.000", "try: 16, refid: 00000000, correction: 0.000000002, skew: 0.000", "try: 17, refid: 00000000, correction: 0.000000002, skew: 0.000", "try: 18, refid: 00000000, correction: 0.000000002, skew: 0.000", "try: 19, refid: 00000000, correction: 0.000000002, skew: 0.000", "try: 20, refid: 00000000, correction: 0.000000002, skew: 0.000", "try: 21, refid: 00000000, correction: 0.000000002, skew: 0.000", "try: 22, refid: 00000000, correction: 0.000000003, skew: 0.000", "try: 23, refid: 00000000, correction: 0.000000003, skew: 0.000", "try: 24, refid: 00000000, correction: 0.000000003, skew: 0.000", "try: 25, refid: 00000000, correction: 0.000000003, skew: 0.000", "try: 26, refid: 00000000, correction: 0.000000003, skew: 0.000", "try: 27, refid: 00000000, correction: 0.000000003, skew: 0.000", "try: 28, refid: 00000000, correction: 0.000000003, skew: 0.000", "try: 29, refid: 00000000, correction: 0.000000003, skew: 0.000", "try: 30, refid: 00000000, correction: 0.000000004, skew: 0.000"]}
2022-01-13 07:13:41 | 2022-01-13 07:13:41.936466 | fa163e40-c8a4-d97a-b1b8-00000000198b | TIMING | Ensure system is NTP time synced | overcloud-controller-1 | 0:08:52.714991 | 290.64s

https://logserver.rdoproject.org/openstack-periodic-integration-stable2/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-victoria/a9ad493/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
https://logserver.rdoproject.org/46/35646/70/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-victoria/3d6b5fb/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

Revision history for this message
Ananya Banerjee (frenzyfriday) wrote :

We see this failure starting from 2022-01-13 05:18:39

Changed in tripleo:
importance: Undecided → Critical
status: New → Triaged
milestone: none → yoga-2
tags: added: promotion-blocker
Revision history for this message
Ronelle Landy (rlandy) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Alex Schultz (alex-schultz) wrote :

+ chrony activity
200 OK
0 sources online
0 sources offline
0 sources doing burst (return to online)
0 sources doing burst (return to offline)
4 sources with unknown address

The configured sources never resolved

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

couldn't it be that resolve failed bevause of the same networkmanager related race-condition as described in https://bugzilla.redhat.com/show_bug.cgi?id=2040915 ?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Yes, that looks like a race with network manager.
We should force online chrony sources each time we perform an os-net-config trigger

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/824980

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/824953

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/824954

Revision history for this message
Alex Schultz (alex-schultz) wrote :

fs35 is missing a default route on the ctlplane for victoria.

https://logserver.rdoproject.org/46/35646/73/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-victoria/84aa1f2/logs/overcloud-controller-0/var/log/extra/network.txt.gz
### IPv4 routing
172.16.0.0/24 dev br-tenant proto kernel scope link src 172.16.0.189
192.168.24.0/24 dev ens3 proto kernel scope link src 192.168.24.24

For comparison sake, here's the routes in a wallaby version:

https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby/5626c5c/logs/overcloud-controller-0/var/log/extra/network.txt.gz
### IPv4 routing
default via 192.168.24.1 dev ens3
172.16.0.0/24 dev br-tenant proto kernel scope link src 172.16.0.85
192.168.24.0/24 dev ens3 proto kernel scope link src 192.168.24.15

Revision history for this message
Alex Schultz (alex-schultz) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/tripleo-ansible/+/824961

Revision history for this message
Harald Jensås (harald-jensas) wrote :

The ctlplane_gateway_ip ansible var is set correctly, and the portmap has it:

https://logserver.rdoproject.org/46/35646/73/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-victoria/84aa1f2/logs/undercloud/home/zuul/config-download/config-download-latest/group_vars/Controller.gz

ctlplane_gateway_ip: 192.168.24.1

deployed_server_port_map:
  overcloud-controller-0-ctlplane:
    fixed_ips:
    - ip_address: 192.168.24.24
    network:
      mtu: 1350
      tags:
      - 192.168.24.0/24
    subnets:
    - cidr: 192.168.24.0/24
      dns_nameservers:
      - 208.67.222.222
      - 208.67.220.220
      gateway_ip: 192.168.24.1
      host_routes: []

The network config used is:
tripleo_network_config_template: templates/ci/multiple_nics_ipv6.j2

The Victoria template does not set a route on the ctlplane interface:

https://opendev.org/openstack/tripleo-ansible/src/branch/stable/victoria/tripleo_ansible/roles/tripleo_network_config/templates/ci/multiple_nics_ipv6.j2

Wallaby does:
https://opendev.org/openstack/tripleo-ansible/src/branch/stable/wallaby/tripleo_ansible/roles/tripleo_network_config/templates/ci/multiple_nics_ipv6.j2

We need to backport this change:
https://opendev.org/openstack/tripleo-ansible/commit/c05bf0b042b2e2c008bff555c5b37265c4e2b69f

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by "Alex Schultz <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/824980

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/wallaby)

Change abandoned by "Alex Schultz <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/824953

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/victoria)

Change abandoned by "Alex Schultz <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/824954

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ansible/+/824961
Committed: https://opendev.org/openstack/tripleo-ansible/commit/bc60336d7c003b5041ac5a32e4fc58a2f9c95f00
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit bc60336d7c003b5041ac5a32e4fc58a2f9c95f00
Author: Sandeep Yadav <email address hidden>
Date: Fri Nov 26 12:29:56 2021 +0530

    Default route on ctlplane in CI ipv6 nic config

    This template is used by featureset035 in CI, Overcloud nodes in fs035
    currently have no outside connectivity. Details in Closes-Bug.

    Moving default route to ctlplane for CI usecase, CI Ipv4 nic config
    template also have default route on ctlplane[1].

    It seems we created this template following the older nic template [2]
    which we use in older branches, which have two default route - one on
    ctlplane and one on external network, which seems wrong.

    [1] https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/tripleo_network_config/templates/ci/multiple_nics.j2#L9-L11
    [2] https://github.com/openstack/tripleo-heat-templates/blob/stable/train/ci/environments/network/multiple-nics-ipv6/nic-configs/controller.yaml#L182-L196

    Closes-Bug: #1958116
    Closes-Bug: #1952391
    Change-Id: I3ef3728f042024033b5f48b89b97d9ac3ef08614
    (cherry picked from commit 6f7f53de73b313e4b6cafd97aab645232f5a799f)

tags: added: in-stable-victoria
Ronelle Landy (rlandy)
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-ansible 2.6.0

This issue was fixed in the openstack/tripleo-ansible 2.6.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by "Jiri Podivin <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/824980
Reason: Better test it elswhere first

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/852270

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by "Jiri Podivin <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/852270

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.