(inconsistent) periodic centos8 fs39 fails sometimes with Error, some other host (FA:16:3E:B5:60:2A) already uses address 10.0.0.1

Bug #1874418 reported by Marios Andreou
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Unassigned

Bug Description

In [1][2][3] (many other examples this happens a lot) periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-master fails during undercloud setup due to a conflict for the 10.0.0.1 address:

        2020-04-23 01:46:51.064683 | primary | TASK [Add eth2 interface from eth2.conf] ***************************************
        2020-04-23 01:46:51.064740 | primary | Thursday 23 April 2020 01:46:51 +0000 (0:00:00.030) 0:03:36.186 ********
        2020-04-23 01:46:52.724367 | primary | fatal: [undercloud]: FAILED! => {
...
        2020-04-23 01:46:52.724673 | primary | [2020/04/23 01:46:52 AM] [INFO] running ifup on interface: eth2
        2020-04-23 01:46:52.724682 | primary | [2020/04/23 01:46:52 AM] [ERROR] Failure(s) occurred when applying configuration
        2020-04-23 01:46:52.724688 | primary | [2020/04/23 01:46:52 AM] [ERROR] stdout: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Error, some other host (FA:16:3E:B5:60:2A) already uses address 10.0.0.1.

---
It appears that 10.0.0.1 is hard-coded for the undercloud with [4]. The error then happens when one of the ovb nodes gets the same address in nodes.json [5]:

        2020-04-23 01:39:52.052646 | TASK [ovb-manage : Build nodes.json file to be used as instackenv.json]
        2020-04-23 01:40:00.492779 | primary | Undercloud undercloud-89503 specified in the environment file is not available in nova. No undercloud details will be included in the output.
...
        2020-04-23 01:40:00.493739 | primary | "network_details": {
        2020-04-23 01:40:00.493748 | primary | "baremetal-89503-extra_0": {
...
        2020-04-23 01:40:00.493934 | primary | "public-89503": [
        2020-04-23 01:40:00.493954 | primary | {
        2020-04-23 01:40:00.493964 | primary | "OS-EXT-IPS-MAC:mac_addr": "fa:16:3e:b5:60:2a",
        2020-04-23 01:40:00.493973 | primary | "version": 4,
        2020-04-23 01:40:00.493982 | primary | "addr": "10.0.0.1",

---
Note there is some similarity to the bug at [6] and I added comments/15 there recently but I believe this is a different root cause hence filing here

[1] https://logserver.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-master/028ad5c/job-output.txt
[2] https://logserver.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-master/c79e43c/job-output.txt
[3] https://logserver.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-master/a03877a/job-output.txt
[4] https://opendev.org/openstack/tripleo-quickstart-extras/src/commit/cb6c9c47c8c8e96975d0d5c0a0ff6b5631ea95df/playbooks/prepare-slave.yml#L44
[5] https://opendev.org/openstack/openstack-virtual-baremetal/src/branch/master/openstack_virtual_baremetal/build_nodes_json.py
[6] https://bugs.launchpad.net/tripleo/+bug/1818060

Tags: ci
Changed in tripleo:
status: New → Triaged
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/725244

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by Marios Andreou (<email address hidden>) on branch: master
Review: https://review.opendev.org/725244

summary: - periodic centos8 fs39 fails often with Error, some other host
- (FA:16:3E:B5:60:2A) already uses address 10.0.0.1
+ (inconsistent) periodic centos8 fs39 fails sometimes with Error, some
+ other host (FA:16:3E:B5:60:2A) already uses address 10.0.0.1
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

We create an external ("public" network) interface on undercloud:
https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/prepare-slave.yml#L37-L45
network_config:
    - type: interface
      name: eth2
      use_dhcp: false
      mtu: 1450
      addresses:
        - ip_netmask: 10.0.0.1/24
        - ip_netmask: 2001:db8:fd00:1000::1/64

when we create extra node with OVB, we receive for it DHCP from public network and sometimes it allocates 10.0.0.1 for it, which overlaps with manual set undercloud eth2 above.

So there are possible solutions:
1) to set manually hardcoded IP for extra node in OVB template
2) to remove 10.0.0.1 from allocation pool of public network in OVB stack

Revision history for this message
Harald Jensås (harald-jensas) wrote :

In my personal automation I set the public IP on the undercloud to use the IP address of the port on the "public" network[1] to avoid this type of conflict.

We could improve OVB to put the IP's into the output of the stack in [2] and [3], like it's done with the private IP and the floating IP? Make the "Create eth2.conf file" playbook step use a template taking the IP address from the stack output as input?

I'm also not against alterntive 2) in the previous. Using allocation_pool's in OVB, we could split the default /24 networks into a lower and upper range, where the upper range is in OVB's allocation pool and we use the lower range of addresses in the undercloud/overcloud we deploy in OVB. (In the routed networks implementation I opted to use fixed_ips for router addresses[4] in OVB, I exclude those in undercloud/overcloud allocation pools to avoid duplicate addreses.

[1] https://github.com/hjensas/ooo-bp-tripleo-routed-networks-templates-testing/blob/master/ovb/deploy_ovb.sh#L16
[2] https://opendev.org/openstack/openstack-virtual-baremetal/src/branch/master/templates/undercloud.yaml#L54
[3] https://opendev.org/openstack/openstack-virtual-baremetal/src/branch/master/templates/quintupleo.yaml#L190
[4] https://opendev.org/openstack/openstack-virtual-baremetal/src/branch/master/templates/undercloud-networks-routed.yaml#L39-L42

Revision history for this message
Harald Jensås (harald-jensas) wrote :

I proposed a change to OVB, https://review.opendev.org/730748, to make the public IP available in stack output.

wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
Revision history for this message
wes hayutin (weshayutin) wrote :

hrm.. looking at the ovb settings..

https://opendev.org/openstack/tripleo-quickstart-extras/src/branch/master/config/environments/ovb-common.yml#L37

external_interface_ip: 10.0.0.1
public_net_pool_start: 10.0.0.50
public_net_pool_end: 10.0.0.100

10.0.0.1 is NOT in the public_net_pool

Revision history for this message
Rafael Folco (rafaelfolco) wrote :
Revision history for this message
yatin (yatinkarel) wrote :

https://review.opendev.org/#/c/753566/ + https://review.rdoproject.org/r/#/c/27934/ should clear it, both are merged, and allocation pool is being used in ovb jobs now. Let's monitor fs039 for next couple of runs and close the bug.

Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.