master component network fs 1 fails provision - host not connected to any segments on routed provider network

Bug #1970899 reported by Marios Andreou
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

At [1][2][3] the periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-master fails during node provisioning with error trace like:

  2022-04-28 09:47:12.923120 | fa163ed4-857a-84b3-806c-00000000001a | FATAL | Provision instances | localhost | error={"changed": false, "logging": "Deploy attempt failed on node baremetal-65576-0 (UUID 670af231-0e77-4b60-b42f-9865f4ac7d08), cleaning up\nDeploy attempt failed on node baremetal-65576-1 (UUID ac4e9052-70cc-42d7-ba0c-aa8994d0b438), cleaning up\nDeploy attempt failed on node baremetal-65576-3 (UUID 5f34af5f-44fe-444e-865c-05293d87a2b4), cleaning up\nDeploy attempt failed on node baremetal-65576-2 (UUID e8311804-1467-4373-8943-63f7b9c5461b), cleaning up\n", "msg": "ConflictException: 409: Client Error for url: https://192.168.24.2:13696/v2.0/ports, Host ac4e9052-70cc-42d7-ba0c-aa8994d0b438 is not connected to any segments on routed provider network '6f66ffd7-af6c-43ad-9d64-59f7aa9cd2b6'. It should be connected to one."}

This blocks the network component promotion and has been happening since ~ 19th April (logs at [3] from then)

[1] https://logserver.rdoproject.org/openstack-component-network/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-master/eddd52d/logs/undercloud/home/zuul/overcloud_node_provision.log.txt.gz
[2] https://logserver.rdoproject.org/openstack-component-network/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-master/f50feda/logs/undercloud/home/zuul/overcloud_node_provision.log.txt.gz
[3] https://logserver.rdoproject.org/openstack-component-network/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-master/3b405fd/logs/undercloud/home/zuul/overcloud_node_provision.log.txt.gz

Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
Harald Jensås (harald-jensas) wrote :

https://logserver.rdoproject.org/openstack-component-network/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-master/eddd52d/logs/undercloud/var/log/extra/podman/containers/ironic_neutron_agent/stdout.log.txt.gz

+ exec /usr/bin/ironic-neutron-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ironic_neutron_agent.ini --config-dir /etc/neutron/conf.d/common
Traceback (most recent call last):
  File "/usr/bin/ironic-neutron-agent", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python3.9/site-packages/networking_baremetal/agent/ironic_neutron_agent.py", line 270, in main
    _unregiser_deprecated_opts()
  File "/usr/lib/python3.9/site-packages/networking_baremetal/agent/ironic_neutron_agent.py", line 261, in _unregiser_deprecated_opts
    [CONF._groups[ironic_client.IRONIC_GROUP]._opts[opt]['opt']
  File "/usr/lib/python3.9/site-packages/networking_baremetal/agent/ironic_neutron_agent.py", line 261, in <listcomp>
    [CONF._groups[ironic_client.IRONIC_GROUP]._opts[opt]['opt']
KeyError: 'ironic'

This issue will be resolved once https://review.opendev.org/c/openstack/networking-baremetal/+/839298 is used.

Revision history for this message
Marios Andreou (marios-b) wrote :

OK we are getting networking-baremetal via the baremetal component so we need that to promote to get the patch https://review.opendev.org/c/openstack/networking-baremetal/+/839298

Looking at the current component-ci-testing [1] the version is python-networking-baremetal-5.2.0-0.20220427140901.df6c7c9.el9.src.rpm and checking the versions.csv there [2] it looks like it has Harald patch - the commit from [2] is df6c7c9c55653b818c4a882eafda1967827ddf98 whish is the patch referenced in comment [2]

[1] https://trunk.rdoproject.org/centos9-master/component/baremetal/component-ci-testing/
[2] https://trunk.rdoproject.org/centos9-master/component/baremetal/component-ci-testing/versions.csv
[3] https://github.com/openstack/networking-baremetal/commit/df6c7c9c55653b818c4a882eafda1967827ddf98

Revision history for this message
Dariusz Smigiel (smigiel-dariusz) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

OK great Dariusz so we have step 1 ;)

Looks like promoted-components for baremetal now contains the fix we need - python-networking-baremetal-5.2.0-0.20220427140901.df6c7c9.el9.src.rpm at [1]

So next step is we need a master integration promotion so we can get that fix available in the network component to close out this bug

Again by checking the versions.csv [2] I can see df6c7c9c55653b818c4a882eafda1967827ddf98 which is the one we want

[1] https://trunk.rdoproject.org/centos9-master/component/baremetal/promoted-components/
[2] https://trunk.rdoproject.org/centos9-master/component/baremetal/promoted-components/versions.csv

Revision history for this message
Marios Andreou (marios-b) wrote :

this is being seen in the integration line now - e.g. latest master/9 buildset at [1]

examples of the bug in [2][3][4][5][6] (fs1/2/20/35/39(

        * 2022-05-01 21:55:56.917915 | fa163e2b-52a8-e022-01a2-00000000001a | FATAL | Provision instances | localhost | error={"changed": false, "logging": "Deploy attempt failed on node baremetal-98444-1 (UUID ad090bb0-5da3-4c05-bfa3-5cd27b4e5149), cleaning up\nDeploy attempt failed on node baremetal-98444-3 (UUID 7e1afed4-4e9d-4f6a-b9dc-af6dd12b5386), cleaning up\nDeploy attempt failed on node baremetal-98444-2 (UUID 19e1b5e0-472d-4b16-ae7a-d0493cd8c0d0), cleaning up\nDeploy attempt failed on node baremetal-98444-0 (UUID 86b7582d-85ad-41a9-8ba3-c975958bfe13), cleaning up\n", "msg": "ConflictException: 409: Client Error for url: https://192.168.24.2:13696/v2.0/ports, Host 19e1b5e0-472d-4b16-ae7a-d0493cd8c0d0 is not connected to any segments on routed provider network '04cb49cf-d63b-4c94-8010-b6169cf4d1b7'. It should be connected to one."}

Main issue is that it looks like the *right* version of the package is available in [7][8]

        * python-networking-baremetal-5.2.0-0.20220427140901.df6c7c9.el9.src.rpm 2022-04-27 14:11 51K

[1] https://review.rdoproject.org/zuul/buildset/c225ee91fb674f22a66379b57272782a
[2] https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-master/ea4ff20/logs/undercloud/home/zuul/overcloud_node_provision.log.txt.gz
[3] https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_1comp-featureset002-master/784a53a/logs/undercloud/home/zuul/overcloud_node_provision.log.txt.gz
[4] https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-master/3dca009/logs/undercloud/home/zuul/overcloud_node_provision.log.txt.gz
[5] https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset035-master/9b4d05b/logs/undercloud/home/zuul/overcloud_node_provision.log.txt.gz
[6] https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp_1supp-featureset039-master/8584b40/logs/undercloud/home/zuul/overcloud_node_provision.log.txt.gz
[7] https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-master/ea4ff20/logs/undercloud/etc/yum.repos.d/delorean.repo.txt.gz
[8] http://mirror.regionone.vexxhost-nodepool-tripleo.rdoproject.org:8080/rdo/centos9-master/component/baremetal/4c/f0/4cf0147e86039e8fa2232c3de6d45f1405a515d0_a1894fe9

Revision history for this message
Marios Andreou (marios-b) wrote :

OK and failed again in my sanity check/test just now at https://review.rdoproject.org/r/c/testproject/+/42429/1#message-50a43bf007c537eca70386a01fddbfb797f8bf03

@Harald can you please check again is there something else we are missing possibly?

Revision history for this message
Harald Jensås (harald-jensas) wrote :

INFO:__main__:Setting permission for /var/log/neutron/privsep-helper.log
++ cat /run_command
+ CMD='/usr/bin/ironic-neutron-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ironic_neutron_agent.ini --config-dir /etc/neutron/conf.d/common'
+ ARGS=
+ [[ ! -n '' ]]
+ . kolla_extend_start
+ echo 'Running command: '\''/usr/bin/ironic-neutron-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ironic_neutron_agent.ini --config-dir /etc/neutron/conf.d/common'\'''
Running command: '/usr/bin/ironic-neutron-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ironic_neutron_agent.ini --config-dir /etc/neutron/conf.d/common'
+ exec /usr/bin/ironic-neutron-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ironic_neutron_agent.ini --config-dir /etc/neutron/conf.d/common
Traceback (most recent call last):
  File "/usr/bin/ironic-neutron-agent", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python3.9/site-packages/networking_baremetal/agent/ironic_neutron_agent.py", line 267, in main
    common_config.register_common_config_options()
AttributeError: module 'neutron.common.config' has no attribute 'register_common_config_options'

So, we seem to not have neutron with [1] in the container.
We had to fix networking-baremetal to load register the options explicitly because of that change, since it neutron stopped registering opts on import.

Now we have the fix in network-baremetal, but we don't have the neutron change.

[1] https://review.opendev.org/c/openstack/neutron/+/837392/12/neutron/common/config.py#52

Revision history for this message
Dariusz Smigiel (smigiel-dariusz) wrote :

We promoted network component. Currently I'm running testproject to verify if the issue is solved.

Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):

I think we are now clear of this. Network component promoted [1] commit_hash: dcdab7b480e4fc62b567a08d16879bea82a74967

Latest run of the integration line today [2] has no examples (compared to yesterday see comment #6 above).

However I still don't see a green run on the network FS1 job for which this bug was filed in the first place - all red currently at [3]. Holding on closing this out for now and especially, how did we promote network without that job?

Will catchup with dasm & rlandy later and update.

Trying a testproject for now at [4]

[1] https://trunk.rdoproject.org/centos9-master/component/network/promoted-components/commit.yaml
[2] https://review.rdoproject.org/zuul/buildset/af8e6da8e7b24619957694aa2a62991f
[3] https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-master
[4] https://review.rdoproject.org/r/c/testproject/+/42489

Revision history for this message
Marios Andreou (marios-b) wrote :

we need the master integration promotion before we can clear this out

i am still seeing it in the network component test @ [1] (testproject link [2])

        * 2022-05-03 08:48:57.393717 | fa163e84-19c6-416a-8a1d-00000000001a | FATAL | Provision instances | localhost | error={"changed": false, "logging": "Deploy attempt failed on node baremetal-42489-1-62888-0 (UUID 2799276c-7cbc-436f-ba0f-08b3f05b9f17), cleaning up\nDeploy attempt failed on node baremetal-42489-1-62888-1 (UUID dcf922d1-c01a-4c34-958c-ed3a9a1d0845), cleaning up\nDeploy attempt failed on node baremetal-42489-1-62888-2 (UUID 29a3ddda-684f-4475-8185-b80fed4a0219), cleaning up\nDeploy attempt failed on node baremetal-42489-1-62888-3 (UUID e1bb3a97-b9d1-4e9e-a48b-893740e9f990), cleaning up\n", "msg": "ConflictException: 409: Client Error for url: https://192.168.24.2:13696/v2.0/ports, Host 29a3ddda-684f-4475-8185-b80fed4a0219 is not connected to any segments on routed provider network 'ac7a334a-63f0-43b4-9a0f-65c47ffce7c4'. It should be connected to one."}

[1] https://logserver.rdoproject.org/89/42489/1/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-master/61aadb3/logs/undercloud/home/zuul/overcloud_node_provision.log.txt.gz

[2] https://review.rdoproject.org/r/c/testproject/+/42489/1#message-ac35243ec5d51cc9cb9a04b6febf959756fb785b

Revision history for this message
Marios Andreou (marios-b) wrote :

also seen in the baremetal component yesterday there [1]

(we are still chasing/waiting for the master promotion - lots of ovb instability with https://bugs.launchpad.net/tripleo/+bug/1971465)

        * 2022-05-03 21:38:18.291935 | fa163ece-3811-82da-4586-00000000001a | FATAL | Provision instances | localhost | error={"changed": false, "logging": "Deploy attempt failed on node baremetal-87788-2 (UUID 38445777-f0ed-44df-a0c2-2b32d80ad1ea), cleaning up\nDeploy attempt failed on node baremetal-87788-0 (UUID fcda4a65-5ae5-45b6-93ef-e7f1aa392cfe), cleaning up\nDeploy attempt failed on node baremetal-87788-3 (UUID 2a6b1edc-3b14-41f7-9b63-771fa36cfcd4), cleaning up\nDeploy attempt failed on node baremetal-87788-1 (UUID de65590b-eb95-4e29-bfc1-668cd199a4aa), cleaning up\n", "msg": "ConflictException: 409: Client Error for url: https://192.168.24.2:13696/v2.0/ports, Host de65590b-eb95-4e29-bfc1-668cd199a4aa is not connected to any segments on routed provider network 'be57cd12-da05-407d-9f8d-35d25c8280ac'. It should be connected to one."}

[1] https://logserver.rdoproject.org/openstack-component-baremetal/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-baremetal-master/6ef1fd7/logs/undercloud/home/zuul/overcloud_node_provision.log.txt.gz

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

This was resolved after the integration line promotion on 06th May.

Node provisioning is failing on other bug now[1] but we have tested its fix as well and ovb job is green[2].

Closing this bug.

[1] https://bugs.launchpad.net/tripleo/+bug/1973038
[2] https://review.rdoproject.org/zuul/build/f68c681f0a784a9b9f8bb3a9061c948d

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.