OVB jobs failing in overcloud-prep-images with timeout on introspection

Bug #1829468 reported by Marios Andreou
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Incomplete
Wishlist
Unassigned

Bug Description

Many examples across branches like [1][2][3] (master/stein/rocky) looks like it affects all ovb jobs. Trace from overcloud_prep_images.log.txt.gz like:

    2019-05-16 17:12:33 | + openstack overcloud node introspect --all-manageable
    2019-05-16 17:12:36 | Waiting for messages on queue 'tripleo' with no timeout.
    2019-05-16 17:14:26 | Waiting for introspection to finish...
    2019-05-16 17:14:26 |
    2019-05-16 17:14:26 | Introspection completed.
    2019-05-16 17:14:26 | + openstack overcloud node provide --all-manageable
    2019-05-16 17:14:29 | Waiting for messages on queue 'tripleo' with no timeout.

And nothing else happens. I see these in errors [4]:

    2019-05-16 21:06:47.682 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova.virt.ironic.driver [req-5877e70e-8475-4fd3-8d4b-f85800271e5f - - - - -] An unknown error has occurred when trying to get the list of nodes from the Ironic inventory. Error: StrictVersion instance has no attribute 'version'
    ...
    2019-05-16 22:08:06.931 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR oslo_service.periodic_task raise exception.VirtDriverNotReady()

[1] http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/f42ce71/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz
[2] http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein/cdfb1e9/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz
[3] http://logs.rdoproject.org/openstack-periodic-24hr/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-rocky/b7e3274/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz
[4] http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/f42ce71/logs/undercloud/var/log/extra/errors.txt.txt.gz

Tags: ci
Revision history for this message
Marios Andreou (marios-b) wrote :

    (ykarel) Master/stein also affected, check https://review.opendev.org/#/c/653279/1 (possible breakage), https://review.opendev.org/#/c/659592/ (possible fix) and https://review.opendev.org/#/c/659612/1 (blocking invalid tags), we can proactively do this in rdoinfo

tags: added: promotion-blocker
Revision history for this message
Marios Andreou (marios-b) wrote :

we have that to pin us until the fixes become available. Fixes at https://review.opendev.org/#/q/I3b25f4fb170aa93159ffa8074dc74fa6f50671b7

we are pinning ironicclient in rdo with https://review.rdoproject.org/r/20787

Revision history for this message
Marios Andreou (marios-b) wrote :

this should no longer be blocking us with the pin we should not be seeing this in ci jobs.

Unfortunately looks like ironicclient is not released very often looking at https://pypi.org/project/python-ironicclient/#history

We will need to keep the pin https://review.rdoproject.org/r/20787 until a v 2.7.2 becomes available

tags: removed: promotion-blocker
Revision history for this message
Marios Andreou (marios-b) wrote :

removed promotion-blocker see comment #3

Changed in tripleo:
milestone: none → train-1
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Incomplete
Revision history for this message
Rafael Folco (rafaelfolco) wrote :

Ruck/rover can verify if this is valid bug.

Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
assignee: nobody → Sorin Sbarnea (ssbarnea)
Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → train-rc1
Sorin Sbarnea (ssbarnea)
Changed in tripleo:
assignee: Sorin Sbarnea (ssbarnea) → nobody
Revision history for this message
Ronelle Landy (rlandy) wrote :

Seeing this again in master deployments:

2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron Traceback (most recent call last):
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron File "/usr/bin/neutron-dhcp-agent", line 10, in <module>
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron sys.exit(main())
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/cmd/eventlet/agents/dhcp.py", line 17, in main
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron dhcp_agent.main()
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp_agent.py", line 49, in main
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron manager='neutron.agent.dhcp.agent.DhcpAgentWithStateReport')
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/service.py", line 414, in create
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron periodic_fuzzy_delay=periodic_fuzzy_delay)
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/service.py", line 345, in __init__
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron manager_class = importutils.import_class(self.manager_class_name)
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron File "/usr/lib/python2.7/site-packages/oslo_utils/importutils.py", line 30, in import_class
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron __import__(mod_str)
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 39, in <module>
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron from neutron.agent.linux import dhcp
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 68, in <module>
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron class DictModel(collections.abc.MutableMapping):
2019-12-08 16:13:25.580 ERROR /var/log/containers/neutron/dhcp-agent.log.1: 715132 ERROR neutron AttributeError: 'module' object has no attribute 'abc'

Revision history for this message
Ronelle Landy (rlandy) wrote :

Failed container:
a56877deb5f2 192.168.24.1:8787/tripleomaster/centos-binary-neutron-dhcp-agent:bd316fa91fad3df7c4b2e7847399d03e16625a42_3599f536-updated-20191208133445 kolla_start 3 hours ago Exited (1) Less than a second ago neutron_dhcp

Changed in tripleo:
importance: Critical → Wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.