Ussuri ovb deployment hangs in introspection step.

Bug #1879577 reported by wes hayutin
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

2020-05-19 16:27:43,903 p=89344 u=zuul n=ansible | TASK [Nodes that passed introspection] *****************************************
2020-05-19 16:27:43,904 p=89344 u=zuul n=ansible | Tuesday 19 May 2020 16:27:43 +0000 (0:20:30.787) 0:20:31.348 ***********
2020-05-19 16:27:43,964 p=89344 u=zuul n=ansible | ok: [localhost] =>
  msg: ' No nodes completed introspection successfully!'
2020-05-19 16:27:43,969 p=89344 u=zuul n=ansible | TASK [Nodes that failed introspection] *****************************************
2020-05-19 16:27:43,970 p=89344 u=zuul n=ansible | Tuesday 19 May 2020 16:27:43 +0000 (0:00:00.065) 0:20:31.414 ***********
2020-05-19 16:27:44,028 p=89344 u=zuul n=ansible | fatal: [localhost]: FAILED! =>
  msg: ' 5d7a147c-ecce-4262-990e-95b8106281b4 423b13a2-b5c4-40bf-bcf7-20d07bb528bf a1865b73-e197-4f9c-abaf-d2c5c8aba611 39405908-9918-48bf-a9d0-d5adb41e7eed'

https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-ussuri/d019289/logs/undercloud/home/zuul/ansible.log.txt.gz

https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-ussuri/d019289/logs/undercloud/home/zuul/overcloud_introspect.log.txt.gz

ERROR is here:
https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-ussuri/d019289/logs/undercloud/var/log/extra/errors.txt.txt.gz

2020-05-19 16:27:26.956 ERROR /var/log/containers/ironic-inspector/ironic-inspector.log: 7 ERROR ironic_inspector.node_cache [-] [node: 5d7a147c-ecce-4262-990e-95b8106281b4 state starting] Processing the error event because of an exception <class 'ironic_inspector.utils.Error'>: Failed to set boot device to PXE: Failed to set boot device for node 5d7a147c-ecce-4262-990e-95b8106281b4: Client Error for url: http://192.168.24.3:6385/v1/nodes/5d7a147c-ecce-4262-990e-95b8106281b4/management/boot_device, Node 5d7a147c-ecce-4262-990e-95b8106281b4 is locked by host undercloud.localdomain, please retry after the current operation is completed. raised by ironic_inspector.introspect._do_introspect: ironic_inspector.utils.Error: Failed to set boot device to PXE: Failed to set boot device for node 5d7a147c-ecce-4262-990e-95b8106281b4: Client Error for url: http://192.168.24.3:6385/v1/nodes/5d7a147c-ecce-4262-990e-95b8106281b4/management/boot_device, Node 5d7a147c-ecce-4262-990e-95b8106281b4 is locked by host undercloud.localdomain, please retry after the current operation is completed.

2020-05-19 16:27:34.934 7 DEBUG eventlet.wsgi.server [-] (7) accepted ('192.168.24.1', 43578) server /usr/lib/python3.6/site-packages/eventlet/wsgi.py:985
2020-05-19 16:27:34.937 7 INFO eventlet.wsgi.server [-] 192.168.24.1 "OPTIONS / HTTP/1.0" status: 200 len: 272 time: 0.0015445
2020-05-19 16:27:36.018 7 ERROR ironic_inspector.utils [-] [node: 39405908-9918-48bf-a9d0-d5adb41e7eed state starting] Failed to set boot device to PXE: Failed to set boot device for node 39405908-9918-48bf-a9d0-d5adb41e7eed: Client Error for url: http://192.168.24.3:6385/v1/nodes/39405908-9918-48bf-a9d0-d5adb41e7eed/management/boot_device, Node 39405908-9918-48bf-a9d0-d5adb41e7eed is locked by host undercloud.localdomain, please retry after the current operation is completed.: openstack.exceptions.ConflictException: Failed to set boot device for node 39405908-9918-48bf-a9d0-d5adb41e7eed: Client Error for url: http://192.168.24.3:6385/v1/nodes/39405908-9918-48bf-a9d0-d5adb41e7eed/management/boot_device, Node 39405908-9918-48bf-a9d0-d5adb41e7eed is locked by host undercloud.localdomain, please retry after the current operation is completed.

https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-ussuri/d019289/logs/undercloud/var/log/containers/ironic-inspector/ironic-inspector.log.txt.gz

Nodes were successfully imported:
https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-ussuri/d019289/logs/undercloud/home/zuul/overcloud_import_nodes.log.txt.gz

Ironic IPMI calls to power off appear fine:
https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-ussuri/d019289/logs/undercloud/var/log/containers/ironic/ironic-conductor.log.txt.gz

Revision history for this message
Harald Jensås (harald-jensas) wrote :
Revision history for this message
Harald Jensås (harald-jensas) wrote :

Nope, ignore my previous comment - undercloud is CentOS Linux release 8.1.1911 (Core), so this would'nt have ipmitool with the cipher issue.

Revision history for this message
wes hayutin (weshayutin) wrote : Re: [Bug 1879577] Re: Ussuri ovb deployment hangs in introspection step.
Download full text (5.9 KiB)

Well even still..
The bmc console logs are very different, which makes me suspect something
w/ the bmc setup.

Ussuri log:
https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-ussuri/d019289/logs/bmc-console.log

Master log:
https://logserver.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master/e21345c/logs/bmc-console.log

On Tue, May 19, 2020 at 4:31 PM Harald Jensås <email address hidden> wrote:

> Nope, ignore my previous comment - undercloud is CentOS Linux release
> 8.1.1911 (Core), so this would'nt have ipmitool with the cipher issue.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1879577
>
> Title:
> Ussuri ovb deployment hangs in introspection step.
>
> Status in tripleo:
> Triaged
>
> Bug description:
> 2020-05-19 16:27:43,903 p=89344 u=zuul n=ansible | TASK [Nodes that
> passed introspection] *****************************************
> 2020-05-19 16:27:43,904 p=89344 u=zuul n=ansible | Tuesday 19 May 2020
> 16:27:43 +0000 (0:20:30.787) 0:20:31.348 ***********
> 2020-05-19 16:27:43,964 p=89344 u=zuul n=ansible | ok: [localhost] =>
> msg: ' No nodes completed introspection successfully!'
> 2020-05-19 16:27:43,969 p=89344 u=zuul n=ansible | TASK [Nodes that
> failed introspection] *****************************************
> 2020-05-19 16:27:43,970 p=89344 u=zuul n=ansible | Tuesday 19 May 2020
> 16:27:43 +0000 (0:00:00.065) 0:20:31.414 ***********
> 2020-05-19 16:27:44,028 p=89344 u=zuul n=ansible | fatal: [localhost]:
> FAILED! =>
> msg: ' 5d7a147c-ecce-4262-990e-95b8106281b4
> 423b13a2-b5c4-40bf-bcf7-20d07bb528bf a1865b73-e197-4f9c-abaf-d2c5c8aba611
> 39405908-9918-48bf-a9d0-d5adb41e7eed'
>
> https://logserver.rdoproject.org/openstack-periodic-latest-
> released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-
> centos-8-ovb-3ctlr_1comp-
> featureset001-ussuri/d019289/logs/undercloud/home/zuul/ansible.log.txt.gz
>
> https://logserver.rdoproject.org/openstack-periodic-latest-
> released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-
> centos-8-ovb-3ctlr_1comp-
>
> featureset001-ussuri/d019289/logs/undercloud/home/zuul/overcloud_introspect.log.txt.gz
>
> ERROR is here:
>
> https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-ussuri/d019289/logs/undercloud/var/log/extra/errors.txt.txt.gz
>
>
> 2020-05-19 16:27:26.956 ERROR
> /var/log/containers/ironic-inspector/ironic-inspector.log: 7 ERROR
> ironic_inspector.node_cache [-] [node: 5d7a147c-ecce-4262-990e-95b8106281b4
> state starting] Processing the error event because of an exception <class
> 'ironic_inspector.utils.Error'>: Failed to set boot device to PXE: Failed
> to set boot device for node 5d7a147c-ecce-4262-990e-95b8106281b4: Client
> Error for url:
> http://192.168.24.3:6385/v1/nodes/5d7a147c-ecce-4262-990e-...

Read more...

Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
Harald Jensås (harald-jensas) wrote :
Download full text (7.2 KiB)

An observation, introspection starts and nodes seem to boot. Then after 20 minutes introspection is aborted, and retried, probably a timeout? Then we get the errors setting boot device PXE when trying to start introspection again.

*** Introspection start ***

2020-05-19 16:07:22.133 7 INFO ironic_inspector.introspect [-] [node: 5d7a147c-ecce-4262-990e-95b8106281b4 state starting] Introspection started successfully

*** we see successfull DHCP for all the nodes ***

May 19 16:07:34 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:3a:d2:42
May 19 16:07:34 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.100 fa:16:3e:3a:d2:42
May 19 16:07:34 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:3a:d2:42
May 19 16:07:34 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.100 fa:16:3e:3a:d2:42
May 19 16:07:34 dnsmasq-dhcp[8]: DHCPREQUEST(br-ctlplane) 192.168.24.100 fa:16:3e:3a:d2:42
May 19 16:07:34 dnsmasq-dhcp[8]: DHCPACK(br-ctlplane) 192.168.24.100 fa:16:3e:3a:d2:42
May 19 16:08:42 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:88:73:76
May 19 16:08:42 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.101 fa:16:3e:88:73:76
May 19 16:08:42 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:88:73:76
May 19 16:08:42 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.101 fa:16:3e:88:73:76
May 19 16:08:42 dnsmasq-dhcp[8]: DHCPREQUEST(br-ctlplane) 192.168.24.101 fa:16:3e:88:73:76
May 19 16:08:42 dnsmasq-dhcp[8]: DHCPACK(br-ctlplane) 192.168.24.101 fa:16:3e:88:73:76
May 19 16:09:38 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:25:97:6d
May 19 16:09:38 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.102 fa:16:3e:25:97:6d
May 19 16:09:38 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:25:97:6d
May 19 16:09:38 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.102 fa:16:3e:25:97:6d
May 19 16:09:38 dnsmasq-dhcp[8]: DHCPREQUEST(br-ctlplane) 192.168.24.102 fa:16:3e:25:97:6d
May 19 16:09:38 dnsmasq-dhcp[8]: DHCPACK(br-ctlplane) 192.168.24.102 fa:16:3e:25:97:6d
May 19 16:14:13 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:75:51:8a
May 19 16:14:13 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.103 fa:16:3e:75:51:8a
May 19 16:14:13 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:75:51:8a
May 19 16:14:13 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.103 fa:16:3e:75:51:8a
May 19 16:14:13 dnsmasq-dhcp[8]: DHCPREQUEST(br-ctlplane) 192.168.24.103 fa:16:3e:75:51:8a
May 19 16:14:13 dnsmasq-dhcp[8]: DHCPACK(br-ctlplane) 192.168.24.103 fa:16:3e:75:51:8a
May 19 16:20:38 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:3a:d2:42
May 19 16:20:38 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.104 fa:16:3e:3a:d2:42
May 19 16:20:38 dnsmasq-dhcp[8]: DHCPDISCOVER(br-ctlplane) fa:16:3e:3a:d2:42
May 19 16:20:38 dnsmasq-dhcp[8]: DHCPOFFER(br-ctlplane) 192.168.24.104 fa:16:3e:3a:d2:42
May 19 16:20:38 dnsmasq-dhcp[8]: DHCPREQUEST(br-ctlplane) 192.168.24.104 fa:16:3e:3a:d2:42
May 19 16:20:38 dnsmasq-dhcp[8]: DHCPACK(br-ctlplane) 192.168.24.104 fa:16:3e:3a:d2:42

*** Introspection is aborted 20 minutes later ***

2020-05-19 16:27:21.552 7 DEBUG ironic_inspector.introspect [-] Aborting introspection for node 5...

Read more...

Revision history for this message
wes hayutin (weshayutin) wrote :

RDO-Cloud info

Name
bmc-base
ID
c59224e2-c5df-4a86-b7b6-49556d8c7f5c

Name
bmc-template
ID
d312897c-7051-4dce-b589-e109a15320a7

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :
Download full text (7.3 KiB)

Inspector service shuts down:

DEBUG stevedore.extension [-] found extension EntryPoint.parse('lldp_basic = ironic_inspector.plugins.lldp_basic:LLDPBasicProcessingHook') _load_plugins /usr/lib/python3.6/site-packages
/stevedore/extension.py:189
ERROR stevedore.extension [-] Could not load 'lldp_basic': module 'construct.core' has no attribute 'Embedded': AttributeError: module 'construct.core' has no attribute 'Embedded'
ERROR stevedore.extension Traceback (most recent call last):
ERROR stevedore.extension File "/usr/lib/python3.6/site-packages/stevedore/extension.py", line 195, in _load_plugins
ERROR stevedore.extension verify_requirements,
ERROR stevedore.extension File "/usr/lib/python3.6/site-packages/stevedore/named.py", line 158, in _load_one_plugin
ERROR stevedore.extension verify_requirements,
ERROR stevedore.extension File "/usr/lib/python3.6/site-packages/stevedore/extension.py", line 223, in _load_one_plugin
ERROR stevedore.extension plugin = ep.resolve()
ERROR stevedore.extension File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2449, in resolve
ERROR stevedore.extension module = __import__(self.module_name, fromlist=['__name__'], level=0)
ERROR stevedore.extension File "/usr/lib/python3.6/site-packages/ironic_inspector/plugins/lldp_basic.py", line 18, in <module>
ERROR stevedore.extension from ironic_inspector.common import lldp_parsers
ERROR stevedore.extension File "/usr/lib/python3.6/site-packages/ironic_inspector/common/lldp_parsers.py", line 22, in <module>
ERROR stevedore.extension from ironic_inspector.common import lldp_tlvs as tlv
ERROR stevedore.extension File "/usr/lib/python3.6/site-packages/ironic_inspector/common/lldp_tlvs.py", line 110, in <module>
ERROR stevedore.extension IANAAddress = core.Embedded(core.Struct(
ERROR stevedore.extension AttributeError: module 'construct.core' has no attribute 'Embedded'
ERROR stevedore.extension
DEBUG stevedore.extension [-] found extension EntryPoint.parse('local_link_connection = ironic_inspector.plugins.local_link_connection:GenericLocalLinkConnectionHook') _load_plugins /us
r/lib/python3.6/site-packages/stevedore/extension.py:189
ERROR stevedore.extension [-] Could not load 'local_link_connection': module 'construct.core' has no attribute 'Embedded': AttributeError: module 'construct.core' has no attribute 'Embe
dded'
ERROR stevedore.extension Traceback (most recent call last):
ERROR stevedore.extension File "/usr/lib/python3.6/site-packages/stevedore/extension.py", line 195, in _load_plugins
ERROR stevedore.extension verify_requirements,
ERROR stevedore.extension File "/usr/lib/python3.6/site-packages/stevedore/named.py", line 158, in _load_one_plugin
ERROR stevedore.extension verify_requirements,
ERROR stevedore.extension File "/usr/lib/python3.6/site-packages/stevedore/extension.py", line 223, in _load_one_plugin
ERROR stevedore.extension plugin = ep.resolve()
ERROR stevedore.extension File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2449, in resolve
ERROR stevedore.extension module = __import__(self.module_name, fromlist=['__name__'], level=0)
ERROR stevedore.e...

Read more...

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :
Download full text (6.5 KiB)

# Inspector RPMs
python3-ironic-inspector-client.noarch 4.1.0-0.20200415171055.c9b1aff.el8 @delorean-component-clients
openstack-ironic-inspector.noarch 10.0.1-0.20200423163656.14dfb7c.el8 delorean-component-baremetal
openstack-ironic-inspector.src 10.0.1-0.20200423163656.14dfb7c.el8 delorean-component-baremetal
openstack-ironic-inspector-api.noarch 10.0.1-0.20200423163656.14dfb7c.el8 delorean-component-baremetal
openstack-ironic-inspector-conductor.noarch 10.0.1-0.20200423163656.14dfb7c.el8 delorean-component-baremetal
openstack-ironic-inspector-dnsmasq.noarch 10.0.1-0.20200423163656.14dfb7c.el8 delorean-component-baremetal
openstack-ironic-inspector-doc.noarch 10.0.1-0.20200423163656.14dfb7c.el8 delorean-component-baremetal
python-ironic-inspector-client.src 4.1.0-0.20200415171055.c9b1aff.el8 delorean-component-clients
python3-ironic-inspector-tests.noarch 10.0.1-0.20200423163656.14dfb7c.el8 delorean-component-baremetal

# All Ironic RPMs
openstack-ironic-python-agent-builder.noarch 1.1.1-0.20200423062442.dd90f32.el8 @delorean-component-baremetal
puppet-ironic.noarch 16.3.1-0.20200516001440.3346c0c.el8 @delorean-current
python3-ironic-inspector-client.noarch 4.1.0-0.20200415171055.c9b1aff.el8 @delorean-component-clients
python3-ironicclient.noarch 4.1.0-0.20200424103637.fc31b5e.el8 @delorean-component-clients
openstack-ironic.src 1:14.0.1-0.20200424113559.9632d39.el8 delorean-component-baremetal
openstack-ironic-api.noarch 1:14.0.1-0.20200424113559.9632d39.el8 delorean-component-baremetal
openstack-ironic-common.noarch 1:14.0.1-0.20200424113559.9632d39.el8 delorean-component-baremetal
openstack-ironic-conductor.noarch 1:14.0.1-0.20200424113559.9632d39.el8 delorean-component-baremetal
openstack-ironic-inspector.noarch 10.0.1-0.20200423163656.14dfb7c.el8 delorean-component-baremetal
openstack-ironic-inspector.src 10.0.1-0.20200423163656.14dfb7c.el8 delorean-component-baremetal
openstack-ironic-inspector-api.noarch 10.0.1-0.20200423163656.14dfb7c.el8 delorean-component-baremetal
openstack-ironic-inspector-conductor.noarch 10.0.1-0.20200423163656.14dfb7c.el8 delorean-component-baremetal
openstack-ironic-inspector-dnsmasq.noarch 10.0.1-0.20200423163656.14dfb7c.el8 delorean-component-baremetal
openstack-ironic-inspector-doc.noarch 10.0.1-0.20200423163656.14dfb7c.el8 delorean-component-baremetal
o...

Read more...

Revision history for this message
Bob Fournier (bfournie) wrote :

Ah, thanks Sagi. It looks like we are hitting:
ERROR stevedore.extension [-] Could not load 'local_link_connection': module 'construct.core' has no attribute 'Embedded': AttributeError: module 'construct.core' has no attribute 'Embedded'

Which is due to the new version of construct.

This ironic-inspector patch https://review.opendev.org/#/c/724822/ merged on 5/14 to fix this.

We'll need to pick up this ironic-inspector patch.

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

OK, so I used old containers. Rerunning it with new ones that include the patch Bob pointed to.

wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.