cs9 ovb fs01 clients wallaby is failing with error - 'neutron.agent.dhcp.agent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply'

Bug #1987616 reported by Soniya Murlidhar Vyas
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
Critical
Unassigned

Bug Description

periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-clients-wallaby is consistently failing with ERROR neutron.agent.dhcp.agent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID

Following is the traceback observed in the extra/error.log.txt:-

2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.9/site-packages/neutron/agent/dhcp/agent.py", line 1089, in _report_state
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent agent_status = self.state_rpc.report_state(
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.9/site-packages/neutron/agent/rpc.py", line 103, in report_state
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent return method(context, 'report_state', **kwargs)
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.9/site-packages/oslo_messaging/rpc/client.py", line 175, in call
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent self.transport._send(self.target, msg_ctxt, msg,
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.9/site-packages/oslo_messaging/transport.py", line 123, in _send
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent return self._driver.send(target, ctxt, message,
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.9/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 681, in send
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent return self._send(target, ctxt, message, wait_for_reply, timeout,
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.9/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 670, in _send
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent result = self._waiter.wait(msg_id, timeout,
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.9/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent message = self.waiters.get(msg_id, timeout=timeout)
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent File "/usr/lib/python3.9/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 435, in get
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent raise oslo_messaging.MessagingTimeout(
2022-08-25 01:53:41.497 ERROR /var/log/containers/neutron/dhcp-agent.log: 132381 ERROR neutron.agent.dhcp.agent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 446ccfc5c7fe4d25baa9bed5be9d2bc3

Following is the error seen in overcloud_node_provision.log.txt.gz from the same run:-

PLAY [Overcloud Node Grow Volumes] *********************************************
2022-08-25 02:30:44.006420 | fa163e46-632c-5778-1b40-00000000000c | TASK | Wait for provisioned nodes to boot
2022-08-25 02:30:55.163202 | fa163e46-632c-5778-1b40-00000000000c | OK | Wait for provisioned nodes to boot | overcloud-controller-2
2022-08-25 02:30:55.165587 | fa163e46-632c-5778-1b40-00000000000c | TIMING | Wait for provisioned nodes to boot | overcloud-controller-2 | 0:00:11.178741 | 11.14s
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
controller-1: Failed to connect to the host via ssh: ssh: connect to host
192.168.24.13 port 22: Connection refused
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
controller-0: Failed to connect to the host via ssh: ssh: connect to host
192.168.24.18 port 22: Connection refused

here is the log url for the above run:-
- https://logserver.rdoproject.org/29/37029/57/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-clients-wallaby/1e5561b/logs/undercloud/home/zuul/overcloud_node_provision.log.txt.gz
- https://logserver.rdoproject.org/29/37029/57/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-clients-wallaby/1e5561b/logs/undercloud/var/log/extra/errors.txt.gz

For more reference please refer to the following links:-
- https://logserver.rdoproject.org/29/37029/56/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-clients-wallaby/c71b56c/logs/undercloud/var/log/extra/errors.txt.gz
- https://logserver.rdoproject.org/29/37029/56/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-clients-wallaby/c71b56c/logs/undercloud/home/zuul/overcloud_node_provision.log.txt.gz
- https://logserver.rdoproject.org/openstack-component-clients/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-clients-train/58192d5/logs/undercloud/var/log/containers/neutron/server.log.txt.gz

summary: - fs01 clients wallaby is failing with error - 'neutron.agent.dhcp.agent
- oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a
- reply'
+ cs9 ovb fs01 clients wallaby is failing with error -
+ 'neutron.agent.dhcp.agent oslo_messaging.exceptions.MessagingTimeout:
+ Timed out waiting for a reply'
Revision history for this message
Soniya Murlidhar Vyas (svyas) wrote :

this issue is happening in check as well[1]. tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001 job failing with same issue

[1] https://logserver.rdoproject.org/35/843835/25/openstack-check/tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001/2e5d522/logs/undercloud/var/log/extra/errors.txt.gz

Revision history for this message
Rabi Mishra (rabi) wrote :

Though I've not looked in detail, issue with tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001 jobs does not look the same as the periodic wallaby ones. We probably need separate bugs to track those.

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

Master issue mentioned in above comment is not related, its a separate issue happening during Node provisioning[1]

[1] https://logserver.rdoproject.org/35/843835/25/openstack-check/tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001/2e5d522/logs/baremetal_25_88904_0-console.log

Revision history for this message
Soniya Murlidhar Vyas (svyas) wrote :

yeah, it seems we need a separate bug for the check job above. I have filed here:- https://bugs.launchpad.net/tripleo/+bug/1987632

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.