MessagingTimeout

Bug #1852175 reported by YG Kumar
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Expired
Undecided
Unassigned
oslo.messaging
Expired
Undecided
Unassigned

Bug Description

Hi All,

We have a rocky 18.1.9 branch openstack ansible setup. When trying to launch a vm , it is going into ERROR state saying the vif creation failed. I see the following messages in the openvswitch agent log:

-------------------
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Failed reporting state!: MessagingTimeout: Timed out waiting for a reply to messa
ge ID ec0806a7312a4e55b090500257cc998a
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/openstack/venvs/neutron-18.1.9/lib/python2.7/site-packages/neutron/plugins/m
l2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 325, in _report_state
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent True)
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/openstack/venvs/neutron-18.1.9/lib/python2.7/site-packages/neutron/agent/rpc
.py", line 97, in report_state
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return method(context, 'report_state', **kwargs)
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/openstack/venvs/neutron-18.1.9/lib/python2.7/site-packages/oslo_messaging/rp
c/client.py", line 179, in call
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=self.retry)
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/openstack/venvs/neutron-18.1.9/lib/python2.7/site-packages/oslo_messaging/tr
ansport.py", line 133, in _send
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=retry)
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/openstack/venvs/neutron-18.1.9/lib/python2.7/site-packages/oslo_messaging/_d
rivers/amqpdriver.py", line 584, in send
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent call_monitor_timeout, retry=retry)
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/openstack/venvs/neutron-18.1.9/lib/python2.7/site-packages/oslo_messaging/_d
rivers/amqpdriver.py", line 573, in _send
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent call_monitor_timeout)
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/openstack/venvs/neutron-18.1.9/lib/python2.7/site-packages/oslo_messaging/_d
rivers/amqpdriver.py", line 459, in wait
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent message = self.waiters.get(msg_id, timeout=timeout)
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/openstack/venvs/neutron-18.1.9/lib/python2.7/site-packages/oslo_messaging/_d
rivers/amqpdriver.py", line 336, in get
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 'to message ID %s' % msg_id)
2019-11-01 17:29:23.971 6135 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent MessagingTimeout: Timed out waiting for a reply to message ID ec0806a7312a4e55b090500
257cc998a
---------------------------

Please help us with the issue.

Thanks
Kumar

Revision history for this message
Ben Nemec (bnemec) wrote :

It's unlikely that this is a problem with oslo.messaging. Usually messaging timeouts happen because an RPC call failed or took a really long time. They can also happen if rabbit is overloaded, but if this is a new deployment that's unlikely.

I've added neutron to the bug as they're more likely to be able to tell you why this happened.

Changed in oslo.messaging:
status: New → Incomplete
Revision history for this message
Bence Romsics (bence-romsics) wrote :

I suspect there are at least two errors happening around the same time in your system. The traceback is about the report_state RPC (which in this case is a call(), so we'll wait for the response) not getting a response before timeout.

However the report_state message is not directly involved in booting a VM.

So I suggest multiple things to look at:

1) Is there another error message in your logs while booting the vm? I suspect there is. Usually there's an error message and a traceback in the 'openstack server show' output already in these cases too. Can you copy-paste that here?

2) Usually report_state calls fail to respond timely when neutron-server is overloaded. Is this a fresh deployment? How many neutron agents do you have? What's the overall workload hitting neutron-server? Have you tried tuning your neutron server's worker settings, especially raising the value of rpc_state_report_workers in neutron.conf?

Changed in neutron:
status: New → Incomplete
Revision history for this message
YG Kumar (ygk-kmr) wrote :
Download full text (3.3 KiB)

Hi,

I have rerun the playbooks but the issue is the same. When I create a vm, it is failing eventually
with vif creation failed.

------------
2019-11-19 08:50:09.830 12956 INFO nova.virt.libvirt.driver [-] [instance: de4b6e59-998b-465c-8e8c-e659619ca025] Instance destroyed successfully.
2019-11-19 08:50:09.833 12956 INFO nova.virt.libvirt.driver [req-068acb5f-d813-44d4-9b1f-14c20995b0a4 9ca30f42033f4e93b72f9be304f66726 e12b8e37797b4fbf8d0d6b28d4b61848 - default default] [instance: de4b6e59-998b-465c-8e8c-e659619ca025] Deletion of /var/lib/nova/instances/de4b6e59-998b-465c-8e8c-e659619ca025_del complete
2019-11-19 08:50:09.921 12956 INFO nova.compute.manager [req-068acb5f-d813-44d4-9b1f-14c20995b0a4 9ca30f42033f4e93b72f9be304f66726 e12b8e37797b4fbf8d0d6b28d4b61848 - default default] [instance: de4b6e59-998b-465c-8e8c-e659619ca025] Took 0.09 seconds to destroy the instance on the hypervisor.
2019-11-19 08:50:10.253 12956 ERROR nova.compute.manager [req-068acb5f-d813-44d4-9b1f-14c20995b0a4 9ca30f42033f4e93b72f9be304f66726 e12b8e37797b4fbf8d0d6b28d4b61848 - default default] [instance: de4b6e59-998b-465c-8e8c-e659619ca025] Failed to allocate network(s): VirtualInterfaceCreateException: Virtual Interface creation failed
2019-11-19 08:50:10.253 12956 ERROR nova.compute.manager [instance: de4b6e59-998b-465c-8e8c-e659619ca025] Traceback (most recent call last):
2019-11-19 08:50:10.253 12956 ERROR nova.compute.manager [instance: de4b6e59-998b-465c-8e8c-e659619ca025] File "/openstack/venvs/nova-18.1.9/lib/python2.7/site-packages/nova/compute/manager.py", line 2133, in _build_and_run_instance
2019-11-19 08:50:10.253 12956 ERROR nova.compute.manager [instance: de4b6e59-998b-465c-8e8c-e659619ca025] block_device_info=block_device_info)
2019-11-19 08:50:10.253 12956 ERROR nova.compute.manager [instance: de4b6e59-998b-465c-8e8c-e659619ca025] File "/openstack/venvs/nova-18.1.9/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3133, in spawn
2019-11-19 08:50:10.253 12956 ERROR nova.compute.manager [instance: de4b6e59-998b-465c-8e8c-e659619ca025] destroy_disks_on_failure=True)
2019-11-19 08:50:10.253 12956 ERROR nova.compute.manager [instance: de4b6e59-998b-465c-8e8c-e659619ca025] File "/openstack/venvs/nova-18.1.9/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5693, in _create_domain_and_network
2019-11-19 08:50:10.253 12956 ERROR nova.compute.manager [instance: de4b6e59-998b-465c-8e8c-e659619ca025] raise exception.VirtualInterfaceCreateException()
2019-11-19 08:50:10.253 12956 ERROR nova.compute.manager [instance: de4b6e59-998b-465c-8e8c-e659619ca025] VirtualInterfaceCreateException: Virtual Interface creation failed
2019-11-19 08:50:10.253 12956 ERROR nova.compute.manager [instance: de4b6e59-998b-465c-8e8c-e659619ca025]
2019-11-19 08:50:10.269 12956 ERROR nova.compute.manager [req-068acb5f-d813-44d4-9b1f-14c20995b0a4 9ca30f42033f4e93b72f9be304f66726 e12b8e37797b4fbf8d0d6b28d4b61848 - default default] [instance: de4b6e59-998b-465c-8e8c-e659619ca025] Build of instance de4b6e59-998b-465c-8e8c-e659619ca025 aborted: Failed to allocate the network(s), not rescheduling.: BuildAbortException: Build of instance de4b6...

Read more...

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

Do you have the Neutron logs? Without the neutron server and agent logs we can't debug what is happening there.

Thanks.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for oslo.messaging because there has been no activity for 60 days.]

Changed in oslo.messaging:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.