ofctl timeouts lead to dvr-ha-multinode-full failures

Bug #1938685 reported by Oleg Bondarev
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Oleg Bondarev

Bug Description

Recently neutron-ovs-tempest-dvr-ha-multinode-full (non-voting) job start failing often. Usual test fail is:

"Details: (ServersTestJSON:setUpClass) Server 74743462-a419-4f89-a92c-0e99bc185581 failed to reach ACTIVE status and task state "None" within the required time (196 s). Current status: BUILD. Current task state: spawning."

Looking at logs I see that the reason is ofctl timeout (300 sec) that causes OVS agent to not process new port(s) in time:

Jul 30 17:33:42.946480 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-7ef93d36-7664-4072-b3d1-677a772a0fc1 None None] fdb_add received {{(pid=82746) fdb_add /opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:841}}
Jul 30 17:37:46.516378 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [None req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] ofctl request version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:0f:58:bc', 'vlan_vid': 4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out: eventlet.timeout.Timeout: 300 seconds
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] Error while processing VIF ports: RuntimeError: ofctl request version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:0f:58:bc', 'vlan_vid': 4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py", line 91, in _send_msg
...

Not sure why ofctl timeouts, but anyway default 300 seems too much

description: updated
Revision history for this message
Oleg Bondarev (obondarev) wrote :
Download full text (12.7 KiB)

Example job: https://zuul.opendev.org/t/openstack/build/fe101f625f4c45d6bf7e2b0aa9b71732/logs

Full Traceback:

Jul 30 17:33:42.946480 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-7ef93d36-7664-4072-b3d1-677a772a0fc1 None None] fdb_add received {{(pid=82746) fdb_add /opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:841}}
Jul 30 17:37:46.516378 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [None req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] ofctl request version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:0f:58:bc', 'vlan_vid': 4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out: eventlet.timeout.Timeout: 300 seconds
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-9d8a3325-2d80-41a4-9f3d-184b365b7dfc None None] Error while processing VIF ports: RuntimeError: ofctl request version=0x4,msg_type=0xe,msg_len=None,xid=0xdfcb3e13,OFPFlowMod(buffer_id=4294967295,command=0,cookie=7439791576028281136,cookie_mask=0,flags=0,hard_timeout=0,idle_timeout=0,instructions=[OFPInstructionActions(actions=[OFPActionPopVlan(len=8,type=18), OFPActionOutput(len=16,max_len=0,port=-1,type=0)],type=4)],match=OFPMatch(oxm_fields={'eth_dst': 'fa:16:3e:0f:58:bc', 'vlan_vid': 4113}),out_group=0,out_port=0,priority=20,table_id=60) timed out
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py", line 91, in _send_msg
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result = ofctl_api.send_msg(self._app, msg, reply_cls, reply_multi)
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/local/lib/python3.8/dist-packages/os_ken/app/ofctl/api.py", line 87, in send_msg
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[82746]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return app.send_request(event.SendMsgRequest(msg=msg,
Jul 30 17:37:46.530852 ubuntu-focal-inap-mtl01-0025709340 neutron-openvswitch-agent[827...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/803211

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Looks like the reason of error is invalid ofport:

OFPActionOutput(len=16,max_len=0,port=-1,type=0)]

Possible culprit https://review.opendev.org/c/openstack/neutron/+/640258
Need to avoid ofctl requests for invalid ofport

Changed in neutron:
assignee: nobody → Oleg Bondarev (obondarev)
Changed in neutron:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/803211
Committed: https://opendev.org/openstack/neutron/commit/63a2a65bbe9ab2ec054f47f581b505dfeb441307
Submitter: "Zuul (22348)"
Branch: master

commit 63a2a65bbe9ab2ec054f47f581b505dfeb441307
Author: Oleg Bondarev <email address hidden>
Date: Mon Aug 2 16:49:51 2021 +0300

    Skip DVR binding for ports with invalid OFPORT

    Invalid OFPORT (-1) causes ofctl errors and timeouts if set
    it as output in a flow rule.

    Closes-Bug: #1938685
    Change-Id: Ib8be66c3068309832e08066af4e0b75c30e8163a

Changed in neutron:
status: In Progress → Fix Released
tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/807254

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/807255

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/807254
Committed: https://opendev.org/openstack/neutron/commit/3e5f1be057fc75c2ae34a7f015e17632d8ae6f20
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 3e5f1be057fc75c2ae34a7f015e17632d8ae6f20
Author: Oleg Bondarev <email address hidden>
Date: Mon Aug 2 16:49:51 2021 +0300

    Skip DVR binding for ports with invalid OFPORT

    Invalid OFPORT (-1) causes ofctl errors and timeouts if set
    it as output in a flow rule.

    Closes-Bug: #1938685
    Change-Id: Ib8be66c3068309832e08066af4e0b75c30e8163a
    (cherry picked from commit 63a2a65bbe9ab2ec054f47f581b505dfeb441307)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 18.1.1

This issue was fixed in the openstack/neutron 18.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 19.0.0.0rc1

This issue was fixed in the openstack/neutron 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/807255
Committed: https://opendev.org/openstack/neutron/commit/79529fde18ec95615d124591f18897b43aa1c758
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 79529fde18ec95615d124591f18897b43aa1c758
Author: Oleg Bondarev <email address hidden>
Date: Mon Aug 2 16:49:51 2021 +0300

    Skip DVR binding for ports with invalid OFPORT

    Invalid OFPORT (-1) causes ofctl errors and timeouts if set
    it as output in a flow rule.

    Closes-Bug: #1938685
    Change-Id: Ib8be66c3068309832e08066af4e0b75c30e8163a
    (cherry picked from commit 63a2a65bbe9ab2ec054f47f581b505dfeb441307)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 17.3.0

This issue was fixed in the openstack/neutron 17.3.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.