Rebuild server - notification about port UP isn't sent to nova properly

Bug #1963899 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Unassigned

Bug Description

It happened at least in the gate: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_275/819147/3/check/neutron-ovs-tempest-dvr-ha-multinode-full/275f51f/testr_results.html - test failed due to instance rebuild timeout.

After some investigation it seems that there was some race condition there and port which belongs to the vm wasn't set to DOWN and then UP during the rebuild. As it was UP all the time, neutron didn't send notification to nova compute and there was timeout while waiting for the notification from neutron in nova. Details:

Rebuild started around 20:11:04

Feb 25 20:11:04.155853 ubuntu-focal-rax-ord-0028617419 nova-compute[45640]: INFO nova.virt.libvirt.driver [-] [instance: f11fa638-af9d-44a4-88d5-a5a9211a67ce] Instance destroyed successfully.
Feb 25 20:11:04.157143 ubuntu-focal-rax-ord-0028617419 nova-compute[45640]: DEBUG nova.virt.libvirt.vif [None req-39913996-9bcb-40fe-96c4-46805d82a11c tempest-ServerDiskConfigTestJSON-565705927 tempest-ServerDiskConfigTestJSON-565705927-project] vif_type=ovs instance=Instance(access_ip_v4=None,access_ip_v6=None,architecture=None,auto_disk_config=True,availability_zone='nova',cell_name=None,cleaned=False,config_drive='',created_at=2022-02-25T20:10:47Z,default_ephemeral_device=None,default_swap_device=None,deleted=False,deleted_at=None,device_metadata=<?>,disable_terminate=False,display_description='tempest-ServerDiskConfigTestJSON-s
erver-1146230707',display_name='tempest-ServerDiskConfigTestJSON-server-1146230707',ec2_ids=<?>,ephemeral_gb=0,ephemeral_key_uuid=None,fault=<?>,flavor=Flavor(11),hidden=False,host='ubuntu-focal-rax-ord-0028617419',hostname='tempest-serverdiskconfigtestjson-server-1146230707',id=87,image_ref='ec702e94-7def-4279-8cf5-1b874dc76a6d',info_cache=InstanceInfoCache,instance_type_id=11,kernel_id='',key_data=None,key_name=None,keypairs=<?>,launch_index=0,launched_at=2022-02-25T20:11:01Z,launched_on='ubuntu-focal-rax-ord-0028617419',locked=False,locked_by=None,memory_mb=128,metadata={},migration_context=None,new_flavor=None,node='ubuntu-focal-rax-ord-0028617419',numa_topology=None,old_flavor=None,os_type=None,pci_devices=PciDeviceList,pci_requests=InstancePCIRequests,power_state=1,progress=0,project_id='a75e3dd9c1444a11bfa3db1be433820c',ramdisk_id='',reservation_id='r-ak002cpg',resources=None,root_device_name='/dev/vda',root_gb=1,security_groups=SecurityGroupList,services=<?>,shutdown_terminate=False,system_metadata={boot_roles='member,reader',image_base_image_ref='ec702e94-7def-4279-8cf5-1b874dc76a6d',image_container_format='bare',image_disk_format='qcow2',image_hw_rng_model='virtio',image_min_disk='1',image_min_ram='0',image_owner_specified.openstack.md5='',image_owner_specified.openstack.object='images/cirros-0.5.2-x86_64-disk',image_owner_specified.openstack.sha256='',owner_project_name='tempest-ServerDiskConfigTestJSON-565705927',owner_user_name='tempest-ServerDiskConfigTestJSON-565705927-project'},tags=<?>,task_state='rebuilding',terminated_at=None,trusted_certs=None,updated_at=2022-02-25T20:11:03Z,user_data=None,user_id='a99989e1c306447a80393de985672536',uuid=f11fa638-af9d-44a4-88d5-a5a9211a67ce,vcpu_model=<?>,vcpus=1,vm_mode=None,vm_state='active') vif={"id": "7017c67f-ba63-4787-80d5-18db52567ef2", "address": "fa:16:3e:66:b3:80", "network": {"id": "1eff860a-a017-4839-a4d2-8891b4327cde", "bridge": "br-int", "label": "tempest-ServerDiskConfigTestJSON-1413670902-network", "subnets": [{"cidr": "10.1.0.0/28", "dns": [], "gateway": {"address": "10.1.0.1", "type": "gateway", "version": 4, "meta": {}}, "ips": [{"address": "10.1.0.11", "type": "fixed", "version": 4, "meta": {}, "floating_ips": []}], "routes": [], "version": 4, "meta": {}}], "meta": {"injected": false, "tenant_id": "a75e3dd9c1444a11bfa3db1be433820c", "mtu": 1380, "physical_network": null, "tunneled": true}}, "type": "ovs", "details": {"connectivity": "l2", "port_filter": true, "ovs_hybrid_plug": false, "datapath_type": "system", "bridge_name": "br-int", "bound_drivers": {"0": "openvswitch"}}, "devname": "tap7017c67f-ba", "ovs_interfaceid": "7017c67f-ba63-4787-80d5-18db52567ef2", "qbh_params": null, "qbg_params": null, "active": false, "vnic_type": "normal", "profile": {}, "preserve_on_delete": false, "delegate_create": true, "meta": {}} {{(pid=45640) unplug /opt/stack/nova/nova/virt/libvirt/vif.py:828}}

On the OVS agent's side:

Feb 25 20:11:04.184822 ubuntu-focal-rax-ord-0028617419 neutron-openvswitch-agent[43371]: DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["11f71fd0-900a-4350-b2b6-720d41775059","delete","tap7017c67f-ba",-1,["map",[["attached-mac","fa:16:3e:66:b3:80"],["iface-id","7017c67f-ba63-4787-80d5-18db52567ef2"],["iface-status","active"],["vm-uuid","f11fa638-af9d-44a4-88d5-a5a9211a67ce"]]]]],"headings":["row","action","name","ofport","external_ids"]} {{(pid=43371) _read_stdout /opt/stack/neutron/neutron/agent/common/async_pro
cess.py:264}}
Feb 25 20:11:04.756254 ubuntu-focal-rax-ord-0028617419 neutron-openvswitch-agent[43371]: DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["07c1170a-686e-4e30-9b9e-60fd64c21b2c","insert","tap7017c67f-ba",["set",[]],["map",[["attached-mac","fa:16:3e:66:b3:80"],["iface-id","7017c67f-ba63-4787-80d5-18db52567ef2"],["iface-status","active"],["vm-uuid","f11fa638-af9d-44a4-88d5-a5a9211a67ce"]]]]],"headings":["row","action","name","ofport","external_ids"]} {{(pid=43371) _read_stdout /opt/stack/neutron/neutron/agent/common/a
sync_process.py:264}}
Feb 25 20:11:04.761952 ubuntu-focal-rax-ord-0028617419 neutron-openvswitch-agent[43371]: DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["07c1170a-686e-4e30-9b9e-60fd64c21b2c","old",null,["set",[]],null],["","new","tap7017c67f-ba",-1,["map",[["attached-mac","fa:16:3e:66:b3:80"],["iface-id","7017c67f-ba63-4787-80d5-18db52567ef2"],["iface-status","active"],["vm-uuid","f11fa638-af9d-44a4-88d5-a5a9211a67ce"]]]]],"headings":["row","action","name","ofport","external_ids"]} {{(pid=43371) _read_stdout /opt/stack/neutron/n
eutron/agent/common/async_process.py:264}}
Feb 25 20:11:04.829126 ubuntu-focal-rax-ord-0028617419 neutron-openvswitch-agent[43371]: DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["07c1170a-686e-4e30-9b9e-60fd64c21b2c","old",null,-1,null],["","new","tap7017c67f-ba",169,["map",[["attached-mac","fa:16:3e:66:b3:80"],["iface-id","7017c67f-ba63-4787-80d5-18db52567ef2"],["iface-status","active"],["vm-uuid","f11fa638-af9d-44a4-88d5-a5a9211a67ce"]]]]],"headings":["row","action","name","ofport","external_ids"]} {{(pid=43371) _read_stdout /opt/stack/neutron/neutron/
agent/common/async_process.py:264}}

And there was only report that device is UP:

Feb 25 20:11:05.796177 ubuntu-focal-rax-ord-0028617419 neutron-openvswitch-agent[43371]: DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-7c5a8fe9-a203-4f1d-a655-18381910397e None None] Setting status for 7017c67f-ba63-4787-80d5-18db52567ef2 to UP {{(pid=43371) _bind_devices /opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:1251}}

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/834852

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/834852
Committed: https://opendev.org/openstack/neutron/commit/e7edcec2608fe1cb30d7458a882eb81e058eb76e
Submitter: "Zuul (22348)"
Branch: master

commit e7edcec2608fe1cb30d7458a882eb81e058eb76e
Author: Slawek Kaplonski <email address hidden>
Date: Wed Mar 23 12:25:26 2022 +0100

    Ensure that re_added ports are DOWN before set back to UP

    During e.g. rebuild of the server by Nova, ports plugged to such server
    are quickly removed and added again into br-int. In such case, ports are
    in the "re_added" ports set in the neutron-ovs-agent.
    But it seems that in some cases it may happen that such port isn't
    switched to be DOWN first and then, when neutron-ovs-agent treats port
    as added/updated and reports to the server that port is UP, there is no
    notification to nova-compute send (because port's status was UP and new
    status is still UP in the Neutron DB).
    As Nova waits for the notification from Neutron in such case server
    could ends up in the ERROR state.

    To avoid such issue, all ports which are treated as "re_added" by the
    neutron-ovs-agent are now first switched to be DOWN on the server side.
    That way, when those ports are treated as added/updated in the same
    rpc_loop iteration, switching their status to UP will for sure trigger
    notification to nova.

    Closes-Bug: #1963899
    Change-Id: I0df376a80140ead7ff1fbf7f5ffef08a999dbe0b

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/836177

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/836178

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/836179

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/836180

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/836321

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/neutron/+/836332

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/836177
Committed: https://opendev.org/openstack/neutron/commit/14559a3096c81122d26079c565a193704b870a42
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 14559a3096c81122d26079c565a193704b870a42
Author: Slawek Kaplonski <email address hidden>
Date: Wed Mar 23 12:25:26 2022 +0100

    Ensure that re_added ports are DOWN before set back to UP

    During e.g. rebuild of the server by Nova, ports plugged to such server
    are quickly removed and added again into br-int. In such case, ports are
    in the "re_added" ports set in the neutron-ovs-agent.
    But it seems that in some cases it may happen that such port isn't
    switched to be DOWN first and then, when neutron-ovs-agent treats port
    as added/updated and reports to the server that port is UP, there is no
    notification to nova-compute send (because port's status was UP and new
    status is still UP in the Neutron DB).
    As Nova waits for the notification from Neutron in such case server
    could ends up in the ERROR state.

    To avoid such issue, all ports which are treated as "re_added" by the
    neutron-ovs-agent are now first switched to be DOWN on the server side.
    That way, when those ports are treated as added/updated in the same
    rpc_loop iteration, switching their status to UP will for sure trigger
    notification to nova.

    Closes-Bug: #1963899
    Change-Id: I0df376a80140ead7ff1fbf7f5ffef08a999dbe0b
    (cherry picked from commit e7edcec2608fe1cb30d7458a882eb81e058eb76e)

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/836321
Committed: https://opendev.org/openstack/neutron/commit/04453cd161b172f1cdbacd0a034f5e6d21fd6e34
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 04453cd161b172f1cdbacd0a034f5e6d21fd6e34
Author: Slawek Kaplonski <email address hidden>
Date: Wed Mar 23 12:25:26 2022 +0100

    Ensure that re_added ports are DOWN before set back to UP

    During e.g. rebuild of the server by Nova, ports plugged to such server
    are quickly removed and added again into br-int. In such case, ports are
    in the "re_added" ports set in the neutron-ovs-agent.
    But it seems that in some cases it may happen that such port isn't
    switched to be DOWN first and then, when neutron-ovs-agent treats port
    as added/updated and reports to the server that port is UP, there is no
    notification to nova-compute send (because port's status was UP and new
    status is still UP in the Neutron DB).
    As Nova waits for the notification from Neutron in such case server
    could ends up in the ERROR state.

    To avoid such issue, all ports which are treated as "re_added" by the
    neutron-ovs-agent are now first switched to be DOWN on the server side.
    That way, when those ports are treated as added/updated in the same
    rpc_loop iteration, switching their status to UP will for sure trigger
    notification to nova.

    Conflicts:
        neutron/tests/unit/plugins/ml2/drivers/openvswitch/agent/test_ovs_neutron_agent.py

    Closes-Bug: #1963899
    Change-Id: I0df376a80140ead7ff1fbf7f5ffef08a999dbe0b
    (cherry picked from commit e7edcec2608fe1cb30d7458a882eb81e058eb76e)
    (cherry picked from commit 0842d28232949acd456aefbcd5ca714638103a50)

tags: added: in-stable-ussuri
tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/836178
Committed: https://opendev.org/openstack/neutron/commit/7342d1142728ffce2aa418b8ee28677335b16d57
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 7342d1142728ffce2aa418b8ee28677335b16d57
Author: Slawek Kaplonski <email address hidden>
Date: Wed Mar 23 12:25:26 2022 +0100

    Ensure that re_added ports are DOWN before set back to UP

    During e.g. rebuild of the server by Nova, ports plugged to such server
    are quickly removed and added again into br-int. In such case, ports are
    in the "re_added" ports set in the neutron-ovs-agent.
    But it seems that in some cases it may happen that such port isn't
    switched to be DOWN first and then, when neutron-ovs-agent treats port
    as added/updated and reports to the server that port is UP, there is no
    notification to nova-compute send (because port's status was UP and new
    status is still UP in the Neutron DB).
    As Nova waits for the notification from Neutron in such case server
    could ends up in the ERROR state.

    To avoid such issue, all ports which are treated as "re_added" by the
    neutron-ovs-agent are now first switched to be DOWN on the server side.
    That way, when those ports are treated as added/updated in the same
    rpc_loop iteration, switching their status to UP will for sure trigger
    notification to nova.

    Closes-Bug: #1963899
    Change-Id: I0df376a80140ead7ff1fbf7f5ffef08a999dbe0b
    (cherry picked from commit e7edcec2608fe1cb30d7458a882eb81e058eb76e)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/836179
Committed: https://opendev.org/openstack/neutron/commit/8daf9b38d302cf540e9cf73870804f46347f74b5
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 8daf9b38d302cf540e9cf73870804f46347f74b5
Author: Slawek Kaplonski <email address hidden>
Date: Wed Mar 23 12:25:26 2022 +0100

    Ensure that re_added ports are DOWN before set back to UP

    During e.g. rebuild of the server by Nova, ports plugged to such server
    are quickly removed and added again into br-int. In such case, ports are
    in the "re_added" ports set in the neutron-ovs-agent.
    But it seems that in some cases it may happen that such port isn't
    switched to be DOWN first and then, when neutron-ovs-agent treats port
    as added/updated and reports to the server that port is UP, there is no
    notification to nova-compute send (because port's status was UP and new
    status is still UP in the Neutron DB).
    As Nova waits for the notification from Neutron in such case server
    could ends up in the ERROR state.

    To avoid such issue, all ports which are treated as "re_added" by the
    neutron-ovs-agent are now first switched to be DOWN on the server side.
    That way, when those ports are treated as added/updated in the same
    rpc_loop iteration, switching their status to UP will for sure trigger
    notification to nova.

    Closes-Bug: #1963899
    Change-Id: I0df376a80140ead7ff1fbf7f5ffef08a999dbe0b
    (cherry picked from commit e7edcec2608fe1cb30d7458a882eb81e058eb76e)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/836180
Committed: https://opendev.org/openstack/neutron/commit/0842d28232949acd456aefbcd5ca714638103a50
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 0842d28232949acd456aefbcd5ca714638103a50
Author: Slawek Kaplonski <email address hidden>
Date: Wed Mar 23 12:25:26 2022 +0100

    Ensure that re_added ports are DOWN before set back to UP

    During e.g. rebuild of the server by Nova, ports plugged to such server
    are quickly removed and added again into br-int. In such case, ports are
    in the "re_added" ports set in the neutron-ovs-agent.
    But it seems that in some cases it may happen that such port isn't
    switched to be DOWN first and then, when neutron-ovs-agent treats port
    as added/updated and reports to the server that port is UP, there is no
    notification to nova-compute send (because port's status was UP and new
    status is still UP in the Neutron DB).
    As Nova waits for the notification from Neutron in such case server
    could ends up in the ERROR state.

    To avoid such issue, all ports which are treated as "re_added" by the
    neutron-ovs-agent are now first switched to be DOWN on the server side.
    That way, when those ports are treated as added/updated in the same
    rpc_loop iteration, switching their status to UP will for sure trigger
    notification to nova.

    Closes-Bug: #1963899
    Change-Id: I0df376a80140ead7ff1fbf7f5ffef08a999dbe0b
    (cherry picked from commit e7edcec2608fe1cb30d7458a882eb81e058eb76e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/836332
Committed: https://opendev.org/openstack/neutron/commit/58e663f27d84b187b6cb13289cd1e8fb36f5c770
Submitter: "Zuul (22348)"
Branch: stable/train

commit 58e663f27d84b187b6cb13289cd1e8fb36f5c770
Author: Slawek Kaplonski <email address hidden>
Date: Wed Mar 23 12:25:26 2022 +0100

    Ensure that re_added ports are DOWN before set back to UP

    During e.g. rebuild of the server by Nova, ports plugged to such server
    are quickly removed and added again into br-int. In such case, ports are
    in the "re_added" ports set in the neutron-ovs-agent.
    But it seems that in some cases it may happen that such port isn't
    switched to be DOWN first and then, when neutron-ovs-agent treats port
    as added/updated and reports to the server that port is UP, there is no
    notification to nova-compute send (because port's status was UP and new
    status is still UP in the Neutron DB).
    As Nova waits for the notification from Neutron in such case server
    could ends up in the ERROR state.

    To avoid such issue, all ports which are treated as "re_added" by the
    neutron-ovs-agent are now first switched to be DOWN on the server side.
    That way, when those ports are treated as added/updated in the same
    rpc_loop iteration, switching their status to UP will for sure trigger
    notification to nova.

    Conflicts:
        neutron/tests/unit/plugins/ml2/drivers/openvswitch/agent/test_ovs_neutron_agent.py

    Closes-Bug: #1963899
    Change-Id: I0df376a80140ead7ff1fbf7f5ffef08a999dbe0b
    (cherry picked from commit e7edcec2608fe1cb30d7458a882eb81e058eb76e)
    (cherry picked from commit 0842d28232949acd456aefbcd5ca714638103a50)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 17.4.0

This issue was fixed in the openstack/neutron 17.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 18.3.0

This issue was fixed in the openstack/neutron 18.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 19.2.0

This issue was fixed in the openstack/neutron 19.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 20.1.0

This issue was fixed in the openstack/neutron 20.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 21.0.0.0rc1

This issue was fixed in the openstack/neutron 21.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron train-eol

This issue was fixed in the openstack/neutron train-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron ussuri-eol

This issue was fixed in the openstack/neutron ussuri-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.