The maintenance thread fails when handling router ports

Bug #1746979 reported by Miguel Angel Ajo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-ovn
Fix Released
Undecided
Miguel Angel Ajo

Bug Description

When a port needs to be fixed or created from the maintenance thread on the OVN side due to a discrepancy on the revisions this error is seen on the neutron server log:

Feb 01 20:42:11.533666 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance [None req-7865ae16-582c-400b-b302-6aedbac761e0 None None] Failed to fix resource 3a4bfdf8-e93a-45be-be00-c4eaf21fd2c2 (type: router_ports): RouterNotFound: Router network:router_gateway could not be found
Feb 01 20:42:11.533923 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance Traceback (most recent call last):
Feb 01 20:42:11.534153 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/opt/stack/new/networking-ovn/networking_ovn/common/maintenance.py", line 233, in check_for_inconsistencies
Feb 01 20:42:11.534367 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance self._fix_create_update(row)
Feb 01 20:42:11.534570 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/opt/stack/new/networking-ovn/networking_ovn/common/maintenance.py", line 160, in _fix_create_update
Feb 01 20:42:11.534784 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance res_map['ovn_create'](n_obj)
Feb 01 20:42:11.535001 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/opt/stack/new/networking-ovn/networking_ovn/common/maintenance.py", line 261, in _create_lrouter_port
Feb 01 20:42:11.535203 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance {'port_id': port['id']})
Feb 01 20:42:11.535414 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/opt/stack/new/networking-ovn/networking_ovn/l3/l3_ovn.py", line 165, in add_router_interface
Feb 01 20:42:11.535626 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance context, router_id, interface_info)
Feb 01 20:42:11.535966 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/opt/stack/new/neutron/neutron/db/l3_db.py", line 1892, in add_router_interface
Feb 01 20:42:11.536265 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance context, router_id, interface_info)
Feb 01 20:42:11.536567 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/opt/stack/new/neutron/neutron/db/api.py", line 161, in wrapped
Feb 01 20:42:11.536915 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance return method(*args, **kwargs)
Feb 01 20:42:11.537208 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/opt/stack/new/neutron/neutron/db/api.py", line 91, in wrapped
Feb 01 20:42:11.537533 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance setattr(e, '_RETRY_EXCEEDED', True)
Feb 01 20:42:11.537870 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Feb 01 20:42:11.538243 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance self.force_reraise()
Feb 01 20:42:11.538546 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Feb 01 20:42:11.538828 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance six.reraise(self.type_, self.value, self.tb)
Feb 01 20:42:11.539123 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/opt/stack/new/neutron/neutron/db/api.py", line 87, in wrapped
Feb 01 20:42:11.539422 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance return f(*args, **kwargs)
Feb 01 20:42:11.539716 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/usr/local/lib/python2.7/dist-packages/oslo_db/api.py", line 147, in wrapper
Feb 01 20:42:11.539963 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance ectxt.value = e.inner_exc
Feb 01 20:42:11.540178 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Feb 01 20:42:11.540387 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance self.force_reraise()
Feb 01 20:42:11.540591 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Feb 01 20:42:11.542243 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance six.reraise(self.type_, self.value, self.tb)
Feb 01 20:42:11.542512 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/usr/local/lib/python2.7/dist-packages/oslo_db/api.py", line 135, in wrapper
Feb 01 20:42:11.542770 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance return f(*args, **kwargs)
Feb 01 20:42:11.542986 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/opt/stack/new/neutron/neutron/db/api.py", line 126, in wrapped
Feb 01 20:42:11.543194 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance LOG.debug("Retry wrapper got retriable exception: %s", e)
Feb 01 20:42:11.543400 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Feb 01 20:42:11.543636 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance self.force_reraise()
Feb 01 20:42:11.543854 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Feb 01 20:42:11.544052 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance six.reraise(self.type_, self.value, self.tb)
Feb 01 20:42:11.544252 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/opt/stack/new/neutron/neutron/db/api.py", line 122, in wrapped
Feb 01 20:42:11.544469 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance return f(*dup_args, **dup_kwargs)
Feb 01 20:42:11.544669 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/opt/stack/new/neutron/neutron/db/l3_db.py", line 835, in add_router_interface
Feb 01 20:42:11.544887 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance router = self._get_router(context, router_id)
Feb 01 20:42:11.545119 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance File "/opt/stack/new/neutron/neutron/db/l3_db.py", line 187, in _get_router
Feb 01 20:42:11.545359 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance raise l3_exc.RouterNotFound(router_id=router_id)
Feb 01 20:42:11.545562 ubuntu-xenial-rax-ord-0002331188 neutron-server[25811]: ERROR networking_ovn.common.maintenance RouterNotFound: Router network:router_gateway could not be found

And the fix fails.

Changed in networking-ovn:
assignee: nobody → Miguel Angel Ajo (mangelajo)
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.openstack.org/540391

Changed in networking-ovn:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/540391
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=251aabd857c5afa736c9cf9f5427bcc05f857e81
Submitter: Zuul
Branch: master

commit 251aabd857c5afa736c9cf9f5427bcc05f857e81
Author: Miguel Angel Ajo <email address hidden>
Date: Fri Feb 2 15:01:50 2018 +0100

    Fix error in router port maintenance code

    When a port needs to be created or updated on OVN side by the
    maintenance task we need to recover the router ID from the
    port, which is on the device_id field.
    At some point in time during development it got changed to
    device_owner, which is a logical name referring to the type
    of owner: compute, router, dhcp, etc...)

    Also, when getting the lrouter port from the NBDB we were
    using a uuid, while we needed to convert uuid to lrp port
    name.

    Those mistakes was causing the maintenance code uncapable of
    recovering router ports that were detected as outdated, or
    not created on the OVN side.

    Change-Id: Iea8060b11cc4076e6efc300dd9079ddafcc3fb5e
    Closes-Bug: 1746979

Changed in networking-ovn:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 5.0.0.0b1

This issue was fixed in the openstack/networking-ovn 5.0.0.0b1 development milestone.

tags: added: networking-ovn-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/573153

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/queens)

Reviewed: https://review.openstack.org/573153
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=fc458efc580f384122ac872c2bbcddf4ae061bd0
Submitter: Zuul
Branch: stable/queens

commit fc458efc580f384122ac872c2bbcddf4ae061bd0
Author: Miguel Angel Ajo <email address hidden>
Date: Fri Feb 2 15:01:50 2018 +0100

    Fix error in router port maintenance code

    When a port needs to be created or updated on OVN side by the
    maintenance task we need to recover the router ID from the
    port, which is on the device_id field.
    At some point in time during development it got changed to
    device_owner, which is a logical name referring to the type
    of owner: compute, router, dhcp, etc...)

    Also, when getting the lrouter port from the NBDB we were
    using a uuid, while we needed to convert uuid to lrp port
    name.

    Those mistakes was causing the maintenance code uncapable of
    recovering router ports that were detected as outdated, or
    not created on the OVN side.

    Change-Id: Iea8060b11cc4076e6efc300dd9079ddafcc3fb5e
    Closes-Bug: 1746979
    (cherry picked from commit 251aabd857c5afa736c9cf9f5427bcc05f857e81)

tags: added: in-stable-queens
Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

The last fix didn't fix it fully, re-opening the bug

Changed in networking-ovn:
status: Fix Released → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/580592
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=c84e1a93da9f8184ab11f856463908ee1139c268
Submitter: Zuul
Branch: master

commit c84e1a93da9f8184ab11f856463908ee1139c268
Author: Lucas Alvares Gomes <email address hidden>
Date: Fri Jul 6 09:50:24 2018 +0100

    Maintenance: Fix router ports

    Prior to this patch the maintenance thread would fail to fix a router
    port resource upon a failure in OVN to create it.

    The maintence thread was calling the add_router_interface() method from
    the l3_ovn.py module but, if the router port already existed in Neutron
    (but didn't exist in OVN) that method would fail with a PortInUse
    exception.

    This patch accounts for that problem and if the port router already
    exist in Neutron it will only fetch it from the database and continue
    the work until it's created in the OVN database as well.

    A new functional test has been added which to test this scenario.

    Closes-Bug: #1746979
    Change-Id: Iae500d17d0efe17f3460dc2f09356675d406abed

Changed in networking-ovn:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/584531

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 5.0.0.0b3

This issue was fixed in the openstack/networking-ovn 5.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/queens)

Reviewed: https://review.openstack.org/584531
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=ac81a8de11426e8b6908316012e48b2fbdda344f
Submitter: Zuul
Branch: stable/queens

commit ac81a8de11426e8b6908316012e48b2fbdda344f
Author: Lucas Alvares Gomes <email address hidden>
Date: Fri Feb 9 13:56:08 2018 +0000

    Maintenance: Fix router ports

    Prior to this patch the maintenance thread would fail to fix a router
    port resource upon a failure in OVN to create it.

    The maintence thread was calling the add_router_interface() method from
    the l3_ovn.py module but, if the router port already existed in Neutron
    (but didn't exist in OVN) that method would fail with a PortInUse
    exception.

    This patch accounts for that problem and if the port router already
    exist in Neutron it will only fetch it from the database and continue
    the work until it's created in the OVN database as well.

    A new functional test has been added which to test this scenario.

    Backport note: This is squashing also f69b68d7d36c79bba211648acd5939ee3e52cd7a
    which adds the functional tests to the maintenance feature.

    Closes-Bug: #1746979
    Change-Id: Iae500d17d0efe17f3460dc2f09356675d406abed
    (cherry-picked from commit c84e1a93da9f8184ab11f856463908ee1139c268)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 4.0.3

This issue was fixed in the openstack/networking-ovn 4.0.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.