Networking-odl trying to delete a router which was never pushed to ODL

Bug #1738246 reported by Sridhar Gaddam
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-odl
Fix Released
High
Manjeet Singh Bhatia

Bug Description

When running Browbeat+Rally scenario with netcreate-boot-ping scenario (where we spawn a VM, associate a FIP and then try to ping the FIP with concurrency set to 5 and times set to 500), we observed the following in neutron server logs.

2017-12-01 12:05:16.021 948413 ERROR networking_odl.common.client Traceback (most recent call last):
2017-12-01 12:05:16.021 948413 ERROR networking_odl.common.client File "/usr/lib/python2.7/site-packages/networking_odl/common/client.py", line 136, in _check_response
2017-12-01 12:05:16.021 948413 ERROR networking_odl.common.client response.raise_for_status()
2017-12-01 12:05:16.021 948413 ERROR networking_odl.common.client File "/usr/lib/python2.7/site-packages/requests/models.py", line 862, in raise_for_status
2017-12-01 12:05:16.021 948413 ERROR networking_odl.common.client raise HTTPError(http_error_msg, response=self)
2017-12-01 12:05:16.021 948413 ERROR networking_odl.common.client HTTPError: 404 Client Error: Not Found for url: http://172.16.0.13:8081/controller/nb/v2/neutron/routers/659a6590-1523-49c7-a845-b5b195e3505c
2017-12-01 12:05:16.021 948413 ERROR networking_odl.common.client
2017-12-01 12:05:16.022 948413 ERROR networking_odl.common.client [req-ec69f007-0638-46bd-b884-8aa140469c4d - - - - -] REST request ( delete ) to url ( routers/659a6590-1523-49c7-a845-b5b195e3505c ) is failed. Request body : [None] service: HTTPError: 404 Client Error: Not Found for url: http://172.16.0.13:8081/controller/nb/v2/neutron/routers/659a6590-1523-49c7-a845-b5b195e3505c
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal [req-ec69f007-0638-46bd-b884-8aa140469c4d - - - - -] Error while processing (Entry ID: 3013) - delete router 659a6590-1523-49c7-a845-b5b195e3505c: HTTPError: 404 Client Error: Not Found for url: http://172.16.0.13:8081/controller/nb/v2/neutron/routers/659a6590-1523-49c7-a845-b5b195e3505c
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal Traceback (most recent call last):
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal File "/usr/lib/python2.7/site-packages/networking_odl/journal/journal.py", line 298, in _sync_entry
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal self.client.sendjson(method, urlpath, to_send)
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal File "/usr/lib/python2.7/site-packages/networking_odl/common/client.py", line 106, in sendjson
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal 'body': obj})
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal self.force_reraise()
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal six.reraise(self.type_, self.value, self.tb)
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal File "/usr/lib/python2.7/site-packages/networking_odl/common/client.py", line 98, in sendjson
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal self.request(method, urlpath, data))
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal File "/usr/lib/python2.7/site-packages/networking_odl/common/client.py", line 140, in _check_response
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal {'e': error, 'text': response.text}, exc_info=1)
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal self.force_reraise()
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal six.reraise(self.type_, self.value, self.tb)
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal File "/usr/lib/python2.7/site-packages/networking_odl/common/client.py", line 136, in _check_response
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal response.raise_for_status()
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal File "/usr/lib/python2.7/site-packages/requests/models.py", line 862, in raise_for_status
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal raise HTTPError(http_error_msg, response=self)
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal HTTPError: 404 Client Error: Not Found for url: http://172.16.0.13:8081/controller/nb/v2/neutron/routers/659a6590-1523-49c7-a845-b5b195e3505c
2017-12-01 12:05:16.023 948413 ERROR networking_odl.journal.journal

I took a random neutron router UUID (i.e., 21e9ccb0-cb43-40d6-811c-f63fcbce8607) for which we were seeing the HTTPError 404.
In the logs, there is no "create router *" for this UUID. The first reference to this UUID was as part of router_gateway port creation.
Then we see the following log

2017-12-01 12:07:08.088 948417 DEBUG neutron.api.rpc.agentnotifiers.l3_rpc_agent_api [req-b5532c6b-c2c0-46b5-a461-1ad7a04190a8 4f3065ec9a8b40ab8291b285cdd33fdd f505a5e332c5463c92ac74c48badc286 - default default] Fanout notify agent at l3_agent the message router_deleted on router 21e9ccb0-cb43-40d6-811c-f63fcbce8607 _notification_fanout /usr/lib/python2.7/site-packages/neutron/api/rpc/agentnotifiers/l3_rpc_agent_api.py:118

So something seems to be going wrong on the networking-odl side as its trying to delete a resource (i.e., router) which was never pushed to ODL.

Revision history for this message
Sridhar Gaddam (sridhargaddam) wrote :

The issue is seen with Pike codebase.

Revision history for this message
Sridhar Gaddam (sridhargaddam) wrote :

Neutron Server Logs

Revision history for this message
Isaku Yamahata (yamahata) wrote :

At least there are two bugs known for long time.
one is odl l3 v2 plugin is broken and potential issue in ODL Neutron northbound.
Let's fix them and see its outcome to check if there still remains this issue.

Changed in networking-odl:
importance: Undecided → High
status: New → Triaged
Changed in networking-odl:
assignee: nobody → Manjeet Singh Bhatia (manjeet-s-bhatia)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-odl (master)

Reviewed: https://review.openstack.org/570789
Committed: https://git.openstack.org/cgit/openstack/networking-odl/commit/?id=8c5551f2ba57a035090be1ac00f73a397d274a9e
Submitter: Zuul
Branch: master

commit 8c5551f2ba57a035090be1ac00f73a397d274a9e
Author: Mike Kolesnik <email address hidden>
Date: Mon May 28 16:51:38 2018 +0300

    Cleanup l3_odl_v2 code

    Some cleanup in preparation for fixing bug #1738246:
     * Remove session.begin code since it doesn't actually do anything.
     * Simplify dependency list construction in delete_floatingip
     * Remove comment from rcurran since it's already handled
     * Remove get_floatingip mocking from tests set up since it doesn't do
       anything but will interfere with any proper testing (next patch).

    Change-Id: I6e718d3943a0c958d6717ba242c069a855e07c83
    Partial-Bug: #1738246

Changed in networking-odl:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/570775
Committed: https://git.openstack.org/cgit/openstack/networking-odl/commit/?id=19143fed7b975ee91bca67e64e3fb68f28bc9438
Submitter: Zuul
Branch: master

commit 19143fed7b975ee91bca67e64e3fb68f28bc9438
Author: Mike Kolesnik <email address hidden>
Date: Thu May 24 10:53:28 2018 +0300

    Retry journal recording in L3

    Since journal recording can fail, and is not run in the same
    transaction as router/FIP operations, it will lead to the resource
    (router or FIP) to remain altered while a journmal record hasn't been
    created.

    L3 Flavors should solve this issue comprehensively, but we need a
    solution until it's stable, which can also be backported to stable
    branches. Hence, this fix should address 99% or the cases.

    Change-Id: Ibef0db9a58d86a85ffd948a35db983863999d319
    Closes-Bug: #1738246

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-odl (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/575126

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/575476

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-odl (stable/queens)

Reviewed: https://review.openstack.org/575126
Committed: https://git.openstack.org/cgit/openstack/networking-odl/commit/?id=0ffa67285b9aac89d1d7249f5a95a408f3a0dba6
Submitter: Zuul
Branch: stable/queens

commit 0ffa67285b9aac89d1d7249f5a95a408f3a0dba6
Author: Mike Kolesnik <email address hidden>
Date: Mon May 28 16:51:38 2018 +0300

    Cleanup l3_odl_v2 code

    Some cleanup in preparation for fixing bug #1738246:
     * Remove session.begin code since it doesn't actually do anything.
     * Simplify dependency list construction in delete_floatingip
     * Remove comment from rcurran since it's already handled
     * Remove get_floatingip mocking from tests set up since it doesn't do
       anything but will interfere with any proper testing (next patch).

    Change-Id: I6e718d3943a0c958d6717ba242c069a855e07c83
    Partial-Bug: #1738246
    (cherry picked from commit 8c5551f2ba57a035090be1ac00f73a397d274a9e)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/575476
Committed: https://git.openstack.org/cgit/openstack/networking-odl/commit/?id=2531dc6ebb938c3eb5edb95b38f8e56e9bc79cf5
Submitter: Zuul
Branch: stable/queens

commit 2531dc6ebb938c3eb5edb95b38f8e56e9bc79cf5
Author: Mike Kolesnik <email address hidden>
Date: Thu May 24 10:53:28 2018 +0300

    Retry journal recording in L3

    Since journal recording can fail, and is not run in the same
    transaction as router/FIP operations, it will lead to the resource
    (router or FIP) to remain altered while a journmal record hasn't been
    created.

    L3 Flavors should solve this issue comprehensively, but we need a
    solution until it's stable, which can also be backported to stable
    branches. Hence, this fix should address 99% or the cases.

    Change-Id: Ibef0db9a58d86a85ffd948a35db983863999d319
    Closes-Bug: #1738246
    (cherry picked from commit 19143fed7b975ee91bca67e64e3fb68f28bc9438)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-odl 13.0.0.0b3

This issue was fixed in the openstack/networking-odl 13.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-odl 12.0.1

This issue was fixed in the openstack/networking-odl 12.0.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.