[SRU] Infinite loop trying to delete deleted HA router

Bug #1668410 reported by Ann Taraday on 2017-02-27
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Security Advisory
Undecided
Unassigned
Ubuntu Cloud Archive
Undecided
Unassigned
Mitaka
High
Unassigned
neutron
Medium
Ann Taraday
neutron (Ubuntu)
Medium
Unassigned
Xenial
High
Unassigned

Bug Description

[Descriptoin]

When deleting a router the logfile is filled up. See full log - http://paste.ubuntu.com/25429257/

I can see the error 'Error while deleting router c0dab368-5ac8-4996-88c9-f5d345a774a6' occured 3343386 times from _safe_router_removed() [1]:

$ grep -r 'Error while deleting router c0dab368-5ac8-4996-88c9-f5d345a774a6' |wc -l
3343386

This _safe_router_removed() is invoked by L488 [2], if _safe_router_removed() goes wrong it will return False, then self._resync_router(update) [3] will make the code _safe_router_removed be run again and again. So we saw so many errors 'Error while deleting router XXXXX'.

[1] https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L361
[2] https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L488
[3] https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L457

[Test Case]

That's because race condition between neutron server and L3 agent, after neutron server deletes HA interfaces the L3 agent may sync a HA router without HA interface info (just need to trigger L708[1] after deleting HA interfaces and before deleting HA router). If we delete HA router at this time, this problem will happen. So test case we design is as below:

1, First update fixed package, and restart neutron-server by 'sudo service neutron-server restart'

2, Create ha_router

neutron router-create harouter --ha=True

3, Delete ports associated with ha_router before deleting ha_router

neutron router-port-list harouter |grep 'HA port' |awk '{print $2}' |xargs -l neutron port-delete
neutron router-port-list harouter

4, Update ha_router to trigger l3-agent to update ha_router info without ha_port into self.router_info

neutron router-update harouter --description=test

5, Delete ha_router this time

neutron router-delete harouter

[1] https://github.com/openstack/neutron/blob/mitaka-eol/neutron/db/l3_hamode_db.py#L708

[Regression Potential]

The fixed patch [1] for neutron-server will no longer return ha_router which is missing ha_ports, so L488 will no longer have chance to call _safe_router_removed() for a ha_router, so the problem has been fundamentally fixed by this patch and no regression potential.

Besides, this fixed patch has been in mitaka-eol branch now, and neutron-server mitaka package is based on neutron-8.4.0, so we need to backport it to xenial and mitaka.

$ git tag --contains 8c77ee6b20dd38cc0246e854711cb91cffe3a069
mitaka-eol

[1] https://review.openstack.org/#/c/440799/2/neutron/db/l3_hamode_db.py
[2] https://github.com/openstack/neutron/blob/mitaka-eol/neutron/agent/l3/agent.py#L488

Fix proposed to branch: master
Review: https://review.openstack.org/439185

Changed in neutron:
status: New → In Progress
Changed in neutron:
importance: Undecided → Medium

In the logs http://paste.openstack.org/show/601001/ for router 2fcdef4e-83fe-48b5-be0f-f45a631c1482 we get notification for router deletion, cleanup network, removing port and then started loop where router is trying to be deleted, but failed as HA port is already None.

Ann Taraday (akamyshnikova) wrote :

Change abandoned by Ann Taraday (<email address hidden>) on branch: master
Review: https://review.openstack.org/439185
Reason: can be fixed with https://review.openstack.org/#/c/365653/

Reviewed: https://review.openstack.org/440799
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8c77ee6b20dd38cc0246e854711cb91cffe3a069
Submitter: Jenkins
Branch: stable/mitaka

commit 8c77ee6b20dd38cc0246e854711cb91cffe3a069
Author: John Schwarz <email address hidden>
Date: Mon Sep 5 16:34:44 2016 +0300

    l3 ha: don't send routers without '_ha_interface'

    Change I22ff5a5a74527366da8f82982232d4e70e455570 changed
    get_ha_sync_data_for_host such that if an agent requests a router's
    details, then it is always returned, even when it doesn't have the key
    '_ha_interface'. Further changes to this change tried to put this check
    back in (Ie38baf061d678fc5d768195b25241efbad74e42f), but this patch
    failed to do so for the case where no bindings were returned (possible
    when the router has been concurrently deleted). This patch puts this
    check back in.

    Closes-Bug: #1607381
    Closes-bug: #1668410
    Change-Id: I047e53ea9b3e20a21051f29d0a44624e2a31c83c
    (cherry picked from commit 29cec0345617627b64a73b9de35c46bccdc4ffa3)

tags: added: in-stable-mitaka

may cause dos

information type: Public → Public Security
Jeremy Stanley (fungi) on 2017-08-04
Changed in ossa:
status: New → Incomplete
Jeremy Stanley (fungi) wrote :

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

Given this was purported to gave been fixed in master by https://review.openstack.org/365653 prior to the Newton release and it in turn claims to be fixing bug 1607381 (which itself makes mention of an infinite loop bug 1606844 which is also questioned as a possible dupe for bug 1605546, bug 1533441, bug 1533457 and bug 1605546, some of which are still open), it's not entirely clear to me the degree to which this has been solved so some summary from neutron-coresec reviewers would be particularly appreciated.

That aside, "denial of service" conditions arising from unconstrained resource consumption by authenticated users is a grey area we struggle with classifying. At some point, operators must have a means of identifying abuse by their users, locking them out and cleaning up the mess. In a "typical" production deployment servicing potentially risky users, how quickly can an abuser "fill up" your logs doing this? Will your monitoring system alert operations to the increase in activity and disk utilization in reasonable time for them to take mitigating action? Are deployments likely to include rate-limiting proxies which further throttle problem API calls such as these?

In most cases, we triage such reports as security hardening opportunities (class D in our taxonomy: https://security.openstack.org/vmt-process.html#incident-report-taxonomy ) and since this report is already public there's no harm in doing that for now while entertaining further discussion on whether it should be reclassed and any potential advisory issued.

Changed in ossa:
status: Incomplete → Won't Fix
information type: Public Security → Public
tags: added: security
Basel Darvish (dbassel) wrote :

a single infinite loop message has filled up disk space by logs in 6 hours. 50 such messages within 10 minutes (still not many enough to be throttled by a regular configuration of a rate-limiting proxy) could have filled up the disk a lot faster leaving no time for operators to react.

Changed in neutron (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Hua Zhang (zhhuabj) on 2017-08-30
description: updated
summary: - Infinite loop trying to delete deleted HA router
+ [SRU] Infinite loop trying to delete deleted HA router
description: updated
tags: added: sts sts-sru-needed
Hua Zhang (zhhuabj) wrote :
Hua Zhang (zhhuabj) on 2017-08-30
description: updated

The attachment "mitaka.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Hua Zhang (zhhuabj) on 2017-08-31
tags: added: sru-sponsors
Hua Zhang (zhhuabj) on 2017-08-31
tags: removed: sru-sponsors
Changed in neutron (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → High
Changed in neutron (Ubuntu):
status: Triaged → Invalid
Changed in cloud-archive:
status: New → Invalid
description: updated
Corey Bryant (corey.bryant) wrote :

Hi Hua,

Thanks for the patch. I've uploaded the package to the xenial review queue and it is awaiting SRU review: https://launchpad.net/ubuntu/xenial/+queue?queue_state=1&queue_text=

Corey

Hello Ann, or anyone else affected,

Accepted neutron into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:8.4.0-0ubuntu5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in neutron (Ubuntu Xenial):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-xenial
Hua Zhang (zhhuabj) wrote :

xenial-proposed has been verified successfully - http://paste.ubuntu.com/25520417/

tags: added: verification-done-xenial
removed: verification-needed verification-needed-xenial
tags: added: verification-done

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package neutron - 2:8.4.0-0ubuntu5

---------------
neutron (2:8.4.0-0ubuntu5) xenial; urgency=medium

  * d/p/l3-ha-don-t-send-routers-without-_ha_interface.patch: Backport fix for
    l3 ha: don't send routers without '_ha_interface' (LP: #1668410)

 -- Hua Zhang <email address hidden> Thu, 24 Aug 2017 12:19:23 +0800

Changed in neutron (Ubuntu Xenial):
status: Fix Committed → Fix Released

Hello Ann, or anyone else affected,

Accepted neutron into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:mitaka-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-mitaka-needed to verification-mitaka-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-mitaka-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-mitaka-needed
Corey Bryant (corey.bryant) wrote :

I've verified this successfully on cloud-archive:trusty-proposed with neutron 2:8.4.0-0ubuntu5~cloud0 using http://paste.ubuntu.com/25520417/ as well as verifying there are no tempest smoke test regressions.

tags: added: verification-mitaka-done
removed: verification-mitaka-needed

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Ryan Beisner (1chb1n) wrote :

This bug was fixed in the package neutron - 2:8.4.0-0ubuntu5~cloud0
---------------

 neutron (2:8.4.0-0ubuntu5~cloud0) trusty-mitaka; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:8.4.0-0ubuntu5) xenial; urgency=medium
 .
   * d/p/l3-ha-don-t-send-routers-without-_ha_interface.patch: Backport fix for
     l3 ha: don't send routers without '_ha_interface' (LP: #1668410)

tags: added: sts-sru-done
removed: sts-sru-needed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments