Double removal of floating IP in nova-network

Bug #1268569 reported by Dmitry Pyzhov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
jichenjc

Bug Description

It is possible to send two DELETE requests via api and in_use.floating_ips will be decreased by two. It can be changed even to value below zero:

Logs:
<182>Dec 19 17:52:08 node-2 nova-nova.osapi_compute.wsgi.server INFO: 172.16.0.2 "DELETE /v2/46f4516b6fbd461eb30f17409
36c2167/os-floating-ips/1 HTTP/1.1" status: 202 len: 209 time: 0.0797279
<182>Dec 19 17:52:08 node-2 nova-nova.osapi_compute.wsgi.server INFO: (19517) accepted ('172.16.0.2', 47613)
<182>Dec 19 17:52:08 node-2 nova-nova.osapi_compute.wsgi.server INFO: 172.16.0.2 "DELETE /v2/46f4516b6fbd461eb30f17409
36c2167/os-floating-ips/1 HTTP/1.1" status: 202 len: 209 time: 0.0804379
<182>Dec 19 17:52:08 node-2 nova-nova.osapi_compute.wsgi.server INFO: (19517) accepted ('172.16.0.2', 47615)
<0>Dec 19 17:52:08 node-2 <BF><180>nova-nova.db.sqlalchemy.api WARNING: Change will make usage less than 0 for the fol
lowing resources: ['floating_ips']

Database:
mysql> select resource,in_use from quota_usages;
+-----------------+--------+
| resource | in_use |
+-----------------+--------+
| security_groups | 0 |
| instances | 0 |
| ram | 0 |
| cores | 0 |
| fixed_ips | 0 |
| floating_ips | -1 |
+-----------------+--------+
6 rows in set (0.00 sec)

Changed in nova:
assignee: nobody → sahid (sahid-ferdjaoui)
Changed in nova:
status: New → Confirmed
status: Confirmed → New
Matt Riedemann (mriedem)
tags: added: api network
Revision history for this message
Christopher Yeoh (cyeoh-0) wrote :

sahid - are you still working on this?

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Christopher Yeoh (cyeoh-0) wrote :

Looks like we have a race here.

Changed in nova:
assignee: sahid (sahid-ferdjaoui) → nobody
Revision history for this message
jichenjc (jichenjc) wrote :

looks to me it can be a race condition like chris said

following are my guess because more log as context might help us a lot
Dmitry Pyzhov , do you have detailed log as reference?

we support bulk_delete floating ips and delete float ips
so 2 API might be called simultaneously, is it possible?

Changed in nova:
assignee: nobody → jichencom (jichenjc)
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

There was an error in our script and there were two identical delete requests via api at the same time. Sorry, but we did not saved log files. But issue was easily reproducible a month ago.

Revision history for this message
jichenjc (jichenjc) wrote :

ok, so my guess is not correct, another thing want to confirm is 2 delete means 2 delete instance operations or 2 delete floating ip(disassociate) operations? I will try to recreate it

Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Delete of the same floating ip from the project, two times. In nova-network.

Revision history for this message
jichenjc (jichenjc) wrote :

According to this bug https://bugs.launchpad.net/nova/+bug/1268569

I am wondering someone can help me in understanding the concurrent access prevention on api layer?

if we have more than 1 nova-api server, let's say 2(thread A and thread B) ,so both of them should be able to handle request from user
let's say we delete same floating ip 2 times at almost same time,
if thread A and B both pass the test in following logic (that's possible because they run concurrently)

api/openstack/compute/contrib/floating_ips.py
def delete(self, req, id):
..........
try:
            floating_ip = self.network_api.get_floating_ip(context, id)
        except (exception.NotFound, exception.InvalidID):
            msg = _("Floating ip not found for id %s") % id
            raise webob.exc.HTTPNotFound(explanation=msg)
..........

then both of them will call following function
self.network_api.release_floating_ip(context, address)

at last function deallocate_floating_ip in nova/network/float_ips.py will be called by both A and B ,this will not lead to logic error
because db layer will not return error if it can't find the floating_ip
but quota will be reserve and commit 2 times, that lead to wrong quota error

Could someone help to clarify my understanding or any help on what kind of concurrent access prevention we have ?

Revision history for this message
jichenjc (jichenjc) wrote :

my analysis above, posted to openstack-dev but no response until now ....
will see whether can find help through IRC

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/77829

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/77829
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cbbb9de51b52793f6d424c58199b22a422d39aed
Submitter: Jenkins
Branch: master

commit cbbb9de51b52793f6d424c58199b22a422d39aed
Author: jichenjc <email address hidden>
Date: Fri Feb 28 04:55:46 2014 +0800

    Add lock on API layer delete floating IP

    There are potential race condition when multiple nova-api
    service is running. We need to prevent the concurrent access to
    the same floating ip, though db layer function is ok for floating
    ip, the quota will be released more than one time and it will
    lead to potential problem later

    So the solution: check whether the db is *really* updated with the
    value we want, which means the second operation will fail and
    return None, so network api layer can handle it

    Change-Id: I028938a4c387d21a8880ae168b161a763e7361ff
    Closes-Bug: #1268569

Changed in nova:
status: In Progress → Fix Committed
tags: added: havana-backport-potential icehouse-backport-potential
Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.