Bug #1155092 “Namespace doesn't get deleted on vip/pool removal” : Bugs : neutron

Eugene Nikanorov (enikanorov) on 2013-03-14

description:

updated

Revision history for this message

Tatiana Ovchinnikova (tmazur) wrote on 2013-03-14:

#1

Download full text (4.9 KiB)

2013-03-14 04:49:25.945 6884 ERROR quantum.plugins.services.agent_loadbalancer.agent.manager [-] Unable to refresh device for pool: 5c721913-9115-4aea-
a3e9-a899d4e16c64
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager Traceback (most recent call last):
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/plugins/services/agent_
loadbalancer/agent/manager.py", line 184, in refresh_device
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager logical_config = self.plugin_rpc.get_logical_device(po
ol_id)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/plugins/services/agent_
loadbalancer/agent/api.py", line 47, in get_logical_device
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager topic=self.topic
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/openstack/common/rpc/pr
oxy.py", line 80, in call
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager return rpc.call(context, self._get_topic(topic), msg,
timeout)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/openstack/common/rpc/__
init__.py", line 140, in call
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager return _get_impl().call(CONF, context, topic, msg, tim
eout)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/openstack/common/rpc/im
pl_kombu.py", line 798, in call
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager rpc_amqp.get_connection_pool(conf, Connection))
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/openstack/common/rpc/am
qp.py", line 610, in call
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager rv = list(rv)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/openstack/common/rpc/am
qp.py", line 559, in __iter__
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager raise result
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager AttributeError: 'NoneType' object has no attribute 'status
'
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager Traceback (most recent call last):
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/openstack/common/rpc/amqp.py", line 429, in _process_data
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.servi...

2013-03-14 04:49:25.945 6884 ERROR quantum.plugins.services.agent_loadbalancer.agent.manager [-] Unable to refresh device for pool: 5c721913-9115-4aea-
a3e9-a899d4e16c64
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager Traceback (most recent call last):
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager   File "/opt/stack/quantum/quantum/plugins/services/agent_
loadbalancer/agent/manager.py", line 184, in refresh_device
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager     logical_config = self.plugin_rpc.get_logical_device(po
ol_id)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager   File "/opt/stack/quantum/quantum/plugins/services/agent_
loadbalancer/agent/api.py", line 47, in get_logical_device
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager     topic=self.topic
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager   File "/opt/stack/quantum/quantum/openstack/common/rpc/pr
oxy.py", line 80, in call
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager     return rpc.call(context, self._get_topic(topic), msg, 
timeout)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager   File "/opt/stack/quantum/quantum/openstack/common/rpc/__
init__.py", line 140, in call
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager     return _get_impl().call(CONF, context, topic, msg, tim
eout)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager   File "/opt/stack/quantum/quantum/openstack/common/rpc/im
pl_kombu.py", line 798, in call
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager     rpc_amqp.get_connection_pool(conf, Connection))
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager   File "/opt/stack/quantum/quantum/openstack/common/rpc/am
qp.py", line 610, in call
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager     rv = list(rv)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager   File "/opt/stack/quantum/quantum/openstack/common/rpc/am
qp.py", line 559, in __iter__
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager     raise result
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager AttributeError: 'NoneType' object has no attribute 'status
'
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager Traceback (most recent call last):
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager   File "/opt/stack/quantum/quantum/openstack/common/rpc/amqp.py", line 429, in _process_data
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager     rval = self.proxy.dispatch(ctxt, version, method, **args)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager   File "/opt/stack/quantum/quantum/common/rpc.py", line 43, in dispatch
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager     quantum_ctxt, version, method, **kwargs)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager   File "/opt/stack/quantum/quantum/openstack/common/rpc/dispatcher.py", line 133, in dispatch
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager     return getattr(proxyobj, method)(ctxt, **kwargs)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager   File "/opt/stack/quantum/quantum/plugins/services/agent_loadbalancer/plugin.py", line 74, in get_logical_device
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager     if pool.vip.status in ACTIVE_PENDING:
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager AttributeError: 'NoneType' object has no attribute 'status'
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager

Revision history for this message

Tatiana Ovchinnikova (tmazur) wrote on 2013-03-14:

#2

Previous comment was an extract of loadbalancer agent's log.

Eugene Nikanorov (enikanorov) on 2013-03-14

Changed in quantum:
importance:	Undecided → Medium
status:	New → Confirmed

Eugene Nikanorov (enikanorov) on 2013-03-14

description:	updated
Changed in quantum:
assignee:	nobody → Eugene Nikanorov (enikanorov)

Mark McClain (markmcclain) on 2013-03-14

tags:

added: lbaas

Eugene Nikanorov (enikanorov) on 2013-03-14

Changed in quantum:
status:	Confirmed → In Progress

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2013-03-18:

#3

In fact, bug description contains repro steps for two separate issues happening:

1) unsuccessfull netns deletion
That happens more oftenly when trying to delete VIP when more than one VIP was created.

Consequence of this issue is that once VIP was deleted namespace remains in errorneous state which prevents it from either deletion or further usage for different VIP

2) Incorrect DB request for ready logical devices.
That happens when two vips and two pools existed and then one VIP was deleted.
query returns full join of vips,pools which leads to different errors.

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2013-03-20:

#4

Further analysis shows that it's possible to delete namespace after another namespace is used to restart haproxy.
At first glance those events seem unrelated, however issue is reproducible nearly at 100% cases with the following steps:
1)Create pool1
2)Create pool2
3)Create vip1 (for pool1)
3)Create vip2 (for pool2)
3)Delete vip1
Fails with "Device or resource busy"
The namespace could not be removed manually as well.

That causes state resync and restart of haproxy in the namespace of VIP2
After that, namespace of vip1/pool1 can be deleted manually.

That seems like some bug in netns support.

Revision history for this message

Isaku Yamahata (yamahata) wrote on 2013-03-27:

#5

Discussion moved from
https://bugs.launchpad.net/quantum/+bug/1158589

I suppose the issues is
- after failing to delete network namespace, the network namespace can't be used anymore.
i.e. ip netns exec fails.
- And, later the network namespace with same name can't be created because it already exists.

How about not using lazy umount?
I.e. instead of executing ip netns delete, does the following in python
- umount(/var/run/netns/<NAME>) (without MNT_DETACH. not lazy umount)
=> may results in EBUSY. => error
- unlink(/var/run/netns/<NAME>)
If umount above successed, this should success.

With this, although the network namespace might not be deleted, the network namespace can be reused later.
Ideally, ip netns should support this kind of operation. And the patch should be merged into the upstream of iproute2.
But for now we can use this as work around.

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2013-03-29:

#6

The following script helps to reproduce the issue and to verify the fix (considering devstack installation):

export OS_TENANT_NAME=demo
export OS_PASSWORD=password
export OS_USERNAME=demo
export OS_AUTH_URL=http://127.0.0.1:35357/v2.0

subnetid=`quantum net-list | grep private | awk '{print $6}'`
echo "SubnetID: $subnetid"
pool1id=`quantum lb-pool-create --lb-method ROUND_ROBIN --name p1 --protocol HTTP --subnet-id $subnetid | grep "| id" | awk '{print $4}'`
echo "PoolID 1: $pool1id"

pool2id=`quantum lb-pool-create --lb-method ROUND_ROBIN --name p1 --protocol HTTP --subnet-id $subnetid | grep "| id" | awk '{print $4}'`
echo "PoolID 2: $pool2id"

vip1id=`quantum lb-vip-create --name v1 --protocol-port 80 --protocol HTTP --address 10.0.0.4 --subnet-id $subnetid $pool1id | grep "| id" | awk '{print $4}'`
echo "VIPID 1: $vip1id"

vip2id=`quantum lb-vip-create --name v1 --protocol-port 80 --protocol HTTP --address 10.0.0.5 --subnet-id $subnetid $pool2id | grep "| id" | awk '{print $4}'`
echo "VIPID 2: $vip2id"

echo "Sleeping 30 sec"
sleep 30

echo "Deleting vip1"
quantum lb-vip-delete $vip1id

echo "Sleeping 1 min"
sleep 60

echo "Deleting vip2"
quantum lb-vip-delete $vip2id

echo "Deleting pools"
quantum lb-pool-delete $pool1id
quantum lb-pool-delete $pool2id

echo "Done."

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-03-29: Fix proposed to quantum (master)

#7

Fix proposed to branch: master
Review: https://review.openstack.org/25712

Gary Kotton (garyk) on 2013-04-05

tags:

added: grizzly-backport-potential

OpenStack Infra (hudson-openstack) on 2014-03-27

Changed in neutron:
assignee:	Eugene Nikanorov (enikanorov) → Oleg Bondarev (obondarev)

Alan Pevec (apevec) on 2014-03-31

tags:

removed: grizzly-backport-potential

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2014-08-07:

#8

Apparently that was an issue in system software that have been fixed in newer versions of ubuntu/centos.
Marking as Invalid

Changed in neutron:
status:	In Progress → Won't Fix

OpenStack Infra (hudson-openstack) on 2014-08-25

Changed in neutron:
status:	Won't Fix → In Progress

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2014-11-21:

#9

Moving back to 'Won't fix'

Changed in neutron:
status:	In Progress → Won't Fix

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-17: Fix proposed to neutron-lbaas (master)

#10

Fix proposed to branch: master
Review: https://review.openstack.org/142471

Changed in neutron:
status:	Won't Fix → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-30: Change abandoned on neutron (master)

#11

Change abandoned by Oleg Bondarev (<email address hidden>) on branch: master
Review: https://review.openstack.org/82749
Reason: Was proposed for neutron-lbaas: https://review.openstack.org/#/c/142471/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-31: Fix merged to neutron-lbaas (master)

#12

Reviewed: https://review.openstack.org/142471
Committed: https://git.openstack.org/cgit/openstack/neutron-lbaas/commit/?id=a627100d510d58d558982d2d5f5bebbe35634b18
Submitter: Jenkins
Branch: master

commit a627100d510d58d558982d2d5f5bebbe35634b18
Author: Oleg Bondarev <email address hidden>
Date: Wed Dec 17 19:26:51 2014 +0400

Haproxy driver should respect vip/pool admin state

    On vip/pool update when admin_state_up becomes False
    haproxy driver should reflect it in the config.
    Currently there may be only one vip in the config and
    if it is disabled, haproxy process fails to restart with
    "[ALERT] 084/045122 (11407) : [haproxy.main()] No enabled
    listener found (check the <listen> keywords) ! Exiting.",
    and continues running and balancing with old config -
    so for this case we need to undeploy loadbalancer.
    The patch also moves namespace deletion to delete_pool()
    as there is no need to delete/recreate namespace each time
    the vip is removed/added or disabled/enabled

    Closes-Bug: #1155092
    Closes-Bug: #1297142
    Change-Id: I11e2bd3185328ba47ba1aaede932e3114263bed8

Changed in neutron:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2015-02-05

Changed in neutron:
milestone:	none → kilo-2
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2015-04-30

Changed in neutron:
milestone:	kilo-2 → 2015.1.0

neutron

Namespace doesn't get deleted on vip/pool removal

Bug Description

Other bug subscribers

Remote bug watches