Namespace doesn't get deleted on vip/pool removal

Bug #1155092 reported by Tatiana Ovchinnikova
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Oleg Bondarev

Bug Description

Steps to reproduce (in Horizon):
1. Create one pool.
2. Create vip for it.
3. Create another pool.
4. Create vip for it.
5. Delete one vip.
6. Delete another vip.
7. Delete two pools at once.
8. Check qlbaas- namespaces. One namespace of corresponding pool doesn't get deleted and is unusable because of cleared permissions in /var/run/netns/

Tags: lbaas
description: updated
Revision history for this message
Tatiana Ovchinnikova (tmazur) wrote :
Download full text (4.9 KiB)

2013-03-14 04:49:25.945 6884 ERROR quantum.plugins.services.agent_loadbalancer.agent.manager [-] Unable to refresh device for pool: 5c721913-9115-4aea-
a3e9-a899d4e16c64
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager Traceback (most recent call last):
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/plugins/services/agent_
loadbalancer/agent/manager.py", line 184, in refresh_device
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager logical_config = self.plugin_rpc.get_logical_device(po
ol_id)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/plugins/services/agent_
loadbalancer/agent/api.py", line 47, in get_logical_device
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager topic=self.topic
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/openstack/common/rpc/pr
oxy.py", line 80, in call
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager return rpc.call(context, self._get_topic(topic), msg,
timeout)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/openstack/common/rpc/__
init__.py", line 140, in call
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager return _get_impl().call(CONF, context, topic, msg, tim
eout)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/openstack/common/rpc/im
pl_kombu.py", line 798, in call
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager rpc_amqp.get_connection_pool(conf, Connection))
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/openstack/common/rpc/am
qp.py", line 610, in call
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager rv = list(rv)
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/openstack/common/rpc/am
qp.py", line 559, in __iter__
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager raise result
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager AttributeError: 'NoneType' object has no attribute 'status
'
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager Traceback (most recent call last):
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.services.agent_loadbalancer.agent.manager File "/opt/stack/quantum/quantum/openstack/common/rpc/amqp.py", line 429, in _process_data
2013-03-14 04:49:25.945 6884 TRACE quantum.plugins.servi...

Read more...

Revision history for this message
Tatiana Ovchinnikova (tmazur) wrote :

Previous comment was an extract of loadbalancer agent's log.

Changed in quantum:
importance: Undecided → Medium
status: New → Confirmed
description: updated
Changed in quantum:
assignee: nobody → Eugene Nikanorov (enikanorov)
tags: added: lbaas
Changed in quantum:
status: Confirmed → In Progress
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

In fact, bug description contains repro steps for two separate issues happening:

1) unsuccessfull netns deletion
That happens more oftenly when trying to delete VIP when more than one VIP was created.

Consequence of this issue is that once VIP was deleted namespace remains in errorneous state which prevents it from either deletion or further usage for different VIP

2) Incorrect DB request for ready logical devices.
That happens when two vips and two pools existed and then one VIP was deleted.
query returns full join of vips,pools which leads to different errors.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Further analysis shows that it's possible to delete namespace after another namespace is used to restart haproxy.
At first glance those events seem unrelated, however issue is reproducible nearly at 100% cases with the following steps:
1)Create pool1
2)Create pool2
3)Create vip1 (for pool1)
3)Create vip2 (for pool2)
3)Delete vip1
Fails with "Device or resource busy"
The namespace could not be removed manually as well.

That causes state resync and restart of haproxy in the namespace of VIP2
After that, namespace of vip1/pool1 can be deleted manually.

That seems like some bug in netns support.

Revision history for this message
Isaku Yamahata (yamahata) wrote :

Discussion moved from
https://bugs.launchpad.net/quantum/+bug/1158589

I suppose the issues is
- after failing to delete network namespace, the network namespace can't be used anymore.
  i.e. ip netns exec fails.
- And, later the network namespace with same name can't be created because it already exists.

How about not using lazy umount?
I.e. instead of executing ip netns delete, does the following in python
- umount(/var/run/netns/<NAME>) (without MNT_DETACH. not lazy umount)
  => may results in EBUSY. => error
- unlink(/var/run/netns/<NAME>)
  If umount above successed, this should success.

With this, although the network namespace might not be deleted, the network namespace can be reused later.
Ideally, ip netns should support this kind of operation. And the patch should be merged into the upstream of iproute2.
But for now we can use this as work around.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

The following script helps to reproduce the issue and to verify the fix (considering devstack installation):

export OS_TENANT_NAME=demo
export OS_PASSWORD=password
export OS_USERNAME=demo
export OS_AUTH_URL=http://127.0.0.1:35357/v2.0

subnetid=`quantum net-list | grep private | awk '{print $6}'`
echo "SubnetID: $subnetid"
pool1id=`quantum lb-pool-create --lb-method ROUND_ROBIN --name p1 --protocol HTTP --subnet-id $subnetid | grep "| id" | awk '{print $4}'`
echo "PoolID 1: $pool1id"

pool2id=`quantum lb-pool-create --lb-method ROUND_ROBIN --name p1 --protocol HTTP --subnet-id $subnetid | grep "| id" | awk '{print $4}'`
echo "PoolID 2: $pool2id"

vip1id=`quantum lb-vip-create --name v1 --protocol-port 80 --protocol HTTP --address 10.0.0.4 --subnet-id $subnetid $pool1id | grep "| id" | awk '{print $4}'`
echo "VIPID 1: $vip1id"

vip2id=`quantum lb-vip-create --name v1 --protocol-port 80 --protocol HTTP --address 10.0.0.5 --subnet-id $subnetid $pool2id | grep "| id" | awk '{print $4}'`
echo "VIPID 2: $vip2id"

echo "Sleeping 30 sec"
sleep 30

echo "Deleting vip1"
quantum lb-vip-delete $vip1id

echo "Sleeping 1 min"
sleep 60

echo "Deleting vip2"
quantum lb-vip-delete $vip2id

echo "Deleting pools"
quantum lb-pool-delete $pool1id
quantum lb-pool-delete $pool2id

echo "Done."

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to quantum (master)

Fix proposed to branch: master
Review: https://review.openstack.org/25712

Gary Kotton (garyk)
tags: added: grizzly-backport-potential
Changed in neutron:
assignee: Eugene Nikanorov (enikanorov) → Oleg Bondarev (obondarev)
Alan Pevec (apevec)
tags: removed: grizzly-backport-potential
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Apparently that was an issue in system software that have been fixed in newer versions of ubuntu/centos.
Marking as Invalid

Changed in neutron:
status: In Progress → Won't Fix
Changed in neutron:
status: Won't Fix → In Progress
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Moving back to 'Won't fix'

Changed in neutron:
status: In Progress → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-lbaas (master)

Fix proposed to branch: master
Review: https://review.openstack.org/142471

Changed in neutron:
status: Won't Fix → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Oleg Bondarev (<email address hidden>) on branch: master
Review: https://review.openstack.org/82749
Reason: Was proposed for neutron-lbaas: https://review.openstack.org/#/c/142471/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-lbaas (master)

Reviewed: https://review.openstack.org/142471
Committed: https://git.openstack.org/cgit/openstack/neutron-lbaas/commit/?id=a627100d510d58d558982d2d5f5bebbe35634b18
Submitter: Jenkins
Branch: master

commit a627100d510d58d558982d2d5f5bebbe35634b18
Author: Oleg Bondarev <email address hidden>
Date: Wed Dec 17 19:26:51 2014 +0400

    Haproxy driver should respect vip/pool admin state

    On vip/pool update when admin_state_up becomes False
    haproxy driver should reflect it in the config.
    Currently there may be only one vip in the config and
    if it is disabled, haproxy process fails to restart with
    "[ALERT] 084/045122 (11407) : [haproxy.main()] No enabled
    listener found (check the <listen> keywords) ! Exiting.",
    and continues running and balancing with old config -
    so for this case we need to undeploy loadbalancer.
    The patch also moves namespace deletion to delete_pool()
    as there is no need to delete/recreate namespace each time
    the vip is removed/added or disabled/enabled

    Closes-Bug: #1155092
    Closes-Bug: #1297142
    Change-Id: I11e2bd3185328ba47ba1aaede932e3114263bed8

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-2 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.