[dvr] router remove subnet <router> <subnet> silently fails

Bug #1759918 reported by Dmitrii Shcherbakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Scenario: Queens, DVR without L3 HA, distributed non-HA virtual router (pubrouter), all subnets are attached to 2 different subnet pools, all of them have one global address scope so DVR "fast exit" is triggered (https://review.openstack.org/#/c/474007/), floating ips are not used, snat is not enabled.

Commands:
openstack address scope create dev
openstack subnet pool create --address-scope dev --pool-prefix 10.232.40.0/21 --pool-prefix 10.232.16.0/21 dev
openstack subnet pool create --address-scope dev --pool-prefix 192.168.100.0/24 tenant
openstack network create --external --provider-physical-network physnet1 --provider-network-type flat pubnet
openstack network segment set --name segment1 d8391bfb-4466-4a45-972c-45ffcec9f6bc
openstack network segment create --physical-network physnet2 --network-type flat --network pubnet segment2
openstack subnet create --no-dhcp --subnet-pool dev --subnet-range 10.232.16.0/21 --allocation-pool start=10.232.17.0,end=10.232.17.255 --dns-nameserver 10.232.36.101 --ip-version 4 --network pubnet --network-segment segment1 pubsubnetl1
openstack subnet create --gateway 10.232.40.100 --no-dhcp --subnet-pool dev --subnet-range 10.232.40.0/21 --allocation-pool start=10.232.41.0,end=10.232.41.255 --dns-nameserver 10.232.36.101 --ip-version 4 --network pubnet --network-segment segment2 pubsubnetl2
openstack network create --internal --provider-network-type vxlan tenantnet
 openstack subnet create --dhcp --ip-version 4 --subnet-range 192.168.100.0/24 --subnet-pool tenant --dns-nameserver 10.232.36.101 --network tenantnet tenantsubnet
openstack router create --disable --no-ha --distributed pubrouter
openstack router set --disable-snat --external-gateway pubnet --enable pubrouter
openstack network create --internal --provider-network-type vxlan othertenantnet
openstack subnet pool set --pool-prefix 192.168.200.0/24 tenant
openstack subnet create --dhcp --ip-version 4 --subnet-range 192.168.200.0/24 --subnet-pool tenant --dns-nameserver 10.232.36.101 --network othertenantnet othertenantsubnet
openstack router add subnet pubrouter othertenantsubnet

outputs in case they are needed: https://pastebin.canonical.com/p/fRQTxRKYCt/

Note: (This setup uses routed provider networks so unit names correspond to nodes that have connectivity to the right physnets, however, this is irrelevant for this bug)
l1 - leaf 1
l2 - leaf 2)

openstack subnet show tenantsubnet | grep cid
| cidr | 192.168.100.0/24 |

openstack subnet show othertenantsubnet | grep cid
| cidr | 192.168.200.0/24 |

# 2 qr- interfaces per namespace on every compute - one per tenant network

juju run --application neutron-gateway-l2,neutron-gateway-l1,neutron-openvswitch-l1,neutron-openvswitch-l2 'sudo ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip -4 -o -br a s'
- Stdout: "lo UNKNOWN 127.0.0.1/8 \nqr-a9696fa7-96@if23 UP 192.168.100.1/24
    \nqr-ad410866-0c@if24 UP 192.168.200.1/24 \nrfp-4f9ca9ef-3 UP 169.254.109.46/31
    \n"
  UnitId: neutron-gateway-l1/0
- Stdout: "lo UNKNOWN 127.0.0.1/8 \nqr-a9696fa7-96@if26 UP 192.168.100.1/24
    \nqr-ad410866-0c@if28 UP 192.168.200.1/24 \nrfp-4f9ca9ef-3 UP 169.254.109.46/31
    \n"
  UnitId: neutron-gateway-l2/0
- Stdout: "lo UNKNOWN 127.0.0.1/8 \nrfp-4f9ca9ef-3@if3 UP 169.254.109.46/31
    \nqr-a9696fa7-96 UNKNOWN 192.168.100.1/24 \nqr-ad410866-0c UNKNOWN
    \ 192.168.200.1/24 \n"
  UnitId: neutron-openvswitch-l1/0
- Stdout: "lo UNKNOWN 127.0.0.1/8 \nrfp-4f9ca9ef-3@if3 UP 169.254.109.46/31
    \nqr-a9696fa7-96 UNKNOWN 192.168.100.1/24 \nqr-ad410866-0c UNKNOWN
    \ 192.168.200.1/24 \n"
  UnitId: neutron-openvswitch-l1/1
- Stdout: "lo UNKNOWN 127.0.0.1/8 \nrfp-4f9ca9ef-3@if3 UP 169.254.109.46/31
    \nqr-a9696fa7-96 UNKNOWN 192.168.100.1/24 \nqr-ad410866-0c UNKNOWN
    \ 192.168.200.1/24 \n"
  UnitId: neutron-openvswitch-l1/2
- Stdout: "lo UNKNOWN 127.0.0.1/8 \nrfp-4f9ca9ef-3@if3 UP 169.254.109.46/31
    \nqr-a9696fa7-96 UNKNOWN 192.168.100.1/24 \nqr-ad410866-0c UNKNOWN
    \ 192.168.200.1/24 \n"
  UnitId: neutron-openvswitch-l2/0
- Stdout: "lo UNKNOWN 127.0.0.1/8 \nrfp-4f9ca9ef-3@if3 UP 169.254.109.46/31
    \nqr-a9696fa7-96 UNKNOWN 192.168.100.1/24 \nqr-ad410866-0c UNKNOWN
    \ 192.168.200.1/24 \n"
  UnitId: neutron-openvswitch-l2/1

# removed 192.168.200.0/24 from pubrouter
openstack router remove subnet pubrouter othertenantsubnet

# ports are still there
juju run --application neutron-gateway-l2,neutron-gateway-l1,neutron-openvswitch-l1,neutron-openvswitch-l2 'sudo ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip -4 -o -br a s'
- Stdout: "lo UNKNOWN 127.0.0.1/8 \nqr-a9696fa7-96@if23 UP 192.168.100.1/24
    \nqr-ad410866-0c@if24 UP 192.168.200.1/24 \nrfp-4f9ca9ef-3 UP 169.254.109.46/31
    \n"
  UnitId: neutron-gateway-l1/0
- Stdout: "lo UNKNOWN 127.0.0.1/8 \nrfp-4f9ca9ef-3@if3 UP 169.254.109.46/31
    \nqr-a9696fa7-96 UNKNOWN 192.168.100.1/24 \nqr-ad410866-0c UNKNOWN
    \ 192.168.200.1/24 \n"
  UnitId: neutron-openvswitch-l1/0
- Stdout: "lo UNKNOWN 127.0.0.1/8 \nrfp-4f9ca9ef-3@if3 UP 169.254.109.46/31
    \nqr-a9696fa7-96 UNKNOWN 192.168.100.1/24 \nqr-ad410866-0c UNKNOWN
    \ 192.168.200.1/24 \n"
  UnitId: neutron-openvswitch-l1/1
- Stdout: "lo UNKNOWN 127.0.0.1/8 \nrfp-4f9ca9ef-3@if3 UP 169.254.109.46/31
    \nqr-a9696fa7-96 UNKNOWN 192.168.100.1/24 \nqr-ad410866-0c UNKNOWN
    \ 192.168.200.1/24 \n"
  UnitId: neutron-openvswitch-l1/2
- Stdout: "lo UNKNOWN 127.0.0.1/8 \nrfp-4f9ca9ef-3@if3 UP 169.254.109.46/31
    \nqr-a9696fa7-96 UNKNOWN 192.168.100.1/24 \nqr-ad410866-0c UNKNOWN
    \ 192.168.200.1/24 \n"
  UnitId: neutron-openvswitch-l2/0
- Stdout: "lo UNKNOWN 127.0.0.1/8 \nrfp-4f9ca9ef-3@if3 UP 169.254.109.46/31
    \nqr-a9696fa7-96 UNKNOWN 192.168.100.1/24 \nqr-ad410866-0c UNKNOWN
    \ 192.168.200.1/24 \n"
  UnitId: neutron-openvswitch-l2/1
- Stdout: "lo UNKNOWN 127.0.0.1/8 \nqr-a9696fa7-96@if26 UP 192.168.100.1/24
    \nqr-ad410866-0c@if28 UP 192.168.200.1/24 \nrfp-4f9ca9ef-3 UP 169.254.109.46/31
    \n"
  UnitId: neutron-gateway-l2/0

# but not policy rules

juju run --application neutron-gateway-l2,neutron-gateway-l1,neutron-openvswitch-l1,neutron-openvswitch-l2 'sudo ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule'
- Stdout: "0:\tfrom all lookup local \n32766:\tfrom all lookup main \n32767:\tfrom
    all lookup default \n80000:\tfrom 192.168.100.0/24 lookup 16 \n80000:\tfrom 192.168.200.0/24
    lookup 16 \n"
  UnitId: neutron-gateway-l1/0
- Stdout: "0:\tfrom all lookup local \n32766:\tfrom all lookup main \n32767:\tfrom
    all lookup default \n80000:\tfrom 192.168.100.0/24 lookup 16 \n80000:\tfrom 192.168.200.0/24
    lookup 16 \n"
  UnitId: neutron-openvswitch-l1/0
- Stdout: "0:\tfrom all lookup local \n32766:\tfrom all lookup main \n32767:\tfrom
    all lookup default \n80000:\tfrom 192.168.100.0/24 lookup 16 \n80000:\tfrom 192.168.200.0/24
    lookup 16 \n"
  UnitId: neutron-openvswitch-l1/1
- Stdout: "0:\tfrom all lookup local \n32766:\tfrom all lookup main \n32767:\tfrom
    all lookup default \n80000:\tfrom 192.168.100.0/24 lookup 16 \n80000:\tfrom 192.168.200.0/24
    lookup 16 \n"
  UnitId: neutron-openvswitch-l1/2
- Stdout: "0:\tfrom all lookup local \n32766:\tfrom all lookup main \n32767:\tfrom
    all lookup default \n80000:\tfrom 192.168.100.0/24 lookup 16 \n80000:\tfrom 192.168.200.0/24
    lookup 16 \n"
  UnitId: neutron-openvswitch-l2/0
- Stdout: "0:\tfrom all lookup local \n32766:\tfrom all lookup main \n32767:\tfrom
    all lookup default \n80000:\tfrom 192.168.100.0/24 lookup 16 \n80000:\tfrom 192.168.200.0/24
    lookup 16 \n"
  UnitId: neutron-openvswitch-l2/1
- Stdout: "0:\tfrom all lookup local \n32766:\tfrom all lookup main \n32767:\tfrom
    all lookup default \n80000:\tfrom 192.168.100.0/24 lookup 16 \n80000:\tfrom 192.168.200.0/24
    lookup 16 \n"
  UnitId: neutron-gateway-l2/0

Tags: cpe-onsite
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

A subnet is not even removed from a router after "openstack router remove subnet <id>"

openstack router show pubrouter -f value -c interfaces_info && openstack router remove subnet pubrouter 6694cc70-7667-4583-8eec-1decb19063c9 && openstack router show pubrouter -f value -c interfaces_info
[{"subnet_id": "40a17fc3-30f2-4991-8541-a65e9717dd28", "ip_address": "192.168.100.1", "port_id": "a9696fa7-967c-4789-97c1-6ae47733ac0d"}, {"subnet_id": "6694cc70-7667-4583-8eec-1decb19063c9", "ip_address": "192.168.200.1", "port_id": "ad410866-0c90-46db-8cae-eb9f28e336fa"}, {"subnet_id": "40a17fc3-30f2-4991-8541-a65e9717dd28", "ip_address": "192.168.100.11", "port_id": "b8dcfe58-b6e8-4237-ac90-5245bd495664"}, {"subnet_id": "6694cc70-7667-4583-8eec-1decb19063c9", "ip_address": "192.168.200.12", "port_id": "ea140f1d-d006-4f77-b2c1-85d6fba94a4a"}]
[{"subnet_id": "40a17fc3-30f2-4991-8541-a65e9717dd28", "ip_address": "192.168.100.1", "port_id": "a9696fa7-967c-4789-97c1-6ae47733ac0d"}, {"subnet_id": "6694cc70-7667-4583-8eec-1decb19063c9", "ip_address": "192.168.200.1", "port_id": "ad410866-0c90-46db-8cae-eb9f28e336fa"}, {"subnet_id": "40a17fc3-30f2-4991-8541-a65e9717dd28", "ip_address": "192.168.100.11", "port_id": "b8dcfe58-b6e8-4237-ac90-5245bd495664"}, {"subnet_id": "6694cc70-7667-4583-8eec-1decb19063c9", "ip_address": "192.168.200.12", "port_id": "ea140f1d-d006-4f77-b2c1-85d6fba94a4a"}]

description: updated
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

On a disabled router:
https://paste.ubuntu.com/p/8kpcq8hfbp/

Consecutive attempts to remove a subnet result in the following on the neutron API side:

https://paste.ubuntu.com/p/bxVHgmbgr9/
remove_router_interface failed (client error): There was a conflict when trying to complete your request.

summary: - [dvr] ip policy rules for tenant networks do not get deleted in qrouter
- namespaces after a router port is removed from a tenant network
+ [dvr] router remove subnet <router> <subnet> silently fails
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

SELECT 1 failure seems to be OK:
http://docs.sqlalchemy.org/en/latest/core/pooling.html#disconnect-handling-pessimistic

So this fails:

remove_router_interface failed (client error): There was a conflict when trying to complete your request.

remove_router_interface
https://github.com/openstack/neutron/blob/stable/queens/neutron/db/l3_db.py#L1002-L1005

openstack port list --router pubrouter | grep 168.200
| ad410866-0c90-46db-8cae-eb9f28e336fa | | fa:16:3e:3a:0b:f8 | ip_address='192.168.200.1', subnet_id='6694cc70-7667-4583-8eec-1decb19063c9' | ACTIVE |
| ea140f1d-d006-4f77-b2c1-85d6fba94a4a | | fa:16:3e:a6:f3:a0 | ip_address='192.168.200.12', subnet_id='6694cc70-7667-4583-8eec-1decb19063c9' | ACTIVE |

One port is

binding_vif_type | ovs
device_owner | network:router_centralized_snat

The other is:

binding_vif_type | distributed
device_owner | network:router_interface_distributed

https://paste.ubuntu.com/p/zTP3SzCBxM/

ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip a s | grep 200
    inet 192.168.200.1/24 brd 192.168.200.255 scope global qr-ad410866-0c

From an instance I can ping 192.168.200.1 but not 192.168.200.12:

https://paste.ubuntu.com/p/Hw8rr4bCFb/

Deleting ports manually is not possible:

https://paste.ubuntu.com/p/Znnb6t4HPP/
cannot be deleted directly via the port API: has device owner network:router_centralized_snat
cannot be deleted directly via the port API: has device owner network:router_interface_distributed

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Download full text (4.0 KiB)

"cannot be deleted directly via the port API" is by design it seems:

https://bugs.launchpad.net/neutron/+bug/1425504

However, deleting via a subnet should be possible.

Client debug log revealed the issue (I also had a static route configured to 192.168.200.0/24 on pubrouter via neutron extra routes extension)

openstack router remove subnet 4f9ca9ef-303b-4082-abbc-e50782d9b800 6694cc70-7667-4583-8eec-1decb19063c9 --debug

http://10.232.1.207:9696 "PUT /v2.0/routers/4f9ca9ef-303b-4082-abbc-e50782d9b800/remove_router_interface HTTP/1.1" 409 257
RESP: [409] Content-Type: application/json Content-Length: 257 X-Openstack-Request-Id: req-afac4d67-d309-4bcd-83d6-bfd08d6a6d9b Date: Thu, 29 Mar 2018 19:30:24 GMT Connection: keep-alive
RESP BODY: {"NeutronError": {"message": "Router interface for subnet 6694cc70-7667-4583-8eec-1decb19063c9 on router 4f9ca9ef-303b-4082-abbc-e50782d9b800 cannot be deleted, as it is required by one or more routes.", "type": "RouterInterfaceInUseByRoute", "detail": ""}}

openstack router show pubrouter -c routes -f value
destination='8.8.8.8/32', gateway='192.168.200.10'

openstack router set --no-route pubrouter

After that a route got deleted

openstack router remove subnet 4f9ca9ef-303b-4082-abbc-e50782d9b800 6694cc70-7667-4583-8eec-1decb19063c9 --debug

PUT call to network for http://10.232.1.207:9696/v2.0/routers/4f9ca9ef-303b-4082-abbc-e50782d9b800/remove_router_interface used request id req-cf7a0614-5fd5-4842-aef4-422be3ab96b1
Manager RegionOne ran task network.PUT.routers.remove_router_interface in 4.53609895706s

However, policy rules for 192.168.200.0/24 were kept but were deleted for 192.168.100.0/24 which should not have happened.

juju run --application neutron-gateway-l2,neutron-gateway-l1,neutron-openvswitch-l1,neutron-openvswitch-l2 'sudo ip netns exec qrouter-4f9ca9ef-303b-4082-abbc-e50782d9b800 ip rule'
- Stdout: "0:\tfrom all lookup local \n32766:\tfrom all lookup main \n32767:\tfrom
    all lookup default \n80000:\tfrom 192.168.200.0/24 lookup 16 \n"
  UnitId: neutron-gateway-l1/0
- Stdout: "0:\tfrom all lookup local \n32766:\tfrom all lookup main \n32767:\tfrom
    all lookup default \n80000:\tfrom 192.168.200.0/24 lookup 16 \n"
  UnitId: neutron-openvswitch-l1/0
- Stdout: "0:\tfrom all lookup local \n32766:\tfrom all lookup main \n32767:\tfrom
    all lookup default \n80000:\tfrom 192.168.200.0/24 lookup 16 \n"
  UnitId: neutron-openvswitch-l1/1
- Stdout: "0:\tfrom all lookup local \n32766:\tfrom all lookup main \n32767:\tfrom
    all lookup default \n80000:\tfrom 192.168.200.0/24 lookup 16 \n"
  UnitId: neutron-openvswitch-l1/2
- Stdout: "0:\tfrom all lookup local \n32766:\tfrom all lookup main \n32767:\tfrom
    all lookup default \n80000:\tfrom 192.168.200.0/24 lookup 16 \n"
  UnitId: neutron-openvswitch-l2/0
- Stdout: "0:\tfrom all lookup local \n32766:\tfrom all lookup main \n32767:\tfrom
    all lookup default \n80000:\tfrom 192.168.200.0/24 lookup 16 \n"
  UnitId: neutron-openvswitch-l2/1
- Stdout: "0:\tfrom all lookup local \n32766:\tfrom all lookup main \n32767:\tfrom
    all lookup default \n80000:\tfrom 192.168.200.0/24 lookup 16 \n"
  UnitId: neutron-gateway-l2/0

Disabling and re-en...

Read more...

Changed in neutron (Ubuntu):
status: New → Invalid
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

The policy rule issue now lives here https://bugs.launchpad.net/neutron/+bug/1759956

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.