Default scope rules added to router may drop traffic unexpectedly

Bug #1667755 reported by James Denton
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Invalid
Undecided
Unassigned

Bug Description

Release: OpenStack-Ansible 13.3.4 (Mitaka)

Scenario:

Neutron routers are connected to single provider network and single tenant network. Floating IPs are *not* used, and SNAT is disabled on the router:

+-------------------------+------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------------+------------------------------------------------------------------------------------------------------------------------------------+
| admin_state_up | True |
| availability_zone_hints | |
| availability_zones | nova |
| description | |
| distributed | False |
| external_gateway_info | {"network_id": "ce830329-4133-41fe-868f-698cc761e247", "enable_snat": false, "external_fixed_ips": [{"subnet_id": "cf34a5c3-5d26 |
| | -449f-b22e-2e3fdd69f262", "ip_address": "10.152.114.39"}]} |
| ha | False |
| id | c965e7a1-98c0-4d5e-8dcb-cfafc2667ee1 |
| name | RTR |
| routes | |
| status | ACTIVE |
| tenant_id | 2ed1712187674c64acae83948e5b1928 |
+-------------------------+------------------------------------------------------------------------------------------------------------------------------------+

Upstream routes exist that route tenant network traffic to the qg interface of the routes (static, not BGP - yet).

In some cases, we have found that inbound/outbound traffic is getting dropped within the Neutron qrouter namespace. Comparing to a working router, we have found some differences in iptables:

Working router:

*mangle
-A neutron-l3-agent-scope -i qr-3dd65e85-f2 -j MARK --set-xmark 0x4010000/0xffff0000
-A neutron-l3-agent-scope -i qg-2f55db22-5b -j MARK --set-xmark 0x4010000/0xffff0000

*filter
-A neutron-l3-agent-scope -o qr-3dd65e85-f2 -m mark ! --mark 0x4010000/0xffff0000 -j DROP
-A neutron-l3-agent-scope -o qg-2f55db22-5b -m mark ! --mark 0x4010000/0xffff0000 -j DROP

Non-working router:

*mangle
-A neutron-l3-agent-scope -i qg-e3f65cf1-29 -j MARK --set-xmark 0x4010000/0xffff0000
-A neutron-l3-agent-scope -i qr-125a3dc5-e3 -j MARK --set-xmark 0x4000000/0xffff0000

*filter
-A neutron-l3-agent-scope -o qg-e3f65cf1-29 -m mark ! --mark 0x4010000/0xffff0000 -j DROP
-A neutron-l3-agent-scope -o qr-125a3dc5-e3 -m mark ! --mark 0x4000000/0xffff0000 -j DROP

Our working theory is that the marks in filter rules on the non-working router are incorrectly set - traffic ingress to the qg interface is being marked as x401, and the egress filter on the qr interface is checking for x400. We were able to test this theory by swapping the marks on those two filter rules and observed that inbound/outbound traffic was working properly.

In the case of the working router, the mark set in the mangle rules is the same (x401 for both), so the filter rules work fine.

We are not sure at this time how the mark is determined, and while we can replicate the issue on new routers in the environment, we are unable to replicate this behavior in other environments at this time.

Please let us know if you need any additional info.

tags: added: sg-fw
tags: added: l3-ipam-dhcp
Revision history for this message
Kevin Benton (kevinbenton) wrote :

It looks like it thinks they are coming from different address scopes. Can you confirm by checking the address scope on both sides of the router via the API (ipv4_address_scope on the external network matches the ipv4_address_scope on the internal network)?

Revision history for this message
Sean Carlisle (sean-carlisle) wrote :

@kevinbenton Thanks for the advice! The address scope for the affected networks does match what it should be. However, I just took a second look at Jame's initial post and it looks like the wrong external network may have been specified during router creation. We'll take another look.

Thanks,

Sean

Revision history for this message
James Denton (james-denton) wrote :

Hey Kevin - Confirming now, but this may be a case of the provider and tenant networks not being in the same address scope. Folks working on verifying the config. Sorry for the fire drill - thanks for the quick response.

Revision history for this message
Sean Carlisle (sean-carlisle) wrote :

Disregard. That router information is not the router associated with the networks in question.

Revision history for this message
Luke Yildirim (luke.yildirim) wrote :

Kevin:

Thanks very much for the insight here. You were right. I did a mini test and verified. I had done another test where we re-created objects w/o address-scope and subnetpools but it didn't work since the provider network was part of a address-scope/subnetpool hence ending up in different marks in the iptables.

Regards,

Luke Yildirim

Revision history for this message
James Denton (james-denton) wrote :

It has been determined that the networks attached to the router were associated with different scopes. Additional testing has found the proper rules are being added. marking as invalid.

Changed in neutron:
status: New → Invalid
Revision history for this message
John Davidge (john-davidge) wrote :

Thanks for the update James :)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.