When a fip is added to a vm with dvr, previous connections loss the connectivity

Bug #1818824 reported by Candido Campos Rivas
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Low
Unassigned

Bug Description

The behavior wihout DVR is that the previous connections continue using the snat ip and the new connections go to use the fip.
then without dvr:
 -1 add fip -> no delete conntrack flows
 -2 delete fip -> delete conntrack flows

 I don know if 1 is a expected behavior or a bug. Delete the conntrack entries in the 1 and 2 would be a simpler solution(less casuistic).

 then first we should be sure of the desired behavior when a fip is added, becuase it is not working with DVR.

 If the decission isn't maintain the old connections then:

  -with and without DVR the conntrack entries shoud be deleted.

 If the decission is maintais the old connection then:

  -the fix only would be necessary for DVR and it consists in create a way for the external traffic without nat(in the qrouter of the compute, the nat is done in the controller qrouter )(*).

Related bug: https://bugs.launchpad.net/neutron/+bug/1818805

"Conntrack rules in the qrouter are not deleted when a fip is removed with dvr"

(*)

 With dvr the external traffic is managed using pbr(in the qrouter):

 without fip:

[root@compute-2 heat-admin]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: rfp-d01c89b0-c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:09:90:b6:24:47 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.106.114/31 scope global rfp-d01c89b0-c
       valid_lft forever preferred_lft forever
    inet6 fe80::409:90ff:feb6:2447/64 scope link
       valid_lft forever preferred_lft forever
43: qr-c47b0417-7d: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:d9:7d:01 brd ff:ff:ff:ff:ff:ff
    inet 10.2.0.1/24 brd 10.2.0.255 scope global qr-c47b0417-7d
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fed9:7d01/64 scope link
       valid_lft forever preferred_lft forever
[root@compute-2 heat-admin]# ip r
10.2.0.0/24 dev qr-c47b0417-7d proto kernel scope link src 10.2.0.1
169.254.106.114/31 dev rfp-d01c89b0-c proto kernel scope link src 169.254.106.114
[root@compute-2 heat-admin]# ip rule list
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
167903233: from 10.2.0.1/24 lookup 167903233
[root@compute-2 heat-admin]# ip r show table 167903233
default via 10.2.0.8 dev qr-c47b0417-7d
[root@compute-2 heat-admin]# ip route get 8.8.8.8 from 10.2.0.12 iif qr-c47b0417-7d
8.8.8.8 from 10.2.0.12 via 10.2.0.8 dev qr-c47b0417-7d
    cache iif *

The traffic is sent to the qrouter in the controller

With fip:

[root@compute-1 heat-admin]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: rfp-d01c89b0-c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:5e:ac:51:83:7a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.106.114/31 scope global rfp-d01c89b0-c
       valid_lft forever preferred_lft forever
    inet6 fe80::45e:acff:fe51:837a/64 scope link
       valid_lft forever preferred_lft forever
23: qr-c47b0417-7d: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:d9:7d:01 brd ff:ff:ff:ff:ff:ff
    inet 10.2.0.1/24 brd 10.2.0.255 scope global qr-c47b0417-7d
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fed9:7d01/64 scope link
       valid_lft forever preferred_lft forever
[root@compute-1 heat-admin]# ip r
10.2.0.0/24 dev qr-c47b0417-7d proto kernel scope link src 10.2.0.1
169.254.106.114/31 dev rfp-d01c89b0-c proto kernel scope link src 169.254.106.114
[root@compute-1 heat-admin]# ip rule list
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
57483: from 10.2.0.12 lookup 16
167903233: from 10.2.0.1/24 lookup 167903233
[root@compute-1 heat-admin]# ip r show table 16
default via 169.254.106.115 dev rfp-d01c89b0-c
[root@compute-1 heat-admin]#
[root@compute-1 heat-admin]# ip route get 8.8.8.8 from 10.2.0.12 iif qr-c47b0417-7d
8.8.8.8 from 10.2.0.12 via 169.254.106.115 dev rfp-d01c89b0-c
    cache iif *

The traffic is sent to the fip namestpace to out directly without pass for the controller.

The problem is that the traffic of old connections with contrack entries is sent to the fip name space without snat, but this traffic should be sent to the controller in the same way than in the scenario without fip.

traffic before adding fip:

[root@compute-1 heat-admin]#
[root@compute-1 heat-admin]# tcpdump -nei rfp-d01c89b0-c
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on rfp-d01c89b0-c, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
[root@compute-1 heat-admin]# tcpdump -nei qr-c47b0417-7d
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on qr-c47b0417-7d, link-type EN10MB (Ethernet), capture size 262144 bytes
11:44:06.503464 fa:16:3e:6e:0e:f7 > fa:16:3e:d9:7d:01, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 32, length 64
11:44:06.503515 fa:16:3e:d9:7d:01 > fa:16:3e:53:66:de, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 32, length 64
11:44:07.504016 fa:16:3e:6e:0e:f7 > fa:16:3e:d9:7d:01, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 33, length 64
11:44:07.504071 fa:16:3e:d9:7d:01 > fa:16:3e:53:66:de, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 33, length 64
11:44:08.504556 fa:16:3e:6e:0e:f7 > fa:16:3e:d9:7d:01, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 34, length 64
11:44:08.504616 fa:16:3e:d9:7d:01 > fa:16:3e:53:66:de, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 34, length 64

traffic after adding fip:

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on rfp-d01c89b0-c, link-type EN10MB (Ethernet), capture size 262144 bytes
11:46:42.580817 06:5e:ac:51:83:7a > 32:b6:27:4f:1e:1e, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 188, length 64
11:46:43.581222 06:5e:ac:51:83:7a > 32:b6:27:4f:1e:1e, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 189, length 64
11:46:44.581685 06:5e:ac:51:83:7a > 32:b6:27:4f:1e:1e, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 190, length 64
11:46:45.582043 06:5e:ac:51:83:7a > 32:b6:27:4f:1e:1e, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 191, length 64
11:46:46.582698 06:5e:ac:51:83:7a > 32:b6:27:4f:1e:1e, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 192, length 64
11:46:47.583142 06:5e:ac:51:83:7a > 32:b6:27:4f:1e:1e, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 193, length 64
11:46:48.583620 06:5e:ac:51:83:7a > 32:b6:27:4f:1e:1e, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 194, length 64
11:46:49.584134 06:5e:ac:51:83:7a > 32:b6:27:4f:1e:1e, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 195, length 64
11:46:50.584554 06:5e:ac:51:83:7a > 32:b6:27:4f:1e:1e, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 196, length 64
11:46:51.585165 06:5e:ac:51:83:7a > 32:b6:27:4f:1e:1e, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 197, length 64
11:46:52.585599 06:5e:ac:51:83:7a > 32:b6:27:4f:1e:1e, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 198, length 64
^C
11 packets captured
11 packets received by filter
0 packets dropped by kernel
[root@compute-1 heat-admin]# tcpdump -nei qr-c47b0417-7d
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on qr-c47b0417-7d, link-type EN10MB (Ethernet), capture size 262144 bytes
11:46:57.587549 fa:16:3e:6e:0e:f7 > fa:16:3e:d9:7d:01, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 203, length 64
11:46:57.599214 fa:16:3e:6e:0e:f7 > fa:16:3e:d9:7d:01, ethertype ARP (0x0806), length 42: Request who-has 10.2.0.1 tell 10.2.0.12, length 28
11:46:57.599238 fa:16:3e:d9:7d:01 > fa:16:3e:6e:0e:f7, ethertype ARP (0x0806), length 42: Reply 10.2.0.1 is-at fa:16:3e:d9:7d:01, length 28
11:46:58.588062 fa:16:3e:6e:0e:f7 > fa:16:3e:d9:7d:01, ethertype IPv4 (0x0800), length 98: 10.2.0.12 > 8.8.8.8: ICMP echo request, id 46081, seq 204, length 64
^C
4 packets captured
4 packets received by filter
0 packets dropped by kernel
[root@compute-1 heat-admin]# contrack -L
bash: contrack: command not found
[root@compute-1 heat-admin]# conntrack -L
icmp 1 29 src=10.2.0.12 dst=8.8.8.8 type=8 code=0 id=46081 [UNREPLIED] src=8.8.8.8 dst=10.2.0.12 type=0 code=0 id=46081 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.

tags: added: l3-dvr-backlog
Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

You mentioned that there is another bug related to this.
Is this a duplicate of the original one.1818805

Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

the behaviour difference between legacy and dvr looks plausible. i'm not sure how it should be fixed though.

Changed in neutron:
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
Candido Campos Rivas (ccamposr) wrote :

the 1818805 is not a duplicate. It is deleting the fip.

The plroblem in this case is that the contract flows related to the fip aren't deleted when the fip is deleted. In this case the fix is clear :).

But I put it as related because the two bug are related to the fip resouces management.

Related bug: https://bugs.launchpad.net/neutron/+bug/1818805

"Conntrack rules in the qrouter are not deleted when a fip is removed with dvr"

Revision history for this message
Gabriele Cerami (gcerami) wrote :

Newbie here and there's a lot to learn. Expect slow resolution. If someone thinks this is too urgent to be assigned to me, please tell me.

Changed in neutron:
assignee: nobody → Gabriele Cerami (gcerami)
Revision history for this message
Gabriele Cerami (gcerami) wrote :

The only way to keep the connections open is to maintain the old routing path as an exception for the existing connections.
We could differentiate all the old connections updating the entries with a mark, then use the mark to route the traffic differently.

summary of the steps, following the example above.

- Mark the existing connections.
  conntrack -U --src 10.2.0.12 --mark $OLDCONNMARK
- add iptables rule that would mark all packages with the same mark.
  iptables -t mangle -A PREROUTING -m connmark $OLDCONNMARK -j CONNMARK --restore-mark
- add rule to route packages with this mark not using the FIP.
  ip rule from 10.2.0.12 fwmark $OLDCONNMARK table 167903233
- add the FIP

Does it make sense ? Am I missing something, like should the return traffic be altered in some way, are filter rules altered when a FIP is used for a VM ? Should we tell the controller in some way to not stop snatting the VM ?

While waiting for feedback, I'll try to understand where in the code I could implement these steps, and find the correct tables/namespaces to apply them to.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/656665

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Brian Haley (brian-haley) wrote :

Getting back to Candido's original question: should we continue using the default SNAT IP once a floating IP is assigned?

I think continuing to use the SNAT IP was an oversight and we should use the Floating IP once it's assigned. Thoughts?

Revision history for this message
Ryan Tidwell (ryan-tidwell) wrote :

I agree that continuing to use the SNAT IP is an oversight. The semantics of associating a FIP imply that all communication across the router changes to a 1-1 NAT of the fixed IP to the floating IP. To me that implies that we should be using the FIP as soon as it is associated.

Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: Gabriele Cerami (gcerami) → nobody
status: In Progress → New
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.opendev.org/656665
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.