SNAT HA failed because of missing nat rule in snat namespace iptable

Bug #1593354 reported by Hao Chen
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
Undecided
Unassigned

Bug Description

I have a mitaka openstack deployment with neutron DVR enabled. When I try to test the snat HA failover I found that even though the snat namespace was created on the other backup node, it doesn't has any nat rule in snat namespace iptable. And run "ip a" in the sant namespace you will find the sg port is missing.

Here is what I found on the second neutron network node

sandy-pistachio:/opt/openstack # ip netns
qrouter-e25b81f9-8810-4654-9be0-ebac09c700fb
qdhcp-abe36e89-f7a5-4cbd-a7e4-852d80ed92d6
snat-e25b81f9-8810-4654-9be0-ebac09c700fb

sandy-pistachio:/opt/openstack # ip netns exec snat-e25b81f9-8810-4654-9be0-ebac09c700fb ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
70: qg-cc3b2f8c-b7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether fa:16:3e:cb:27:cd brd ff:ff:ff:ff:ff:ff
    inet 10.240.117.98/28 brd 10.240.117.111 scope global qg-cc3b2f8c-b7
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fecb:27cd/64 scope link
       valid_lft forever preferred_lft forever

sandy-pistachio:/opt/openstack # ip netns exec snat-e25b81f9-8810-4654-9be0-ebac09c700fb iptables -L -n -v -t nat
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source destination

Here are the package information:

provo-pistachio:/opt/openstack # zypper info openstack-neutron
Loading repository data...
Reading installed packages...

Information for package openstack-neutron:
------------------------------------------
Repository: Mitaka
Name: openstack-neutron
Version: 8.1.1~a0~dev32-2.1
Arch: noarch
Vendor: obs://build.opensuse.org/Cloud:OpenStack
Installed: Yes
Status: up-to-date
Installed Size: 235.1 KiB
Summary: OpenStack Network
Description:
  Neutron is a virtual network service for Openstack.

  Just like OpenStack Nova provides an API to dynamically request and
  configure virtual servers, Neutron provides an API to dynamically
  request and configure virtual networks. These networks connect
  "interfaces" from other OpenStack services (e.g., vNICs from Nova VMs).
  The Neutron API supports extensions to provide advanced network
  capabilities (e.g., QoS, ACLs, network monitoring, etc)

provo-pistachio:/opt/openstack # zypper info openstack-neutron-openvswitch-agent
Loading repository data...
Reading installed packages...

Information for package openstack-neutron-openvswitch-agent:
------------------------------------------------------------
Repository: Mitaka
Name: openstack-neutron-openvswitch-agent
Version: 8.1.1~a0~dev32-2.1
Arch: noarch
Vendor: obs://build.opensuse.org/Cloud:OpenStack
Installed: Yes
Status: up-to-date
Installed Size: 14.9 KiB
Summary: OpenStack Network - Open vSwitch
Description:
  This package provides the OpenVSwitch Agent.

provo-pistachio:/opt/openstack # zypper info openstack-neutron-l3-agent
Loading repository data...
Reading installed packages...

Information for package openstack-neutron-l3-agent:
---------------------------------------------------
Repository: Mitaka
Name: openstack-neutron-l3-agent
Version: 8.1.1~a0~dev32-2.1
Arch: noarch
Vendor: obs://build.opensuse.org/Cloud:OpenStack
Installed: Yes
Status: up-to-date
Installed Size: 24.7 KiB
Summary: OpenStack Network Service (Neutron) - L3 Agent
Description:
  This package provides the L3 Agent.

Hao Chen (chenh1987)
description: updated
Revision history for this message
Assaf Muller (amuller) wrote :

Check out https://bugs.launchpad.net/neutron/+bug/1571113, let me know if it's the same bug.

tags: added: l3-dvr-backlog l3-ha
Revision history for this message
Hao Chen (chenh1987) wrote :
Download full text (3.7 KiB)

I believe this is not the same issue as https://bugs.launchpad.net/neutron/+bug/1571113, here is what I got before the node failure and everything looks good:
wichita-citron:/opt/openstack # ip netns exec snat-c83cfa28-685e-4363-859e-18400b27ee4f ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
161: qg-97c348f7-d9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether fa:16:3e:9f:f9:cf brd ff:ff:ff:ff:ff:ff
    inet 10.240.127.2/32 scope global qg-97c348f7-d9
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe9f:f9cf/64 scope link
       valid_lft forever preferred_lft forever
163: sg-019439c9-de: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether fa:16:3e:c2:7d:2c brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.3/24 brd 192.168.1.255 scope global sg-019439c9-de
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fec2:7d2c/64 scope link
       valid_lft forever preferred_lft forever

wichita-citron:/opt/openstack # neutron router-port-list admin-router -c fixed_ips
+-------------------------------------------------------------------------------------+
| fixed_ips |
+-------------------------------------------------------------------------------------+
| {"subnet_id": "3c0d5091-1349-4b42-aede-c62717c83b72", "ip_address": "192.168.1.3"} |
| {"subnet_id": "3c0d5091-1349-4b42-aede-c62717c83b72", "ip_address": "192.168.1.1"} |
| {"subnet_id": "5e0665ec-020b-4626-98c0-29490f675c7d", "ip_address": "10.240.127.2"} |
+-------------------------------------------------------------------------------------+

Then I reboot this active neutron router node which has the snat namespace on it. The failover happened and I see snat namespace was created on the other network node. The router-port list still give me the right result as below, only the "sg-019439c9-de" interface is disappeared from the snat namespace

honolulu-citron:/opt/openstack # neutron router-port-list admin-router -c fixed_ips
+-------------------------------------------------------------------------------------+
| fixed_ips |
+-------------------------------------------------------------------------------------+
| {"subnet_id": "3c0d5091-1349-4b42-aede-c62717c83b72", "ip_address": "192.168.1.3"} |
| {"subnet_id": "3c0d5091-1349-4b42-aede-c62717c83b72", "ip_address": "192.168.1.1"} |
| {"subnet_id": "5e0665ec-020b-4626-98c0-29490f675c7d", "ip_address": "10.240.127.2"} |
+-------------------------------------------------------------------------------------+

honolulu-citron:/opt/openstack # ip netns exec snat-c83cfa28-685e-4363-859e-18400b27ee4f ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00...

Read more...

Revision history for this message
John Schwarz (jschwarz) wrote :

So I'm unable to reproduce this on master (didn't try on mitaka). Basically I set up 2 dvr_snat nodes and performed the reboot as specified in comment #2 on the master node - the failover of the other node was immediate.

Also note that starting from Newton, all nodes have the snat- namespace (and interfaces, only without IPs) to allow for faster failover).

Hao, can you please provide the keepalived version you used in your deployment?

Changed in neutron:
status: New → Incomplete
John Schwarz (jschwarz)
Changed in neutron:
status: Incomplete → New
Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

I did verify it in Mitaka and I don't see any issues with the 'sg' port and related rules with respect to failover.

So we can close this issue as we discussed last week.

Changed in neutron:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.