DVR, Floating IPs are not working. Failed sending gratuitous ARP

Bug #1607398 reported by Jack Ivanov on 2016-07-28
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Undecided
Brian Haley

Bug Description

Hello, I'm trying to use DVR and floating IPs, but it does not work.
When I'm trying to associate a floating IP with VM, I see on the compute node:

2016-07-28 14:26:31.893 125513 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'fip-4f2774d1-dfb8-4833-8374-806e1fc40827', 'arping', '-A', '-I', 'fg-86481da8-4c', '-c', '3', '-w', '4.5', '172.16.48.6'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:100
2016-07-28 14:26:31.912 125513 ERROR neutron.agent.linux.utils [-] Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot assign requested address

2016-07-28 14:26:31.912 125513 ERROR neutron.agent.linux.ip_lib [-] Failed sending gratuitous ARP to 172.16.48.6 on fg-86481da8-4c in namespace fip-4f2774d1-dfb8-4833-8374-806e1fc40827
2016-07-28 14:26:31.912 125513 ERROR neutron.agent.linux.ip_lib Traceback (most recent call last):
2016-07-28 14:26:31.912 125513 ERROR neutron.agent.linux.ip_lib File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 1040, in _arping
2016-07-28 14:26:31.912 125513 ERROR neutron.agent.linux.ip_lib ip_wrapper.netns.execute(arping_cmd, check_exit_code=True)
2016-07-28 14:26:31.912 125513 ERROR neutron.agent.linux.ip_lib File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 927, in execute
2016-07-28 14:26:31.912 125513 ERROR neutron.agent.linux.ip_lib log_fail_as_error=log_fail_as_error, **kwargs)
2016-07-28 14:26:31.912 125513 ERROR neutron.agent.linux.ip_lib File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 140, in execute
2016-07-28 14:26:31.912 125513 ERROR neutron.agent.linux.ip_lib raise RuntimeError(msg)
2016-07-28 14:26:31.912 125513 ERROR neutron.agent.linux.ip_lib RuntimeError: Exit code: 2; Stdin: ; Stdout: ; Stderr: bind: Cannot assign requested address
2016-07-28 14:26:31.912 125513 ERROR neutron.agent.linux.ip_lib
2016-07-28 14:26:31.912 125513 ERROR neutron.agent.linux.ip_lib
2016-07-28 14:26:31.948 125513 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 67908aabc9bd446493cd22af8cccbd59 __call__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:302
2016-07-28 14:26:31.949 125513 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'fip-4f2774d1-dfb8-4833-8374-806e1fc40827', 'ip', '-o', 'link', 'show', 'fpr-a5e261f2-9'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:100

[root@node13 ~]# sysctl net.ipv4.ip_nonlocal_bind
net.ipv4.ip_nonlocal_bind = 1

[root@node13 ~]# ip netns exec fip-4f2774d1-dfb8-4833-8374-806e1fc40827 sysctl net.ipv4.ip_nonlocal_bind
net.ipv4.ip_nonlocal_bind = 1

Remote ping is not working:
[root@node13 ~]# ping -c2 172.16.48.6
PING 172.16.48.6 (172.16.48.6) 56(84) bytes of data.
^C
--- 172.16.48.6 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

Ping into the namespace is working:
[root@node13 ~]# ip netns
fip-4f2774d1-dfb8-4833-8374-806e1fc40827
qrouter-a5e261f2-991c-497c-adcd-b1e9e1a8a001

[root@node13 ~]# ip netns exec fip-4f2774d1-dfb8-4833-8374-806e1fc40827 ping -c2 172.16.48.6
PING 172.16.48.6 (172.16.48.6) 56(84) bytes of data.
64 bytes from 172.16.48.6: icmp_seq=1 ttl=63 time=0.290 ms
64 bytes from 172.16.48.6: icmp_seq=2 ttl=63 time=0.260 ms

--- 172.16.48.6 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.260/0.275/0.290/0.015 ms

Additional information:
[root@node13 ~]# ip netns exec fip-4f2774d1-dfb8-4833-8374-806e1fc40827 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
9: fpr-a5e261f2-9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether ce:cd:c7:4d:b8:c2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.126.51/31 scope global fpr-a5e261f2-9
       valid_lft forever preferred_lft forever
    inet6 fe80::cccd:c7ff:fe4d:b8c2/64 scope link
       valid_lft forever preferred_lft forever
333: fg-86481da8-4c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
    link/ether fa:16:3e:3b:ac:ac brd ff:ff:ff:ff:ff:ff
    inet 172.16.48.4/22 brd 172.16.51.255 scope global fg-86481da8-4c
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe3b:acac/64 scope link
       valid_lft forever preferred_lft forever

[root@node13 ~]# ip netns exec fip-4f2774d1-dfb8-4833-8374-806e1fc40827 ip route
default via 172.16.51.254 dev fg-86481da8-4c
169.254.126.50/31 dev fpr-a5e261f2-9 proto kernel scope link src 169.254.126.51
172.16.48.0/22 dev fg-86481da8-4c proto kernel scope link src 172.16.48.4
172.16.48.6 via 169.254.126.50 dev fpr-a5e261f2-9

[root@node13 ~]# ovs-vsctl show
8fd67521-01bc-4f6e-8750-a2501d4e1505
    Bridge br-int
        fail_mode: secure
        Port "fg-86481da8-4c"
            tag: 2
            Interface "fg-86481da8-4c"
                type: internal
        Port br-int
            Interface br-int
                type: internal
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port "tapf872f999-f9"
            tag: 4095
            Interface "tapf872f999-f9"
                type: internal
        Port "qvo616a1035-c5"
            tag: 1
            Interface "qvo616a1035-c5"
        Port int-br-ex
            Interface int-br-ex
                type: patch
                options: {peer=phy-br-ex}
        Port int-br-private
            Interface int-br-private
                type: patch
                options: {peer=phy-br-private}
        Port "qr-9edb3525-5a"
            tag: 1
            Interface "qr-9edb3525-5a"
                type: internal
    Bridge br-tun
        fail_mode: secure
        Port "vxlan-ac103409"
            Interface "vxlan-ac103409"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="172.16.52.23", out_key=flow, remote_ip="172.16.52.9"}
        Port br-tun
            Interface br-tun
                type: internal
        Port "vxlan-ac103418"
            Interface "vxlan-ac103418"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="172.16.52.23", out_key=flow, remote_ip="172.16.52.24"}
        Port "vxlan-ac103403"
            Interface "vxlan-ac103403"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="172.16.52.23", out_key=flow, remote_ip="172.16.52.3"}
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
    Bridge br-ex
        Port br-ex
            Interface br-ex
                type: internal
        Port "em1.193"
            Interface "em1.193"
        Port phy-br-ex
            Interface phy-br-ex
                type: patch
                options: {peer=int-br-ex}
    Bridge br-private
        Port phy-br-private
            Interface phy-br-private
                type: patch
                options: {peer=int-br-private}
        Port br-private
            Interface br-private
                type: internal
        Port "em1.1070"
            Interface "em1.1070"
    ovs_version: "2.5.0"

Brian Haley (brian-haley) wrote :

I'm stacking now to try and reproduce this with upstream master. Is that where you are seeing this?

tags: added: l3-dvr-backlog
removed: dvr floating ips
Brian Haley (brian-haley) wrote :

I couldn't reproduce this on a single-node devstack, arping ran successfully.

Can you make sure you have the iputils-arping package installed and not arping? That could possibly cause this as well since the arguments are different.

Jack Ivanov (gunph1ld) wrote :

iputils provides arping package in Centos 7

iputils-20121221-7.el7.x86_64 : Network monitoring tools including ping
Repo : base
Matched from:
Filename : /sbin/arping

Brian Haley (brian-haley) wrote :

Hi Evgeniy, just had to check on the arping binary being used.

So if you run:

# ip netns exec fip-4f2774d1-dfb8-4833-8374-806e1fc40827 arping -A -I fg-86481da8-4c -c 3 -w 4.5 172.16.48.6

does it give the same error? It's probably easier to track down that failure first to see if we can then apply results to neutron.

And is this with a recent master?

Changed in neutron:
assignee: nobody → Brian Haley (brian-haley)
status: New → Incomplete
Jack Ivanov (gunph1ld) wrote :

Hi Brian

same error:
# ip netns exec fip-4f2774d1-dfb8-4833-8374-806e1fc40827 arping -A -I fg-86481da8-4c -c 3 -w 4.5 172.16.48.6
bind: Cannot assign requested address

Packages:
openstack-neutron-ml2-8.1.2-1.el7.noarch
python-neutron-8.1.2-1.el7.noarch
openstack-neutron-common-8.1.2-1.el7.noarch
python-neutron-lib-0.0.2-1.el7.noarch
openstack-neutron-openvswitch-8.1.2-1.el7.noarch
python-neutronclient-4.1.1-2.el7.noarch
openstack-neutron-8.1.2-1.el7.noarch

Brian Haley (brian-haley) wrote :

Sorry, I don't know what bits are in those Centos 7 packages, although it looks like Mitaka based on the 8.1.2 version. But I'm not sure this is a Neutron issue yet.

This could be an issue with your kernel. I know from your output that ip_nonlocal_bind is being set to 1, so arping should be able to run properly given you don't have that IP configured inside the namespace. As an experiment you could try adding another IP from that subnet manually, then try running arping for it. If that works then it's something with nonlocal-bind in your kernel.

My test system is running 4.4.0-31, but I can't say I've seen this issue before, so you might need to do some kernel debugging to figure out why it's failing. Doing 'strace arping...' might help find what the arguments are to bind() to help track that down.

Brian Haley (brian-haley) wrote :

BTW, if you are on the openstack-neutron IRC channel on freenode I am haleyb, it might make for quicker debugging than trading posts in this bug.

Jack Ivanov (gunph1ld) wrote :

Hello, Brian

Seems you are right.

The problem disappear when I changed the kernel to 3.10.0-327.18.2.el7.x86_64 and the problem is appear for the kernel 3.10.0-327.22.2.el7.x86_64

Jack Ivanov (gunph1ld) wrote :

Let me check more closer and I will post here my results

Jack Ivanov (gunph1ld) wrote :

Yes, it is working with the old kernel.
I've made an issue here - https://bugs.centos.org/view.php?id=11238
Thanks for your help, Brian, please, close the ticket.

Changed in neutron:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.