No ARP response on hapr-p from gateway router

Bug #1488925 reported by joern@tel2ip.net
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel Library (Deprecated)
6.1.x
Invalid
High
MOS Maintenance
7.0.x
Invalid
High
Fuel Library (Deprecated)
8.0.x
Invalid
High
Fuel Library (Deprecated)

Bug Description

Using Fuel 6.1.0

1. Create a new Cluster
2. Use "Juno on Ubuntu 14.04.01 (default)"
3. Neutron with GRE segmentation

Using 5 Nodes (3x Controll, 1x Compute, 1x Cinder)
All Nodes have 4 NIC design:
eth0: Admin/PXE (native VLAN)
eth1: Public (Tagged VLAN 24)
eth2: Storage/Management (Tagged VLAN)
eth3: Private (Tagged VLAN)

Public Network
IP Range: 172.22.24.202 - 172.22.24.220
CIDR: 172.22.24.0/24
Use VLAN Tagging: 24
Gateway: 172.22.24.1
Floating IP: 172.22.24.221 - 172.22.24.230

4. Network Verification succeeded
5. The deployment passes successful!

### Controll-Node ###

#Verify the VIP interface of the public network
#
$ ip netns exec haproxy ifconfig hapr-p
hapr-p Link encap:Ethernet HWaddr 22:d0:a4:3f:7e:b9
          inet addr:172.22.24.202 Bcast:0.0.0.0 Mask:255.255.255.0
          inet6 addr: fe80::20d0:a4ff:fe3f:7eb9/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:414821 errors:0 dropped:3 overruns:0 frame:0
          TX packets:48012 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:32576725 (32.5 MB) TX bytes:4957782 (4.9 MB)

#Verify the arp table:
#
$ ip netns exec haproxy arp -n
Address HWtype HWaddress Flags Mask Iface
172.22.24.207 ether 00:50:56:83:6e:01 C hapr-p
172.22.24.204 ether 00:50:56:83:07:ba C hapr-p
172.22.24.1 (incomplete) hapr-p
172.22.24.205 ether 00:50:56:83:6f:ad C hapr-p
240.0.0.1 ether da:7e:47:60:de:bb C hapr-ns
172.22.24.222 ether fa:16:3e:24:20:df C hapr-p
172.22.24.223 ether fa:16:3e:24:20:df C hapr-p
172.22.24.2 ether 10:0e:7e:91:1f:f2 C hapr-p
172.22.24.3 ether 10:0e:7e:90:c7:f2 C hapr-p

###
#172.22.24.1 is a VRRP of the gateway on a Juniper MX router
#172.22.24.2 corresponding VRRP IP
#172.22.24.3 corresponding VRRP IP
###

Next I flush the arp table
$ ip netns exec haproxy ip n flush all
and ping the gateway ip within haproxy namespace
$ ip netns exec haproxy ping 172.22.24.1 -c 3
while tracing br-ex and br-floating

$ tcpdump -i br-ex -n arp
11:01:00.754730 ARP, Request who-has 172.22.24.202 tell 172.22.24.205, length 28
11:01:00.754753 ARP, Reply 172.22.24.202 is-at 22:d0:a4:3f:7e:b9, length 28
11:01:05.247381 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 28
11:01:05.247650 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 46
11:01:05.247885 ARP, Reply 172.22.24.1 is-at 00:00:5e:00:01:01, length 46
11:01:06.246707 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 28
11:01:06.246923 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 46
11:01:06.247277 ARP, Reply 172.22.24.1 is-at 00:00:5e:00:01:01, length 46
11:01:07.246697 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 28
11:01:07.246833 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 46
11:01:07.247163 ARP, Reply 172.22.24.1 is-at 00:00:5e:00:01:01, length 46
11:01:13.353402 ARP, Request who-has 172.22.24.100 tell 172.22.24.2, length 46
11:01:13.353441 ARP, Request who-has 172.22.24.100 tell 172.22.24.2, length 46
11:01:14.877406 ARP, Request who-has 172.22.24.1 tell 172.22.24.8, length 46
11:01:14.877437 ARP, Request who-has 172.22.24.1 tell 172.22.24.8, length 46

$ tcpdump -i br-floating -n arp
11:01:05.247524 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 28
11:01:05.247650 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 46
11:01:06.246707 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 28
11:01:06.246923 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 46
11:01:07.246697 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 28
11:01:07.246833 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 46
11:01:13.353523 ARP, Request who-has 172.22.24.100 tell 172.22.24.2, length 46
11:01:13.353538 ARP, Request who-has 172.22.24.100 tell 172.22.24.2, length 46
11:01:14.877543 ARP, Request who-has 172.22.24.1 tell 172.22.24.8, length 46

###
I never see an arp response for 172.22.24.1 on br-floating and the arp table within the haproxy namespace never learn the MAC of the gateway.

PS:
I did the same ping test while tracing br-ex-hapr and I do not see an arp reply for my gateway IP, but for any other!!
(different timestamp, but same scenario)

$ tcpdump -i br-ex-hapr -n arp
06:50:37.090717 ARP, Reply 172.22.24.202 is-at 22:d0:a4:3f:7e:b9, length 28
06:51:03.778702 ARP, Request who-has 172.22.24.202 tell 172.22.24.205, length 28
06:51:03.778723 ARP, Reply 172.22.24.202 is-at 22:d0:a4:3f:7e:b9, length 28
06:51:11.895376 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 28
06:51:11.895570 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 46
06:51:12.894736 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 28
06:51:12.894943 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 46
06:51:13.898693 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 28
06:51:13.898904 ARP, Request who-has 172.22.24.1 tell 172.22.24.202, length 46

##############
and the output of the br-ex arp table on the conroll node
##############

$ arp -n | grep br-ex
172.22.24.223 ether fa:16:3e:24:20:df C br-ex
172.22.24.181 ether fa:16:3e:b7:69:1e C br-ex
172.22.24.1 ether 00:00:5e:00:01:01 C br-ex
172.22.24.182 ether fa:16:3e:b7:69:1e C br-ex
172.22.24.2 ether 10:0e:7e:91:1f:f2 C br-ex
172.22.24.155 ether fa:16:3e:24:d8:ce C br-ex
172.22.24.183 ether fa:16:3e:be:4c:b0 C br-ex
172.22.24.3 ether 10:0e:7e:90:c7:f2 C br-ex
172.22.24.202 ether 22:d0:a4:3f:7e:b9 C br-ex
172.22.24.222 ether fa:16:3e:24:20:df C br-ex

##################################
##################################
### 2015-09-01
##################################
##################################

eth1 <--> eth1.24 <--> br-ex --<br-ex-hapr---hapr-p>--(ns haproxy

Gateway IP: 172.22.24.1
Gateway MAC: 00:00:5e:00:01:01

hapr-p IP: 172.22.24.202
hapr-p MAC: 22:d0:a4:3f:7e:b9

# brctl showmacs br-ex
port no mac addr is local? ageing timer
  1 00:00:5e:00:01:01 no 0.51
  3 22:d0:a4:3f:7e:b9 no 0.82

# brctl showstp br-ex
br-ex-hapr (3) ; Port 3
eth1.24 (1) ; Port 1

Conclusion so far:
br-ex should forward
traffic to 00:00:5e:00:01:01 (Gateway) on Port 1 (eth1.24)
and
traffic to 22:d0:a4:3f:7e:b9 (hapr-p) on Port 3 (br-ex-hapr)

next I flush the ARP table on ip namespace haproxy
# ip netns exec haproxy ip n flush all
and do a ping to the gateway fron anmespace haproxy
# ip netns exec haproxy ping 172.22.24.1
while tcpdump "br-ex" and "br-ex-hapr"

TCPDUMP ON "br-ex" shows ARP request and reply:

Who has 172.22.21.1? Tell 172.22.24.202
Sender MAC address: 22:d0:a4:3f:7e:b9 (22:d0:a4:3f:7e:b9)
Sender IP address: 172.22.24.202 (172.22.24.202)
Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
Target IP address: 172.22.24.1 (172.22.24.1)

172.22.24.1.is at 00:00:5e:00:01:01
Sender MAC address: IETF-VRRP-VRID_01 (00:00:5e:00:01:01)
Sender IP address: 172.22.24.1 (172.22.24.1)
Target MAC address: 22:d0:a4:3f:7e:b9 (22:d0:a4:3f:7e:b9)
Target IP address: 172.22.24.202 (172.22.24.202)

TCPDUMP ON br-ex-hapr shows ARP requests only:

Who has 172.22.21.1? Tell 172.22.24.202
Sender MAC address: 22:d0:a4:3f:7e:b9 (22:d0:a4:3f:7e:b9)
Sender IP address: 172.22.24.202 (172.22.24.202)
Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
Target IP address: 172.22.24.1 (172.22.24.1)

and the ping (ICMP Echo Reply) answers "Destination Host Unreachable"

The ARP request and the ARP reply are correct, but "br-ex" does not forward it even though the "br-ex" tables are correct.

################
## WORKAROUND ##
################

The ageing time of "br-ex" is set to 300.00 by default. When changing this value to "0" (change behaviour from bridge to hub) it works and the ICMP Reply went through immediately.

# brctl setageing br-ex 0

Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
Sergey Vasilenko (xenolog) wrote :

for 7.0 this story required additional checking, because VIP assignment into network namespace was re-implementing

tags: added: l23network
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

QA team, could you please reproduce for the 7.0 as Sergey asked? Setting to incomplete for now.

tags: added: vrrp
tags: removed: l23network
description: updated
Revision history for this message
joern@tel2ip.net (joern-g) wrote :

Did an update to the bug description. Added a tcpdump from "br-ex-hapr" interface (which might be more relevant)

Revision history for this message
joern@tel2ip.net (joern-g) wrote :

Update the bug description with arp tabe of br-ex on controll node.

description: updated
Revision history for this message
Sergey Vasilenko (xenolog) wrote :

I guess this bug related to features of VRRP inplementation. Need lab with Juniper (1 host + VRRP based on J.) for deep investigation.

Revision history for this message
Egor Kotko (ykotko) wrote :

I have checked the issue several times on the ISO with last changes, but could not reproduce it.

Revision history for this message
Sergey Vasilenko (xenolog) wrote :

IMHO this issue should be reproduced only if gateway reserved by VRRP.
We should investigate it and make choose between two variants:
* this issue related for any VRRP implementation
* this is Juniper-specific VRRP-related issue

Revision history for this message
joern@tel2ip.net (joern-g) wrote :

I don't think this is a Juniper specific behaviour and has nothing to do with VRRP. It is simple ARP traffic which is not forwardes correctly by the br-ex linuxbridge.
The ARP request is sent out by hapr-p and arrive at the gateway. An ARP response is sent out from the gateway, arrive at br-ex and will not be forwarded to br-ex-hapr.

I will update the bug description

Revision history for this message
joern@tel2ip.net (joern-g) wrote :

bug description updated

description: updated
Revision history for this message
Egor Kotko (ykotko) wrote :

But still I could not reproduce it without VRRP.

Changed in fuel:
assignee: Egor Kotko (ykotko) → Fuel Library Team (fuel-library)
Revision history for this message
Andrey Maximov (maximov) wrote :

Moving to 8.0 since we cannot work on high bugs in 7.0 after HCF

tags: added: tricky
Revision history for this message
joern@tel2ip.net (joern-g) wrote :

Guess I found the root cause.
My nodes are running on vmware/vcenter environment and the distributed vswitch (DVS) is used.

             |### br-ex ###|
--eth1-| Port1 Port3|--br-ex-hapr
             | |

eth1 points to the outside world and passes a DVS before it reaches the physical environment.
To get an ARP request sent out from the br-ex to the outside world, you need to enable "Forged Transmits" on the specific distributed portgroup. Otherwise the DVS will drop that request, because the source MAC is not a vmware MAC.
The ARP response from outside will never reach the NS behind port3 as long you did not enable "Promiscuous Mode", but then you flood requests on all ports with a loop as a result.

Dmitry Pyzhov (dpyzhov)
tags: added: area-library
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.