vrouter doesn't handle properly zero-tagged vlan packets

Bug #1457805 reported by Francois Eleouet
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Committed
Medium
Anand H. Krishnan
OpenContrail
Fix Committed
Medium
Anand H. Krishnan

Bug Description

The issue appeared on Cisco UCS platform using enic drivers: vrouter wasn't working any more on this platform since release 1.2.

It took a while to find the reason, as checksum issues were first incriminated (enic is the only driver using CHECKSUM_COMPLETE instead of CHECKSUM_UNNECESSARY), but it finally turned out to come from another specificity of these nics: interfaces configured to access natively to a VLAN are recieving zero-tagged frames [1]

This breaks vrouter since commit [2] as vrouter doesn't handle ARP responses properly any more: packets are either trapped to the agent with zero-vlan tag or just unhandled in recent versions [3]. This results in next-hops being maked as invalid in the fabric, which prevents any communication with other hosts in the subnet.

As a workaround, native VLAN should not be used for vrouter on UCS platform, configuring vrouter to use a vlan sub-interface fixes the issue.

[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8757446d8df4446fc7f5d24ad6d53e9f265cc021
[2] https://github.com/Juniper/contrail-vrouter/commit/63b16769ac8a78286723905157015947daf18ef0?diff=split
[3] https://github.com/Juniper/contrail-vrouter/blob/master/dp-core/vr_datapath.c#L387

Tags: vrouter
Changed in opencontrail:
importance: Undecided → Medium
Changed in opencontrail:
assignee: nobody → Anand H. Krishnan (anandhk)
Changed in juniperopenstack:
importance: Undecided → Medium
assignee: nobody → Anand H. Krishnan (anandhk)
Revision history for this message
joern@tel2ip.net (joern-g) wrote :

But if you use VLAN tag on vrouter interface which is possible within a standard Contrail setup you loose the ability to enable dpdk feature on that compute node. Contrail is not able to configure that correctly at least not with Cisco enic (1240) nor with Intel igb (I350). Conclusion: you can't use untagged traffic to access the VLAN due to zero-vlan tag issue on Cisco UCS. You can't enable dpdk acceleration while you use VLAN taged traffic due to Contrail. Seems to be a dead end street so far.

Cisco UCS always sends untagged traffic with this dummy VLAN 0 tag, even for best effort / default QoS class traffic! Perfectly standards compliant, but highly unusual!

Juniper Contrail engineering informed about flaw via cases 2016-0310-0756 and 2016-0323-0387 (both closed), Juniper not handling the problem in JTAC.

Revision history for this message
Anand H. Krishnan (anandhk) wrote :

Can you please try with the latest 3.2 build?

Changed in juniperopenstack:
status: New → In Progress
Changed in opencontrail:
status: New → In Progress
Revision history for this message
Vedamurthy Joshi (vedujoshi) wrote :
Download full text (6.0 KiB)

On R3.2 Build 1, tried the following :
Server1 : 10.204.217.194( MAC : 52:54:00:01:00:01)
Vrouter node : 10.204.217.205 ( MAC : 52:54:00:13:df:eb)

Scapy was used to generate priority-tagged traffic

When priority-tagged arp request was sent from Server1 to vrouter-node, the arp responses were seen:

>>> ls(e/dot1q/arp)
dst : DestMACField = 'ff:ff:ff:ff:ff:ff' (None)
src : SourceMACField = '52:54:00:01:00:01' (None)
type : XShortEnumField = 33024 (0)
--
prio : BitField = 5 (0)
id : BitField = 0 (0)
vlan : BitField = 0 (1)
type : XShortEnumField = 2054 (0)
--
hwtype : XShortField = 1 (1)
ptype : XShortEnumField = 2048 (2048)
hwlen : ByteField = 6 (6)
plen : ByteField = 4 (4)
op : ShortEnumField = 1 (1)
hwsrc : ARPSourceMACField = '52:54:00:01:00:01' (None)
psrc : SourceIPField = '10.204.217.194' (None)
hwdst : MACField = '00:00:00:00:00:00' ('00:00:00:00:00:00')
pdst : IPField = '10.204.217.205' ('0.0.0.0')
>>>

As seen on vrouter node :
12:43:06.193662 52:54:00:01:00:01 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 60: vlan 0, p 5, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.204.217.205 tell 10.204.217.194, length 42
 0x0000: ffff ffff ffff 5254 0001 0001 8100 a000
 0x0010: 0806 0001 0800 0604 0001 5254 0001 0001
 0x0020: 0acc d9c2 0000 0000 0000 0acc d9cd 0000
 0x0030: 0000 0000 0000 0000 0000 0000
12:43:06.193759 52:54:00:13:df:eb > 52:54:00:01:00:01, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.204.217.205 is-at 52:54:00:13:df:eb, length 28
 0x0000: 5254 0001 0001 5254 0013 dfeb 0806 0001
 0x0010: 0800 0604 0002 5254 0013 dfeb 0acc d9cd
 0x0020: 5254 0001 0001 0acc d9c2

——————————————————————————————————————————————————————————————————

When priority-tagged arp request was sent from vrouter-node to Server1 also, the arp responses were seen:
>>> sendp(e/dot1q/arp)
.
Sent 1 packets.
>>> ls(e/dot1q/arp)
dst : DestMACField = 'ff:ff:ff:ff:ff:ff' (None)
src : SourceMACField = '52:54:00:13:df:eb' (None)
type : XShortEnumField = 33024 (0)
--
prio : BitField = 6 (0)
id : BitField = 0 (0)
vlan : BitField = 0 (1)
type : XShortEnumField = 2054 (0)
--
hwtype : XShortField = 1 (1)
ptype : XShortEnumField = 2048 (2048)
hwlen : ByteField = 6 (6)
plen : ByteField = 4 (4)
op : ShortEnumField = 1 (1)
hwsrc : ARPSourceMACField = '52:54:00:13:df:eb' (None)
psrc : SourceIPField = '10.204.217.205' (None)
hwdst : MACField = '00:00:00:00:00:00' ('00:00:00:00:00:00')
pdst : IPField ...

Read more...

Changed in opencontrail:
status: In Progress → Fix Committed
Changed in juniperopenstack:
status: In Progress → Fix Committed
milestone: none → r3.2.0.0-fcs
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.