packet drop between contrail and mx

Bug #1557539 reported by Slobodan Blatnjak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
New
Undecided
Unassigned

Bug Description

Customer has packet drop between contrail and mx router. It works in the same setup when contrail is installed on VM on the same server.
There were no signs of packet loss on eth1 – only after the IP address was moved from eth1 to vhost0 by the installation script this problem appeared.

Brief description:
------------------------------------------------------------------------------------------------------
Contrail 2.21 all in one setup on Ubuntu 14.04.01 (dpkg –i /tmp/contrail-install-packages_2.21-102-ubuntu-14-04icehouse_all.deb).
Testbed (attached):
#Management ip addresses of hosts in the cluster
host1 = 'root@172.21.61.32'
ext_routers = [('mx1', '172.21.64.251'), ('mx2', '172.21.64.252')]
control_data = {
    host1 : { 'ip': '172.21.64.32/24', 'gw' : '172.21.64.1', 'device': 'eth1'},
}

Environment:

1) In case of Bare Metal installation CCP-HOST is a UCS B200 Server that is hosted inside Cisco UCS 5108 chassis. Interfaces eth0 and eth1 are emulated by the chassis and are transparently mapped to the external ports. These ports are then connected to UCS Fabric Interconnect (It is the classical way to connect the UCS 5108 chassis to external devices and in general UCS 5108 and UCS Fabric Interconnect can be considered a single device). UCS Fabric Interconnect connects to Juniper Q Fabric.
ping from 172.21.64.32 to 172.21.64.251 fails with 'no response found'
ping -c 10000 -f 172.21.64.251
--- 172.21.64.251 ping statistics ---
10000 packets transmitted, 10 received, 99% packet loss, time 120759ms

ping -c 10000 -f 172.21.64.251 -s 1473
PING 172.21.64.251 (172.21.64.251) 1473(1501) bytes of data.
--- 172.21.64.251 ping statistics ---
10000 packets transmitted, 6095 received, 39% packet loss, time 51631ms
rtt min/avg/max/mdev = 0.475/0.630/17.244/0.778 ms, pipe 2, ipg/ewma 5.163/0.517 ms

There were no signs of packet loss on eth1 that time – only after the IP address was moved from eth1 to vhost0 by the installation script this problem appeared.

2) In case of VM installation, ESXi is hosted on one of UCS B200 Blades that is installed in the same UCS 5108 as the Blade that was used for Bare Metal installation. Hence the rest of the network is completely the same.
Communication with MX (172.21.64.251) works.

Attached are some outputs from customer deployment.
drop_capture.pcap (ping -c 10000 -f 172.21.64.251) - Wireshark filter in order to see which packets were in fact answered: (icmp) && !(_ws.expert.severity == 0x00600000)
drop_capture_1501 (1).pcap.gz (ping -c 10000 -f 172.21.64.251 -s 1473) - %
interface stats.txt (contains output of the ifconfig; vif --list, dropstats, uname -r commands)

Revision history for this message
Slobodan Blatnjak (sblatnjak) wrote :
Revision history for this message
Slobodan Blatnjak (sblatnjak) wrote :
Revision history for this message
Slobodan Blatnjak (sblatnjak) wrote :
Revision history for this message
Slobodan Blatnjak (sblatnjak) wrote :
Revision history for this message
Slobodan Blatnjak (sblatnjak) wrote :
Revision history for this message
Slobodan Blatnjak (sblatnjak) wrote :
information type: Proprietary → Public
Revision history for this message
Slobodan Blatnjak (sblatnjak) wrote :

There is no packet drops when vrouter agent is turned off (service contrail-vrouter-agent stop).

By looking at attached eth1_on.pcap and vhost_on.pcap I can see that the Echo Request is sent on vhost interface but it doesn't reach eth1 interface. But when vrouter agent is turned off it works.
Below are summary results when agent in turned on/off:
1. vrfstats --dump shows no discards on Vrf:0 in both cases (files vrfstats_on and vrfstats_off)
2. dropstats
   a. shows increasing value for Invalid NH when agent is on
   b. the same value for Invalid NH when agent is off
3. ping
   a. 1000 packets transmitted, 6 received, 99% packet loss, time 107204ms, when agent is on (file ping_on.txt)
   b. 190 packets transmitted, 190 received, 0% packet loss, time 18943ms, when agent is off (file ping_off.txt)

All files (dropstat, eth.pcap, vhost.pcap, ping & vrfstats) are attached in file command_outputs.tar. other_outputs.txt contains vrouter --info, nh --list, flow -l and mpls --dump results.

Please let me know if you need more information.

Revision history for this message
Slobodan Blatnjak (sblatnjak) wrote :
Revision history for this message
Slobodan Blatnjak (sblatnjak) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.