Comment 1 for bug 1692795

Revision history for this message
Sandeep Sridhar (ssandeep) wrote :

Hi Manish,

  Their test procedure is as below:

The environment is installed clean (3 Control Node, 4 TSN, 128 ToR-Agent, 4 Compute Node, 1 Openstack Node) as following.
After that, they go ahead provisioning LIF and pass traffic.

===================================================================================================
A1-1 Setup Procedure
cd /opt/contrail/utils/
fab install_pkg_all:/tmp/contrail-install-packages_3.1.3.0-73~mitaka_all.deb
fab upgrade_kernel_all
fab install_contrail
fab setup_all

A1-2
They encounter an issue with TSN being unstable after reboot (due to bond settings that you guys worked with Mehul and resolved it)

A1-3
execute "service supervisor-vrouter restart" at 4 TSN Node.

A1-5
Modify following Params
(TSN&Compute)
/etc/contrail/contrail-*-agent*.conf
headless_mode = true

/etc/contrail/supervisord_vrouter.conf
environment=TBB_THREAD_COUNT = 8

(TSN)
/etc/modprobe.d/vrouter.conf
options vrouter vr_mpls_labels=256000 vr_nexthops=521000 vr_vrfs=65536 vr_bridge_entries=1000000

(Compute)
/etc/modprobe.d/vrouter.conf
options vrouter vr_mpls_labels=11520 vr_flow_entries=2097152

/etc/contrail/contrail-vrouter-agent.conf
[DEFAULT]
flow_cache_timeout = 60
disable_flow_collection = True
[FLOWS]
max_vm_flows = 45

remove virbr0
virsh net-destroy default
virsh net-autostart default --disable

A1-6
All contrail-server were rebooted (excuted "shutdown -r now").
===================================================================================================
They believe this issue is due to the POST they do (around 40 posts per second and there would be 10 sessions). The POST messages is more of creating virtual-networks, virtual-machines etc.

The logs below should help:

ssh root@10.219.48.123
password:Jtaclab123

[root@LocalStorage coreCollectedMay26]# pwd
/home/ssandeep/2017-0424-0113/coreCollectedMay26
[root@LocalStorage coreCollectedMay26]# ls -lrt
total 382604
-rw-rw-r--. 1 1001 1001 1181918 May 23 08:05 20170523-pt008.log
-rw-rw-r--. 1 1001 1001 1181918 May 23 08:05 20170523-pt009.log
-rw-rw-r--. 1 1001 1001 3549981 May 23 08:13 20170523-pt002.log
-rw-rw-r--. 1 1001 1001 4729948 May 23 08:18 20170523-pt004.log
-rw-rw-r--. 1 1001 1001 4805984 May 23 08:19 20170523-pt001.log
-rw-rw-r--. 1 1001 1001 4805982 May 23 08:20 20170523-pt003.log
-rw-rw-r--. 1 1001 1001 7824423 May 23 08:34 20170523-pt007.log
-rw-rw-r--. 1 1001 1001 11948881 May 23 08:37 20170523-pt006.log
-rw-rw-r--. 1 1001 1001 11848002 May 23 08:40 20170523-pt011.log
-rw-rw-r--. 1 1001 1001 11850306 May 23 08:40 20170523-pt010.log
-rw-r--r--. 1 root root 264286785 May 26 21:07 20170526_JN-323_tor-agent-21-core.zip
-rw-r--r--. 1 root root 63744000 May 26 21:13 20170523_JN-323_post.tar

all *-pt*.log indicates the POST they are doing during which this issue occurs. This might give you some hint as to what could be resulting in this problem.

The zip file 20170526_JN-323_tor-agent-21-core.zip has the tor-agent and tsn core we collected on openc-36. I will unicast you my notes which has VNI for your reference.

Please let me know when the binary is ready.

Greetings,
Sandeep.