contrail-vrouter-agent freezes vhost0 interface
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Juniper Openstack |
Fix Committed
|
High
|
Anand H. Krishnan | ||
R1.1 |
Fix Committed
|
Undecided
|
Anand H. Krishnan |
Bug Description
Summary
=======
We are observing a complete network freeze on the vhost0 interface of Contrail
compute nodes once contrail-
vhost0 interface still exists and retains its IP address and routing table
entries, but all connections timeout. This continues to be the case even after
stopping contrail-
Additionally, we experience a reproducible Kernel crash when attempting to
recover by unloading the vrouter Kernel module.
Environment
===========
Hardware: HP ProLiant DL380p Gen8, Broadcom (bnx2x) NICs. The bnx2x NICs are
slaved together in a bond0 interface that is used as vhost0's backing
interface.
System: Ubuntu 14.04 LTS
Kernel: 3.13.0-41-generic
Contrail packages:
| ii contrail-lib 1.20-1+syseleven21 amd64 OpenContrail libraries
| ii contrail-
| ii contrail-
| ii contrail-
| ii contrail-
| ii python-
| ii python-certifi 1.0.1-1contrail1 all Python SSL Certificates
| ii python-contrail 1.20-1+syseleven21 all OpenContrail python-libs
| ii python-
| ii python-
| ii python-
| ii python-pycassa 1.11.0-1contrail2 all Client library for Apache Cassandra
Steps to reproduce
==================
Note: we have only been able to reproduce this problem one of our contrail
instances. We are running another contrail instance on HP Gen9 machines with
virtually identical configuration (the working instance does not use VLAN
tagging on bond0) and package versions that is not affected.
1. Bring up vhost0 interface (on bond0):
# vif --create vhost0 --mac $(cat /sys/class/
# vif --add bond0.1621 --mac $(cat /sys/class/
# vif --add vhost0 --mac $(cat /sys/class/
# dhclient vhost0
2. Start contrail-
# service contrail-
[Network connectivity through vhost0 drops out at this point, so switch to a serial console]
3. Stop contrail-
# service contrail-
[Network connectivity through vhost0 continues to be down]
4. Deconfigure vhost0
# ifconfig vhost0 0.0.0.0
# ifconfig vhost0 down
5. Remove vrouter kernel module
# rmmod vrouter
At this point the kernel crash happens (see attached dump).
Changed in juniperopenstack: | |
assignee: | nobody → Anand H. Krishnan (anandhk) |
importance: | Undecided → High |
Changed in juniperopenstack: | |
status: | New → In Progress |
We narrowed down the culprit to the VLAN tagging on the bond0 interface: we switched the machines to an access port which solved the problem for now.
VLAN tagging used to work for us in 1.06, so probably the breaking change was probably introduced in 1.10 or 1.20 (we tried 1.20 and 1.99+git+ 9917937- 1+syseleven6 and they both broke in the manner described above).