contrail-vrouter-agent freezes vhost0 interface

Bug #1401880 reported by Johannes Grassler
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Committed
High
Anand H. Krishnan
R1.1
Fix Committed
Undecided
Anand H. Krishnan

Bug Description

Summary
=======

 We are observing a complete network freeze on the vhost0 interface of Contrail
 compute nodes once contrail-vrouter-agent is started. Which is to say, the
 vhost0 interface still exists and retains its IP address and routing table
 entries, but all connections timeout. This continues to be the case even after
 stopping contrail-vrouter-agent. Interfaces other than vhost0 are not affected.

 Additionally, we experience a reproducible Kernel crash when attempting to
 recover by unloading the vrouter Kernel module.

Environment
===========

  Hardware: HP ProLiant DL380p Gen8, Broadcom (bnx2x) NICs. The bnx2x NICs are
  slaved together in a bond0 interface that is used as vhost0's backing
  interface.

  System: Ubuntu 14.04 LTS
  Kernel: 3.13.0-41-generic

  Contrail packages:

    | ii contrail-lib 1.20-1+syseleven21 amd64 OpenContrail libraries
    | ii contrail-nova-driver 1.20-1+syseleven21 amd64 OpenStack Nova compute-node driver for OpenContrail
    | ii contrail-vrouter-agent 1.20-1+syseleven21 amd64 OpenContrail vrouter agent
    | ii contrail-vrouter-dkms 1.20-1+syseleven21 all OpenContrail VRouter - DKMS version
    | ii contrail-vrouter-utils 1.20-1+syseleven21 amd64 OpenContrail VRouter - Utilities
    | ii python-backports.ssl-match-hostname 3.4.0.2-1contrail1 all The ssl.match_hostname() function from Python 3.4
    | ii python-certifi 1.0.1-1contrail1 all Python SSL Certificates
    | ii python-contrail 1.20-1+syseleven21 all OpenContrail python-libs
    | ii python-contrail-vrouter-api 1.20-1+syseleven21 all OpenContrail vrouter agent api
    | ii python-geventhttpclient 1.1.0-1contrail1 amd64 http client library for gevent
    | ii python-opencontrail-vrouter-netns 1.20-1+syseleven21 amd64 OpenContrail vrouter network namespace package
    | ii python-pycassa 1.11.0-1contrail2 all Client library for Apache Cassandra

Steps to reproduce
==================

Note: we have only been able to reproduce this problem one of our contrail
instances. We are running another contrail instance on HP Gen9 machines with
virtually identical configuration (the working instance does not use VLAN
tagging on bond0) and package versions that is not affected.

1. Bring up vhost0 interface (on bond0):

      # vif --create vhost0 --mac $(cat /sys/class/net/bond0/address)
      # vif --add bond0.1621 --mac $(cat /sys/class/net/bond0/address) --vrf 0 --vhost-phys x --type physical
      # vif --add vhost0 --mac $(cat /sys/class/net/bond0/address) --vrf 0 --
      # dhclient vhost0

2. Start contrail-vrouter-agent

      # service contrail-vrouter-agent start

[Network connectivity through vhost0 drops out at this point, so switch to a serial console]

3. Stop contrail-vrouter-agent

      # service contrail-vrouter-agent stop

[Network connectivity through vhost0 continues to be down]

4. Deconfigure vhost0

     # ifconfig vhost0 0.0.0.0
     # ifconfig vhost0 down

5. Remove vrouter kernel module

    # rmmod vrouter

At this point the kernel crash happens (see attached dump).

Revision history for this message
Johannes Grassler (jgr-launchpad) wrote :
information type: Proprietary → Public
Revision history for this message
Johannes Grassler (jgr-launchpad) wrote :

We narrowed down the culprit to the VLAN tagging on the bond0 interface: we switched the machines to an access port which solved the problem for now.

VLAN tagging used to work for us in 1.06, so probably the breaking change was probably introduced in 1.10 or 1.20 (we tried 1.20 and 1.99+git+9917937-1+syseleven6 and they both broke in the manner described above).

Pedro Marques (5-roque)
Changed in juniperopenstack:
assignee: nobody → Anand H. Krishnan (anandhk)
importance: Undecided → High
Changed in juniperopenstack:
status: New → In Progress
Revision history for this message
Anand H. Krishnan (anandhk) wrote :

Problem understood

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : R1.10

Review in progress for https://review.opencontrail.org/8531
Submitter: Anand H. Krishnan (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/8531
Committed: http://github.org/Juniper/contrail-vrouter/commit/8532c2e37ac0a0e24287a5a0a0deb00ec294997e
Submitter: Zuul
Branch: R1.10

commit 8532c2e37ac0a0e24287a5a0a0deb00ec294997e
Author: Anand H. Krishnan <email address hidden>
Date: Mon Mar 23 12:41:41 2015 +0530

Strip vlan tags before sending to vhost interface

Packets that are sent to vhost with tags will be dropped by the OS.
Hence, tags need to be stripped from such packets before sending it
to vhost interface.

Change-Id: I7d8ee96ee392b7f2ab61d5aca437c4124556e5d0
Closes-Bug: #1401880

Revision history for this message
Anand H. Krishnan (anandhk) wrote :
Changed in juniperopenstack:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.