R2.0 to R2.1 upgrade: vrouter failed to come up after upgrade

Bug #1419202 reported by Shweta Naik
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.1
Fix Committed
Critical
Shweta Naik
Trunk
Fix Committed
Critical
Shweta Naik

Bug Description

I did a upgrade from R2.0 (ubuntu 12.04 icehouse) to R2.10 (build 23). I have multimode setup with 2 computes.
After the vrouter upgrade when the reboot was issued on both the computes, one of the compute failed to come up.
I am not able to do ssh using the mgmt ip. Logged in via the console, contrail-status shows ‘contrail-vrouter-agetnt’ in initializing state, with XMPP connections down.

From the console I am not able to ping the default Gw . I see in dropstats Discards count increasing.
I tried to restart the discovery service and then restarted the supervisor-vrouter, but didn’t help.

From the upgrade log this is the time stamp when reboot was issued,

2015-02-03 18:18:11:342351: [root@10.84.14.35] Executing task 'compute_reboot'
2015-02-03 18:18:11:537787: [root@10.84.14.36] Executing task ‘compute_reboot'

But in 10.84.14.35 (this is the compute with the issue) in syslog I see vrouter soft reset before this time stamp, at Feb 3, 18:18:08
Also I see some errors in agent log when its trying to add the route.

Upgrade logs are at root@10.84.14.31:/opt/contrail/utils/upgrade_contrail_2015_02_18_09_51_935032.log .

Another compute node in the cluster is up and running. (10.84.14.36)
I had hit this issue some time back also during upgrade(not sure from which release), but when I did reboot from console vrouter came up fine and in rerun I didn’t hit this issue again.
I am hitting this issue intermittently. Could you please take a look, I have the setup in failed state.

Revision history for this message
Shweta Naik (stnaik) wrote :

From: Anand H Krishnan <email address hidden>
Date: Wednesday, February 4, 2015 at 10:06 PM
To: Shweta Naik <email address hidden>, Praveen K V <email address hidden>, Hari Prasad Killi <email address hidden>
Cc: Ashish Ranjan <email address hidden>, Raj Reddy <email address hidden>, Megh Bhatt <email address hidden>, Rajagopalan Sivaramakrishnan <email address hidden>
Subject: Re: R2.1 vrouter failed to come up after upgrade

​Hi,

I just looked into the setup. The default vrf routes have not been populated
and hence vrouter is dropping the packet. Somebody from agent team needs
to look at the problem. I have stopped agent for now and hence direct access
to the box is possible.

Thanks,
Anand

Revision history for this message
Shweta Naik (stnaik) wrote :

From: Rajagopalan Sivaramakrishnan <email address hidden>
Date: Wednesday, February 4, 2015 at 10:16 PM
To: Anand H Krishnan <email address hidden>, Shweta Naik <email address hidden>, Praveen K V <email address hidden>, Hari Prasad Killi <email address hidden>
Cc: Ashish Ranjan <email address hidden>, Raj Reddy <email address hidden>, Megh Bhatt <email address hidden>
Subject: Re: R2.1 vrouter failed to come up after upgrade

The default route was probably not created because of netlink errors (as seen in agent log). Perhaps there is a Sandesh mismatch as a result of agent being restarted with the upgraded version, but vrouter remaining the old version (upgrade logs mention that reboot failed).

Raja

Revision history for this message
Shweta Naik (stnaik) wrote :

From: Rajagopalan Sivaramakrishnan <email address hidden>
Date: Friday, February 6, 2015 12:58 PM
To: Anand H Krishnan <email address hidden>, Shweta Naik <email address hidden>, Ashish Ranjan <email address hidden>
Cc: Praveen K V <email address hidden>, Hari Prasad Killi <email address hidden>, Raj Reddy <email address hidden>, Megh Bhatt <email address hidden>
Subject: Re: R2.1 vrouter failed to come up after upgrade

The problem seems to be that the agent is restarted as part of the upgrade process (after the agent binary is replaced) and if it restarts successfully, network access to the box is lost if the new agent is incompatible with the old vrouter module. This appears to be timing dependent. In the working scenario, the compute node reboot is triggered before the agent restarts successfully i.e. before it does a soft reset of vrouter. This is from 10.84.14.36 (working compute node)

2015-02-03 18:18:07,008 INFO supervisord started with pid 4727
2015-02-03 18:18:08,010 INFO spawned: 'contrail-vrouter-nodemgr' with pid 4773
2015-02-03 18:18:08,010 INFO spawned: 'contrail-vrouter-agent' with pid 4774
Reboot was triggered at 18:18:11 (before agent had been up for startsecs value of 5 seconds)

The failed compute node (10.84.14.35) has similar logs, however the following is printed in supervisord-vrouter.log at 18:18:07.

contrail-vrouter-agent entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)

Also, syslog indicates that vrouter soft reset was performed at 18:18:08 i.e. 3 seconds before compute node reboot was issued. However, the reboot command never reached the compute node because of loss of network connectivity (because new agent is incompatible with old vrouter module).

The right fix would be for agent to not do a vrouter soft reset if the versions are mismatched (so vrouter xconnect ensures that network connectivity is not lost). Also, upgrade can avoid restarting the agent after the agent binary is replaced with the new version (however, we will still have a problem if agent crashes after the agent binary is replaced, so it would be better if the agent avoids soft reset due to version mismatch).

Raja

Revision history for this message
Shweta Naik (stnaik) wrote :

From: Hari Prasad Killi <email address hidden>
Date: Friday, February 6, 2015 at 3:37 AM
To: Rajagopalan Sivaramakrishnan <email address hidden>, Anand H Krishnan <email address hidden>, Shweta Naik <email address hidden>, Ashish Ranjan <email address hidden>
Cc: Praveen K V <email address hidden>, Raj Reddy <email address hidden>, Megh Bhatt <email address hidden>
Subject: Re: R2.1 vrouter failed to come up after upgrade

Hi Shweta,
As we don't have a mechanism currently to identify incompatibility between vrouter and agent (within release or across releases), could we currently use --no-restart-on-upgrade option of dh_installinit for supervisor-vrouter. Can you check if this works ?

Regards,
Hari

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/7175
Committed: http://github.org/Juniper/contrail-packages/commit/cf65334c00ae2d9ffc0f3ed36c5c0a39491e2e7b
Submitter: Zuul
Branch: R2.1

commit cf65334c00ae2d9ffc0f3ed36c5c0a39491e2e7b
Author: Shweta Naik <email address hidden>
Date: Fri Feb 6 17:27:52 2015 -0800

Adding --no-restart-on-upgrade option of dh_installinit for supervisor-vrouter to prevent
agent from restarting when doing upgrade.
closed bug: #1419202

Change-Id: Iab2408f9702ae9784b1dc4540a5ae50a2be128ec

Shweta Naik (stnaik)
summary: - R2.0 to R2.1 upgrade: grouter failed to come up after upgrade
+ R2.0 to R2.1 upgrade: vrouter failed to come up after upgrade
wenqing liang (wliang)
tags: added: blocker
Revision history for this message
wenqing liang (wliang) wrote :

Hi Shweta,
This is still happening even in the r2.0 upgrade to the latest r2.1 #29 as well. The bug needs to be reopened.

root@cmbu-vse2100-10:~# contrail-version
Package Version Build-ID | Repo | Package Name
-------------------------------------- ------------------------------ ----------------------------------
contrail-fabric-utils 2.10-29 29
contrail-install-packages 2.10-29~icehouse 29
contrail-lib 2.10-29 29
contrail-nodemgr 2.10-29 29
contrail-nova-vif 2.10-29 29
contrail-openstack-vrouter 2.10-29 29
contrail-setup 2.10-29 29
contrail-utils 2.10-29 29
contrail-vrouter-3.13.0-34-generic 2.10-29 29
contrail-vrouter-agent 2.10-29 29
contrail-vrouter-common 2.10-29 29
contrail-vrouter-init 2.10-29 29
contrail-vrouter-utils 2.10-29 29
nova-common 1:2014.1.3-0ubuntu1~cloud0.2contrail29
nova-compute 1:2014.1.3-0ubuntu1~cloud0.2contrail29
nova-compute-kvm 1:2014.1.3-0ubuntu1~cloud0.2contrail29
nova-compute-libvirt 1:2014.1.3-0ubuntu1~cloud0.2contrail29
python-contrail 2.10-29 29
python-contrail-vrouter-api 2.10-29 29
python-neutronclient 2:2.3.4-0ubuntu1.2contrail 29
python-nova 1:2014.1.3-0ubuntu1~cloud0.2contrail29
python-opencontrail-vrouter-netns 2.10-29 29
root@cmbu-vse2100-10:~# contrail-status
== Contrail vRouter ==
supervisor-vrouter: active
contrail-vrouter-agent initializing (XMPP:control-node:10.87.129.173, XMPP:dns-server:10.87.129.173, Collector, Discovery:Collector, Discovery:dns-server, Discovery:xmpp-server connection down)
contrail-vrouter-nodemgr active

== Contrail Storage ==
contrail-storage-stats: active
root@cmbu-vse2100-10:~#

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/7855
Committed: http://github.org/Juniper/contrail-packages/commit/dc6afd7fdde756f0e23ef1a58f77e52fa852e741
Submitter: Zuul
Branch: master

commit dc6afd7fdde756f0e23ef1a58f77e52fa852e741
Author: Shweta Naik <email address hidden>
Date: Wed Feb 25 17:38:28 2015 -0800

In upgrade python-requests pkg is not getting upgraded.
clloses bug: #1424086
Adding --no-restart-on-upgrade option of dh_installinit for supervisor-vrouter to prevent
agent from restarting when doing upgrade.
closed bug: #1419202

Change-Id: I967d6699a1d9f7d08da9d4bcc5822b087309aa2e

information type: Proprietary → Public
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.