On a HA setup, rebooted the control node one at a time , all the configs were deleted from the QFX by the controller permanently

Bug #1484788 reported by Anoop Kumar Sahu
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Ignatious Johnson Christopher
Trunk
Fix Committed
High
Ignatious Johnson Christopher

Bug Description

On a HA setup(3-CN/3-TSN),

Step 1:
Rebooted CN-1. waited it to come online and active

Step 2:
Rebooted CN-2. Waited it to come online and active

Step 3:

Rebooted CN-3. Waited it to com online and active

After effect of above 3 steps lead to all the configs were deleted on the QFX side by the Controller. It has been almost 5-6 hours and still the configs are not pushed to the QFX TORs with tor agent active and all SSL connection intact . Please refer to the tor-agent-1 which has around 3K VNs. Attached is the http introspect page + ovsdb.dump file (at 192.168.61.1 (/var/www/html/pub))

http://10.94.63.103:9010/Snh_SandeshTraceRequest?x=Ovsdb%20Pkt

{master:0}
root@vdc-vcf-s1> show ovsdb interface
Interface VLAN ID Bridge-domain
xe-0/0/22:0
xe-0/0/22:1
xe-0/0/22:2
xe-0/0/22:3
xe-1/0/15
xe-1/0/23

{master:0}
root@vdc-vcf-s1> show ovsdb controller
VTEP controller information:
Controller IP address: 10.94.191.153
Controller protocol: ssl
Controller port: 6645
Controller connection: up
Controller seconds-since-connect: 22109
Controller seconds-since-disconnect: 22112
Controller last-eror: Connection reset by peer
Controller connection status: active

{master:0}
root@vdc-vcf-s1> show system core-dumps
fpc0:
--------------------------------------------------------------------------
/var/tmp/*core*: No such file or directory

fpc1:
--------------------------------------------------------------------------
/var/tmp/*core*: No such file or directory

{master:0}
root@vdc-vcf-s1>

root@Host1-CN1:/var/crashes# contrail-status
vRouter is NOT PRESENT

== Contrail vRouter ==
supervisor-vrouter: inactive (disabled on boot)
unix:///tmp/supervisord_vrouter.sockno

== Contrail Control ==
supervisor-control: active
contrail-control active
contrail-control-nodemgr active
contrail-dns active
contrail-named active

== Contrail Analytics ==
supervisor-analytics: active
contrail-analytics-api active
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active

== Contrail Config ==
supervisor-config: active
contrail-api:0 active
contrail-config-nodemgr active
contrail-device-manager active
contrail-discovery:0 active
contrail-schema backup
contrail-svc-monitor active
ifmap active

== Contrail Web UI ==
supervisor-webui: active
contrail-webui active
contrail-webui-middleware active

== Contrail Support Services ==
supervisor-support-service: active
rabbitmq-server active

========Run time service failures=============
/var/crashes/core.contrail-contro.2267.Host1-CN1.1439507792

root@Host5-TSN2:~# contrail-status
== Contrail vRouter ==
supervisor-vrouter: active
contrail-tor-agent-1 active
contrail-tor-agent-2 initializing (ToR:vdc-vcf-s2 connection down)
contrail-tor-agent-3 initializing (ToR:vdc-vcf-s4 connection down)
contrail-tor-agent-4 initializing (ToR:vdc-vcf-l1 connection down)
contrail-tor-agent-5 initializing (ToR:vdc-vcf-l2 connection down)
contrail-tor-agent-6 active
contrail-tor-agent-7 initializing (ToR:vdc-vcf-l7 connection down)
contrail-vrouter-agent active
contrail-vrouter-nodemgr initializing (NTP state unsynchronized.)

Revision history for this message
Anoop Kumar Sahu (anoops) wrote :
Changed in juniperopenstack:
importance: Undecided → Critical
description: updated
Revision history for this message
Anoop Kumar Sahu (anoops) wrote :

[root@openflow-e ovsdb]# ./ovsdb-client dump tcp:10.94.47.65:6675 > ovsdb_dump_PR14874788.dump
[root@openflow-e ovsdb]# ftp 192.168.61.1
Connected to 192.168.61.1.
220 (vsFTPd 2.0.5)
530 Please login with USER and PASS.
530 Please login with USER and PASS.
KERBEROS_V4 rejected as an authentication type
Name (192.168.61.1:root): anonymous
331 Please specify the password.
Password:
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd pub
250 Directory successfully changed.
ftp> mput ovsdb_dump_PR14874788.dump
mput ovsdb_dump_PR14874788.dump?
227 Entering Passive Mode (192,168,61,1,220,16)
150 Ok to send data.
226 File receive OK.
4145 bytes sent in 0.09 seconds (45 Kbytes/s)
ftp> bye
221 Goodbye.

Revision history for this message
Anoop Kumar Sahu (anoops) wrote :

rebooted the TSN where the TOR agent was residing and the problem disappeared

Revision history for this message
Hari Prasad Killi (haripk) wrote :

The TSN node was missing /etc/contrail/debs_list.txt file (upgrade issue ?). Due to this, contrail-version command isnt working. In the agent code, this command is invoked and in this state, agent was getting blocked. This needs to be fixed.

tags: added: vrouter
Changed in juniperopenstack:
assignee: nobody → Ashok Singh (ashoksr)
importance: Critical → High
milestone: none → r2.30-fcs
Revision history for this message
Ignatious Johnson Christopher (ijohnson-x) wrote :

This file is used to by contrail-status to display the packages and its
version. This file is brought in by contrail-setup package.

We allow two mode of contrail uninstall as follows in fab,

1. Full - removes all contrail packages(users use this to install
different version of contrail)
2. Partial - Removes all except contrail-install-packages, contrail-setup,
contrail-fabric-utils (Users use this to reinstall same version of
contrail)

During partial uninstall this file is removed, then during reinstall of
the contrail, contrail-setup will not be installed as it is already
present,
So this file will not be created again.

I am fixing the fab uninstall_contrail task, so that it won¹t remove this
file from /etc/contrail in case of partial uninstall.

Changed in juniperopenstack:
assignee: Ashok Singh (ashoksr) → Ignatious Johnson Christopher (ijohnson-x)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/13268
Submitter: Ignatious Johnson Christopher (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/13268
Committed: http://github.org/Juniper/contrail-fabric-utils/commit/d4c08c028deb370b15cf08eaf03bb50637adaee3
Submitter: Zuul
Branch: master

commit d4c08c028deb370b15cf08eaf03bb50637adaee3
Author: Ignatious Johnson Christopher <email address hidden>
Date: Mon Aug 24 15:14:52 2015 -0700

Fixing the fab uninstall_contrail task not to remove debs_list.txt, rpm_list.txt
file from /etc/contrail in case of partial uninstall
Closes-Bug:1484788

Change-Id: Ib4edb8f1a9aac6dcae9f9c91ca1715de38164712

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22-dev

Review in progress for https://review.opencontrail.org/13293
Submitter: Ignatious Johnson Christopher (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/13294
Submitter: Ignatious Johnson Christopher (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/13293
Committed: http://github.org/Juniper/contrail-fabric-utils/commit/14a1f5832af26d024326e790eb151a8ddf1336dd
Submitter: Zuul
Branch: R2.22-dev

commit 14a1f5832af26d024326e790eb151a8ddf1336dd
Author: Ignatious Johnson Christopher <email address hidden>
Date: Mon Aug 24 15:14:52 2015 -0700

Fixing the fab uninstall_contrail task not to remove debs_list.txt, rpm_list.txt
file from /etc/contrail in case of partial uninstall
Closes-Bug:1484788

Change-Id: Ib4edb8f1a9aac6dcae9f9c91ca1715de38164712
(cherry picked from commit d4c08c028deb370b15cf08eaf03bb50637adaee3)

information type: Proprietary → Public
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/13294
Committed: http://github.org/Juniper/contrail-fabric-utils/commit/cea2b0d6829ac4eb8ce73b9bdf6307652a18e8f1
Submitter: Zuul
Branch: R2.20

commit cea2b0d6829ac4eb8ce73b9bdf6307652a18e8f1
Author: Ignatious Johnson Christopher <email address hidden>
Date: Mon Aug 24 15:14:52 2015 -0700

Fixing the fab uninstall_contrail task not to remove debs_list.txt, rpm_list.txt
file from /etc/contrail in case of partial uninstall
Closes-Bug:1484788

Change-Id: Ib4edb8f1a9aac6dcae9f9c91ca1715de38164712
(cherry picked from commit d4c08c028deb370b15cf08eaf03bb50637adaee3)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.