Cassandra state detected DOWN after restoring database.

Bug #1707594 reported by vijaya kumar shankaran
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Fix Committed
High
vijaya kumar shankaran
R3.1
Fix Committed
High
vijaya kumar shankaran
R3.2
Fix Committed
High
vijaya kumar shankaran
R4.0
Fix Committed
High
vijaya kumar shankaran
Trunk
Fix Committed
High
vijaya kumar shankaran

Bug Description

Customer is running 3.1.3.0-79 testbed.py attached. Customer performed a database back and restored the database. Database running on openc-53, openc-54, openc-55
Node testbed reference Mgmt IP control + data
Openc-53 adb21 172.23.1.130 172.23.10.195
Openc-54 adb22 172.23.1.131 172.23.10.208
Openc-55 adb23 172.23.1.132 172.23.10.209

fab backup_cassandra_db
fab backup_zookeeper_data

&
fab restore_zookeeper_data
fab restore_cassandra_db

After restoring DB, verifying status with nodetool
juniper@openc-53:/var/log/cassandra$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down

/ State=Normal/Leaving/Joining/Moving
– Address Load Tokens Owns Host ID Rack
DN 172.23.10.209 4.66 GB 256 ? bcb33d2b-d1aa-4a1b-b114-f339a457c8f1 rack1 <<<<<<<<<< HERE
UN 172.23.10.208 5.25 GB 256 ? 24009e54-557b-4567-b872-d3d06aa9aa4e rack1
UN 172.23.10.195 5.66 GB 256 ? 74346b2b-45fc-4383-8eb6-bc54d0f0f243 rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

juniper@openc-54:~$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down

/ State=Normal/Leaving/Joining/Moving
– Address Load Tokens Owns Host ID Rack
DN 172.23.10.209 4.66 GB 256 ? bcb33d2b-d1aa-4a1b-b114-f339a457c8f1 rack1 <<<<<<<<<< HERE
UN 172.23.10.208 5.51 GB 256 ? 24009e54-557b-4567-b872-d3d06aa9aa4e rack1
DN 172.23.10.195 5.55 GB 256 ? 74346b2b-45fc-4383-8eb6-bc54d0f0f243 rack1 <<<<<<<<<< HERE

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

juniper@openc-55:~$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down

/ State=Normal/Leaving/Joining/Moving
– Address Load Tokens Owns Host ID Rack
UN 172.23.10.209 4.67 GB 256 ? bcb33d2b-d1aa-4a1b-b114-f339a457c8f1 rack1
UN 172.23.10.208 5.06 GB 256 ? 24009e54-557b-4567-b872-d3d06aa9aa4e rack1
UN 172.23.10.195 5.36 GB 256 ? 74346b2b-45fc-4383-8eb6-bc54d0f0f243 rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
From /var/log/Cassandra/system.log
Openc-53
INFO [HANDSHAKE-/172.23.10.209] 2017-07-25 09:17:13,672 OutboundTcpConnection.java:505 - Handshaking version with /172.23.10.209
INFO [HANDSHAKE-/172.23.10.208] 2017-07-25 09:30:42,982 OutboundTcpConnection.java:505 - Handshaking version with /172.23.10.208

INFO [HANDSHAKE-/172.23.10.195] 2017-07-25 09:43:08,155 OutboundTcpConnection.java:505 - Handshaking version with /172.23.10.195

INFO [GossipStage:1] 2017-07-25 16:07:10,274 Gossiper.java:995 - InetAddress /172.23.10.209 is now DOWN
WARN [CompactionExecutor:1409] 2017-07-25 16:07:12,867 SSTableReader.java:259 - Reading cardinality from Statistics.db failed for /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/la-541-big-Data.db
ERROR [NonPeriodicTasks:1] 2017-07-25 16:07:12,888 SSTableDeletingTask.java:83 - Unable to delete /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/la-541-big-Data.db (it will be removed on server restart; we'll also retry after GC)
ERROR [NonPeriodicTasks:1] 2017-07-25 16:07:12,888 SSTableDeletingTask.java:83 - Unable to delete /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/la-542-big-Data.db (it will be removed on server restart; we'll also retry after GC)
ERROR [NonPeriodicTasks:1] 2017-07-25 16:07:46,745 SSTableDeletingTask.java:83 - Unable to delete /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/la-541-big-Data.db (it will be removed on server restart; we'll also retry after GC)
ERROR [NonPeriodicTasks:1] 2017-07-25 16:07:46,746 SSTableDeletingTask.java:83 - Unable to delete /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/la-542-big-Data.db (it will be removed on server restart; we'll also retry after GC)
ERROR [COMMIT-LOG-ALLOCATOR] 2017-07-25 16:10:57,852 StorageService.java:457 - Stopping gossiper
WARN [COMMIT-LOG-ALLOCATOR] 2017-07-25 16:10:57,852 StorageService.java:363 - Stopping gossip by operator request
INFO [COMMIT-LOG-ALLOCATOR] 2017-07-25 16:10:57,852 Gossiper.java:1449 - Announcing shutdown
INFO [COMMIT-LOG-ALLOCATOR] 2017-07-25 16:10:57,853 StorageService.java:1922 - Node /172.23.10.195
ERROR [COMMIT-LOG-ALLOCATOR] 2017-07-25 16:10:59,916 CommitLog.java:486 - Failed managing commit log segments. Commit disk failure policy is stop; terminating thread
java.lang.AssertionError: attempted to delete non-existing file CommitLog-5-1500607922799.log
 at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:122) ~[apache-cassandra-2.2.5.jar:2.2.5]
 at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:149) ~[apache-cassandra-2.2.5.jar:2.2.5]
 at org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:314) ~[apache-cassandra-2.2.5.jar:2.2.5]
 at org.apache.cassandra.db.commitlog.CommitLogSegmentManager$2.run(CommitLogSegmentManager.java:380) ~[apache-cassandra-2.2.5.jar:2.2.5]
 at org.apache.cassandra.db.commitlog.CommitLogSegmentManager$1.runMayThrow(CommitLogSegmentManager.java:155) ~[apache-cassandra-2.2.5.jar:2.2.5]
 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.2.5.jar:2.2.5]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_121]
openc-54

INFO [GossipStage:1] 2017-07-25 16:07:10,273 Gossiper.java:995 - InetAddress /172.23.10.209 is now DOWN
INFO [GossipStage:1] 2017-07-25 16:10:57,874 Gossiper.java:995 - InetAddress /172.23.10.195 is now DOWN

INFO [COMMIT-LOG-ALLOCATOR] 2017-07-25 16:14:39,352 StorageService.java:1922 - Node /172.23.10.208 state jump to shutdown
ERROR [COMMIT-LOG-ALLOCATOR] 2017-07-25 16:14:41,373 StorageService.java:462 - Stopping RPC server

openc-55
INFO [COMMIT-LOG-ALLOCATOR] 2017-07-25 16:07:10,256 StorageService.java:1922 - Node /172.23.10.209 state jump to shutdown
ERROR [COMMIT-LOG-ALLOCATOR] 2017-07-25 16:07:12,274 StorageService.java:462 - Stopping RPC server
INFO [COMMIT-LOG-ALLOCATOR] 2017-07-25 16:07:12,275 ThriftServer.java:142 - Stop listening to thrift clients
ERROR [COMMIT-LOG-ALLOCATOR] 2017-07-25 16:07:12,276 StorageService.java:467 - Stopping native transport
INFO [COMMIT-LOG-ALLOCATOR] 2017-07-25 16:07:12,283 Server.java:218 - Stop listening for CQL clients
ERROR [COMMIT-LOG-ALLOCATOR] 2017-07-25 16:07:12,291 CommitLog.java:486 - Failed managing commit log segments. Commit disk failure policy is stop; terminating thread
java.lang.AssertionError: attempted to delete non-existing file CommitLog-5-1500607929087.log
 at org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:122) ~[apache-cassandra-2.2.5.jar:2.2.5]

Contrail-status (After restore)
contrail-status : openc-53 (172.23.1.130)
-----------------------------------------------------------------------------
== Contrail Database ==
contrail-database: active

== Contrail Supervisor Database ==
supervisor-database: active
contrail-database-nodemgr initializing (Cassandra state detected DOWN.)
kafka active

contrail-status : openc-54 (172.23.1.131)
-----------------------------------------------------------------------------
== Contrail Database ==
contrail-database: active

== Contrail Supervisor Database ==
supervisor-database: active
contrail-database-nodemgr initializing (Cassandra state detected DOWN.)
kafka active

contrail-status : openc-55 (172.23.1.132)
-----------------------------------------------------------------------------
== Contrail Database ==
contrail-database: active

== Contrail Supervisor Database ==
supervisor-database: active
contrail-database-nodemgr initializing (Cassandra state detected DOWN.)
kafka active

Contrail-status (before restore)
contrail-status : openc-53 (172.23.1.130)
-------------------------------------------
== Contrail Database ==
contrail-database: active

== Contrail Supervisor Database ==
supervisor-database: active
contrail-database-nodemgr active
kafka active

contrail-status : openc-54 (172.23.1.131)
-------------------------------------------
== Contrail Database ==
contrail-database: active

== Contrail Supervisor Database ==
supervisor-database: active
contrail-database-nodemgr active
kafka active

contrail-status : openc-55 (172.23.1.132)
-------------------------------------------
== Contrail Database ==
contrail-database: active

== Contrail Supervisor Database ==
supervisor-database: active
contrail-database-nodemgr active
kafka active

Tags: analytics
information type: Proprietary → Public
Revision history for this message
vijaya kumar shankaran (vijayks) wrote :

logs uploaded to 10.219.48.123:/home/vijayks/2017-0725-1153/2017-0725-1153_20170726.zip

Changed in juniperopenstack:
assignee: nobody → Megh Bhatt (meghb)
milestone: none → r3.1.4.0
importance: Undecided → High
Jeba Paulaiyan (jebap)
tags: added: analytics
Revision history for this message
vijaya kumar shankaran (vijayks) wrote :

Hi Megh,

Could you please provide us an update?

Best Regards,
Vijay Kumar

Revision history for this message
Megh Bhatt (meghb) wrote :

Hi Vijay Kumar,

Please provide credentials to access the logs. I was not able to log on to 10.219.48.123 to access the logs.

Thanks

Megh

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/34421
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/34428
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/34429
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/34430
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/34432
Submitter: Megh Bhatt (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/34421
Committed: http://github.com/Juniper/contrail-fabric-utils/commit/eff9abd3c90c64e08fb658b6ebb69672fde5eca7
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit eff9abd3c90c64e08fb658b6ebb69672fde5eca7
Author: Megh Bhatt <email address hidden>
Date: Wed Aug 9 15:43:54 2017 -0700

Include contrail-database/cassandra in fab start/stop database

Change-Id: I96be7d959d917fbdba8c020ad158c8afd4e4e6d3
Closes-Bug: #1707594

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/34428
Committed: http://github.com/Juniper/contrail-fabric-utils/commit/ea0a11cc76c96e3af0fd89925a1b13f66ba56a94
Submitter: Zuul (<email address hidden>)
Branch: R3.0

commit ea0a11cc76c96e3af0fd89925a1b13f66ba56a94
Author: Megh Bhatt <email address hidden>
Date: Wed Aug 9 15:43:54 2017 -0700

Include contrail-database/cassandra in fab start/stop database

Change-Id: I96be7d959d917fbdba8c020ad158c8afd4e4e6d3
Closes-Bug: #1707594

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/34430
Committed: http://github.com/Juniper/contrail-fabric-utils/commit/552a96b8bbe85da55a8bcf2478f06eff251f3b39
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit 552a96b8bbe85da55a8bcf2478f06eff251f3b39
Author: Megh Bhatt <email address hidden>
Date: Wed Aug 9 15:43:54 2017 -0700

Include contrail-database/cassandra in fab start/stop database

Change-Id: I96be7d959d917fbdba8c020ad158c8afd4e4e6d3
Closes-Bug: #1707594

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/34432
Committed: http://github.com/Juniper/contrail-fabric-utils/commit/779204501eaf312c16f73181853e5fdff27918a2
Submitter: Zuul (<email address hidden>)
Branch: master

commit 779204501eaf312c16f73181853e5fdff27918a2
Author: Megh Bhatt <email address hidden>
Date: Wed Aug 9 15:43:54 2017 -0700

Include contrail-database/cassandra in fab start/stop database

Change-Id: I96be7d959d917fbdba8c020ad158c8afd4e4e6d3
Closes-Bug: #1707594

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/34429
Committed: http://github.com/Juniper/contrail-fabric-utils/commit/2702bb1c5f045eff2bf8399b6778466014d069d2
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit 2702bb1c5f045eff2bf8399b6778466014d069d2
Author: Megh Bhatt <email address hidden>
Date: Wed Aug 9 15:43:54 2017 -0700

Include contrail-database/cassandra in fab start/stop database

Change-Id: I96be7d959d917fbdba8c020ad158c8afd4e4e6d3
Closes-Bug: #1707594

Revision history for this message
vijaya kumar shankaran (vijayks) wrote :

Hi Megh,

I see that this is fixed could you please update us the build information that has the fix?

Best Regards,
Vijay Kumar

Revision history for this message
vijaya kumar shankaran (vijayks) wrote :

Hi Megh,

Customer would like to understand from which release Cassandra db was moved to contrail-database service?

*moving Cassandra db from supervisor-database to contrail-database service.

Best Regards,
Vijay Kumar

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.