this LP is opened to involve BU to confirm/explain if the below observation during lab replication to ATT's P1 case 2018-0303-0029 indicate a problem or not.
ATT is still chasing the cause We also need to update ATT with our effort and findings.
test steps:
* stop cassandra db (node2 is not in cluster now)
* removed /var/lib/cassandra/data/ (node2 db is cleared now)
* start cassandra db (node2 join back into cluster now)
test result:
it was noticed after node2 came online and joined cluster, config data from all
nodes will also disappear and restore back after at least one hour.
the replication factor configured for the keyspaces under observation is 3.
also this is quite reproducible.
cqlsh> SELECT * FROM system.schema_keyspaces;
keyspace_name | durable_writes | strategy_class | strategy_options
----------------------+----------------+---------------------------------------------+----------------------------
ContrailAnalyticsCql | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"2"}
svc_monitor_keyspace | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"3"}
to_bgp_keyspace | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"3"}
system | True | org.apache.cassandra.locator.LocalStrategy | {}
config_db_uuid | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"3"}
dm_keyspace | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"3"}
system_traces | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"2"}
useragent | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"3"}
DISCOVERY_SERVER | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"3"}
below are observations during the test.
#### before test everything look good on all nodes
root@cont45:~# contrail-status
== Contrail Control ==
supervisor-control: active
contrail-control active
contrail-control-nodemgr active
contrail-dns active
contrail-named active
== Contrail Analytics ==
supervisor-analytics: active
contrail-alarm-gen:0 active
contrail-analytics-api active
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active
== Contrail Config ==
supervisor-config: active
contrail-api:0 active
contrail-config-nodemgr active
contrail-device-manager active
contrail-discovery:0 active
contrail-schema active
contrail-svc-monitor active
ifmap active
== Contrail Web UI ==
supervisor-webui: active
contrail-webui active
contrail-webui-middleware active
== Contrail Database ==
contrail-database: active
== Contrail Supervisor Database ==
supervisor-database: active
contrail-database-nodemgr active
kafka active
== Contrail Support Services ==
supervisor-support-service: active
rabbitmq-server active
#### in one node stop cassandra process and remove the hard disk data folder
root@cont45:~# service contrail-database stop
root@cont45:~# nodetool status
nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused'.
root@cont45:~# mv /var/lib/cassandra/data/ /var/lib/cassandra/data.ori
root@cont45:/var/lib/cassandra# mv commitlog/ commitlog.ori
root@cont45:/var/lib/cassandra# mv saved_caches/ saved_caches.ori
now we see same the errors message (as seen in ATT logs)
root@cont45:~# zgrep -HiF "Data doesn't exist" /var/log/cassandra/system.log.*.zip
/var/log/cassandra/system.log.1.zip:ERROR [main] 2018-04-17 10:13:32,556 CassandraDaemon.java:264 - Directory /var/lib/cassandra/data doesn't exist
root@kvm789:~/cases/2018-0303-0029# zgrep -HiF "Directory /var/lib/contrail_db/data doesn't exist" varlog0*/log/cassandra/system.log.*.zip
varlog02/log/cassandra/system.log.3.zip:ERROR [main] 2018-02-17 18:54:34,706 CassandraDaemon.java:264 - Directory /var/lib/contrail_db/data doesn't exist
varlog02/log/cassandra/system.log.3.zip:ERROR [main] 2018-02-24 04:01:19,011 CassandraDaemon.java:264 - Directory /var/lib/contrail_db/data doesn't exist
varlog02/log/cassandra/system.log.1.zip:ERROR [main] 2018-02-25 01:31:05,838 CassandraDaemon.java:264 - Directory /var/lib/contrail_db/data doesn't exist
varlog02/log/cassandra/system.log.1.zip:ERROR [main] 2018-02-27 23:16:32,668 CassandraDaemon.java:264 - Directory /var/lib/contrail_db/data doesn't exist
#### start cassandra
root@cont45:/var/lib/cassandra# service contrail-database start
root@cont45:/var/lib/cassandra# service contrail-database-nodemgr restart
contrail-database-nodemgr: stopped
contrail-database-nodemgr: started
#### everything came back
root@cont45:/var/lib/cassandra# contrail-status
== Contrail Control ==
supervisor-control: active
contrail-control active
contrail-control-nodemgr active
contrail-dns active
contrail-named active
== Contrail Analytics ==
supervisor-analytics: active
contrail-alarm-gen:0 active
contrail-analytics-api active
contrail-analytics-nodemgr active
contrail-collector active
contrail-query-engine active
contrail-snmp-collector active
contrail-topology active
== Contrail Config ==
supervisor-config: active
contrail-api:0 active
contrail-config-nodemgr active
contrail-device-manager active
contrail-discovery:0 active
contrail-schema active
contrail-svc-monitor active
ifmap active
== Contrail Web UI ==
supervisor-webui: active
contrail-webui active
contrail-webui-middleware active
== Contrail Database ==
contrail-database: active
== Contrail Supervisor Database ==
supervisor-database: active
contrail-database-nodemgr active
kafka active
== Contrail Support Services ==
supervisor-support-service: active
rabbitmq-server active
#### data loss in node2
root@cont45:/var/lib/cassandra# cqlsh 10.169.4.5
Connected to Contrail at 10.169.4.5:9042.
[cqlsh 5.0.1 | Cassandra 2.1.13 | CQL spec 3.2.1 | Native protocol v3]
Use HELP for help.
cqlsh> DESCRIBE KEYSPACEs;
"ContrailAnalyticsCql" config_db_uuid useragent
"DISCOVERY_SERVER" to_bgp_keyspace svc_monitor_keyspace
system system_traces dm_keyspace
cqlsh> SELECT * FROM config_db_uuid.obj_fq_name_table ;
key | column1 | value
-----+---------+-------
(0 rows)
cqlsh>
cqlsh>
[1]+ Stopped cqlsh 10.169.4.5
root@cont45:/var/lib/cassandra# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.169.4.5 1.27 MB 256 ? 43fff594-a300-490f-93d2-f7b5aaee76f5 rack1
UN 10.169.4.4 102.68 MB 256 ? fa3576e4-5e4c-45a1-ba65-9891d71a2458 rack1
UN 10.169.4.6 109.44 MB 256 ? 926c8a25-9f34-4284-a831-cebf8599d668 rack1
cqlsh> SELECT * FROM config_db_uuid.obj_fq_name_table ;
key | column1 | value
------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+-------
0x7669727475616c5f726f75746572 | 0x64656661756c742d676c6f62616c2d73797374656d2d636f6e6669673a636f6d7034373a63363263623461342d663433612d343633352d613637652d366639613632383736333033 | null
0x7669727475616c5f726f75746572 | 0x64656661756c742d676c6f62616c2d73797374656d2d636f6e6669673a636f6d7034383a38313565656338392d626132382d343931362d613139622d663866383465626136653632 | null
0x646973636f766572795f736572766963655f61737369676e6d656e74 | 0x64656661756c742d646973636f766572792d736572766963652d61737369676e6d656e743a66336665323962372d366366362d343939352d386436642d323332643638313134326438 | null
(3 rows)
In [5]: dict(OBJ_FQ_NAME_TABLE.get_range()).keys()
Out[5]: ['virtual_router', 'discovery_service_assignment']
In [4]: dict(OBJ_FQ_NAME_TABLE.get_range())
Out[4]:
{'discovery_service_assignment': OrderedDict([('default-discovery-service-assignment:f3fe29b7-6cf6-4995-8d6d-232d681142d8', u'null')]),
'virtual_router': OrderedDict([('default-global-system-config:comp47:c62cb4a4-f43a-4635-a67e-6f9a62876303', u'null'), ('default-global-system-config:comp48:815eec89-ba28-4916-a19b-f8f84eba6e62', u'null')])}
In [5]: dict(OBJ_FQ_NAME_TABLE.get_range()).keys()
Out[5]: ['virtual_router', 'discovery_service_assignment']
In [7]: dict(OBJ_FQ_NAME_TABLE.get_range()).keys()
Out[7]:
['virtual_network',
'domain',
'bgp_router',
'virtual_router',
'discovery_service_assignment']
#### data loss on other 2 nodes
In [5]: dict(OBJ_FQ_NAME_TABLE.get_range()).keys()
Out[5]: ['virtual_router', 'discovery_service_assignment']
root@cont46:~# INFO [SlabPoolCleaner] 2018-04-17 11:06:06,557 ColumnFamilyStore.java:1197 - Flushing largest CFS(Keyspace='DISCOVERY_SERVER', ColumnFamily='discovery') to free up room. Used total: 0.33/0.00, live: 0.33/0.00, flushing: 0.00/0.00, this: 0.04/0.04
INFO [SlabPoolCleaner] 2018-04-17 11:06:06,558 ColumnFamilyStore.java:905 - Enqueuing flush of discovery: 22897468 (4%) on-heap, 0 (0%) off-heap
INFO [MemtableFlushWriter:393] 2018-04-17 11:06:06,560 Memtable.java:347 - Writing Memtable-discovery@1230843943(85.702KiB serialized bytes, 49375 ops, 4%/0% of on/off-heap limit)
INFO [MemtableFlushWriter:393] 2018-04-17 11:06:06,562 Memtable.java:382 - Completed flushing /var/lib/cassandra/data/DISCOVERY_SERVER/discovery-1bcbd28041c711e8b0af33f5ee0423aa/DISCOVERY_SERVER-discovery-tmp-ka-18-Data.db (11.354KiB) for commitlog position ReplayPosition(segmentId=1523926234035, position=11910103)
#### data come back on other nodes after about 1 hour
In [11]: dict(OBJ_FQ_NAME_TABLE.get_range()).keys()
Out[11]:
['virtual_network',
'domain',
'bgp_router',
'virtual_router',
'discovery_service_assignment']
In [12]:
In [12]:
In [12]: INFO [SlabPoolCleaner] 2018-04-17 11:50:33,957 ColumnFamilyStore.java:1197 - Flushing largest CFS(Keyspace='ContrailAnalyticsCql', ColumnFamily='messagetable') to free up room. Used total: 0.33/0.00, live: 0.33/0.00, flushing: 0.00/0.00, this: 0.04/0.04
INFO [SlabPoolCleaner] 2018-04-17 11:50:33,958 ColumnFamilyStore.java:905 - Enqueuing flush of messagetable: 21725370 (4%) on-heap, 0 (0%) off-heap
INFO [MemtableFlushWriter:410] 2018-04-17 11:50:33,959 Memtable.java:347 - Writing Memtable-messagetable@1246079711(12.445MiB serialized bytes, 72688 ops, 4%/0% of on/off-heap limit)
INFO [MemtableFlushWriter:410] 2018-04-17 11:50:34,087 Memtable.java:382 - Completed flushing /var/lib/cassandra/data/ContrailAnalyticsCql/messagetable-fd009cd041d711e88e47a3f227876d24/ContrailAnalyticsCql-messagetable-tmp-ka-127-Data.db (2.488MiB) for commitlog position ReplayPosition(segmentId=1523926234040, position=30162636)
INFO [ScheduledTasks:1] 2018-04-17 11:50:37,624 ColumnFamilyStore.java:905 - Enqueuing flush of compaction_history: 24334 (0%) on-heap, 0 (0%) off-heap
INFO [ScheduledTasks:1] 2018-04-17 11:50:37,625 ColumnFamilyStore.java:905 - Enqueuing flush of sstable_activity: 45095 (0%) on-heap, 0 (0%) off-heap
INFO [MemtableFlushWriter:411] 2018-04-17 11:50:37,625 Memtable.java:347 - Writing Memtable-compaction_history@2020466485(4.965KiB serialized bytes, 179 ops, 0%/0% of on/off-heap limit)
INFO [MemtableFlushWriter:410] 2018-04-17 11:50:37,625 Memtable.java:347 - Writing Memtable-sstable_activity@1968787031(4.896KiB serialized bytes, 1285 ops, 0%/0% of on/off-heap limit)
INFO [MemtableFlushWriter:411] 2018-04-17 11:50:37,627 Memtable.java:382 - Completed flushing /var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-tmp-ka-28-Data.db (0.000KiB) for commitlog position ReplayPosition(segmentId=1523926234040, position=30236973)
INFO [MemtableFlushWriter:410] 2018-04-17 11:50:37,627 Memtable.java:382 - Completed flushing /var/lib/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-tmp-ka-28-Data.db (0.000KiB) for commitlog position ReplayPosition(segmentId=1523926234040, position=30236973)
INFO [CompactionExecutor:292] 2018-04-17 11:50:37,634 CompactionTask.java:141 - Compacting [SSTableReader(path='/var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-28-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-26-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-27-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-25-Data.db')]
INFO [CompactionExecutor:291] 2018-04-17 11:50:37,634 CompactionTask.java:141 - Compacting [SSTableReader(path='/var/lib/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-28-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-26-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-27-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-25-Data.db')]
INFO [CompactionExecutor:291] 2018-04-17 11:50:37,645 CompactionTask.java:274 - Compacted 4 sstables to [/var/lib/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-29,]. 9,204 bytes to 1,580 (~17% of original) in 9ms = 0.167423MB/s. 290 total partitions merged to 32. Partition merge counts were {1:125, 2:20, 3:11, 4:23, }
INFO [CompactionExecutor:292] 2018-04-17 11:50:37,716 CompactionTask.java:274 - Compacted 4 sstables to [/var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-29,]. 250,070 bytes to 249,326 (~99% of original) in 80ms = 2.972198MB/s. 2,905 total partitions merged to 2,905. Partition merge counts were {1:2905, }
INFO [CompactionExecutor:292] 2018-04-17 11:50:39,109 CompactionTask.java:141 - Compacting [SSTableReader(path='/var/lib/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-21-Data.db')]
INFO [CompactionExecutor:292] 2018-04-17 11:50:39,132 CompactionTask.java:274 - Compacted 1 sstables to [/var/lib/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-22,]. 224,624 bytes to 224,624 (~100% of original) in 22ms = 9.737188MB/s. 1 total partitions merged to 1. Partition merge counts were {1:1, }
INFO [SlabPoolCleaner] 2018-04-17 11:53:56,141 ColumnFamilyStore.java:1197 - Flushing largest CFS(Keyspace='ContrailAnalyticsCql', ColumnFamily='statstablebystrtagv3') to free up room. Used total: 0.33/0.00, live: 0.33/0.00, flushing: 0.00/0.00, this: 0.05/0.05
INFO [SlabPoolCleaner] 2018-04-17 11:53:56,142 ColumnFamilyStore.java:905 - Enqueuing flush of statstablebystrtagv3: 24482303 (5%) on-heap, 0 (0%) off-heap
INFO [MemtableFlushWriter:412] 2018-04-17 11:53:56,143 Memtable.java:347 - Writing Memtable-statstablebystrtagv3@1878390197(7.844MiB serialized bytes, 59566 ops, 5%/0% of on/off-heap limit)
INFO [MemtableFlushWriter:412] 2018-04-17 11:53:56,267 Memtable.java:382 - Completed flushing /var/lib/cassandra/data/ContrailAnalyticsCql/statstablebystrtagv3-094e7c0041d811e88e47a3f227876d24/ContrailAnalyticsCql-statstablebystrtagv3-tmp-ka-131-Data.db (1.872MiB) for commitlog position ReplayPosition(segmentId=1523926234041, position=10991017)
In [12]: INFO [SlabPoolCleaner] 2018-04-17 11:57:44,804 ColumnFamilyStore.java:1197 - Flushing largest CFS(Keyspace='system', ColumnFamily='size_estimates') to free up room. Used total: 0.33/0.00, live: 0.33/0.00, flushing: 0.00/0.00, this: 0.05/0.05
INFO [SlabPoolCleaner] 2018-04-17 11:57:44,804 ColumnFamilyStore.java:905 - Enqueuing flush of size_estimates: 23150166 (5%) on-heap, 0 (0%) off-heap
INFO [MemtableFlushWriter:413] 2018-04-17 11:57:44,805 Memtable.java:347 - Writing Memtable-size_estimates@1573675184(2.281MiB serialized bytes, 234080 ops, 5%/0% of on/off-heap limit)
INFO [MemtableFlushWriter:413] 2018-04-17 11:57:44,852 Memtable.java:382 - Completed flushing /var/lib/cassandra/data/system/size_estimates-618f817b005f3678b8a453f3930b8e86/system-size_estimates-tmp-ka-42-Data.db (535.843KiB) for commitlog position ReplayPosition(segmentId=1523926234041, position=26845586)
INFO [CompactionExecutor:295] 2018-04-17 12:00:39,137 CompactionTask.java:141 - Compacting [SSTableReader(path='/var/lib/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-22-Data.db')]
INFO [CompactionExecutor:295] 2018-04-17 12:00:39,157 CompactionTask.java:274 - Compacted 1 sstables to [/var/lib/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-23,]. 224,624 bytes to 224,624 (~100% of original) in 18ms = 11.901008MB/s. 1 total partitions merged to 1. Partition merge counts were {1:1, }
INFO [SlabPoolCleaner] 2018-04-17 12:01:12,893 ColumnFamilyStore.java:1197 - Flushing largest CFS(Keyspace='ContrailAnalyticsCql', ColumnFamily='messagetable') to free up room. Used total: 0.33/0.00, live: 0.33/0.00, flushing: 0.00/0.00, this: 0.05/0.05
INFO [SlabPoolCleaner] 2018-04-17 12:01:12,893 ColumnFamilyStore.java:905 - Enqueuing flush of messagetable: 23459775 (5%) on-heap, 0 (0%) off-heap
INFO [MemtableFlushWriter:414] 2018-04-17 12:01:12,894 Memtable.java:347 - Writing Memtable-messagetable@1115321929(13.444MiB serialized bytes, 78442 ops, 5%/0% of on/off-heap limit)
INFO [MemtableFlushWriter:414] 2018-04-17 12:01:13,030 Memtable.java:382 - Completed flushing /var/lib/cassandra/data/ContrailAnalyticsCql/messagetable-fd009cd041d711e88e47a3f227876d24/ContrailAnalyticsCql-messagetable-tmp-ka-128-Data.db (2.707MiB) for commitlog position ReplayPosition(segmentId=1523926234042, position=6524380)
INFO [CompactionExecutor:294] 2018-04-17 12:01:13,054 ColumnFamilyStore.java:905 - Enqueuing flush of compactions_in_progress: 1332 (0%) on-heap, 0 (0%) off-heap
INFO [MemtableFlushWriter:415] 2018-04-17 12:01:13,055 Memtable.java:347 - Writing Memtable-compactions_in_progress@240159631(0.162KiB serialized bytes, 9 ops, 0%/0% of on/off-heap limit)
INFO [MemtableFlushWriter:415] 2018-04-17 12:01:13,055 Memtable.java:382 - Completed flushing /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-tmp-ka-232-Data.db (0.000KiB) for commitlog position ReplayPosition(segmentId=1523926234042, position=6547711)
INFO [CompactionExecutor:295] 2018-04-17 12:01:13,059 CompactionTask.java:141 - Compacting [SSTableReader(path='/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-229-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-232-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-230-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-231-Data.db')]
INFO [CompactionExecutor:294] 2018-04-17 12:01:13,060 CompactionTask.java:141 - Compacting [SSTableReader(path='/var/lib/cassandra/data/ContrailAnalyticsCql/messagetable-fd009cd041d711e88e47a3f227876d24/ContrailAnalyticsCql-messagetable-ka-127-Data.db'), SSTableReader(path='/var/lib/cassandra/data/ContrailAnalyticsCql/messagetable-fd009cd041d711e88e47a3f227876d24/ContrailAnalyticsCql-messagetable-ka-126-Data.db'), SSTableReader(path='/var/lib/cassandra/data/ContrailAnalyticsCql/messagetable-fd009cd041d711e88e47a3f227876d24/ContrailAnalyticsCql-messagetable-ka-125-Data.db'), SSTableReader(path='/var/lib/cassandra/data/ContrailAnalyticsCql/messagetable-fd009cd041d711e88e47a3f227876d24/ContrailAnalyticsCql-messagetable-ka-128-Data.db')]
INFO [CompactionExecutor:295] 2018-04-17 12:01:13,071 CompactionTask.java:274 - Compacted 4 sstables to [/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-233,]. 461 bytes to 189 (~40% of original) in 11ms = 0.016386MB/s. 3 total partitions merged to 1. Partition merge counts were {1:1, 2:1, }
INFO [CompactionExecutor:294] 2018-04-17 12:01:15,474 ColumnFamilyStore.java:905 - Enqueuing flush of compactions_in_progress: 164 (0%) on-heap, 0 (0%) off-heap
INFO [MemtableFlushWriter:414] 2018-04-17 12:01:15,474 Memtable.java:347 - Writing Memtable-compactions_in_progress@177919800(0.008KiB serialized bytes, 1 ops, 0%/0% of on/off-heap limit)
INFO [MemtableFlushWriter:414] 2018-04-17 12:01:15,475 Memtable.java:382 - Completed flushing /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-tmp-ka-234-Data.db (0.000KiB) for commitlog position ReplayPosition(segmentId=1523926234042, position=6591678)
INFO [CompactionExecutor:294] 2018-04-17 12:01:15,481 CompactionTask.java:274 - Compacted 4 sstables to [/var/lib/cassandra/data/ContrailAnalyticsCql/messagetable-fd009cd041d711e88e47a3f227876d24/ContrailAnalyticsCql-messagetable-ka-129,]. 39,891,382 bytes to 32,148,123 (~80% of original) in 2,420ms = 12.668942MB/s. 79,195 total partitions merged to 63,765. Partition merge counts were {1:79195, }
In [12]: dict(OBJ_FQ_NAME_TABLE.get_range()).keys()
Out[12]:
['service_appliance_set',
'domain',
'virtual_router',
'global_system_config',
'network_policy',
'route_table',
'routing_instance',
'service_appliance',
'network_ipam',
'config_node',
'namespace',
'bgp_router',
'analytics_node',
'service_template',
'api_access_list',
'discovery_service_assignment',
'access_control_list',
'qos_queue',
'database_node',
'route_target',
'global_vrouter_config',
'project',
'security_group',
'virtual_network']
#### node2 data remains lost
In [20]: INFO [SlabPoolCleaner] 2018-04-17 12:14:19,058 ColumnFamilyStore.java:1197 - Flushing largest CFS(Keyspace='ContrailAnalyticsCql', ColumnFamily='statstablebystrtagv3') to free up room. Used total: 0.33/0.00, live: 0.33/0.00, flushing: 0.00/0.00, this: 0.06/0.06
INFO [SlabPoolCleaner] 2018-04-17 12:14:19,059 ColumnFamilyStore.java:905 - Enqueuing flush of statstablebystrtagv3: 31239070 (6%) on-heap, 0 (0%) off-heap
INFO [MemtableFlushWriter:27] 2018-04-17 12:14:19,060 Memtable.java:347 - Writing Memtable-statstablebystrtagv3@487434147(9.956MiB serialized bytes, 75790 ops, 6%/0% of on/off-heap limit)
INFO [MemtableFlushWriter:27] 2018-04-17 12:14:19,199 Memtable.java:382 - Completed flushing /var/lib/cassandra/data/ContrailAnalyticsCql/statstablebystrtagv3-094e7c0041d811e88e47a3f227876d24/ContrailAnalyticsCql-statstablebystrtagv3-tmp-ka-10-Data.db (2.350MiB) for commitlog position ReplayPosition(segmentId=1523985970826, position=15010936)
In [20]: dict(OBJ_FQ_NAME_TABLE.get_range()).keys()
Out[20]:
['virtual_network',
'domain',
'bgp_router',
'virtual_router',
'discovery_service_assignment']
root@cont45:~# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.169.4.5 54.17 MB 256 ? 43fff594-a300-490f-93d2-f7b5aaee76f5 rack1
UN 10.169.4.4 96.08 MB 256 ? fa3576e4-5e4c-45a1-ba65-9891d71a2458 rack1
UN 10.169.4.6 96.01 MB 256 ? 926c8a25-9f34-4284-a831-cebf8599d668 rack1
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
after leaving the testbed for longer time everything got recovered back.
Ping,
Whom/Where did you receive these recovery steps from?
Could you check if they removed, they followed below steps?
rm -rf /var/lib/ cassandra/ commitlog/ * cassandra/ status- up
rm -f /var/log/
This was the recommended approach from analytics team who were handling Cassandra. /github. com/Juniper/ contrail- controller/ wiki/Recovery- procedure- when-contrail- database- is-down- for-greater- than-gc_ grace_seconds
https:/
-Suresh