Cassandra disk failure during hard node failures

Bug #1550059 reported by venu kolli
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Fix Committed
Critical
Raj Reddy
Trunk
Fix Committed
Critical
Raj Reddy

Bug Description

After powercycling of control nodes leads to cassandra corruption and service is being stopped.

We are hitting the following open bug of cassandra

https://issues.apache.org/jira/browse/CASSANDRA-10534

Megh had a look at the issue and gave following workarond.

Workaround :

Modify cassandra policy in cassandra.yaml on failed nodes.

1) disk failure policy to best_effort
2) service contrail-database start
3) nodetool scrub

Logs:

NFO [SSTableBatchOpen:1] 2016-02-25 11:48:00,194 SSTableReader.java:478 - Opening /var/lib/cassandra/data/system/schema_triggers-0359bc7171233ee19a4ab9dfb11fc125/system-sc
INFO [main] 2016-02-25 11:48:01,772 AutoSavingCache.java:146 - reading saved cache /var/lib/cassandra/saved_caches/system-schema_triggers-0359bc7171233ee19a4ab9dfb11fc125-K
INFO [main] 2016-02-25 11:48:02,068 ColumnFamilyStore.java:363 - Initializing system.compaction_history
INFO [SSTableBatchOpen:3] 2016-02-25 11:48:02,148 SSTableReader.java:478 - Opening /var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system
bytes)
INFO [SSTableBatchOpen:2] 2016-02-25 11:48:02,155 SSTableReader.java:478 - Opening /var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system
tes)
INFO [SSTableBatchOpen:1] 2016-02-25 11:48:02,210 SSTableReader.java:478 - Opening /var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system
ytes)
ERROR [SSTableBatchOpen:2] 2016-02-25 11:48:02,332 FileUtils.java:447 - Exiting forcefully due to file system exception on startup, disk failure policy "stop"
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
        at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:131) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534) ~[apache-cassandra-2.1.9.jar:2.1.9]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_85]
        at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_85]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_85]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_85]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_85]
Caused by: java.io.EOFException: null
        at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) ~[na:1.7.0_85]
        at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_85]
        at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_85]
        at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:106) ~[apache-cassandra-2.1.9.jar:2.1.9]
        ... 14 common frames omitted

venu kolli (vkolli)
tags: added: blocker
Raj Reddy (rajreddy)
tags: added: releasenote
removed: blocker
Raj Reddy (rajreddy)
tags: added: analytics
Revision history for this message
Megh Bhatt (meghb) wrote :

Needs upgrade of cassandra

Revision history for this message
Vinod Nair (vinodnair) wrote :

Cassandra logs for similar issue seen at att is attached

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/19120
Submitter: Raj Reddy (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/19120
Committed: http://github.org/Juniper/contrail-packaging/commit/e25959c9efe0250ce5a513d21cbeaf38def20d0b
Submitter: Zuul
Branch: R3.0

commit e25959c9efe0250ce5a513d21cbeaf38def20d0b
Author: Raj Reddy <email address hidden>
Date: Wed Apr 6 11:54:08 2016 -0700

Need to upgrade to cassandra 2.1.13 for fixing boot up issues on a hard reboot
Closes-Bug: #1550059

Change-Id: Ib83aff80a745f365c3829bf516333bd4f57c88e1

Jeba Paulaiyan (jebap)
information type: Proprietary → Public
Revision history for this message
Raj Reddy (rajreddy) wrote :

committed through 1567693 in mainline..

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.