TimedOutException in collector and query-engine on read/write data from/to cassandra
Bug #1527225 reported by
Sundaresan Rajangam
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
Trunk |
Fix Committed
|
High
|
Megh Bhatt |
Bug Description
In a 3-node database cluster, TimedOutException raised in collector/
The following warning message was continuously logged in cassandra/
WARN [MessagingServi
Also, the nodetool repair command was running on all the nodes in the cluster and seem to be stuck
To post a comment you must log in.
Megh debugged the issue and here is his analysis:
It seems the the UnknownColumnFa milyException is most likely caused due to schema mismatch between the cassandra nodes. It seems that .6 and .7 are somehow not in sync with respect to the schema’s of the column families and not able to get in sync. I checked that ping between them works fine even with higher packet sizes. We should actually be creating column families with QUORUM or ALL consistency then these problems might not happen or happen sooner during creation of the tables itself.
There is also another issue in that contrail- cassandra- status script is not able to parse the nodetool status output of cassandra2.1, that needs fixing too.
root@nodeh1:~# nodetool status ======= ======= == Leaving/ Joining/ Moving 571d-44ce- 9d97-7e643f53b7 8e rack1 0d85-4d99- b84c-9b90bd792a ba rack1 fe87-4645- 8aac-c1fec665d7 7e rack1
Datacenter: datacenter1
=======
Status=Up/Down
|/ State=Normal/
-- Address Load Tokens Owns Host ID Rack
UN 40.43.1.7 1.23 GB 256 ? d95824d5-
UN 40.43.1.6 1.39 GB 256 ? 62dcbdfc-
UN 40.43.1.5 1.3 GB 256 ? f42d739b-
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless 0d85-4d99- b84c-9b90bd792a ba cassandra. locator. DynamicEndpoint Snitch cassandra. dht.Murmur3Part itioner 3a32-31ef- 8eaa-961077d5ce 80: [40.43.1.6, 40.43.1.5]
root@nodeh1:~# nodetool info
ID : 62dcbdfc-
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 1.39 GB
Generation No : 1450164892
Uptime (seconds) : 125495
Heap Memory (MB) : 2076.05 / 8152.00
Off Heap Memory (MB) : 3.00
Data Center : datacenter1
Rack : rack1
Exceptions : 37
Key Cache : entries 18242, size 1.64 MB, capacity 100 MB, 480580 hits, 520230 requests, 0.924 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 50 MB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Token : (invoke with -T/--tokens to see all 256 tokens)
root@nodeh1:~# nodetool describecluster
Cluster Information:
Name: Contrail
Snitch: org.apache.
Partitioner: org.apache.
Schema versions:
408d512e-
UNREACHABLE: [40.43.1.7]
root@nodeh1:~#
root@nodeh2:~# nodetool describecluster cassandra. locator. DynamicEndpoint Snitch cassandra. dht.Murmur3Part itioner 3a32-31ef- 8eaa-961077d5ce 80: [40.43.1.7, 40.43.1.5]
Cluster Information:
Name: Contrail
Snitch: org.apache.
Partitioner: org.apache.
Schema versions:
408d512e-
UNREACHABLE: [40.43.1.6]
root@nodeh2:~# nodetool status ======= ======= == Leaving/ Joining/ Moving 571d-44ce- 9d97-7e643f53b7 8e rack1
Datacenter: datacenter1
=======
Status=Up/Down
|/ State=Normal/
-- Address Load Tokens Owns Host ID Rack
UN 40.43.1.7 1.23 GB 256 ? d95824d5-
UN 40.43.1.6 1.39 GB 256 ? ...