After stopping collector(in alarm cases), it's not coming up.
Though it was seen after resetting rabbitmq cluster(which was broken) it came up once but now again it's not coming up even when rabbitmq cluster is fine.
db nodemgr also showing msg: "Cassandra state detected DOWN" however cassandra seems to be up.
== Contrail database ==
kafka: active
nodemgr: initializing (Cassandra state detected DOWN. )
zookeeper: active
cassandra: active
== Contrail analytics ==
snmp-collector: active
query-engine: active
api: active
alarm-gen: active
nodemgr: active
collector: initializing (Database:overcloud-contrailcontroller-1:Global connection down)
topology: active
Errors seen in collector log:
2018-08-08 Wed 13:39:31:302.587 UTC overcloud-contrailcontroller-1 [Thread 139775151621888, Pid 1]: overcloud-contrailcontroller-1:Global: Initialize: Create/Set KEYSPACE: ContrailAnalyticsCql FAILED
2018-08-08 Wed 13:39:32:858.496 UTC overcloud-contrailcontroller-1 [Thread 139775151621888, Pid 1]: overcloud-contrailcontroller-1:Global: ObjectTableInsert: Addition of overcloud-contrailcontroller-1:Analytics:contrail-collector:0, message UUID 579b6a5f-75cd-41e1-a603-b92797fbc029 ObjectGeneratorInfo into table ObjectValueTable FAILED
2018-08-08 Wed 13:39:32:858.664 UTC overcloud-contrailcontroller-1 [Thread 139775151621888, Pid 1]: overcloud-contrailcontroller-1:Global: MessageTableOnlyInsert: Addition of message: SandeshModuleClientTrace, message UUID: 579b6a5f-75cd-41e1-a603-b92797fbc029 COLUMN FAILED
2018-08-08 Wed 13:39:32:859.743 UTC overcloud-contrailcontroller-1 [Thread 139775155820288, Pid 1]: overcloud-contrailcontroller-1:Global: ObjectTableInsert: Addition of overcloud-contrailcontroller-1:Analytics:contrail-collector:0, message UUID d396ee52-967a-4d4e-8fd4-946cd6bf9a95 ObjectGeneratorInfo into table ObjectValueTable FAILED
2018-08-08 Wed 13:39:32:859.936 UTC overcloud-contrailcontroller-1 [Thread 139775155820288, Pid 1]: overcloud-contrailcontroller-1:Global: MessageTableOnlyInsert: Addition of message: SandeshModuleClientTrace, message UUID: d396ee52-967a-4d4e-8fd4-946cd6bf9a95 COLUMN FAILED
setup info:
This virtualized setup with all the bms(VMs) running on below nodes:
Login for all hypervisors: root
Undercloud: 192.168.122.179 on 10.204.217.133
Controllers hypervisor: 10.204.217.134
Computes hypervisors: 10.204.217.135, 10.204.217.137, 10.204.217.138
target bms(VMs):
(undercloud) [stack@queensa ~]$ openstack server list
+--------------------------------------+--------------------------------+--------+------------------------+----------------+---------------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+--------------------------------+--------+------------------------+----------------+---------------------+
| 58f85277-04ca-4aec-91ce-d5a59ba9e609 | overcloud-contrailcontroller-2 | ACTIVE | ctlplane=192.168.24.14 | overcloud-full | contrail-controller |
| 20d626ff-d15e-48b0-ad06-fba82fa1e5fa | overcloud-contrailcontroller-0 | ACTIVE | ctlplane=192.168.24.19 | overcloud-full | contrail-controller |
| c79d6bfd-4c73-452c-aaf8-03fe08beca1e | overcloud-contrailcontroller-1 | ACTIVE | ctlplane=192.168.24.24 | overcloud-full | contrail-controller |
| 9e65dd37-4e32-466d-900c-014cbed49ee2 | overcloud-novacompute-1 | ACTIVE | ctlplane=192.168.24.20 | overcloud-full | compute |
| 2e2c2b82-c296-4c43-9b36-c1d30859e794 | overcloud-controller-0 | ACTIVE | ctlplane=192.168.24.23 | overcloud-full | control |
| 3106a491-420c-4830-a7c0-a4668305ea16 | overcloud-novacompute-0 | ACTIVE | ctlplane=192.168.24.6 | overcloud-full | compute |
| 541c1f09-31b5-421f-a176-aa3ea137ba90 | overcloud-controller-1 | ACTIVE | ctlplane=192.168.24.13 | overcloud-full | control |
| 077c293a-320f-4ec3-9678-ee774d2dfb92 | overcloud-controller-2 | ACTIVE | ctlplane=192.168.24.18 | overcloud-full | control |
| 29615d3b-c9ca-4375-8113-d8339151321a | overcloud-novacompute-2 | ACTIVE | ctlplane=192.168.24.15 | overcloud-full | compute |
+--------------------------------------+--------------------------------+--------+------------------------+----------------+---------------------+
to connect to any bms: ssh root@10.204.217.133-> ssh root@192.168.122.179 -> su - stack-> source stackrc-> ssh heat-admin@192.168.24.19, this will connect to cfgm0
The controller VM's are provisioned with low memory. overcloud- contrailcontrol ler-1 ~]$ free -h
[heat-admin@
total used free shared buff/cache available
Mem: 15G 13G 273M 4.8M 1.9G 1.6G
Swap: 0B 0B
This is not enough to run analytics_cassandra and config_cassandra in the same VM. e.java: 1238 - MUTATION messages were dropped in last 5000 ms: 125 internal and 121 cross node. Mean internal dropped latency: 545919 ms and Mean cross-node dropped latency: 537531 ms OutOfMemoryErro r: Java heap space
I am noticing OutOfMemory Errors in cassandra logs as well.
INFO [ScheduledTasks:1] 2018-08-08 14:27:36,704 MessagingServic
java.lang.
Dumping heap to java_pid1.hprof ...
Unable to create java_pid1.hprof: Permission denied
# OutOfMemoryErro r: Java heap space ryError= "kill -9 %p" e.java: 1238 - REQUEST_RESPONSE messages were dropped in last 5000 ms: 0 internal and 3 cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped latency: 0 ms
# java.lang.
# -XX:OnOutOfMemo
# Executing /bin/sh -c "kill -9 1"...
os::fork_and_exec failed: Cannot allocate memory (12)
INFO [ScheduledTasks:1] 2018-08-08 14:35:22,034 MessagingServic