RHOSP8-kafka connection not up, analytics api service went down

Bug #1678288 reported by shajuvk
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.2
Invalid
High
Anish Mehta
Trunk
Invalid
High
Anish Mehta

Bug Description

[2017-03-30 18:08:08,647] WARN [Controller-0-to-broker-0-send-thread], Controller 0's connection to broker Node(0, 10.87.67.181, 9092) was unsuccessful (kafka.controller.RequestSendThread)
java.io.IOException: Connection to Node(0, 10.87.67.181, 9092) failed
        at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingReady$extension$1.apply(NetworkClientBlockingOps.scala:62)
        at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingReady$extension$1.apply(NetworkClientBlockingOps.scala:58)
        at kafka.utils.NetworkClientBlockingOps$$anonfun$kafka$utils$NetworkClientBlockingOps$$pollUntil$extension$2.apply(NetworkClientBlockingOps.scala:106)
        at kafka.utils.NetworkClientBlockingOps$$anonfun$kafka$utils$NetworkClientBlockingOps$$pollUntil$extension$2.apply(NetworkClientBlockingOps.scala:105)
        at kafka.utils.NetworkClientBlockingOps$.recurse$1(NetworkClientBlockingOps.scala:129)
        at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntilFound$extension(NetworkClientBlockingOps.scala:139)
        at kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollUntil$extension(NetworkClientBlockingOps.scala:105)
        at kafka.utils.NetworkClientBlockingOps$.blockingReady$extension(NetworkClientBlockingOps.scala:58)
        at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:225)
        at kafka.controller.RequestSendThread.liftedTree1$1(ControllerChannelManager.scala:172)
        at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:171)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
====

03/31/2017 02:11:05 PM [contrail-alarm-gen]: An exception of type NotLeaderForPartitionError occured. Arguments:
(OffsetResponsePayload(topic=u'-uve-13', partition=0, error=6, offsets=()),) : traceback Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/opserver/partition_handler.py", line 597, in _run
    consumer.seek(-1,2)
  File "/usr/lib/python2.7/site-packages/kafka/consumer/simple.py", line 245, in seek
    resps = self.client.send_offset_request(reqs)
  File "/usr/lib/python2.7/site-packages/kafka/client.py", line 641, in send_offset_request
    if not fail_on_error or not self._raise_on_response_error(resp)]
  File "/usr/lib/python2.7/site-packages/kafka/client.py", line 384, in _raise_on_response_error
    kafka.common.check_error(resp)
  File "/usr/lib/python2.7/site-packages/kafka/common.py", line 473, in check_error
    raise error_class(response)
NotLeaderForPartitionError: NotLeaderForPartitionError - 6 - This error is thrown if the client attempts to send messages to a replica that is not the leader for some partition. It indicates that the clients metadata is out of date.

03/31/2017 02:11:05 PM [contrail-alarm-gen]: Stopping part 13 collectors []
03/31/2017 02:11:05 PM [contrail-alarm-gen]: Stopping part 13 UVEs []
03/31/2017 02:11:06 PM [contrail-alarm-gen]: New KafkaClient -uve-18
^C
[root@contrail-controller2 bin]# pwd
/usr/share/kafka/bin
[root@contrail-controller2 bin]# ./kafka-topics.sh --zookeeper 127.0.0.1 --describe
Topic:-uve-0 PartitionCount:1 ReplicationFactor:2 Configs:
        Topic: -uve-0 Partition: 0 Leader: 1 Replicas: 0,1 Isr: 1
Topic:-uve-1 PartitionCount:1 ReplicationFactor:2 Configs:
        Topic: -uve-1 Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic:-uve-10 PartitionCount:1 ReplicationFactor:2 Configs:
        Topic: -uve-10 Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic:-uve-11 PartitionCount:1 ReplicationFactor:2 Configs:
        Topic: -uve-11 Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic:-uve-12 PartitionCount:1 ReplicationFactor:2 Configs:
        Topic: -uve-12 Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic:-uve-13 PartitionCount:1 ReplicationFactor:2 Configs:
        Topic: -uve-13 Partition: 0 Leader: 1 Replicas: 0,1 Isr: 1
Topic:-uve-14 PartitionCount:1 ReplicationFactor:2 Configs:
        Topic: -uve-14 Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic:-uve-15 PartitionCount:1 ReplicationFactor:2 Configs:
        Topic: -uve-15 Partition: 0 Leader: 1 Replicas: 0,1 Isr: 1
Topic:-uve-16 PartitionCount:1 ReplicationFactor:2 Configs:
        Topic: -uve-16 Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic:-uve-17 PartitionCount:1 ReplicationFactor:2 Configs:
        Topic: -uve-17 Partition: 0 Leader: 1 Replicas: 0,1 Isr: 1
Topic:-uve-18 PartitionCount:1 ReplicationFactor:2 Configs:
        Topic: -uve-18 Partition: 0 Leader: 1 Replicas: 0,1 Isr: 1
Topic:-uve-19 PartitionCount:1 ReplicationFactor:2 Configs:
        Topic: -uve-19 Partition: 0 Leader: 2 Replicas: 2,0 Isr: 2

Tags: analytics
Revision history for this message
shajuvk (shajuvk) wrote :

/cs-shared/bugs/1678288 logs are available here. 3 node analytics setup. cfgm and analytics are same nodes

shajuvk (shajuvk)
information type: Proprietary → Public
Raj Reddy (rajreddy)
Changed in juniperopenstack:
assignee: nobody → Anish Mehta (amehta00)
Jeba Paulaiyan (jebap)
summary: - RHOSP8-kafka not connectiong not up, analytics api service went down
+ RHOSP8-kafka connection not up, analytics api service went down
Revision history for this message
Raj Reddy (rajreddy) wrote :

The systems have too less memory..

[root@contrail-controller1 ~]# free -m
              total used free shared buff/cache available
Mem: 11854 6697 396 395 4760 4132
Swap: 0 0 0
[root@contrail-controller1 ~]# free -mh
              total used free shared buff/cache available
Mem: 11G 6.5G 395M 395M 4.6G 4.0G
Swap: 0B 0B 0B
[root@contrail-controller1 ~]#

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.