[CI down] pingtest fails when accessing to OpenStack APIs

Bug #1596758 reported by Emilien Macchi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Pradeep Kilambi

Bug Description

Currently, ha and upgrade jobs are broken during pingtest, that fails to reach OpenStack API:

http://logs.openstack.org/11/333511/7/check-tripleo/gate-tripleo-ci-centos-7-ha/635b87f/console.html#_2016-06-27_22_15_19_757395

It sounds like Pacemaker is in a bad shape during the deployment. I also noticed RabbitMQ down.
Need more investigation.

Revision history for this message
Jiří Stránský (jistr) wrote :
Download full text (16.6 KiB)

The crucial part is here, RabbitMQ fails to form a cluster:

Jun 27 21:48:50 [13170] overcloud-controller-0 crmd: notice: te_rsc_command: Initiating action 71: start rabbitmq_start_0 on overcloud-controller-0 (local)
Jun 27 21:48:50 [13170] overcloud-controller-0 crmd: info: do_lrm_rsc_op: Performing key=71:17:0:61c3eb8c-c41b-47ed-85a4-d2051d130ea9 op=rabbitmq_start_0
Jun 27 21:48:50 [13167] overcloud-controller-0 lrmd: info: log_execute: executing - rsc:rabbitmq action:start call_id:64
Jun 27 21:48:50 [13165] overcloud-controller-0 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=overcloud-controller-2/crmd/39, version=0.31.3)
Jun 27 21:48:51 [13170] overcloud-controller-0 crmd: info: throttle_handle_load: Moderate CPU load detected: 0.840000
Jun 27 21:48:51 [13170] overcloud-controller-0 crmd: info: throttle_send_command: New throttle mode: 0010 (was 0100)
rabbitmq-cluster(rabbitmq)[23989]: 2016/06/27_21:48:51 INFO: RabbitMQ server is not running
rabbitmq-cluster(rabbitmq)[23989]: 2016/06/27_21:48:51 INFO: Joining existing cluster with [ rabbit@overcloud-controller-2 ] nodes.
rabbitmq-cluster(rabbitmq)[23989]: 2016/06/27_21:48:51 INFO: Waiting for server to start
Jun 27 21:48:55 [13165] overcloud-controller-0 cib: info: cib_process_ping: Reporting our current digest to overcloud-controller-0: 0d5b3d9a1e8a9c7820e450dbd39a2280 for 0.31.3 (0x27e10e0 0)
Jun 27 21:48:57 [13168] overcloud-controller-0 attrd: info: attrd_peer_update: Setting rmq-node-attr-rabbitmq[overcloud-controller-0]: (null) -> rabbit@overcloud-controller-0 from overcloud-controller-0
Jun 27 21:48:57 [13165] overcloud-controller-0 cib: info: cib_perform_op: Diff: --- 0.31.3 2
Jun 27 21:48:57 [13165] overcloud-controller-0 cib: info: cib_perform_op: Diff: +++ 0.31.4 (null)
Jun 27 21:48:57 [13165] overcloud-controller-0 cib: info: cib_perform_op: + /cib: @num_updates=4
Jun 27 21:48:57 [13165] overcloud-controller-0 cib: info: cib_perform_op: ++ /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']: <nvpair id="status-1-rmq-node-attr-rabbitmq" name="rmq-node-attr-rabbitmq" value="rabbit@overcloud-controller-0"/>
Jun 27 21:48:57 [13165] overcloud-controller-0 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=overcloud-controller-2/attrd/26, version=0.31.4)
Jun 27 21:48:57 [13170] overcloud-controller-0 crmd: notice: abort_transition_graph: Transition aborted by status-1-rmq-node-attr-rabbitmq, rmq-node-attr-rabbitmq=rabbit@overcloud-controller-0: Transient attribute change (create cib=0.31.4, source=abort_unless_down:319, path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1'], 0)
Jun 27 21:48:57 [13165] overcloud-controller-0 cib: info: cib_process_request: Forwarding cib_modify operation for section nodes to master (origin=local/crm_attribute/4)
Jun 27 21:48:57 [13165] overcloud-controller-0 cib: info: cib_perform_op: Diff: -...

summary: - pingtest fails when accessing to OpenStack APIs
+ [CI down] RabbitMQ fails to form a cluster
Revision history for this message
Jiří Stránský (jistr) wrote : Re: [CI down] RabbitMQ fails to form a cluster

I cannot test the workaround locally ATM but a patch that might get the CI green is here: https://review.openstack.org/#/c/334885/

Revision history for this message
Jiří Stránský (jistr) wrote :

The rabbitmq problem is probably a red herring happening on the one CI run in the description, but not on others. E.g. here's a run with all resources started up fine including RabbitMQ, but the pingtest issue is still there: http://logs.openstack.org/86/334486/2/check-tripleo/gate-tripleo-ci-centos-7-ha/b79b36f/

summary: - [CI down] RabbitMQ fails to form a cluster
+ [CI down] pingtest fails when accessing to OpenStack APIs
Revision history for this message
Jiří Stránský (jistr) wrote :

A revert attempt that might fix it is here: https://review.openstack.org/#/c/335008

Changed in tripleo:
assignee: nobody → Pradeep Kilambi (pkilambi)
status: Confirmed → In Progress
tags: removed: alert
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (master)

Reviewed: https://review.openstack.org/335115
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=c20e88ace39ea5d19bb5f94e081f0664363ac0d8
Submitter: Jenkins
Branch: master

commit c20e88ace39ea5d19bb5f94e081f0664363ac0d8
Author: Pradeep Kilambi <email address hidden>
Date: Tue Jun 28 12:16:13 2016 -0400

    Fix ssl port removal logic in postconfig

    ssl_port removal hack is using wrong service list
    when poping the data. This broke the ci ping tests.
    This fixes the hack to use right data structure.

    Closes-Bug: #1596758

    Depends-On: Id428b112eeaa22ecef78a21032b0c1dcc0ac0592

    Change-Id: Ifa64ec389c7b030c9001482817eaf306890812ff

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/python-tripleoclient 5.0.0.0b2

This issue was fixed in the openstack/python-tripleoclient 5.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.