tripleo-ci-centos-7-scenario010-standalone time outs in gate checks

Bug #1832597 reported by Jose Luis Franco
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Francesco Pantano

Bug Description

The gating job tripleo-ci-centos-7-scenario010-standalone is blocking stable/stein from merging patches due to a time out issue. The last executions of the job in that branch took more than 3 hours, when normally it doesn't take more than 1h30':

http://zuul.openstack.org/builds?job_name=tripleo-ci-centos-7-scenario010-standalone&branch=stable%2Fstein

The last patch to pass the gate check was this one: https://review.opendev.org/661946 which migh be related to the time out issue, as the job hangs while installing ceph-ansible:

standalone-deploy.log:

http://logs.openstack.org/25/664225/2/gate/tripleo-ci-centos-7-scenario010-standalone/62625aa/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz#_2019-06-11_07_59_54

ceph-ansible.log:

2019-06-11 08:01:28,078 p=41905 u=root | TASK [ceph-mon : include_tasks ceph_keys.yml] **********************************
2019-06-11 08:01:28,078 p=41905 u=root | Tuesday 11 June 2019 08:01:28 +0000 (0:00:03.325) 0:01:29.640 **********
2019-06-11 08:01:28,198 p=41905 u=root | included: /usr/share/ceph-ansible/roles/ceph-mon/tasks/ceph_keys.yml for standalone
2019-06-11 08:01:28,284 p=41905 u=root | TASK [ceph-mon : waiting for the monitor(s) to form the quorum...] *************
2019-06-11 08:01:28,284 p=41905 u=root | Tuesday 11 June 2019 08:01:28 +0000 (0:00:00.205) 0:01:29.846 **********
2019-06-11 08:51:29,016 p=41905 u=root | FAILED - RETRYING: waiting for the monitor(s) to form the quorum... (5 retries left).
2019-06-11 09:41:39,603 p=41905 u=root | FAILED - RETRYING: waiting for the monitor(s) to form the quorum... (4 retries left).

Logs: http://logs.openstack.org/25/664225/2/gate/tripleo-ci-centos-7-scenario010-standalone/62625aa/logs/undercloud/home/zuul/undercloud-ansible-HmElX0/ceph-ansible/ceph_ansible_command.log.txt.gz#_2019-06-11_08_01_28_284

Job Logs: http://logs.openstack.org/25/664225/2/gate/tripleo-ci-centos-7-scenario010-standalone/62625aa/

Changed in tripleo:
assignee: John Fulton (jfulton-org) → nobody
assignee: nobody → Francesco Pantano (fmount)
tags: added: stein-backport-potential
removed: ceph-ansible
Revision history for this message
Jose Luis Franco (jfrancoa) wrote :

It seems it was workarounded by https://review.opendev.org/#/c/661946 (revert the enabling of ceph_mon v2), however the issue might be still latent.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/664954

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/664954
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6947d0842c6bee6f687bd3609f04c2828407c80d
Submitter: Zuul
Branch: master

commit 6947d0842c6bee6f687bd3609f04c2828407c80d
Author: fpantano <email address hidden>
Date: Wed Jun 12 17:03:11 2019 +0200

    Add higher retry/delay defaults to check the quorum status.

    As the lp1832597 reported, we need to push to ceph-ansible
    higher values to check if the quorum is healthy after the
    handler is executed.
    This commit sets new defaults via CephAnsibleExtraConfig.

    Change-Id: If5b39c78b32f7312ea0b5056a7d4ec3a60ee931d
    Related-Bug: 1832597

Revision history for this message
Francesco Pantano (fmount) wrote :

Fixed pushing new defaults in ceph-ansible: https://github.com/ceph/ceph-ansible/pull/4131

Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.