Overcloud deletion takes too much time on promotion OVB jobs

Bug #1801525 reported by Sagi (Sergey) Shnaidman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Javier Peña

Bug Description

Job fails because overcloud deletion takes more than 6 minutes

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/5544b61/job-output.txt.gz#_2018-11-03_13_06_42_382810

2018-11-03 13:06:42.395328 | primary | Saturday 03 November 2018 13:06:42 +0000 (0:00:06.325) 0:00:19.063 *****
2018-11-03 13:06:45.398168 | primary | FAILED - RETRYING: check for delete command to complete or fail (30 retries left).
2018-11-03 13:06:58.256750 | primary | FAILED - RETRYING: check for delete command to complete or fail (29 retries left).
2018-11-03 13:07:11.042140 | primary | FAILED - RETRYING: check for delete command to complete or fail (28 retries left).
2018-11-03 13:07:23.977532 | primary | FAILED - RETRYING: check for delete command to complete or fail (27 retries left).
2018-11-03 13:07:36.486581 | primary | FAILED - RETRYING: check for delete command to complete or fail (26 retries left).
2018-11-03 13:07:49.368456 | primary | FAILED - RETRYING: check for delete command to complete or fail (25 retries left).
2018-11-03 13:08:01.840704 | primary | FAILED - RETRYING: check for delete command to complete or fail (24 retries left).
2018-11-03 13:08:14.556554 | primary | FAILED - RETRYING: check for delete command to complete or fail (23 retries left).
2018-11-03 13:08:27.231547 | primary | FAILED - RETRYING: check for delete command to complete or fail (22 retries left).
2018-11-03 13:08:39.838366 | primary | FAILED - RETRYING: check for delete command to complete or fail (21 retries left).
2018-11-03 13:08:52.412215 | primary | FAILED - RETRYING: check for delete command to complete or fail (20 retries left).
2018-11-03 13:09:04.917074 | primary | FAILED - RETRYING: check for delete command to complete or fail (19 retries left).
2018-11-03 13:09:17.463301 | primary | FAILED - RETRYING: check for delete command to complete or fail (18 retries left).
2018-11-03 13:09:30.044752 | primary | FAILED - RETRYING: check for delete command to complete or fail (17 retries left).
2018-11-03 13:09:42.680939 | primary | FAILED - RETRYING: check for delete command to complete or fail (16 retries left).
2018-11-03 13:09:55.192438 | primary | FAILED - RETRYING: check for delete command to complete or fail (15 retries left).
2018-11-03 13:10:07.853290 | primary | FAILED - RETRYING: check for delete command to complete or fail (14 retries left).
2018-11-03 13:10:20.553972 | primary | FAILED - RETRYING: check for delete command to complete or fail (13 retries left).
2018-11-03 13:10:33.300192 | primary | FAILED - RETRYING: check for delete command to complete or fail (12 retries left).
2018-11-03 13:10:46.034032 | primary | FAILED - RETRYING: check for delete command to complete or fail (11 retries left).
2018-11-03 13:10:58.712123 | primary | FAILED - RETRYING: check for delete command to complete or fail (10 retries left).
2018-11-03 13:11:11.542746 | primary | FAILED - RETRYING: check for delete command to complete or fail (9 retries left).
2018-11-03 13:11:24.203079 | primary | FAILED - RETRYING: check for delete command to complete or fail (8 retries left).
2018-11-03 13:11:36.914218 | primary | FAILED - RETRYING: check for delete command to complete or fail (7 retries left).
2018-11-03 13:11:50.175322 | primary | FAILED - RETRYING: check for delete command to complete or fail (6 retries left).
2018-11-03 13:12:02.925870 | primary | FAILED - RETRYING: check for delete command to complete or fail (5 retries left).
2018-11-03 13:12:15.546217 | primary | FAILED - RETRYING: check for delete command to complete or fail (4 retries left).
2018-11-03 13:12:27.983561 | primary | FAILED - RETRYING: check for delete command to complete or fail (3 retries left).
2018-11-03 13:12:40.473795 | primary | FAILED - RETRYING: check for delete command to complete or fail (2 retries left).
2018-11-03 13:12:52.943968 | primary | FAILED - RETRYING: check for delete command to complete or fail (1 retries left).
2018-11-03 13:13:06.366781 | primary | fatal: [undercloud]: FAILED! => {"attempts": 30, "changed": true, "cmd": "source /home/zuul/stackrc\n heat stack-show $(cat /home/zuul/overcloud_id)", "delta": "0:00:02.123148", "end": "2018-11-03 13:13:05.542273", "rc": 0, "start": "2018-11-03 13:13:03.419125", "stderr": "WARNING (shell) \"heat stack-show\" is deprecated, please use \"openstack stack show\" instead", "stderr_lines": ["WARNING (shell) \"heat stack-show\" is deprecated, please use \"openstack stack show\" instead"], "stdout":

Revision history for this message
Javier Peña (jpena-c) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/615533

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart-extras (master)

Fix proposed to branch: master
Review: https://review.openstack.org/615545

Changed in tripleo:
assignee: nobody → Javier Peña (jpena-c)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by Javier Peña (<email address hidden>) on branch: master
Review: https://review.openstack.org/615545
Reason: https://review.openstack.org/615533 was proposed first

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/615533
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=ea2c751da579157c818cfd772ceb38c79d8da5b3
Submitter: Zuul
Branch: master

commit ea2c751da579157c818cfd772ceb38c79d8da5b3
Author: Sagi Shnaidman <email address hidden>
Date: Mon Nov 5 13:29:53 2018 +0200

    Add time for overcloud deletion

    On promotion jobs we hit problems when overcloud-deletion is
    finished too late and we stop the job before, which leads to
    failures.
    Add time to wait for overcloud delete.
    Change-Id: Idb4079b42cb69d7bddc8dd19817da79d49e1dcb3
    Related-Bug: #1801525

Revision history for this message
Alan Pevec (apevec) wrote :

615545 had additional info in the commit message:
"We are getting overcloud delete timeouts in periodic jobs, since it is taking ~7.5 minutes instead of the ~5 minutes it used to take."

This still needs to be investigated: exact 50% increase after Oct 30.

Changed in tripleo:
milestone: stein-2 → stein-3
wes hayutin (weshayutin)
tags: removed: promotion-blocker
Changed in tripleo:
status: In Progress → Incomplete
Changed in tripleo:
milestone: stein-3 → stein-rc1
Revision history for this message
wes hayutin (weshayutin) wrote :

The ovb workflow has been overhauled since this bug was opened.

Changed in tripleo:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.