containerized undercloud upgrade jobs isn't testing upgrades anymore

Bug #1783399 reported by Emilien Macchi
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Rafael Folco

Bug Description

Caused by this patch:
https://review.openstack.org/#/c/582384/9/playbooks/tripleo-ci/templates/toci_gate_test.sh.j2

The variables set by this patch make it so the containerized undercloud upgrade job isn't testing the actual upgrade (skipped now).

Tags: ci
Revision history for this message
Emilien Macchi (emilienm) wrote :
Changed in tripleo:
assignee: nobody → Emilien Macchi (emilienm)
Revision history for this message
Rafael Folco (rafaelfolco) wrote :

The TAGS breakage (stopped running overcloud-update) seems to be unrelated to the updates job failure (Set docker_startup_configs_with_default fact).
Looks like reverting the TAGS patch does not fix the issue.
Another evidence for this is the fact rdo-jobs fail the same way and don't run the same workflow changed in this sprint.

https://review.openstack.org/#/c/584508/ gives a wrong impression that the error is gone since zuul ci update job got a SUCCESS. In fact, it fails the same way: http://logs.openstack.org/08/584508/3/check/tripleo-ci-centos-7-scenario000-multinode-oooq-container-updates/e77e9da/logs/undercloud/home/zuul/overcloud_update_run_Controller.log.txt.gz#_2018-07-24_21_17_17.

@rlandy found this:
https://logs.rdoproject.org/71/584771/14/openstack-check/legacy-tripleo-ci-centos-7-multinode-1ctlr-featureset037-updates-master/c796079/logs/undercloud/var/lib/mistral/5cf4bb48-2415-4575-95ea-7459eb55929e/Controller/docker_config.yaml.txt.gz
start_order: 1
    user: root

'start_order' looks misplaced...

a successful job has:
https://logs.rdoproject.org/71/584771/14/openstack-check/legacy-tripleo-ci-centos-7-multinode-1ctlr-featureset037-updates-master/c796079/logs/undercloud/var/lib/mistral/overcloud/Controller/docker_config.yaml.txt.gz

Another diff is that /var/lib/mistral/overcloud in the successful job is /var/lib/mistral/ad625367adadad2632656337 in the failing job.

Changed in tripleo:
assignee: Emilien Macchi (emilienm) → nobody
assignee: nobody → Rafael Folco (rafaelfolco)
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Revision history for this message
Rafael Folco (rafaelfolco) wrote :

This bug is about TAGS issue not running upgrades jobs.

Created a separate bug for update job failure:
https://bugs.launchpad.net/tripleo/+bug/1783866

Revision history for this message
Rafael Folco (rafaelfolco) wrote :
Revision history for this message
Ronelle Landy (rlandy) wrote :

I have a reproducer set up with fs037.
Manually removing that mislaced/extra start_order and the root line allow the update to continue

Revision history for this message
Ronelle Landy (rlandy) wrote :

Each time /home/zuul/overcloud_update_run-Controller.sh is run, a new dir is created in /var/lib/mistral and Set docker_startup_configs_with_default fact fails on https://github.com/openstack/tripleo-heat-templates/blob/master/common/deploy-steps-tasks.yaml#L80.

u'An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ^',
 u'fatal: [subnode-1]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}',

With no_log removed the step before sprints out to:

 u'TASK [Set docker_config_default fact] ******************************************',
 u'Thursday 26 July 2018 21:20:22 +0000 (0:00:00.548) 0:02:04.493 ********* ',
 u'ok: [subnode-1] => (item=1) => {"ansible_facts": {"docker_config_default": {"step_1": {}}}, "changed": false, "item": "1"}',
 u'ok: [subnode-1] => (item=2) => {"ansible_facts": {"docker_config_default": {"step_1": {}, "step_2": {}}}, "changed": false, "item": "2"}',
 u'ok: [subnode-1] => (item=3) => {"ansible_facts": {"docker_config_default": {"step_1": {}, "step_2": {}, "step_3": {}}}, "changed": false, "item": "3"}',
 u'ok: [subnode-1] => (item=4) => {"ansible_facts": {"docker_config_default": {"step_1": {}, "step_2": {}, "step_3": {}, "step_4": {}}}, "changed": false, "item": "4"}',
 u'ok: [subnode-1] => (item=5) => {"ansible_facts": {"docker_config_default": {"step_1": {}, "step_2": {}, "step_3": {}, "step_4": {}, "step_5": {}}}, "changed": false, "item": "5"}',
 u'ok: [subnode-1] => (item=6) => {"ansible_facts": {"docker_config_default": {"step_1": {}, "step_2": {}, "step_3": {}, "step_4": {}, "step_5": {}, "step_6": {}}}, "changed": false, "item": "6"}',

^^ which is fine. The error is on /var/lib/mistral/xxxx/Controller/docker_config.yaml.
(never on /var/lib/mistral/overcloud/Controller/docker_config.yaml - just the update)

Revision history for this message
Ronelle Landy (rlandy) wrote :
Revision history for this message
Quique Llorente (quiquell) wrote :

Ronelle going to paste this to the bug for the update job

https://bugs.launchpad.net/tripleo/+bug/1783866

wes hayutin (weshayutin)
tags: removed: alert
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.