Undercloud upgrade cannot be re-run after initial failure.

Bug #1804459 reported by Sofer Athlan-Guyot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Sofer Athlan-Guyot

Bug Description

Hi,

Originally reported there https://bugzilla.redhat.com/show_bug.cgi?id=1647956

After an failed undercloud upgrade, if says, the ironic-api has been disabled but the container is still not installed then we have a systematic error preventing the upgrade to be run again.

...
    fatal: [undercloud-0]: FAILED! => {"changed": true, "cmd": ["docker", "exec", "ironic_api", "ironic-dbsync", "--config-file", "/etc/ironic/ironic.conf", "online_data_migrations"], "delta": "0
    :00:00.036968", "end": "2018-11-20 11:06:53.414249", "msg": "non-zero return code", "rc": 1, "start": "2018-11-20 11:06:53.377281", "stderr": "Error response from daemon: No such container: i
    ronic_api", "stderr_lines": ["Error response from daemon: No such container: ironic_api"], "stdout": "", "stdout_lines": []}

tags: added: upgrade
removed: upgr
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/619247

Changed in tripleo:
assignee: nobody → Sofer Athlan-Guyot (sofer-athlan-guyot)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/619247
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=0bf32c5796b4001a87d95eba0ae6828ae357fdd5
Submitter: Zuul
Branch: master

commit 0bf32c5796b4001a87d95eba0ae6828ae357fdd5
Author: Sofer Athlan-Guyot <email address hidden>
Date: Wed Nov 21 14:50:31 2018 +0100

    Remove validation part of the online database migration pre-upgrade.

    That task included a validation in the form of "if the container is
    not there fails". This was done to ensure that the online database
    migration was run even in the case where the upgrade was run on a
    environment where some container would be stopped for some reason.

    This validation proves to be problematic as having the related host
    services stopped and the container non-running is a "legitimate" state
    during the re-run of a fail upgrade.

    That validation then completely blocks the upgrade.

    We remove the validation part of that tasks as it's very unlikely,
    belongs to a validation tasks to be done outside of the upgrade and
    block a valid path from working.

    Change-Id: I6ca70cb913f7cdd6fc4fbcc70698992e2074dc9c
    Closes-Bug: #1804459

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 10.2.0

This issue was fixed in the openstack/tripleo-heat-templates 10.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/622420

tags: added: rocky-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/rocky)

Reviewed: https://review.openstack.org/622420
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=323548a2b30535616a4b6bbe7c2cf274e378d970
Submitter: Zuul
Branch: stable/rocky

commit 323548a2b30535616a4b6bbe7c2cf274e378d970
Author: Sofer Athlan-Guyot <email address hidden>
Date: Wed Nov 21 14:50:31 2018 +0100

    Verify docker as part of the online database migration pre-upgrade.

    We have to check for docker presence here because it might be missing
    during the re-run of a failed upgrade.

    This is a sort of backport of
    I6ca70cb913f7cdd6fc4fbcc70698992e2074dc9c in the sense that it solves
    the same issue. The patch is so different though that this is a rocky
    only patch.

    Change-Id: I2e0cebb09b9cefb08a44b64a24fc4159d01d27bc
    Closes-Bug: #1804459

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/624331

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Jose Luis Franco (<email address hidden>) on branch: master
Review: https://review.openstack.org/624331
Reason: The initial problem was that tripleo-containers-image-prepare command had failed and the upgrade went on (that should be fixed in https://review.openstack.org/#/c/620905/ ) so it's better not to submit this as we might skip the online data migrations which could end up in an undesired state.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 9.2.0

This issue was fixed in the openstack/tripleo-heat-templates 9.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.