N->O Upgrade, ochestration is broken.

Bug #1679486 reported by Sofer Athlan-Guyot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Sofer Athlan-Guyot

Bug Description

Hi,

deploying a custom role Novacontrol, with the nova related API failed with this error:

    TASK [Setup cell_v2 (map cell0)] ***********************************************
    fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["nova-manage", "cell_v2", "map_cell0"], "delta": "0:02:12.569490", "end": "2017-02-28 15:41:23.802908", "failed": true, "rc": 1, "start": "2017-02-28 15:39:11.233418", "stderr": "", "stdout": "An error has occurred:
"DBConnectionError: (pymysql.err.OperationalError) (2003, \"Can't connect to MySQL server on '172.17.1.13' ([Errno 113] EHOSTUNREACH)\")"], "warnings": []}
     to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/b106b80f-8c24-4896-98d3-06ddf74f7508_playbook.retry

We discovered that the Novacontrol node was at step5, while the Controller was at step3. From the logs:

I prefix Novacontrol with N and controller logs with C:

 - C: step0: Apr 03 09:08:55

 - N: step0: Apr 03 09:07:36
 - N: step1: Apr 03 09:08:27
 - N: step2: Apr 03 09:08:52

 - C: step1: Apr 03 09:13:56

 - N: step3: Apr 03 09:14:24
 - N: step4: Apr 03 09:14:40

 - C: step2: Apr 03 09:15:01

 - N: step5: Apr 03 09:17:28

 - C: step3: Apr 03 09:20:48
 - C: step4: never happened
 - C: step5: never happened

So, it seems that contrary to what we claim there https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/README.rst#upgrade-steps there is no guaranty of the state of the cluster at each step *accross roles*.

It may be that it "happened" to work in our test because we just using Controller/Compute/Ceph and:
 1. compute are upgraded with their own mechanism;
 2. Ceph are batched upgrade;
 3. controller are step guaranty as they belong to the same role.

Changed in tripleo:
assignee: nobody → Sofer Athlan-Guyot (sofer-athlan-guyot)
status: Confirmed → In Progress
Revision history for this message
Sofer Athlan-Guyot (sofer-athlan-guyot) wrote :
Changed in tripleo:
assignee: Sofer Athlan-Guyot (sofer-athlan-guyot) → Marios Andreou (marios-b)
Changed in tripleo:
importance: Critical → High
Changed in tripleo:
assignee: Marios Andreou (marios-b) → Giulio Fidente (gfidente)
Changed in tripleo:
assignee: Giulio Fidente (gfidente) → Marios Andreou (marios-b)
assignee: Marios Andreou (marios-b) → Sofer Athlan-Guyot (sofer-athlan-guyot)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/452828
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=d286892c785b8b81a866ea3c6a459d1fc4a347e8
Submitter: Jenkins
Branch: master

commit d286892c785b8b81a866ea3c6a459d1fc4a347e8
Author: Sofer Athlan-Guyot <email address hidden>
Date: Mon Apr 3 18:28:21 2017 +0200

    Ensure upgrade step orchestration accross roles.

    Currently we don't enforce step ordering across role, only within
    role. With custom role, we can reach a step5 on one role while the
    cluster is still at step3, breaking the contract announced in the
    README[1] where each step has a guarantied cluster state.

    We have to remove the conditional here as well as jinja has no way to
    access this information, but we need jinja to iterate over all enabled
    role to create the orchestration.

    This deals only with Upgrade tasks, there is another review to deal
    with UpgradeBatch tasks.

    [1] https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/README.rst

    Closes-Bug: #1679486

    Change-Id: Ibc6b64424cde56419fe82f984d3cc3620f7eb028

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Giulio Fidente (<email address hidden>) on branch: master
Review: https://review.openstack.org/452789
Reason: Fixes for the bugs have been merged already and they remove the conditions which can tentatively being added back (without removing the fixes) using I5c8b0c4abfc0607f42fd3f2da9f5ef2702b1bbe1 instead

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/ocata)

Reviewed: https://review.openstack.org/452830
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=11389e5ac3c328fac7f4387e44aa12671d717f0e
Submitter: Jenkins
Branch: stable/ocata

commit 11389e5ac3c328fac7f4387e44aa12671d717f0e
Author: Sofer Athlan-Guyot <email address hidden>
Date: Mon Apr 3 18:28:21 2017 +0200

    Ensure upgrade step orchestration accross roles.

    Currently we don't enforce step ordering across role, only within
    role. With custom role, we can reach a step5 on one role while the
    cluster is still at step3, breaking the contract announced in the
    README[1] where each step has a guarantied cluster state.

    We have to remove the conditional here as well as jinja has no way to
    access this information, but we need jinja to iterate over all enabled
    role to create the orchestration.

    This deals only with Upgrade tasks, there is another review to deal
    with UpgradeBatch tasks.

    [1] https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/README.rst

    Closes-Bug: #1679486

    Change-Id: Ibc6b64424cde56419fe82f984d3cc3620f7eb028
    (cherry picked from commit d286892c785b8b81a866ea3c6a459d1fc4a347e8)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.0.0b1

This issue was fixed in the openstack/tripleo-heat-templates 7.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 6.1.0

This issue was fixed in the openstack/tripleo-heat-templates 6.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/453238
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=a19297a1bf114d48c4dadfb0678a3466f9e6de3a
Submitter: Jenkins
Branch: master

commit a19297a1bf114d48c4dadfb0678a3466f9e6de3a
Author: Giulio Fidente <email address hidden>
Date: Tue Apr 4 17:49:46 2017 +0200

    Add back Heat conditions in upgrade workflow

    By adding back the conditions we avoid the deployment of unneded
    software configs on nodes where we don't have any upgrade task to
    run, speeding up the upgrade process.

    Related-Bug: #1679486
    Related-Bug: #1678101
    Change-Id: I5c8b0c4abfc0607f42fd3f2da9f5ef2702b1bbe1

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.