Netwon to Ocata upgrade failure because of ceilometer-upgrade

Bug #1724328 reported by David Manchado
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Pradeep Kilambi

Bug Description

Description
===========
While upgrading from Newton to Ocata, the upgrade fails reporting an error running ceilometer-upgrade

Error: /Stage[main]/Tripleo::Profile::Base::Ceilometer::Collector/Exec[ceilometer-db-upgrade]/returns: change from notrun to 0 failed: ceilometer-upgrade --skip-metering-database returned 1 instead of one of [0]

Before upgrading to Ocata, Newton was updated to the latest Newton bits available on rdo-trunk-ocata-tested (https://trunk.rdoproject.org/centos7-ocata/current-passed-ci/ )

Seems similar to https://bugs.launchpad.net/tripleo/+bug/1703444

Steps to reproduce
==================
* Deploy Newton openstack
* Update overcloud to latest Newton bits available
* Upgrade to Ocata

The upgrade script is the same than the deploy one but adding
-e templates/environments/major-upgrade-composable-steps.yaml \
-e overcloud-repos.yaml \
-e skip-validation-upgrade.yaml \

Expected result
===============
UPDATE_COMPLETE

Actual result
=============
UPDATE_FAILED

Re-running ceilometer-upgrade on one of the controller fails.
The logs show ERROR 503 is reported while trying to reach keystone-admin.

All the resources in the controller pcs cluster are reported as being unmanaged
openstack endpoint list fails with error
Failed to contact the endpoint at http://A.B.C.D:35357 for discovery. Fallback to using that endpoint as the base url.
Unable to establish connection to http://A.B.C.D:35357/endpoints: ('Connection aborted.', BadStatusLine("''",))

Setting the cluster out of maintenance mode does not bring keystone-admin back to test ceilometer-upgrade

Environment
===========
Three HA controllers (

Tripleo-related RPM:
puppet-tripleo-6.5.4-0.20171015123804.d9f056e.el7.centos.noarch
openstack-tripleo-common-6.1.2-1.el7.noarch
openstack-tripleo-puppet-elements-6.2.3-1.el7.noarch
python-tripleoclient-6.2.1-1.el7.noarch
openstack-tripleo-image-elements-6.1.0-1.el7.noarch
openstack-tripleo-0.0.8-0.3.4de13b3git.el7.noarch
openstack-tripleo-heat-templates-6.2.4-0.20171011085158.cf73cd2.el7.centos.noarch
openstack-tripleo-ui-3.2.2-1.el7.noarch
openstack-tripleo-validations-5.6.1-1.el7.noarch

Logs & Configs
==============
ceilometer-upgrade.log http://paste.openstack.org/show/623875/
openstack stack failure list --long http://paste.openstack.org/show/623876/

Changed in tripleo:
importance: Undecided → High
milestone: none → queens-2
status: New → Triaged
tags: added: upgrade
Revision history for this message
David Manchado (dmanchad) wrote :

In case it helps, find some responses I am getting when using openstackclient:

$ openstack endpoint list
Failed to contact the endpoint at http://A.B.C.D:35357 for discovery. Fallback to using that endpoint as the base url.
Unable to establish connection to http://A.B.C.D:35357/endpoints: ('Connection aborted.', BadStatusLine("''",))
----
$ openstack server list
The server is currently unavailable. Please try again at a later time.<br /><br />

 (HTTP 503) (Request-ID: req-6b0f1bc8-457c-4ed5-8978-b50e47df6164)
----
$ openstack token issue
+------------+----------------------------------+
| Field | Value |
+------------+----------------------------------+
| expires | 2017-10-18T14:02:19+0000 |
| id | 447a16ef8bf6412aa47b3b062bd9c2cc |
| project_id | 86dcd37905ec440b873422092b923eaf |
| user_id | 72f88c9866d14fc3aa2ce1eb03ffb4c2 |
+------------+----------------------------------+

Revision history for this message
David Manchado (dmanchad) wrote :

The output of the deploy has been uploaded too [1]

[1] http://paste.openstack.org/show/623940/

Revision history for this message
Pradeep Kilambi (pkilambi) wrote :

This is due to httpd being bounced during the step 4. I think this should be resolved with backport which was recently merged:

https://review.openstack.org/#/c/489437/

Revision history for this message
David Manchado (dmanchad) wrote :

I've just confirmed that the installed puppet-tripleo RPM [1] already had that patch included.
That is the version installed both in the controller and the undercloud.

[1] puppet-tripleo-6.5.4-0.20171015123804.d9f056e.el7.centos.noarch

Changed in tripleo:
importance: High → Critical
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

Increased the priority because it's one of blockers for moving OVB jobs to RDO cloud.

Revision history for this message
Pradeep Kilambi (pkilambi) wrote :

hmm if that patch is there, then i doubt if this is a ceilometer issue. based on comment#1 seems like keystone is down? If keystone is down ceilometer upgrade will not be able to authenticate to talk to gnocchi. So seems to me like root cause here is keystone.

Revision history for this message
wes hayutin (weshayutin) wrote :

Making this critical, alert, promotion blocker as it's blocking the upgrade of RDO-CLoud which blocks the migration of jobs off RH1 which blocks everyone patches because jobs in RH1 are timing out.

:))

tags: added: alert promotion-blocker
Revision history for this message
David Manchado (dmanchad) wrote :

All,

The keystone issue should not be related to the upgrade at the end.
Some time ago we had to change keystone admin into SSL as long as it is internet facing [1] and as long as we had to do it right at that time we changed the templates [1] and submitted some LP [2] and BZ [3].

We did a minor update (newton) after that change and we overwrote the template change so the mismatch related to keystone admin should have been actually identified/happened at the Ocata upgrade.

We are still testing on the staging environment the right setup and potential issues when we try the upgrade on Production.

[1] https://code.engineering.redhat.com/gerrit/#/c/107413/
[2] https://bugs.launchpad.net/tripleo/+bug/1639996
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1416225

wes hayutin (weshayutin)
Changed in tripleo:
assignee: nobody → mathieu bultel (mat-bultel)
Revision history for this message
David Manchado (dmanchad) wrote :

We have tested again once we have solved the keystone-admin issue and we are still having the same issue.

Note that keystone has been up an running before the upgrade and after the failure.

Logs are still reporting the same issues than a month ago.

RPMs (related to tripleo, gnocchi and ceilometer) used during today upgrade:
openstack-tripleo-ui-3.2.2-1.el7.noarch
openstack-gnocchi-metricd-3.1.11-1.el7.noarch
openstack-ceilometer-notification-8.1.2-0.20171102233300.600bd6a.el7.centos.noarch
openstack-ceilometer-api-8.1.2-0.20171102233300.600bd6a.el7.centos.noarch
openstack-tripleo-validations-5.6.1-1.el7.noarch
openstack-gnocchi-common-3.1.11-1.el7.noarch
openstack-tripleo-common-6.1.3-0.20171105015427.7b93bc1.el7.centos.noarch
puppet-gnocchi-10.3.2-0.20171031003416.48b6bca.el7.centos.noarch
openstack-gnocchi-indexer-sqlalchemy-3.1.11-1.el7.noarch
openstack-tripleo-puppet-elements-6.2.3-1.el7.noarch
python-ceilometer-8.1.2-0.20171102233300.600bd6a.el7.centos.noarch
openstack-ceilometer-central-8.1.2-0.20171102233300.600bd6a.el7.centos.noarch
openstack-gnocchi-api-3.1.11-1.el7.noarch
openstack-tripleo-heat-templates-6.2.5-0.20171105124759.fdcb5c6.el7.centos.noarch
openstack-tripleo-0.0.8-0.3.4de13b3git.el7.noarch
python2-ceilometerclient-2.8.1-1.el7.noarch
openstack-ceilometer-polling-8.1.2-0.20171102233300.600bd6a.el7.centos.noarch
puppet-tripleo-6.5.5-0.20171106204741.56b8111.el7.centos.noarch
puppet-ceilometer-10.3.2-0.20171102222201.4f6eb57.el7.centos.noarch
python-gnocchi-3.1.11-1.el7.noarch
python-tripleoclient-6.2.1-1.el7.noarch
python-ceilometermiddleware-1.0.3-1.el7.noarch
openstack-tripleo-image-elements-6.1.1-1.el7.noarch
openstack-gnocchi-statsd-3.1.11-1.el7.noarch
python2-gnocchiclient-3.1.0-1.el7.noarch
openstack-ceilometer-common-8.1.2-0.20171102233300.600bd6a.el7.centos.noarch
openstack-ceilometer-collector-8.1.2-0.20171102233300.600bd6a.el7.centos.noarch

Revision history for this message
David Manchado (dmanchad) wrote :

We have been able to deploy and upgrade when the initial deploy had telemetry disabled.
Just to confirm the issue is related to telemetry/gnocchi.

Revision history for this message
Pradeep Kilambi (pkilambi) wrote :

Yea sounds like we might have an ordering issue during upgrade. But hard to say where without logs. Can you get me /var/log/gnocchi/* /var/log/ceilometer/* /var/log/httpd/* and upgrae logs at the time of failure?

Changed in tripleo:
assignee: mathieu bultel (mat-bultel) → Pradeep Kilambi (pkilambi)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/521886

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/521890

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/521621
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=60925faefc58d76adf3914f96c636ca2a5b8c783
Submitter: Zuul
Branch: master

commit 60925faefc58d76adf3914f96c636ca2a5b8c783
Author: Pradeep Kilambi <email address hidden>
Date: Mon Nov 20 13:10:25 2017 -0500

    Add upgrade task to run gnocchi upgrade

    Closes-bug: #1724328

    Change-Id: Id7fed3746733c0ea0804532beda627c69e4ce078

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/ocata)

Reviewed: https://review.openstack.org/521886
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=771189e91d50fd28d2be44d9b003ff061482b90d
Submitter: Zuul
Branch: stable/ocata

commit 771189e91d50fd28d2be44d9b003ff061482b90d
Author: Pradeep Kilambi <email address hidden>
Date: Mon Nov 20 13:10:25 2017 -0500

    Add upgrade task to run gnocchi upgrade

    Closes-bug: #1724328

    Change-Id: Id7fed3746733c0ea0804532beda627c69e4ce078
    (cherry picked from commit 60925faefc58d76adf3914f96c636ca2a5b8c783)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/pike)

Reviewed: https://review.openstack.org/521890
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=aab7bdd6fbab0158cf2b57a63b4422cd3156beed
Submitter: Zuul
Branch: stable/pike

commit aab7bdd6fbab0158cf2b57a63b4422cd3156beed
Author: Pradeep Kilambi <email address hidden>
Date: Mon Nov 20 13:10:25 2017 -0500

    Add upgrade task to run gnocchi upgrade

    Closes-bug: #1724328

    Change-Id: Id7fed3746733c0ea0804532beda627c69e4ce078
    (cherry picked from commit 60925faefc58d76adf3914f96c636ca2a5b8c783)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 8.0.0.0b2

This issue was fixed in the openstack/tripleo-heat-templates 8.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 6.2.7

This issue was fixed in the openstack/tripleo-heat-templates 6.2.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.6

This issue was fixed in the openstack/tripleo-heat-templates 7.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.openstack.org/527709

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/527759

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/527760

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/527709
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=1c49fbe08d1f764975ce8ef952b055ad25effd65
Submitter: Zuul
Branch: master

commit 1c49fbe08d1f764975ce8ef952b055ad25effd65
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Dec 13 16:07:16 2017 +0100

    gnocchi: ensure upgrade run after swift setup

    The orignal fix have create an dependencies on an Class, so
    it does work and fail silencly.

    This changes it to the Anchor.

    Change-Id: I2ed6e328a9a4915844f699784dd87dc99078fb23
    Closes-bug: #1724328

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/527934

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/527940

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (master)

Change abandoned by Mehdi Abaakouk (sileht) (<email address hidden>) on branch: master
Review: https://review.openstack.org/527934
Reason: replaced by https://review.openstack.org/#/c/527940/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/528077

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/528078

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/pike)

Reviewed: https://review.openstack.org/527759
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=e09db0d8c306959a78f3ec51b5afa92b760a814d
Submitter: Zuul
Branch: stable/pike

commit e09db0d8c306959a78f3ec51b5afa92b760a814d
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Dec 13 16:07:16 2017 +0100

    gnocchi: ensure upgrade run after swift setup

    The orignal fix have create an dependencies on an Class, so
    it does work and fail silencly.

    This changes it to the Anchor.

    Change-Id: I2ed6e328a9a4915844f699784dd87dc99078fb23
    Closes-bug: #1724328
    (cherry picked from commit 1c49fbe08d1f764975ce8ef952b055ad25effd65)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/ocata)

Reviewed: https://review.openstack.org/527760
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=5e1b5cafa6bb7a22f75375266688c86a73b157f2
Submitter: Zuul
Branch: stable/ocata

commit 5e1b5cafa6bb7a22f75375266688c86a73b157f2
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Dec 13 16:07:16 2017 +0100

    gnocchi: ensure upgrade run after swift setup

    The orignal fix have create an dependencies on an Class, so
    it does work and fail silencly.

    This changes it to the Anchor.

    Change-Id: I2ed6e328a9a4915844f699784dd87dc99078fb23
    Closes-bug: #1724328
    (cherry picked from commit 1c49fbe08d1f764975ce8ef952b055ad25effd65)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/527940
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=5b1a139fa0bfda2c5b38754b3d31ee28023cbef5
Submitter: Zuul
Branch: master

commit 5b1a139fa0bfda2c5b38754b3d31ee28023cbef5
Author: Mehdi Abaakouk <email address hidden>
Date: Thu Dec 14 12:36:26 2017 +0100

    gnocchi/ceilometer upgrade workflow fix

    The current workflow for gnocchi/ceilometer upgrade doesn't
    work well with swift backend.

    Notification agent push data into Gnocchi on step4, but
    Ceilometer-upgrade run only on step5, So Gnocchi have not been populated
    with latest resource schemas.

    Gnocchi-api is started in step3 but gnocchi::storage configuration have
    not been done and database upgrade have not been done.

    When configuration is done on step4, httpd will be restarted.

    This change will fix this issue by:

    * Doing only the Gnocchi database upgrade on step3 because swift is
      ready only on step4.
    * Configuring gnocchi::storage on step3 to avoid gnocchi-api restart on
      step4.
    * Add dependencies between ceilometer-upgrade and gnocchi-api in case of
      non multinode deployment.

    This ensures:

    * gnocchi-api will be correctly configured at the end of
      step3 (configuration+database-sync).
    * No new measures will be pushed to Gnocchi before ceilometer-upgrade have
      upgraded the Gnocchi resource schemas.
    * Gnocchi-api have database updated before ceilometer-upgrade need it.
    * We continue to upgrade storage/incoming data of Gnocchi on step4 after
      swift is up.

    Closes-bug: #1724328
    Change-Id: I3f9a784e507e03454b335ba8319601fba208ba0a

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/ocata)

Reviewed: https://review.openstack.org/528077
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=062313628c334a5ea4a812b4a6176d3e1c02d8b2
Submitter: Zuul
Branch: stable/ocata

commit 062313628c334a5ea4a812b4a6176d3e1c02d8b2
Author: Mehdi Abaakouk <email address hidden>
Date: Thu Dec 14 12:36:26 2017 +0100

    gnocchi/ceilometer upgrade workflow fix

    The current workflow for gnocchi/ceilometer upgrade doesn't
    work well with swift backend.

    Notification agent push data into Gnocchi on step4, but
    Ceilometer-upgrade run only on step5, So Gnocchi have not been populated
    with latest resource schemas.

    Gnocchi-api is started in step3 but gnocchi::storage configuration have
    not been done and database upgrade have not been done.

    When configuration is done on step4, httpd will be restarted.

    This change will fix this issue by:

    * Doing only the Gnocchi database upgrade on step3 because swift is
      ready only on step4.
    * Configuring gnocchi::storage on step3 to avoid gnocchi-api restart on
      step4.
    * Move ceilometer-notification on step4 to ensure ceilometer-upgrade
      have been run.

    This ensures:

    * gnocchi-api will be correctly configured at the end of
      step3 (configuration+database-sync).
    * No new measures will be pushed to Gnocchi before ceilometer-upgrade have
      upgraded the Gnocchi resource schemas.
    * Gnocchi-api have database updated before ceilometer-upgrade need it.
    * We continue to upgrade storage/incoming data of Gnocchi on step4 after
      swift is up.

    Closes-bug: #1724328
    Change-Id: I3f9a784e507e03454b335ba8319601fba208ba0a
    (cherry picked from commit 4e6939c1a874a06f336321a9d44d9991872f74cf)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (stable/pike)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/528078
Reason: The gate is currently timeouting, we need https://review.openstack.org/#/c/531352/ to improve the situation. I'll restore the patch once the gate is stable again. Please do not recheck or restore this patch, I'll take care of it. Thanks for your patience.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 6.5.7

This issue was fixed in the openstack/puppet-tripleo 6.5.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 7.4.7

This issue was fixed in the openstack/puppet-tripleo 7.4.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/pike)

Reviewed: https://review.openstack.org/528078
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=50429f2cfbb0696ac134b15254a4277398836f49
Submitter: Zuul
Branch: stable/pike

commit 50429f2cfbb0696ac134b15254a4277398836f49
Author: Mehdi Abaakouk <email address hidden>
Date: Thu Dec 14 12:36:26 2017 +0100

    gnocchi/ceilometer upgrade workflow fix

    The current workflow for gnocchi/ceilometer upgrade doesn't
    work well with swift backend.

    Notification agent push data into Gnocchi on step4, but
    Ceilometer-upgrade run only on step5, So Gnocchi have not been populated
    with latest resource schemas.

    Gnocchi-api is started in step3 but gnocchi::storage configuration have
    not been done and database upgrade have not been done.

    When configuration is done on step4, httpd will be restarted.

    This change will fix this issue by:

    * Doing only the Gnocchi database upgrade on step3 because swift is
      ready only on step4.
    * Configuring gnocchi::storage on step3 to avoid gnocchi-api restart on
      step4.
    * Move ceilometer-notification on step4 to ensure ceilometer-upgrade
      have been run.

    This ensures:

    * gnocchi-api will be correctly configured at the end of
      step3 (configuration+database-sync).
    * No new measures will be pushed to Gnocchi before ceilometer-upgrade have
      upgraded the Gnocchi resource schemas.
    * Gnocchi-api have database updated before ceilometer-upgrade need it.
    * We continue to upgrade storage/incoming data of Gnocchi on step4 after
      swift is up.

    Closes-bug: #1724328
    Change-Id: I3f9a784e507e03454b335ba8319601fba208ba0a
    (cherry picked from commit 4e6939c1a874a06f336321a9d44d9991872f74cf)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 7.4.8

This issue was fixed in the openstack/puppet-tripleo 7.4.8 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 8.2.0

This issue was fixed in the openstack/puppet-tripleo 8.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.