basic overcloud deploy fails on ceilometer-upgrade

Bug #1693339 reported by Alex Schultz
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Pradeep Kilambi

Bug Description

Attempting to do a simple overcloud deploy fails on ceilometer-upgrade --skipt-metering-database on the compute node

Deploy command:
openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/enable-swap.yaml

overcloud.AllNodesDeploySteps.ComputeDeployment_Step5.0:
    Error: ceilometer-upgrade --skip-metering-database returned 1 instead of one of [0]
    Error: /Stage[main]/Tripleo::Profile::Base::Ceilometer/Exec[ceilometer-db-upgrade]/returns: change from notrun to 0 failed: ceilometer-upgrade --skip-metering-database returned 1 instead of one of [0]
    (truncated, view all with --long)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.openstack.org/467750

Changed in tripleo:
assignee: nobody → Pradeep Kilambi (pkilambi)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/467750
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=daf64974d89db6bffe1fd99b344b18d682d9f237
Submitter: Jenkins
Branch: master

commit daf64974d89db6bffe1fd99b344b18d682d9f237
Author: Pradeep Kilambi <email address hidden>
Date: Wed May 24 16:10:55 2017 -0400

    Move ceilometer upgrade step out of base

    ceilometer-upgrade should only run on controller nodes.
    Since its currently in base profile, it gets triggered
    on compute as well. So instead split out the upgrade
    into its own and include when we deploy notification
    and central agents instead.

    Change-Id: I2910e8aa5da7fded4cf94b57fb0a14fefd88adbe
    Closes-bug: #1693339

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 7.1.0

This issue was fixed in the openstack/puppet-tripleo 7.1.0 release.

Revision history for this message
Raoul Scarazzini (rasca) wrote :

I hit the issue again today deploying master:

Error: ceilometer-upgrade --skip-metering-database returned 1 instead of one of [0]
Error: /Stage[main]/Tripleo::Profile::Base::Ceilometer::Upgrade/Exec[ceilometer-db-upgrade]/returns: change from notrun to 0 failed: ceilometer-upgrade --skip-metering-database returned 1 instead of one of [0]

The version of the package seems correct:

(undercloud) [stack@haa-16 ~]$ rpm -qa puppet-tripleo
puppet-tripleo-7.1.1-0.20170615141730.5e91493.el7.centos.noarch

So I guess something is wrong here. I took sosreports of the env [1], the delorean hash used by the environment is https://trunk.rdoproject.org/centos7/e6/da/e6da7152ab39cc05ffc958e43b88cd00a324bba5_f5e75451, so at this time, the latest.

[1] http://file.rdu.redhat.com/~rscarazz/LP1693339/

Raoul Scarazzini (rasca)
Changed in tripleo:
milestone: pike-2 → pike-3
status: Fix Released → Triaged
Revision history for this message
Pradeep Kilambi (pkilambi) wrote :

I just did a deploy this yesterday:

 Stack overcloud CREATE_COMPLETE

/home/stack/.ssh/known_hosts updated.
Original contents retained as /home/stack/.ssh/known_hosts.old
Overcloud Endpoint: https://10.0.0.5:13000/v2.0
Overcloud Deployed

[stack@undercloud ~]$ rpm -q puppet-tripleo
puppet-tripleo-7.1.1-0.20170619152413.43422d4.el7.centos.noarch

[root@overcloud-controller-0 ~]# cat /var/log/ceilometer/ceilometer-upgrade.log
2017-06-19 22:12:37.441 49521 WARNING oslo_reports.guru_meditation_report [-] Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
2017-06-19 22:12:37.441 49521 INFO ceilometer.cmd.storage [-] Skipping metering database upgrade

from /var/log/gnocchi/gnocchi-upgrade.log :

2017-06-19 22:14:56.056 51127 WARNING oslo_config.cfg [-] Option "project_id" from group "statsd" is deprecated for removal. Its value may be silently ignored in the future.
2017-06-19 22:14:56.290 51127 INFO gnocchi.cli [-] Upgrading indexer <gnocchi.indexer.sqlalchemy.SQLAlchemyIndexer object at 0x5193d10>
2017-06-19 22:14:56.303 51127 INFO alembic.runtime.migration [-] Context impl MySQLImpl.
2017-06-19 22:14:56.303 51127 INFO alembic.runtime.migration [-] Will assume non-transactional DDL.
2017-06-19 22:14:56.314 51127 INFO alembic.runtime.migration [-] Context impl MySQLImpl.
2017-06-19 22:14:56.315 51127 INFO alembic.runtime.migration [-] Will assume non-transactional DDL.
2017-06-19 22:14:56.342 51127 INFO gnocchi.cli [-] Upgrading storage <gnocchi.storage.swift.SwiftStorage object at 0x6712b90>

Perhaps you are missing some recent patches to gnocchi.

Can you try with latest master-tripleo-ci release and see. Lets confirm this is an issue and if so in what scenario before changing state.

Revision history for this message
Pradeep Kilambi (pkilambi) wrote :
Download full text (8.6 KiB)

I was able to reproduce this in a multi node env locally:

gnocchi api seems to be throwing:

mote 172.16.2.11:244] mod_wsgi (pid=140510): Exception occurred processing WSGI script '/var/www/cgi-bin/gnocchi/app'.
mote 172.16.2.11:244] Traceback (most recent call last):
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 131, in __call__
mote 172.16.2.11:244] resp = self.call_func(req, *args, **self.kwargs)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 196, in call_func
mote 172.16.2.11:244] return self.func(req, *args, **kwargs)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/oslo_middleware/base.py", line 125, in __call__
mote 172.16.2.11:244] response = req.get_response(self.application)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1316, in send
mote 172.16.2.11:244] application, catch_exc_info=False)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1280, in call_application
mote 172.16.2.11:244] app_iter = application(self.environ, start_response)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/paste/urlmap.py", line 203, in __call__
mote 172.16.2.11:244] return app(environ, start_response)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 131, in __call__
mote 172.16.2.11:244] resp = self.call_func(req, *args, **self.kwargs)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 196, in call_func
mote 172.16.2.11:244] return self.func(req, *args, **kwargs)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/oslo_middleware/base.py", line 125, in __call__
mote 172.16.2.11:244] response = req.get_response(self.application)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1316, in send
mote 172.16.2.11:244] application, catch_exc_info=False)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1280, in call_application
mote 172.16.2.11:244] app_iter = application(self.environ, start_response)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 131, in __call__
mote 172.16.2.11:244] resp = self.call_func(req, *args, **self.kwargs)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/webob/dec.py", line 196, in call_func
mote 172.16.2.11:244] return self.func(req, *args, **kwargs)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 335, i
mote 172.16.2.11:244] response = req.get_response(self._app)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1316, in send
mote 172.16.2.11:244] application, catch_exc_info=False)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/webob/request.py", line 1280, in call_application
mote 172.16.2.11:244] app_iter = application(self.environ, start_response)
mote 172.16.2.11:244] File "/usr/lib/python2.7/site-packages/webob/exc.py", line 1162, in __call__
 [client 172.16.2.11:49954] S...

Read more...

Changed in tripleo:
status: Triaged → Confirmed
Revision history for this message
Pradeep Kilambi (pkilambi) wrote :

There are other issues i'm seeing wit the env, so hard to pin point its gnocchi issue:

oslo messaging rabbitmq seems down:

=ERROR REPORT==== 22-Jun-2017::15:08:04 ===
Error on AMQP connection <0.2189.0> (172.16.2.11:40606 -> 172.16.2.6:5672 - neutron-l3-agent:123624:03f21b1f-62e1-4064-97dc-a7ecda757487, vhost: '/', user: 'guest', state: running), channel 0:
operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

=INFO REPORT==== 22-Jun-2017::15:08:05 ===
Stopped RabbitMQ application

=INFO REPORT==== 22-Jun-2017::15:08:44 ===
Stopping RabbitMQ

=INFO REPORT==== 22-Jun-2017::15:08:44 ===
Stopped RabbitMQ application

=INFO REPORT==== 22-Jun-2017::15:08:44 ===
Halting Erlang VM

in swift logs i see:

Jun 22 17:30:48 localhost proxy-server: STDERR: ERROR:oslo.messaging._drivers.impl_rabbit:[2f7e7090-9278-4824-b491-20b8dcc5c403] AMQP server on overcloud-controller-1.internalapi.localdomain:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds. Client port: None

so without seift functioning properly gnocchi and wont and inturn ceilo will fail.

marking as incomplete as i'm not sure where the root cause is yet.

Changed in tripleo:
status: Confirmed → Incomplete
Revision history for this message
Pradeep Kilambi (pkilambi) wrote :

Tested this in ovb-ha ci job and telemetry installs successfully:

https://review.openstack.org/#/c/476666/

Revision history for this message
Matt Young (halcyondude) wrote :
Revision history for this message
Raoul Scarazzini (rasca) wrote :

I confirm the failure and the fact that the issue happened while using a puppet-tripleo version next to the one sold as working (puppet-tripleo-7.1.1-0.20170619152413.43422d4.el7.centos.noarch).
We'll omit the metering on RDO Phase2 to make the job pass again but this is still a bug.

Changed in tripleo:
status: Incomplete → In Progress
Revision history for this message
Pradeep Kilambi (pkilambi) wrote :

From my local testing, the reason this was happening only on swift backend is because, gnocchi + swift generated tons of events through swift ceilometer middleware, bringing rabbit and in turn proxy server down causing 503 and upgrade failing. The potential fix for this is:

https://review.openstack.org/#/c/476930/

Lets see how things work out once this merges.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/476930
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=142b5a28896d788a7112ae8bd2885e6c7dfcc832
Submitter: Jenkins
Branch: master

commit 142b5a28896d788a7112ae8bd2885e6c7dfcc832
Author: Pradeep Kilambi <email address hidden>
Date: Fri Jun 23 10:37:24 2017 -0400

    Disable swift middleware ceilometer pipeline by default

    This generates tons of unnecessary events when gnocchi uses swift backend.
    We end up filtering most of these anyway. So lets disable this so it
    doesn't put useless load. Also changing the default project to service as
    thats what gnocchi uses to authenticate with swift.

    Closes-bug: #1693339

    Change-Id: I40f47d46fdb06f31a739b590bf653bca71e33f61

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/478953

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/ocata)

Reviewed: https://review.openstack.org/478953
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f762bbc3610cad472b9e10cac3609818384ed520
Submitter: Jenkins
Branch: stable/ocata

commit f762bbc3610cad472b9e10cac3609818384ed520
Author: Pradeep Kilambi <email address hidden>
Date: Fri Jun 23 10:37:24 2017 -0400

    Disable swift middleware ceilometer pipeline by default

    This generates tons of unnecessary events when gnocchi uses swift backend.
    We end up filtering most of these anyway. So lets disable this so it
    doesn't put useless load. Also changing the default project to service as
    thats what gnocchi uses to authenticate with swift.

    Closes-bug: #1693339

    Change-Id: I40f47d46fdb06f31a739b590bf653bca71e33f61
    (cherry picked from commit 142b5a28896d788a7112ae8bd2885e6c7dfcc832)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 6.2.0

This issue was fixed in the openstack/tripleo-heat-templates 6.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/newton)

Reviewed: https://review.openstack.org/486684
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=89f90c0fd7dfc3c24232693253d6f00d17c8ba3d
Submitter: Jenkins
Branch: stable/newton

commit 89f90c0fd7dfc3c24232693253d6f00d17c8ba3d
Author: Pradeep Kilambi <email address hidden>
Date: Mon Jul 24 12:13:58 2017 -0400

    Remove ceilometer from swift middleware pipeline

    We removed this in newer versions conditionally[1]. But since
    newton heat templates dont support if conditionals lets just
    remove this. Ceilometer in swift pipeline causes heavy load
    with events and causes everything to slow down. There is no
    real benefit to keeping this in pipeline.

    [1] https://review.openstack.org/#/c/476930/

    Closes-bug: #1693339

    Change-Id: I07676b80b0eae4d6482aa819cc393c1fc69350c9

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.0.0b3

This issue was fixed in the openstack/tripleo-heat-templates 7.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 5.3.1

This issue was fixed in the openstack/tripleo-heat-templates 5.3.1 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers