Grenade test times out with "Failure in upgrade-ceilometer" (from grenade.sh.txt)

Bug #1334548 reported by Kashyap Chamarthy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceilometer
Invalid
Undecided
Unassigned

Bug Description

'check-grenade-dsvm' test job fails[1] with "Failure in upgrade-ceilometer", grenade.sh.txt also says "ceilometer-api didn't seem to start".

There's a related bug:

  https://bugs.launchpad.net/ceilometer/+bug/1328948 -- ceilometer-api sometimes fails to start on the new side of grenade.

in which, the grenade.sh.txt definitively state: "ceilometer-api did not start".

I'm filing this new bug as I'm unsure if 1328948 is the root-cause for the upgrade failure.

And, here's[2] logstack query for last 48h for "Failure in upgrade-ceilometer"

Contextual snippet from grenade.sh.txt:
-----------------
[. . .]
2014-06-26 05:30:11.328 | + echo 'Running: cd ; ceilometer-api -d -v --log-dir=/var/log/ceilometer-api --config-file /etc/ceilometer/ceilometer.conf & echo $! >/opt/stack/status/stack/ceilometer-api.pid; fg || echo "ceilometer-api failed to start" | tee "/opt/stack/status/stack/ceilometer-api.failure"
'
2014-06-26 05:30:11.328 | Running: cd ; ceilometer-api -d -v --log-dir=/var/log/ceilometer-api --config-file /etc/ceilometer/ceilometer.conf & echo $! >/opt/stack/status/stack/ceilometer-api.pid; fg || echo "ceilometer-api failed to start" | tee "/opt/stack/status/stack/ceilometer-api.failure"
2014-06-26 05:30:11.328 | + screen -S stack -p ceilometer-api -X stuff 'cd ; ceilometer-api -d -v --log-dir=/var/log/ceilometer-api --config-file /etc/ceilometer/ceilometer.conf & echo $! >/opt/stack/status/stack/ceilometer-api.pid; fg || echo "ceilometer-api failed to start" | tee "/opt/stack/status/stack/ceilometer-api.failure"
'
2014-06-26 05:30:11.332 | + _is_running_in_screen ceilometer-api
2014-06-26 05:30:11.332 | + local service=ceilometer-api
2014-06-26 05:30:11.332 | + local screen_name=stack
2014-06-26 05:30:11.332 | + local status_dir=/opt/stack/status
2014-06-26 05:30:11.332 | + local service_dir=/opt/stack/status/stack
2014-06-26 05:30:11.332 | + local pid=/opt/stack/status/stack/ceilometer-api.pid
2014-06-26 05:30:11.332 | + local failure=/opt/stack/status/stack/ceilometer-api.failure
2014-06-26 05:30:11.332 | + [[ ! -e /opt/stack/status/stack/ceilometer-api.pid ]]
2014-06-26 05:30:11.332 | + [[ ! -e /opt/stack/status/stack/ceilometer-api.failure ]]
2014-06-26 05:30:11.333 | + echo 'Warning: neither /opt/stack/status/stack/ceilometer-api.pid nor /opt/stack/status/stack/ceilometer-api.failure exist, ceilometer-api didn'\''t seem to start'
2014-06-26 05:30:11.333 | Warning: neither /opt/stack/status/stack/ceilometer-api.pid nor /opt/stack/status/stack/ceilometer-api.failure exist, ceilometer-api didn't seem to start
2014-06-26 05:30:11.333 | + return 1
2014-06-26 05:30:11.333 | + screen_tries=10
2014-06-26 05:30:11.333 | + echo 'Failed to start service after 10 attempt(s), retrying'
2014-06-26 05:30:11.333 | Failed to start service after 10 attempt(s), retrying
2014-06-26 05:30:11.333 | + [[ 10 -eq 10 ]]
2014-06-26 05:30:11.333 | + echo 'Too many retries, giving up'
2014-06-26 05:30:11.333 | Too many retries, giving up
2014-06-26 05:30:11.333 | + exit 1
2014-06-26 05:30:11.334 | + die 325 'Failure in upgrade-ceilometer'
2014-06-26 05:30:11.334 | + local exitcode=1
[. . .]
-----------------

[1] http://logs.openstack.org/90/102490/2/check/check-grenade-dsvm/20b13c7/logs/grenade.sh.txt
[2] http://logstash.openstack.org/#eyJzZWFyY2giOiJcIkZhaWx1cmUgaW4gdXBncmFkZS1jZWlsb21ldGVyXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjE3MjgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0MDM3Njg2MzYyMDl9

gordon chung (chungg)
Changed in ceilometer:
status: New → Triaged
importance: Undecided → High
Revision history for this message
gordon chung (chungg) wrote :

do we know the amount of data in the database? and what the timeout is? the migration scripts from icehouse to juno are non-trivial and deal with multiple tables.

Revision history for this message
Chris Dent (cdent) wrote :

According the logs ceilometer-api didn't even bother to try to start up. The screen commands were called, but then nothing happened (no log artifacts at all). This is one of the reasons for the new USE_SCREEN=False stuff: start process up without screen so there are fewer intermediaries between a service and its logs and startup.

That was turned on over the weekend so it may be that this problem will be more clear.

In any case it is unlikely this has anything to do with the amount of data and is just an artifact of the crufty startup environment. We should know in a few more days if things are better (there were no logstash hits from the weekend, but that's not a fair test).

Revision history for this message
Chris Dent (cdent) wrote :

This doesn't appear to be happening any more (the one hit in logstash is because of a bug I introduced (and since fixed) in devstack).

Can we close this bug so it isn't in view?

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Chris: FWIW, sure, this can be closed given your rationale in comment #2 and #3. And, maybe you can throw a pointer to the fixed bug you mentioned in comment #3 for later reference.

Revision history for this message
gordon chung (chungg) wrote :

see above

Changed in ceilometer:
importance: High → Undecided
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.