Deployment report error on uploading TestVm - Ceph cluster creating.

Bug #1461522 reported by Lubosz Kosnik
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
Undecided
Matthew Mosesohn

Bug Description

Deploying a new environment astute raise an error:

2015-06-03 09:08:18 ERR [433] c6e51be0-d59d-47bd-8301-0b28cdebeef7: Upload cirros "TestVM" image failed
2015-06-03 09:08:18 ERR [433] c6e51be0-d59d-47bd-8301-0b28cdebeef7: cmd: . /root/openrc && /usr/bin/glance image-create --name 'TestVM' --is-public true --container-format='bare' --disk-format='qcow2' --min-ram=64 --property murano_image_info='{"title": "Murano Demo", "type": "cirros.demo"}' --file '/usr/share/cirros-testvm/cirros-x86_64-disk.img'
                                               mcollective error: c6e51be0-d59d-47bd-8301-0b28cdebeef7: MCollective agents '157' didn't respond within the allotted time.
2015-06-03 09:08:18 ERR [433] MCollective agents '157' didn't respond within the allotted time.

All nodes has status ready.
After SSHing to node and running ceph -w there are all PGs in status creating.

Solution:
Astute should wait for Ceph to report HEALTH_OK.
After few minutes when Ceph PGs report status active deploying changes from Fuel complete successfully whole process.

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Lubosz, could you attach diagnostic snapshot and fuel version?

Revision history for this message
Lubosz Kosnik (diltram) wrote :

I can't generate Diagnostic Snapshot - I'm waiting but it's not generating any result.
Were using currently Fuel 6.0.1 with bootstrap prepared to support Dell H330/H730 raid controllers.

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Lubosz, please check time synchronization between your hosts. Ceph and RabbitMQ are very sensitive to desynchronized clocks.

Changed in fuel:
milestone: none → 6.1
assignee: nobody → Matthew Mosesohn (raytrac3r)
status: New → Incomplete
Revision history for this message
Lubosz Kosnik (diltram) wrote :

The time synchronization isn't a problem. The Ceph cluster was in creating state and Fuel was trying to save image in glance.
After status change from creating to active and rerunning deployment from Fuel everything worked grate and deploy completed successfully.
Astute should wait for some PGs to get status active, there is no need to wait for all PGs because only few can save the image properly.

All servers has properly set time synchronization.

Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Ryan Moe (rmoe) wrote :

This has been addressed in 6.1. We now wait for all PGs to be active before trying to upload the cirros image.

https://review.openstack.org/#/c/153338/
https://review.openstack.org/#/c/163019/

See this bug for context: https://bugs.launchpad.net/fuel/+bug/1415954

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.