Problems with images bubble up as a simple "There are not enough hosts available"

Bug #1436166 reported by Julian Edwards
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Chris Dent
Liberty
Fix Released
High
Chris Dent

Bug Description

When starting a new instance, I received the generic "There are not enough hosts available" error, but the real reason was buried in logs, which was that the image I was trying to use was corrupt.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Which version of Openstack? Nova? Can you please share some logs from both the nova api and the nova compute components?

Changed in nova:
status: New → Incomplete
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Sorry, I missed info off. This was done in devstack after I ran stack.sh.

n-cond.log shows:

RescheduledException: Build of instance fd530fd4-e4e6-476a-aa2d-5f4ad6b02cbb was re-scheduled: Unexpected error while running command.\nCommand: qemu-img convert -O raw /opt/stack/data/nova/instances/_base/89faaaa5ec41988d9b775c2fdc678fb4a4b974d9.part /opt/stack/data/nova/instances/_b
ase/89faaaa5ec41988d9b775c2fdc678fb4a4b974d9.converted\nExit code: 1\nStdout: u''\nStderr: u'qemu-img: error while reading sector 702336: Input/output error\\n'

Changed in nova:
status: Incomplete → New
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Julian,

Can you please see if you can find the original traceback for "qemu-img: error while reading sector 702336: Input/output error" in the nova-compute log?

thanks,
dims

Changed in nova:
status: New → Incomplete
Revision history for this message
Julian Edwards (julian-edwards) wrote :

This is all I have:

[req-79b024be-ebb6-473e-a832-bb7f1fd58672 demo invisible_to_admin] [instance: fd530fd4-e4e6-476a-aa2d-5f4ad6b02cbb] Error from last host: devstack (node devstack): [u'Traceback (most recent call last):\n', u' File "/opt/stack/nova/nova/compute/manager.py", line 2175, in _do_build_and_run_instance\n filter_properties)\n', u' File "/opt/stack/nova/nova/compute/manager.py", line 2318, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u"RescheduledException: Build of instance fd530fd4-e4e6-476a-aa2d-5f4ad6b02cbb was re-scheduled: Unexpected error while running command.\nCommand: qemu-img convert -O raw /opt/stack/data/nova/instances/_base/89faaaa5ec41988d9b775c2fdc678fb4a4b974d9.part /opt/stack/data/nova/instances/_base/89faaaa5ec41988d9b775c2fdc678fb4a4b974d9.converted\nExit code: 1\nStdout: u''\nStderr: u'qemu-img: error while reading sector 702336: Input/output error\\n'\n"]

I'd like to help fix this, if you wouldn't mind mentoring me?

Changed in nova:
status: Incomplete → New
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Julian,

if you see line 2318 mentioned, you will see we are catching an uncategorized exception:
http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py#n2318

if you grep for "Unexpected error" you will see it comes from:
http://git.openstack.org/cgit/openstack/oslo.concurrency/tree/oslo_concurrency/processutils.py#n69

So we should be catching processutils.ProcessExecutionError just like we catch exception.FlavorDiskTooSmall earlier in manager.py

you may end up with something like this...

http://paste.openstack.org/show/197374/

-- dims

Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/169436

Changed in nova:
assignee: nobody → Davanum Srinivas (DIMS) (dims-v)
status: Confirmed → In Progress
Revision history for this message
Julian Edwards (julian-edwards) wrote :

Damn, my email reply to the bug got lost. Anyway, I was saying thanks and that I'd try to do a review, but you beat me to it. I +1ed your review, as it tests out OK.

Thanks.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Julian,

Thanks, so a related bug https://bugs.launchpad.net/nova/+bug/1431291 was bumped up in priority and since we are short on time, i ended up filing this one. Please pick another bug and ping me, i'll try to help get you going.

-- dims

Changed in nova:
importance: Low → High
Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 1436166] Re: Problems with images bubble up as a simple "There are not enough hosts available"

No worries, understood!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Davanum Srinivas (dims) (<email address hidden>) on branch: master
Review: https://review.openstack.org/169436

Changed in nova:
assignee: Davanum Srinivas (DIMS) (dims-v) → nobody
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup

Solving an inconsistency: The bug is 'In Progress' but without an assignee. I set the status back to the last known status before the change to 'In Progress'.

Feel free to assign the bug to yourself. If you do so, please set it to 'In Progress'.

Changed in nova:
status: In Progress → Confirmed
Changed in nova:
assignee: nobody → Zhenyu Zheng (zhengzhenyu)
Changed in nova:
assignee: Zhenyu Zheng (zhengzhenyu) → nobody
tags: added: error-messages scheduler spawn
Changed in nova:
assignee: nobody → Stephen Finucane (sfinucan)
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

I'm not able to figure out how to boot a corrupt image :S so unassigning myself

Changed in nova:
assignee: Stephen Finucane (sfinucan) → nobody
Chris Dent (cdent)
Changed in nova:
assignee: nobody → Chris Dent (cdent)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/264349

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/264349
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9a4ecfd96dad32fd4726c46dc6d89e956f1f2a29
Submitter: Jenkins
Branch: master

commit 9a4ecfd96dad32fd4726c46dc6d89e956f1f2a29
Author: Chris Dent <email address hidden>
Date: Mon Jan 11 12:05:00 2016 +0000

    Propagate qemu-img errors to compute manager

    When qemu-img is called with oslo_concurrency.process_utils.execute
    the ProcessExecutionError was raised when qemu-img either fails to
    execute or has a non-zero exit code. This error did not propagate
    up to the compute manager with any meaningful information meaning
    that if an instance build fails the error message is the generic
    "There are not enough hosts available".

    This change captures ProcessExecutionError and re-raises the
    exception as either InvalidDiskInfo (in qemu_img_info) or
    ImageUnacceptable (in convert_image and fetch_to_raw) and makes the
    manager accept this as a cause for a BuildAbortException on the
    logic that if the image is bad, things are dire, let's bail.

    Based on the code in qemu_img_info it appears there was a
    misunderstanding of how process_utils.execute behaves so it seems
    likely this problem is present elsewhere in the code. This change
    attempts to only address the issue as it shows up on the new
    instance path described in the related bug.

    Change-Id: I4fa1c258db58c70dfbf0178b7bb13978fda3a11f
    Closes-Bug: #1436166

Changed in nova:
status: In Progress → Fix Released
Matt Riedemann (mriedem)
tags: added: liberty-backport-potential
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/nova 13.0.0.0b2

This issue was fixed in the openstack/nova 13.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/288594

Matt Riedemann (mriedem)
tags: removed: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/liberty)

Reviewed: https://review.openstack.org/288594
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b806adb032567c2c84d61834e6f8d2684723c194
Submitter: Jenkins
Branch: stable/liberty

commit b806adb032567c2c84d61834e6f8d2684723c194
Author: Chris Dent <email address hidden>
Date: Mon Jan 11 12:05:00 2016 +0000

    Propagate qemu-img errors to compute manager

    When qemu-img is called with oslo_concurrency.process_utils.execute
    the ProcessExecutionError was raised when qemu-img either fails to
    execute or has a non-zero exit code. This error did not propagate
    up to the compute manager with any meaningful information meaning
    that if an instance build fails the error message is the generic
    "There are not enough hosts available".

    This change captures ProcessExecutionError and re-raises the
    exception as either InvalidDiskInfo (in qemu_img_info) or
    ImageUnacceptable (in convert_image and fetch_to_raw) and makes the
    manager accept this as a cause for a BuildAbortException on the
    logic that if the image is bad, things are dire, let's bail.

    Based on the code in qemu_img_info it appears there was a
    misunderstanding of how process_utils.execute behaves so it seems
    likely this problem is present elsewhere in the code. This change
    attempts to only address the issue as it shows up on the new
    instance path described in the related bug.

    Conflicts:
     nova/virt/images.py

    Change-Id: I4fa1c258db58c70dfbf0178b7bb13978fda3a11f
    Closes-Bug: #1436166
    (cherry picked from commit 9a4ecfd96dad32fd4726c46dc6d89e956f1f2a29)

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 12.0.2

This issue was fixed in the openstack/nova 12.0.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.