qemu image convert fails in snapshot

Bug #1303802 reported by Sean Dague
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Periodically in the gate we see a failure by qemu image convert in snapshot:

2014-04-07 01:31:29.470 29554 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/openstack/common/processutils.py", line 193, in execute
2014-04-07 01:31:29.470 29554 TRACE oslo.messaging.rpc.dispatcher cmd=' '.join(cmd))
2014-04-07 01:31:29.470 29554 TRACE oslo.messaging.rpc.dispatcher ProcessExecutionError: Unexpected error while running command.
2014-04-07 01:31:29.470 29554 TRACE oslo.messaging.rpc.dispatcher Command: qemu-img convert -f qcow2 -O qcow2 /opt/stack/data/nova/instances/4ff6dc10-eac8-41d2-a645-3a0e0ba07c8a/disk /opt/stack/data/nova/instances/snapshots/tmpcVpCxJ/33eb0bb2b49648c69770b47db3211a86
2014-04-07 01:31:29.470 29554 TRACE oslo.messaging.rpc.dispatcher Exit code: 1
2014-04-07 01:31:29.470 29554 TRACE oslo.messaging.rpc.dispatcher Stdout: ''
2014-04-07 01:31:29.470 29554 TRACE oslo.messaging.rpc.dispatcher Stderr: 'qemu-img: error while reading sector 0: Input/output error\n'

qemu-img is very obtuse on what the actual issue is, so it's unclear if this is a corrupt disk, or a totally missing disk.

The user visible face of this is on operations like shelve where the instance will believe that it's still in active state - http://logs.openstack.org/02/85602/1/gate/gate-tempest-dsvm-full/20ed964/console.html#_2014-04-07_01_44_29_309

Even though everything is broken instead.

Logstash query: http://logstash.openstack.org/#eyJzZWFyY2giOiJcInFlbXUtaW1nOiBlcnJvclwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxMzk2ODc2MTQ4NDc3fQ==

Tags: libvirt
Revision history for this message
Sean Dague (sdague) wrote :

Discussing with danpb, qemu image convert is notoriously terrible at error messages. So I don't know that we can really figure out root cause easily

Changed in nova:
importance: Undecided → Medium
Tracy Jones (tjones-i)
tags: added: libvirt
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Just a related note, maybe it's worthwhile to add a 'qemu-img check' command to report any inconsistencies in the disk image.

From `qemu-img` man page of the option 'check':

       [. . .]
       check [-f fmt] [--output=ofmt] [-r [leaks | all]] filename
           Perform a consistency check on the disk image filename. The command can output in the format ofmt which is either
            "human" or "json".

           If "-r" is specified, qemu-img tries to repair any inconsistencies found during the check. "-r leaks" repairs only cluster leaks,
           whereas "-r all" fixes all kinds of errors, with a higher risk of choosing the wrong fix or hiding corruption that has already
           occurred.
       [. . .]

Revision history for this message
Fam Zheng (famz) wrote :

This is likely something wrong below qemu-img. Is the source image even accessible with other tools, such as 'dd' or 'cat'?

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

I had a brief chat with Kevin Wolf (QEMU/qcow2 upstream maintainer), posting his comments here:

    - It (the error) means that the source image could be opened successfully, but the very first read failed.
    - Any specific information about the image in question? Like format, size, etc?

Solly Ross (sross-7)
Changed in nova:
status: New → Incomplete
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :
Revision history for this message
Jordan Pittier (jordan-pittier) wrote :
Revision history for this message
Kairat Kushaev (kkushaev) wrote :
Download full text (5.9 KiB)

Hello guys!
We faced with the same issues today on gates:
2015-02-19 11:26:02.215 5467 ERROR nova.compute.manager [-] [instance: 11e30652-70a9-480a-9b16-9026be2e3592] Instance failed to spawn
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] Traceback (most recent call last):
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] File "/opt/stack/new/nova/nova/compute/manager.py", line 2316, in _build_resources
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] yield resources
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] File "/opt/stack/new/nova/nova/compute/manager.py", line 2186, in _build_and_run_instance
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] flavor=flavor)
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2358, in spawn
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] admin_pass=admin_password)
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2762, in _create_image
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] project_id=instance.project_id)
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] File "/opt/stack/new/nova/nova/virt/libvirt/imagebackend.py", line 230, in cache
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] *args, **kwargs)
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] File "/opt/stack/new/nova/nova/virt/libvirt/imagebackend.py", line 477, in create_image
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] prepare_template(target=base, max_size=size, *args, **kwargs)
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 431, in inner
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] return f(*args, **kwargs)
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] File "/opt/stack/new/nova/nova/virt/libvirt/imagebackend.py", line 220, in fetch_func_sync
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] fetch_func(target=target, *args, **kwargs)
2015-02-19 11:26:02.215 5467 TRACE nova.compute.manager [instance: 11e30652-70a9-480a-9b16-9026be2e3592] File "/opt/stack/new/nova/nova/virt/libvirt/utils.py", line 488, in fetch_image
2015-02-19 11:26:02.215 54...

Read more...

Changed in nova:
status: Incomplete → New
Revision history for this message
Richard Jones (rjones-redhat) wrote :

The failing command and error message is:
qemu-img convert -O raw /opt/stack/data/nova/instances/_base/9916110004678060463128e2f14a55ac921f479c.part /opt/stack/data/nova/instances/_base/9916110004678060463128e2f14a55ac921f479c.converted
qemu-img: error while reading sector 941568: Input/output error

This looks to me like the input disk (presumably in qcow2 format) is corrupt.

Revision history for this message
Tony Breeds (o-tony) wrote :

I think it's pretty clear that this confirmed.

Changed in nova:
status: New → Confirmed
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Doesn't appear to be heat related

Changed in heat:
status: New → Invalid
Thomas Herve (therve)
no longer affects: heat
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
importance: Medium → Undecided
status: Confirmed → Expired
Revision history for this message
Yafei Yu (yu-yafei) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.