Gate tests randomly failing qemu-img info calls

Bug #1506012 reported by Daniel Berrange
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
Ihar Hrachyshka

Bug Description

Since yesterday we see a spike in qemu-img info calls failing:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiQ29tbWFuZDogZW52IExDX0FMTD1DIExBTkc9QyBxZW11LWltZyBpbmZvXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0NDQ4MTQ3ODIzMTJ9

eg

 [req-20a22603-59e7-425d-ba06-2a9a5505b5
6b tempest-ServersAdminTestJSON-917741884 tempest-ServersAdminTestJSON-747301663] [instance: bebbcf1a-104f-4d98-a02b-ccd20601890b] Terminating instance
2015-10-13 17:52:40.436 ERROR nova.compute.manager [req-ef37b368-b686-42fd-bbfe-b6c3a69d063f tempest-ServersTestJSON-1340739681 tempest-ServersTestJSON-1992588300] [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] Instance failed to spawn
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] Traceback (most recent call last):
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] File "/opt/stack/new/nova/nova/compute/manager.py", line 2172, in _build_resources
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] yield resources
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] File "/opt/stack/new/nova/nova/compute/manager.py", line 2019, in _build_and_run_instance
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] block_device_info=block_device_info)
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2437, in spawn
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] admin_pass=admin_password)
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2861, in _create_image
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] instance, size, fallback_from_host)
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 6282, in _try_fetch_image_cache
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] size=size)
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] File "/opt/stack/new/nova/nova/virt/libvirt/imagebackend.py", line 249, in cache
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] *args, **kwargs)
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] File "/opt/stack/new/nova/nova/virt/libvirt/imagebackend.py", line 567, in create_image
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] copy_qcow2_image(base, self.path, size)
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 254, in inner
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] return f(*args, **kwargs)
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] File "/opt/stack/new/nova/nova/virt/libvirt/imagebackend.py", line 529, in copy_qcow2_image
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] libvirt_utils.create_cow_image(base, target)
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] File "/opt/stack/new/nova/nova/virt/libvirt/utils.py", line 87, in create_cow_image
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] base_details = images.qemu_img_info(backing_file)
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] File "/opt/stack/new/nova/nova/virt/images.py", line 68, in qemu_img_info
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] preexec_fn=_qemu_resources)
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] File "/opt/stack/new/nova/nova/utils.py", line 390, in execute
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] return processutils.execute(*cmd, **kwargs)
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 275, in execute
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] cmd=sanitized_cmd)
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] ProcessExecutionError: Unexpected error while running command.
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] Command: env LC_ALL=C LANG=C qemu-img info /opt/stack/data/nova/instances/_base/17293870eb69e986dd7779e8bb4f42652df5977e
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] Exit code: -11
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] Stdout: u''
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d] Stderr: u''
2015-10-13 17:52:40.436 23002 ERROR nova.compute.manager [instance: 94a3b99d-b8b2-4e2a-a26a-fd7ff6e4f67d]

This spike co-incides with merging of

commit da217205f53f9a38a573fb151898fbbeae41021d
Author: Tristan Cacqueray <email address hidden>
Date: Wed Aug 5 17:17:04 2015 +0000

    virt: Use preexec_fn to ulimit qemu-img info call

    This uses the preexec_fn oslo.concurrency execute parameter to set user limit
    before calling qemu-img info. The process is limitted to 2 seconds of cpu
    and 1GB of virtual memory

    Change-Id: Ib47f1116c94c8f76d2a4a525af24439a4aa15854
    Closes-Bug: #1449062

Changed in nova:
importance: Undecided → Critical
Changed in nova:
assignee: nobody → Ihar Hrachyshka (ihar-hrachyshka)
status: New → In Progress
Revision history for this message
Daniel Berrange (berrange) wrote :
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Relevant QEMU and libvirt log snippets (from log files of this test job -- gate-tempest-dsvm-full/33cef79:

From [...]/logs/libvirt/qemu/instance-0000000a.txt.gz:

    "qemu: terminating on signal 15 from pid 32611"

Where "singal 15" corresponds to SIGTERM. And, process '32611' (from ps.txt.gz) is:

    32611 1 32611 0.7 0.1 820416 14868 /usr/sbin/libvirtd -d

From [...]/logs/libvirt/libvirtd.txt.gz:

[...]
2015-10-14 08:56:51.397+0000: 32617: debug : virDomainDestroy:2173 : dom=0x7f33c000b920, (VM: name=instance-00000025, uuid=a7127411-8602-41c9-856c-d704b1c6b5e7)
2015-10-14 08:56:51.397+0000: 32617: debug : qemuProcessKill:4193 : vm=0x7f33ac008f00 name=instance-00000025 pid=470 flags=1
2015-10-14 08:56:51.397+0000: 32617: debug : virProcessKillPainfully:274 : vpid=470 force=1
2015-10-14 08:56:51.416+0000: 32611: error : qemuMonitorIO:656 : internal error: End of file from monitor
2015-10-14 08:56:51.416+0000: 32611: debug : qemuMonitorIO:710 : Error on monitor internal error: End of file from monitor
2015-10-14 08:56:51.416+0000: 32611: debug : qemuMonitorIO:731 : Triggering EOF callback
2015-10-14 08:56:51.416+0000: 32611: debug : qemuProcessHandleMonitorEOF:300 : Received EOF on 0x7f33ac008f00 'instance-00000025'
[...]

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/234696
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=54c83186e8f49a297ca041c8b8f240b3c825e74a
Submitter: Jenkins
Branch: master

commit 54c83186e8f49a297ca041c8b8f240b3c825e74a
Author: Ihar Hrachyshka <email address hidden>
Date: Wed Oct 14 09:54:08 2015 +0000

    Revert "virt: Use preexec_fn to ulimit qemu-img info call"

    This reverts commit da217205f53f9a38a573fb151898fbbeae41021d.

    The patch made nova-compute to segfault, among other things
    resulting in instances not being spawned successfully.

    Closes-Bug: #1506012
    Change-Id: I0065dd194a7c910cc9f9d9b468e5d43bf5c9b7c0

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/nova 13.0.0.0b1

This issue was fixed in the openstack/nova 13.0.0.0b1 development milestone.

Changed in nova:
status: Fix Committed → Fix Released
Changed in nova:
milestone: none → mitaka-1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.