flaky "Getting disk size" ERROR in n-cpu log after successful tempest run

Bug #1243849 reported by David Kranz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
jichenjc

Bug Description

This happened in 14 out of 156 runs. Here is a sample:

https://review.openstack.org/#/c/51751/9
http://logs.openstack.org/51/51751/9/check/check-tempest-devstack-vm-postgres-full/ab64cc3
2013-10-22 20:48:20.036 | 2013-10-22 20:41:24.129 21399 ERROR nova.virt.libvirt.driver [-] Getting disk size of instance-00000069: [Errno 2] No such file or directory: '/opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2/disk'

Tags: compute
tags: added: compute
melanie witt (melwitt)
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
jichenjc (jichenjc)
Changed in nova:
assignee: nobody → jichencom (jichenjc)
Revision history for this message
jichenjc (jichenjc) wrote :

I would think it's an expected situation, but from http://en.wikipedia.org/wiki/Syslog#Severity_levels
I will modify the ERROR to WARNING because it might lead to some wrong info later

The root cause of this ERROR is the periodic task will update the resource by getting all active instances and calculate their size
however, during the calculation, the instance is resized, so the disk file can't be found

if someone suggest to permanently resolve the problem, we might need to make the resource stat update and nova operations such as resize/delete etc mutual access(use lock), it might block the operations if thousands of instances exist, so catch the error and defer the process which is used currently might be a better solution

see following logs for more detail

</span><span class='DEBUG _2013-10-22_20_41_24_102'><a name='_2013-10-22_20_41_24_102' class='date' href='#_2013-10-22_20_41_24_102'>2013-10-22 20:41:24.102</a> DEBUG nova.openstack.common.processutils [req-2a121325-5913-4add-95a4-350c3d797510 tempest.scenario.manager-tempest-1560801136-user tempest.scenario.manager-tempest-1560801136-tenant] Running cmd (subprocess): qemu-img convert -f qcow2 -O qcow2 /opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2_resize/disk /opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2_resize/disk_rbase execute /opt/stack/new/nova/nova/openstack/common/processutils.py:147
.......
</span><span class='ERROR _2013-10-22_20_41_24_129'><a name='_2013-10-22_20_41_24_129' class='date' href='#_2013-10-22_20_41_24_129'>2013-10-22 20:41:24.129</a> 21399 ERROR nova.virt.libvirt.driver [-] Getting disk size of instance-00000069: [Errno 2] No such file or directory: '/opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2/disk'
</span><span class='DEBUG _2013-10-22_20_41_24_140'><a name='_2013-10-22_20_41_24_140' class='date' href='#_2013-10-22_20_41_24_140'>2013-10-22 20:41:24.140</a> DEBUG nova.openstack.common.processutils [req-2a121325-5913-4add-95a4-350c3d797510 tempest.scenario.manager-tempest-1560801136-user tempest.scenario.manager-tempest-1560801136-tenant] Running cmd (subprocess): mv /opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2_resize/disk_rbase /opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2/disk execute /opt/stack/new/nova/nova/openstack/common/processutils.py:147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/64412

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
David Kranz (david-kranz) wrote :

The same wikipedia entry would indicate this should not be a WARNING either. What is the operator supposed to do with this warning? Computing aggregate disk usage is inherently approximate so I would think that in this case the code would just move on, which it does anyway. Perhaps this message should be DEBUG level.

Revision history for this message
jichenjc (jichenjc) wrote :

I agree with you that the WARNING is not a best solution, however, considering someone might be interested in the log and do future enhancement on this feature, I prefer to give some warning message instead of add debug info...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/64412
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=661fa0c83309f002bbbdd67d81f3e6fbb6bde829
Submitter: Jenkins
Branch: master

commit 661fa0c83309f002bbbdd67d81f3e6fbb6bde829
Author: jichenjc <email address hidden>
Date: Sun Dec 22 18:26:53 2013 +0800

    Change log from ERROR to WARNING when instance absent

    Periodic task will update the host stat so that it will
    collect all active instances and get their disk size
    during the operation period, the instance might be resized
    by nova operations, it will lead to disk file not exist
    it's not an error from current design point of view, so
    change from ERROR to WARNING to avoid confusion

    Change-Id: I3b6f1416c248d9d4d2caaf2cc30a7c944685661c
    Closes-Bug: #1243849

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → icehouse-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-2 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.