Bug #1243849 “flaky “Getting disk size” ERROR in n-cpu log after...” : Bugs : OpenStack Compute (nova)

Russell Bryant (russellb) on 2013-11-18

tags:

added: compute

melanie witt (melwitt) on 2013-11-18

Changed in nova:
importance:	Undecided → Medium
status:	New → Confirmed

jichenjc (jichenjc) on 2013-12-30

Changed in nova:
assignee:	nobody → jichencom (jichenjc)

Revision history for this message

jichenjc (jichenjc) wrote on 2013-12-30:

#1

I would think it's an expected situation, but from http://en.wikipedia.org/wiki/Syslog#Severity_levels
I will modify the ERROR to WARNING because it might lead to some wrong info later

The root cause of this ERROR is the periodic task will update the resource by getting all active instances and calculate their size
however, during the calculation, the instance is resized, so the disk file can't be found

if someone suggest to permanently resolve the problem, we might need to make the resource stat update and nova operations such as resize/delete etc mutual access(use lock), it might block the operations if thousands of instances exist, so catch the error and defer the process which is used currently might be a better solution

see following logs for more detail

<a name='_2013-10-22_20_41_24_102' class='date' href='#_2013-10-22_20_41_24_102'>2013-10-22 20:41:24.102</a> DEBUG nova.openstack.common.processutils [req-2a121325-5913-4add-95a4-350c3d797510 tempest.scenario.manager-tempest-1560801136-user tempest.scenario.manager-tempest-1560801136-tenant] Running cmd (subprocess): qemu-img convert -f qcow2 -O qcow2 /opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2_resize/disk /opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2_resize/disk_rbase execute /opt/stack/new/nova/nova/openstack/common/processutils.py:147
.......
<a name='_2013-10-22_20_41_24_129' class='date' href='#_2013-10-22_20_41_24_129'>2013-10-22 20:41:24.129</a> 21399 ERROR nova.virt.libvirt.driver [-] Getting disk size of instance-00000069: [Errno 2] No such file or directory: '/opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2/disk'
<a name='_2013-10-22_20_41_24_140' class='date' href='#_2013-10-22_20_41_24_140'>2013-10-22 20:41:24.140</a> DEBUG nova.openstack.common.processutils [req-2a121325-5913-4add-95a4-350c3d797510 tempest.scenario.manager-tempest-1560801136-user tempest.scenario.manager-tempest-1560801136-tenant] Running cmd (subprocess): mv /opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2_resize/disk_rbase /opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2/disk execute /opt/stack/new/nova/nova/openstack/common/processutils.py:147

I would think it's an expected situation, but from http://en.wikipedia.org/wiki/Syslog#Severity_levels
I will modify the ERROR to WARNING because it might lead to some wrong info later

The root cause of this ERROR is the periodic task will update the resource by getting all active instances and calculate their size
however, during the calculation, the instance is resized, so the disk file can't be found

if someone suggest to permanently resolve the problem, we might need to make the resource stat update and nova operations such as resize/delete etc mutual access(use lock), it might block the operations if thousands of instances exist, so catch the error and defer the process which is used currently might be a better solution

see following logs for more detail

<a name='_2013-10-22_20_41_24_102' class='date' href='#_2013-10-22_20_41_24_102'>2013-10-22 20:41:24.102</a> DEBUG nova.openstack.common.processutils [req-2a121325-5913-4add-95a4-350c3d797510 tempest.scenario.manager-tempest-1560801136-user tempest.scenario.manager-tempest-1560801136-tenant] Running cmd (subprocess): qemu-img convert -f qcow2 -O qcow2 /opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2_resize/disk /opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2_resize/disk_rbase execute /opt/stack/new/nova/nova/openstack/common/processutils.py:147
.......
<a name='_2013-10-22_20_41_24_129' class='date' href='#_2013-10-22_20_41_24_129'>2013-10-22 20:41:24.129</a> 21399 ERROR nova.virt.libvirt.driver [-] Getting disk size of instance-00000069: [Errno 2] No such file or directory: '/opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2/disk'
<a name='_2013-10-22_20_41_24_140' class='date' href='#_2013-10-22_20_41_24_140'>2013-10-22 20:41:24.140</a> DEBUG nova.openstack.common.processutils [req-2a121325-5913-4add-95a4-350c3d797510 tempest.scenario.manager-tempest-1560801136-user tempest.scenario.manager-tempest-1560801136-tenant] Running cmd (subprocess): mv /opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2_resize/disk_rbase /opt/stack/data/nova/instances/cd1428aa-fa8a-43d9-8180-888e832c35c2/disk execute /opt/stack/new/nova/nova/openstack/common/processutils.py:147

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-12-30: Fix proposed to nova (master)

#2

Fix proposed to branch: master
Review: https://review.openstack.org/64412

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

David Kranz (david-kranz) wrote on 2013-12-30:

#3

The same wikipedia entry would indicate this should not be a WARNING either. What is the operator supposed to do with this warning? Computing aggregate disk usage is inherently approximate so I would think that in this case the code would just move on, which it does anyway. Perhaps this message should be DEBUG level.

Revision history for this message

jichenjc (jichenjc) wrote on 2014-01-02:

#4

I agree with you that the WARNING is not a best solution, however, considering someone might be interested in the log and do future enhancement on this feature, I prefer to give some warning message instead of add debug info...

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-15: Fix merged to nova (master)

#5

Reviewed: https://review.openstack.org/64412
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=661fa0c83309f002bbbdd67d81f3e6fbb6bde829
Submitter: Jenkins
Branch: master

commit 661fa0c83309f002bbbdd67d81f3e6fbb6bde829
Author: jichenjc <email address hidden>
Date: Sun Dec 22 18:26:53 2013 +0800

Change log from ERROR to WARNING when instance absent

    Periodic task will update the host stat so that it will
    collect all active instances and get their disk size
    during the operation period, the instance might be resized
    by nova operations, it will lead to disk file not exist
    it's not an error from current design point of view, so
    change from ERROR to WARNING to avoid confusion

Change-Id: I3b6f1416c248d9d4d2caaf2cc30a7c944685661c
Closes-Bug: #1243849

Changed in nova:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2014-01-22

Changed in nova:
milestone:	none → icehouse-2
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-04-17

Changed in nova:
milestone:	icehouse-2 → 2014.1

OpenStack Compute (nova)

flaky "Getting disk size" ERROR in n-cpu log after successful tempest run

Bug Description

Other bug subscribers

Remote bug watches