Still "no space left" error during deploy

Bug #1382164 reported by Chris Krelle
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
Medium
jiang, yunhong

Bug Description

I have seen deploys error because of lack of disk space:
Stderr: 'qemu-img: error while writing sector 2768741: No space left on device\n'

------- NOTE(dtantsur): this is part of original report, bug this assumption looks wrong to me -------

the get_image_mb function is used when checking for free disk space. This check may not be valid for all drivers as some drivers (like iscsi) also convert the image to RAW format.

for drivers that do convert the image to another format such as raw we should also be checking the images virtual size and The compressed size as both need to be held on the disk for deployment.

(seed)nobodycam@nobodycam-HP-EliteBook-8460p:~/tripleo$ qemu-img info basicNew.qcow2
image: basicNew.qcow2
file format: qcow2
virtual size: 1.5G (1560084480 bytes)
disk size: 319M
cluster_size: 65536

Changed in ironic:
assignee: nobody → jiang, yunhong (yunhong-jiang)
Revision history for this message
Dmitry Tantsur (divius) wrote :

https://github.com/openstack/ironic/blob/master/ironic/common/images.py#L295: images.converted_size() is used for checking for enough disk space already and it uses virtual size, so there should be another reason. Probably some mistake in clean_up_caches: https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/image_cache.py#L327

summary: - get_image_mb in /drivers/modules/deploy_utils.py not accurate.
+ Still "no space left" error during deploy
Changed in ironic:
importance: Undecided → Medium
description: updated
Revision history for this message
Dmitry Tantsur (divius) wrote :

Please provide the full logs for ironic conductor.

Changed in ironic:
status: New → Incomplete
Revision history for this message
Chris Krelle (nobodycam) wrote :
Download full text (14.8 KiB)

Here is a better section of the log.

2014-10-15 12:11:47.935 11918 DEBUG oslo.messaging._drivers.amqpdriver [-] received {u'_msg_id': u'12b3321045f94363b0db0db6817795d1', u'args': {u'node_obj': {u'ironic_object.namespace': u'ironic', u'ironic_object.data': {u'instance_uuid': u'a02787c4-b446-4ede-88f7-d5fc45e18660', u'target_power_state': None, u'instance_info': {u'ramdisk': u'982dc3dd-270c-4b34-ad61-8c2416191fa8', u'kernel': u'd2d7e186-dee7-4147-83a8-fff4267c309a', u'root_gb': u'30', u'image_source': u'b588925c-e866-404e-9de9-319db44520a7', u'ephemeral_format': u'ext4', u'ephemeral_gb': u'1770', u'preserve_ephemeral': u'True', u'deploy_key': u'9Z2H8Q3F2LW3O7MOOKEOJTZ1HA2ITEC0', u'swap_mb': u'0'}, u'uuid': u'6adf9e06-8b0f-4aa9-81ca-d036a15ae45b', u'driver_info': {u'pxe_deploy_ramdisk': u'baff7d11-e69c-49ff-bd26-2c22e0585564', u'pxe_deploy_kernel': u'17e83e20-e402-4f84-8980-d87efb8259c7', u'ipmi_address': u'10.22.28.109', u'ipmi_username': u'admin', u'ipmi_password': '<SANITIZED>'}, u'target_provision_state': None, u'last_error': u"Failed to deploy. Error: Unexpected error while running command.\nCommand: qemu-img convert -O raw /mnt/state/var/lib/ironic/master_images/tmphsmp3Z/b588925c-e866-404e-9de9-319db44520a7.part /mnt/state/var/lib/ironic/master_images/tmphsmp3Z/b588925c-e866-404e-9de9-319db44520a7.converted\nExit code: 1\nStdout: ''\nStderr: 'qemu-img: error while writing sector 2768741: No space left on device\\n'", u'console_enabled': False, u'extra': {}, u'driver': u'pxe_ipmitool', u'updated_at': u'2014-10-15T12:11:34Z', u'chassis_id': None, u'id': 1, u'provision_updated_at': u'2014-10-15T12:00:45.000000', u'maintenance': False, u'provision_state': u'deploy failed', u'reservation': None, u'created_at': u'2014-10-09T14:20:27Z', u'power_state': u'power on', u'properties': {u'memory_mb': u'128000', u'cpu_arch': u'amd64', u'local_gb': u'1800', u'cpus': u'32'}}, u'ironic_object.version': u'1.3', u'ironic_object.name': u'Node'}}, u'version': u'1.1', u'_context_request_id': u'req-23e9b3e8-e3e9-4452-a7bd-bca5f98fdddc', u'_unique_id': u'debb181fed27493197776f5e8a7f7e73', u'_reply_q': u'reply_2fb29307fa984431aea2bd907d40ef6a', u'_context_domain_id': u'default', u'_context_tenant': u'service', u'_context_is_public_api': False, u'_context_auth_token': '<SANITIZED>', u'_context_show_deleted': False, u'_context_domain_name': u'Default', u'_context_read_only': False, u'_context_user': u'ironic', u'method': u'update_node', u'_context_is_admin': True} _safe_log /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/oslo/messaging/_drivers/common.py:180
2014-10-15 12:11:47.936 11918 DEBUG oslo.messaging._drivers.amqp [-] unpacked context: {u'read_only': False, u'show_deleted': False, u'auth_token': '<SANITIZED>', u'domain_name': u'Default', u'is_admin': True, u'user': u'ironic', u'request_id': u'req-23e9b3e8-e3e9-4452-a7bd-bca5f98fdddc', u'is_public_api': False, u'domain_id': u'default', u'tenant': u'service'} _safe_log /opt/stack/venvs/openstack/local/lib/python2.7/site-packages/oslo/messaging/_drivers/common.py:180
2014-10-15 12:11:47.940 11918 DEBUG ironic.conductor.manager [-] RPC update_node called for node 6adf9e06-8b0f-4aa9-81ca-d036a15...

Revision history for this message
Dmitry Tantsur (divius) wrote :

Hmmm... Indeed I don't see clean up call in logs. Any particular steps how to reproduce?

Changed in ironic:
status: Incomplete → Confirmed
Revision history for this message
Yuriy Zveryanskyy (yzveryanskyy) wrote :

Chris, is "parallel_image_downloads" config option enabled?

Revision history for this message
Chris Krelle (nobodycam) wrote : Re: [Bug 1382164] Re: Still "no space left" error during deploy

Hi Yuriy,

No the "parallel_image_downloads" option is not enabled in this case.

Chris

On Mon, Oct 20, 2014 at 5:32 AM, Yuriy Zveryanskyy <
<email address hidden>> wrote:

> Chris, is "parallel_image_downloads" config option enabled?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1382164
>
> Title:
> Still "no space left" error during deploy
>
> Status in OpenStack Bare Metal Provisioning Service (Ironic):
> Confirmed
>
> Bug description:
> I have seen deploys error because of lack of disk space:
> Stderr: 'qemu-img: error while writing sector 2768741: No space left on
> device\n'
>
> ------- NOTE(dtantsur): this is part of original report, bug this
> assumption looks wrong to me -------
>
> the get_image_mb function is used when checking for free disk space.
> This check may not be valid for all drivers as some drivers (like
> iscsi) also convert the image to RAW format.
>
> for drivers that do convert the image to another format such as raw we
> should also be checking the images virtual size and The compressed
> size as both need to be held on the disk for deployment.
>
> (seed)nobodycam@nobodycam-HP-EliteBook-8460p:~/tripleo$ qemu-img info
> basicNew.qcow2
> image: basicNew.qcow2
> file format: qcow2
> virtual size: 1.5G (1560084480 bytes)
> disk size: 319M
> cluster_size: 65536
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ironic/+bug/1382164/+subscriptions
>

Revision history for this message
jiang, yunhong (yunhong-jiang) wrote :

Had some discussion with Chris on IRC, possibly the reason is because the disk is used between the cache cleanup and the real convert, since currently image cache does not reserve any real disk. Possibly we should do the conversion on the fly, i.e. invoke 'qemu_image convert" instead of "dd".

Revision history for this message
Yuriy Zveryanskyy (yzveryanskyy) wrote :

Disk reservation feature can also resolve a problem with parallel downloads and conversion.

Revision history for this message
jiang, yunhong (yunhong-jiang) wrote :

yzveryanskyy, yes, disk reservation feature can also resolve the problem. However, I'd prefer the convert on the fly because it will reduce the disk usage for the ironic conductor node. Considering a ironic conductor was asked to install 100 compute node. Convert to raw image will cause a huge number of disk space. It's basically a trade off of time/space.

Disk reservation is sure to be something helpful!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ironic (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/130880

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/130881

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ironic (master)

Reviewed: https://review.openstack.org/130880
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=37f1528d9fbb2bbe588c26225bb29830d1e58386
Submitter: Jenkins
Branch: master

commit 37f1528d9fbb2bbe588c26225bb29830d1e58386
Author: yunhong jiang <email address hidden>
Date: Thu Oct 23 04:49:57 2014 -0700

    Change the force_raw_image config usage

    The force_raw_image comes from nova side as a
    fix for https://bugs.launchpad.net/nova/+bug/1383465.

    The reason of the flag is, after all images are forced
    to be raw because of security reason
    (https://bugs.launchpad.net/nova/+bug/853330), this flag
    is added later so that deployer can still select using raw images or
    compressed images.

    Currently this flag is used in the deeper of functions like image_to_raw() or
    converted_size(). For example, image_to_raw() will convert the image
    to be raw format only if the force_raw_image flag is set, that means
    the caller totally have no idea if the image returned is a raw image or not.

    This is possibly ok for libvirt since libvirt does not care for qcow2
    or raw, but it's different for ironic, since we will take different action
    for qcow2/raw format. A flag in such deeper layer is not so good.

    This patch changes these functions, so that the caller in image cache level
    accept parameter from deploy level like iscsi_deploy or pxe, which make
    decision based on the config.

    Change-Id: I118c9d8edcc13d15762593c254eaf27792d6c55f
    Related-Bug: #1382164

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/130881
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=65f094533f4fd4bbc2debfa6a7445deac292a6b1
Submitter: Jenkins
Branch: master

commit 65f094533f4fd4bbc2debfa6a7445deac292a6b1
Author: yunhong jiang <email address hidden>
Date: Fri Oct 24 02:39:44 2014 -0700

    Convert qcow2 image to raw format when deploy

    In iscsi deployment, currently the image are converted to raw format
    in advance and then copied to the compute node later using DD command.

    This patch supports convert on the fly, that means the image is converted
    when copying the image to the compute node.

    Change-Id: Iae8942c3f4d7c296966fe4325523034dd47e5556
    Related-Bug: #1382164

Revision history for this message
Jim Rollenhagen (jim-rollenhagen) wrote :

Is there work left to do here?

Revision history for this message
jiang, yunhong (yunhong-jiang) wrote :

I have no idea if any work left and since I'm not working on openstack anymore. I have no environment to test (It's not easy to get and set the environment).

Not sure if the bug reporter can verify it?

Revision history for this message
Dmitry Tantsur (divius) wrote :

I would call it done, please feel free to reopen.

Changed in ironic:
status: Confirmed → Fix Committed
Changed in ironic:
milestone: none → 4.1.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.