Unable to boot large instances due to prlimit setting

Bug #1705340 reported by Erik McCormick
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Undecided
Eric Harney
OpenStack Compute (nova)
High
Sean Dague

Bug Description

I recently had the need to migrate some instances from an old KIlo cluster to a new Ocata one. Some of the snapshots were 120GB or more (terrible I know). Due to a prlimit limitation of cpu=8, these instances are unable to spawn.

Changing nova/virt/images.py line 42 from

    cpu_time=8,

to

    cpu_time=16,

allowed the instances to boot properly.

This was implemented at 2 seconds and later changed to 8 seconds as part of:
https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=068d851561addfefb2b812d91dc2011077cb6e1d

Here's my qemu-img info process taking more than 8 seconds:
9ddeea47df894145.part execute
/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:355
2017-07-19 19:47:42.849 7 DEBUG oslo_concurrency.processutils
[req-7ed3314d-1c11-4dd8-b612-f8d9c022417f
ff236d57a57dd42cb5811c998e30fca1a76233873b9f08330f725fb639c8b025
9776d48734a24c23a4aef51cb78cc269 - - -] CMD "/usr/bin/python2 -m
oslo_concurrency.prlimit --as=1073741824 --cpu=16 -- env LC_ALL=C
LANG=C qemu-img info
/var/lib/nova/instances/_base/41ebff725eab55d368f97bc79ddeea47df894145.part"
returned: 0 in 8.639s execute
/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py:385

Would it be possible to increase the default setting, or better yet make it a configuration variable so we don't have to keep chasing it?

tags: added: low-hanging-fruit
Revision history for this message
Sean Dague (sdague) wrote :

I think bumping the default to something higher is a good call. This is mostly a backstop DOS prevention measure to ensure those processes end.

Sean Dague (sdague)
Changed in nova:
status: New → Triaged
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/486642

Changed in nova:
assignee: nobody → Sean Dague (sdague)
status: Triaged → In Progress
Revision history for this message
Erik McCormick (emccormickva) wrote :

Just today I ran into another even bigger images (450 GB) that took 28 seconds of CPU time. I'm now up to 32 as the limit. That should probably be good enough for 99.9% of situations and is still quite restrictive.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/486642
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=011ae614d5c5fb35b2e9c22a9c4c99158f6aee20
Submitter: Jenkins
Branch: master

commit 011ae614d5c5fb35b2e9c22a9c4c99158f6aee20
Author: Sean Dague <email address hidden>
Date: Mon Jul 24 10:51:49 2017 -0400

    Increase cpu time for image conversion

    Apparently the current 8 second timeout on qemu-info may not be
    sufficient if snapshot images are > 120G in size.

    This bumps that to 30s instead to provide a backstop, but not hurt
    people with large snapshots.

    Change-Id: I877b9401a671904a13bb07bae3636b72d7d20df8
    Closes-Bug: #1705340

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0rc1

This issue was fixed in the openstack/nova 16.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/562145

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/562145
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=862619b60aff74f9049baab1ba01e2b67226afd8
Submitter: Zuul
Branch: stable/ocata

commit 862619b60aff74f9049baab1ba01e2b67226afd8
Author: Sean Dague <email address hidden>
Date: Mon Jul 24 10:51:49 2017 -0400

    Increase cpu time for image conversion

    Apparently the current 8 second timeout on qemu-info may not be
    sufficient if snapshot images are > 120G in size.

    This bumps that to 30s instead to provide a backstop, but not hurt
    people with large snapshots.

    Change-Id: I877b9401a671904a13bb07bae3636b72d7d20df8
    Closes-Bug: #1705340
    (cherry picked from commit 011ae614d5c5fb35b2e9c22a9c4c99158f6aee20)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.1.1

This issue was fixed in the openstack/nova 15.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.opendev.org/691901

Changed in cinder:
assignee: nobody → Eric Harney (eharney)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/693382

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/693610

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (stable/stein)

Change abandoned by Dincer Celik (<email address hidden>) on branch: stable/stein
Review: https://review.opendev.org/693610
Reason: Invalid cherry pick

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.opendev.org/691901
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=3566c5145ad676c7eb5952807f3ef1c44c142b74
Submitter: Zuul
Branch: master

commit 3566c5145ad676c7eb5952807f3ef1c44c142b74
Author: Eric Harney <email address hidden>
Date: Tue Oct 29 11:50:19 2019 -0400

    Increase cpu limit for image conversion

    The 8 second timeout is not always sufficient for
    large images.

    Bump to 30s, which matches what Nova currently
    uses for this same limit.

    Change-Id: I0c26c400f08d91c8c125c69e06e4c90302bcdbb1
    Closes-Bug: #1705340

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/train)

Reviewed: https://review.opendev.org/693382
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=fa6f7d3d319647bee17ebc9a390ae26c14f3650f
Submitter: Zuul
Branch: stable/train

commit fa6f7d3d319647bee17ebc9a390ae26c14f3650f
Author: Eric Harney <email address hidden>
Date: Tue Oct 29 11:50:19 2019 -0400

    Increase cpu limit for image conversion

    The 8 second timeout is not always sufficient for
    large images.

    Bump to 30s, which matches what Nova currently
    uses for this same limit.

    Change-Id: I0c26c400f08d91c8c125c69e06e4c90302bcdbb1
    Closes-Bug: #1705340
    (cherry picked from commit 3566c5145ad676c7eb5952807f3ef1c44c142b74)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/stein)

Reviewed: https://review.opendev.org/693610
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=186fb25d049a2136eb71f9fece81903450b22890
Submitter: Zuul
Branch: stable/stein

commit 186fb25d049a2136eb71f9fece81903450b22890
Author: Eric Harney <email address hidden>
Date: Tue Oct 29 11:50:19 2019 -0400

    Increase cpu limit for image conversion

    The 8 second timeout is not always sufficient for
    large images.

    Bump to 30s, which matches what Nova currently
    uses for this same limit.

    Change-Id: I0c26c400f08d91c8c125c69e06e4c90302bcdbb1
    Closes-Bug: #1705340
    (cherry picked from commit 3566c5145ad676c7eb5952807f3ef1c44c142b74)
    (cherry picked from commit fa6f7d3d319647bee17ebc9a390ae26c14f3650f)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/rocky)

Reviewed: https://review.opendev.org/697110
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=ce951a3b586bc51a77422c3079637198f14ecb44
Submitter: Zuul
Branch: stable/rocky

commit ce951a3b586bc51a77422c3079637198f14ecb44
Author: Eric Harney <email address hidden>
Date: Tue Oct 29 11:50:19 2019 -0400

    Increase cpu limit for image conversion

    The 8 second timeout is not always sufficient for
    large images.

    Bump to 30s, which matches what Nova currently
    uses for this same limit.

    Change-Id: I0c26c400f08d91c8c125c69e06e4c90302bcdbb1
    Closes-Bug: #1705340
    (cherry picked from commit 3566c5145ad676c7eb5952807f3ef1c44c142b74)
    (cherry picked from commit fa6f7d3d319647bee17ebc9a390ae26c14f3650f)
    (cherry picked from commit 186fb25d049a2136eb71f9fece81903450b22890)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 15.0.1

This issue was fixed in the openstack/cinder 15.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 14.0.3

This issue was fixed in the openstack/cinder 14.0.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 13.0.8

This issue was fixed in the openstack/cinder 13.0.8 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.opendev.org/703682

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/705141

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.opendev.org/703682
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=f3ebdd56c5d598f1fe2537e390e41e0866a3846e
Submitter: Zuul
Branch: master

commit f3ebdd56c5d598f1fe2537e390e41e0866a3846e
Author: Marc Methot <email address hidden>
Date: Tue Jan 21 14:41:23 2020 -0500

    Configurable timeout of the QEMU img conversion

    Instead of having a hardcoded value the conversion prlimits are
    now operator configurable. These settings have always been arbitrary
    as it depends on the environment.

    Change-Id: I462cecc3152bf838b7d42d5abc3ca31610567e5e
    Closes-Bug: #1705340

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/707305

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 16.0.0.0b1

This issue was fixed in the openstack/cinder 16.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/queens)

Reviewed: https://review.opendev.org/705141
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=d3865ecd75f083a9d478f29cd674a1de112cb8a2
Submitter: Zuul
Branch: stable/queens

commit d3865ecd75f083a9d478f29cd674a1de112cb8a2
Author: Eric Harney <email address hidden>
Date: Tue Oct 29 11:50:19 2019 -0400

    Increase cpu limit for image conversion

    The 8 second timeout is not always sufficient for
    large images.

    Bump to 30s, which matches what Nova currently
    uses for this same limit.

    Change-Id: I0c26c400f08d91c8c125c69e06e4c90302bcdbb1
    Closes-Bug: #1705340
    (cherry picked from commit 3566c5145ad676c7eb5952807f3ef1c44c142b74)
    (cherry picked from commit fa6f7d3d319647bee17ebc9a390ae26c14f3650f)
    (cherry picked from commit 186fb25d049a2136eb71f9fece81903450b22890)
    (cherry picked from commit ce951a3b586bc51a77422c3079637198f14ecb44)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/train)

Reviewed: https://review.opendev.org/707305
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=55b263feee140893f7c029a0e5e4762f329dc964
Submitter: Zuul
Branch: stable/train

commit 55b263feee140893f7c029a0e5e4762f329dc964
Author: Marc Methot <email address hidden>
Date: Tue Jan 21 14:41:23 2020 -0500

    Configurable timeout of the QEMU img conversion

    Instead of having a hardcoded value the conversion prlimits are
    now operator configurable. These settings have always been arbitrary
    as it depends on the environment.

    Change-Id: I462cecc3152bf838b7d42d5abc3ca31610567e5e
    Closes-Bug: #1705340
    (cherry picked from commit f3ebdd56c5d598f1fe2537e390e41e0866a3846e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 15.1.0

This issue was fixed in the openstack/cinder 15.1.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers