nova_libvirt container "cannot fork child process: Resource temporarily unavailable"

Bug #1892817 reported by Cédric Jeanneret
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Cédric Jeanneret

Bug Description

First reported on Red Hat bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1871885

Description of problem:
OSP 16.1 latest

Running into the following issue with launching instances:

2020-08-19 21:14:42.722+0000: 34000: error : virFork:274 : cannot fork child process: Resource temporarily unavailable
2020-08-19 21:14:42.724+0000: 34000: error : virFork:274 : cannot fork child process: Resource temporarily unavailable

This is with 184 instances:
# sudo podman exec -ti nova_libvirt virsh list --all |wc -l
184

Seems to be hitting PID limit with the nova_libvirt container. Perhaps:

# sudo podman inspect nova_libvirt |grep PidsLimit
            "PidsLimit": 4096,

How is this config managed?

Version-Release number of selected component (if applicable):
16.1 current

How reproducible:
Unknown

Steps to Reproduce:
1. Launch a significant # of instances.
2.
3.

Additional info:

I'll provide additional libvirtd and nova logs to show the issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (master)

Fix proposed to branch: master
Review: https://review.opendev.org/747830

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on paunch (master)

Change abandoned by Cédric Jeanneret (Tengu) (<email address hidden>) on branch: master
Review: https://review.opendev.org/747830
Reason: should be against stable/ussuri, NOT master - my gerrit thing is borked apparently.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/747831

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/747834

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/747835

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on paunch (stable/ussuri)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: stable/ussuri
Review: https://review.opendev.org/747831
Reason: The gate is currently hitting the "docker api 429" issue, see #tripleo channel for more details. I'll abandon that patch so it's cleared from the gate. Please do not restore it as I'll take care of it when the gate is stable again. Thanks for your understanding and patience!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/747834
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=f602e84c0f03e4c151f25b12e131af95c7daf305
Submitter: Zuul
Branch: master

commit f602e84c0f03e4c151f25b12e131af95c7daf305
Author: Cédric Jeanneret <email address hidden>
Date: Tue Aug 25 08:09:44 2020 +0200

    Enable pids_limit support

    The default PID limit in a container is set to 4096. This limit might be
    reached in a nova_libvirt container, after launching about 150 VMs.

    Change-Id: Ibfbe63cbd9e2a219f10ebc407596aeefe4a5b194
    Related: https://bugzilla.redhat.com/show_bug.cgi?id=1871885
    Closes-Bug: #1892817

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to paunch (stable/ussuri)

Reviewed: https://review.opendev.org/747831
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=368827e9078e29a3a1b07f07718411f35d3767d6
Submitter: Zuul
Branch: stable/ussuri

commit 368827e9078e29a3a1b07f07718411f35d3767d6
Author: Cédric Jeanneret <email address hidden>
Date: Tue Aug 25 07:51:13 2020 +0200

    [USSURI-ONLY] Add new parameter: pids_limit

    The default PID limit in a container is set to 4096. This limit might be
    reached in a nova_libvirt container, after launching about 150 VMs.

    Change-Id: Iebad9919caf805715da9268f9ee8a40b4392642a
    Related: https://bugzilla.redhat.com/show_bug.cgi?id=1871885
    Closes-Bug: #1892817

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/748106

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/748108

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/748220

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (stable/ussuri)

Reviewed: https://review.opendev.org/748108
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=5455472df5f63ec6f59b947e3cdc1eeb97f0d362
Submitter: Zuul
Branch: stable/ussuri

commit 5455472df5f63ec6f59b947e3cdc1eeb97f0d362
Author: Cédric Jeanneret <email address hidden>
Date: Tue Aug 25 08:09:44 2020 +0200

    Enable pids_limit support

    The default PID limit in a container is set to 4096. This limit might be
    reached in a nova_libvirt container, after launching about 150 VMs.

    Change-Id: Ibfbe63cbd9e2a219f10ebc407596aeefe4a5b194
    Related: https://bugzilla.redhat.com/show_bug.cgi?id=1871885
    Closes-Bug: #1892817
    (cherry picked from commit f602e84c0f03e4c151f25b12e131af95c7daf305)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to paunch (stable/train)

Reviewed: https://review.opendev.org/748106
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=ed2c015ab31bdf007a1aff5f70368f661c5f8275
Submitter: Zuul
Branch: stable/train

commit ed2c015ab31bdf007a1aff5f70368f661c5f8275
Author: Cédric Jeanneret <email address hidden>
Date: Tue Aug 25 07:51:13 2020 +0200

    [USSURI-ONLY] Add new parameter: pids_limit

    The default PID limit in a container is set to 4096. This limit might be
    reached in a nova_libvirt container, after launching about 150 VMs.

    Change-Id: Iebad9919caf805715da9268f9ee8a40b4392642a
    Related: https://bugzilla.redhat.com/show_bug.cgi?id=1871885
    Closes-Bug: #1892817
    (cherry picked from commit 368827e9078e29a3a1b07f07718411f35d3767d6)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (stable/train)

Reviewed: https://review.opendev.org/748220
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=094deaf056aa24ec038f62211fe52f93c71a95ed
Submitter: Zuul
Branch: stable/train

commit 094deaf056aa24ec038f62211fe52f93c71a95ed
Author: Cédric Jeanneret <email address hidden>
Date: Tue Aug 25 08:09:44 2020 +0200

    Enable pids_limit support

    The default PID limit in a container is set to 4096. This limit might be
    reached in a nova_libvirt container, after launching about 150 VMs.

    Change-Id: Ibfbe63cbd9e2a219f10ebc407596aeefe4a5b194
    Related: https://bugzilla.redhat.com/show_bug.cgi?id=1871885
    Closes-Bug: #1892817
    (cherry picked from commit f602e84c0f03e4c151f25b12e131af95c7daf305)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/748574

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/748575

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/747835
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=9d71882a420ac77d033c77f6ca762c6636603129
Submitter: Zuul
Branch: master

commit 9d71882a420ac77d033c77f6ca762c6636603129
Author: Cédric Jeanneret <email address hidden>
Date: Tue Aug 25 08:13:55 2020 +0200

    Set a higher PIDs limit for nova_libvirt container

    The default limit is set to 4096. This can be reached with about 150
    VMs, and therefore can lead to a situation where you're unable to start
    new VMs on a compute node.

    This patch integrates the modifications made by Rabi in his own
    (abandonned) patch.

    Note: this patch needs to be backported down to stable/train. The value
    of Depends-On will need to be updated in order to point to another
    patch, in paunch repository: https://review.opendev.org/747831

    Change-Id: Ic414fc8826e4164ed679fbe22b82acf39c9ed7e0
    Co-Authored-By: Rabi Mishra <email address hidden>
    Related: https://bugzilla.redhat.com/show_bug.cgi?id=1871885
    Related-Bug: #1892817
    Depends-On: https://review.opendev.org/747834

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/748575
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=94ba270906b398d5a44004a5ecd4b07363043006
Submitter: Zuul
Branch: stable/train

commit 94ba270906b398d5a44004a5ecd4b07363043006
Author: Cédric Jeanneret <email address hidden>
Date: Tue Aug 25 08:13:55 2020 +0200

    Set a higher PIDs limit for nova_libvirt container

    The default limit is set to 4096. This can be reached with about 150
    VMs, and therefore can lead to a situation where you're unable to start
    new VMs on a compute node.

    This patch integrates the modifications made by Rabi in his own
    (abandonned) patch.

    Note: this patch needs to be backported down to stable/train. The value
    of Depends-On will need to be updated in order to point to another
    patch, in paunch repository: https://review.opendev.org/747831

    Change-Id: Ic414fc8826e4164ed679fbe22b82acf39c9ed7e0
    Co-Authored-By: Rabi Mishra <email address hidden>
    Related: https://bugzilla.redhat.com/show_bug.cgi?id=1871885
    Closes-Bug: #1892817
    Depends-On: https://review.opendev.org/747831
    (cherry picked from commit 9d71882a420ac77d033c77f6ca762c6636603129)
    (cherry picked from commit daff4688fb016a46f49660fa03ca5a43e3945d52)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/ussuri)

Reviewed: https://review.opendev.org/748574
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=daff4688fb016a46f49660fa03ca5a43e3945d52
Submitter: Zuul
Branch: stable/ussuri

commit daff4688fb016a46f49660fa03ca5a43e3945d52
Author: Cédric Jeanneret <email address hidden>
Date: Tue Aug 25 08:13:55 2020 +0200

    Set a higher PIDs limit for nova_libvirt container

    The default limit is set to 4096. This can be reached with about 150
    VMs, and therefore can lead to a situation where you're unable to start
    new VMs on a compute node.

    This patch integrates the modifications made by Rabi in his own
    (abandonned) patch.

    Note: this patch needs to be backported down to stable/train. The value
    of Depends-On will need to be updated in order to point to another
    patch, in paunch repository: https://review.opendev.org/747831

    Change-Id: Ic414fc8826e4164ed679fbe22b82acf39c9ed7e0
    Co-Authored-By: Rabi Mishra <email address hidden>
    Related: https://bugzilla.redhat.com/show_bug.cgi?id=1871885
    Closes-Bug: #1892817
    Depends-On: https://review.opendev.org/748108
    (cherry picked from commit 9d71882a420ac77d033c77f6ca762c6636603129)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/paunch 5.4.0

This issue was fixed in the openstack/paunch 5.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-ansible 0.6.0

This issue was fixed in the openstack/tripleo-ansible 0.6.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.4.0

This issue was fixed in the openstack/tripleo-heat-templates 11.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.