[Containers] Low nginx fd limit causes application failure

Bug #1816479 reported by David Sullivan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Al Bailey

Bug Description

Title
-----
Low nginx fd limit causes application failure

Brief Description
-----------------
During application apply some pods will fail reporting errors similar to 'Lost connection to MySQL server during query'. The mariadb-ingress pod will report: 'socket() failed (24: Too many open files)'.

Bob's investigation found that the nginx file descriptor limit per worker can be too low in select hardware configurations. This limit is based on the nginx workers which is set to the number of logical cores in the system.

The limit is calculated as follows: <system_fd_limit> / <thread_count> - 1024. This is capped to a minimum of 1024.

For example, in a lab with two E5-2699 and hyperthreading enabled there are 88 logical cores. The limit works out as follows:

( 65536 / 88 ) – 1024 = -279 -> This is then set to the minimum of 1024.

When the worker limit is set to 1024 the application apply will fail. Manually setting the limit to 2048 in those labs corrects the issue.

At the very least we will need to see how to number of workers via an override.

Severity
--------
Critical

Steps to Reproduce
------------------
Select a system with a high number of logical cores (32 or more).
Configure the system for kubernetes
Attempt to apply the openstack application

Expected Behavior
------------------
The application applies successfully

Actual Behavior
----------------
The application fails to apply

Reproducibility
---------------
Reproducible on labs with high logical core counts.

System Configuration
--------------------
Seen in multi-node systems, and systems with dedicated storage.

Branch/Pull Time/Commit
-----------------------
###
### StarlingX
### Release 19.01
###

OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="f/stein"

JOB="STX_build_stein_master"
<email address hidden>"
BUILD_NUMBER="47"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-02-13 18:12:13 +0000"

Timestamp/Logs
--------------
See above

Frank Miller (sensfan22)
tags: added: stx.containers
Changed in starlingx:
importance: Undecided → High
assignee: nobody → Al Bailey (albailey1974)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; this can impact systems coming up based on the # of cores they have

Changed in starlingx:
status: New → Triaged
tags: added: stx.2019.05
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-upstream (f/stein)

Fix proposed to branch: f/stein
Review: https://review.openstack.org/638451

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-upstream (f/stein)

Reviewed: https://review.openstack.org/638451
Committed: https://git.openstack.org/cgit/openstack/stx-upstream/commit/?id=bcce8c810f56a32710277341b0d6fe23226e7532
Submitter: Zuul
Branch: f/stein

commit bcce8c810f56a32710277341b0d6fe23226e7532
Author: Al Bailey <email address hidden>
Date: Thu Feb 21 09:52:23 2019 -0600

    Setting the worker_rlimit_nofile minimum to 2048 for nginx

    In the docker image for mariadb-ingress if there are many cores
    the calculated value for worker_rlimit_nofile ends up being 1024
    which is too small. This change sets the min to 2048.

    Closes-Bug: 1816479
    Change-Id: I4f198b703eda61d9a9531640ec01a2770f9ec172
    Signed-off-by: Al Bailey <email address hidden>

tags: added: in-f-stein
Al Bailey (albailey1974)
Changed in starlingx:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-upstream (f/centos76)

Fix proposed to branch: f/centos76
Review: https://review.openstack.org/640918

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-upstream (f/centos76)
Download full text (10.7 KiB)

Reviewed: https://review.openstack.org/640918
Committed: https://git.openstack.org/cgit/openstack/stx-upstream/commit/?id=1443710d8f88509b5e31ebdb2ce302767cd229df
Submitter: Zuul
Branch: f/centos76

commit 53cfbc9f06b58dee663a8f857dc269bde610c349
Author: Don Penney <email address hidden>
Date: Fri Mar 1 23:50:56 2019 -0500

    Remove openstack images from pike build

    Change-Id: Idd211511e42fa4c1290fdfa382ac579a1a8de91a
    Story: 2004751
    Task: 29788
    Signed-off-by: Don Penney <email address hidden>

commit bcce8c810f56a32710277341b0d6fe23226e7532
Author: Al Bailey <email address hidden>
Date: Thu Feb 21 09:52:23 2019 -0600

    Setting the worker_rlimit_nofile minimum to 2048 for nginx

    In the docker image for mariadb-ingress if there are many cores
    the calculated value for worker_rlimit_nofile ends up being 1024
    which is too small. This change sets the min to 2048.

    Closes-Bug: 1816479
    Change-Id: I4f198b703eda61d9a9531640ec01a2770f9ec172
    Signed-off-by: Al Bailey <email address hidden>

commit db10c94d9e26e4150b2a57c5d2e5673fd97a3481
Author: Alex Kozyrev <email address hidden>
Date: Wed Feb 20 12:15:55 2019 -0500

    Create Docker image for Barbican in StarlingX

    In order to provide the secure management of secrets service
    as a container in StarlingX we need to create Barbican Docker
    image and include it into StarlingX repository.

    Change-Id: I3b4483f74d233348ec49729deff11ba7776af01b
    Story: 2003108
    Task: 29579
    Signed-off-by: Alex Kozyrev <email address hidden>

commit b6e3badac62686a14a0d837d2677feccfa1dfd70
Author: Al Bailey <email address hidden>
Date: Mon Feb 4 11:59:06 2019 -0600

    Fix the version string in cinder and glance clients

    cinderclient was showing 0.0.0 for cinder --version
    Same problem for glance.

    The pbr version needed to be set when building from
    outside of a git tree.

    All other clients had this set properly.

    This bug was introduced when the new stein clients were
    added.

    The cinderclient also needed some BuildRequires updated.
    These BuildRequires were for building wheels.

    Closes-Bug: 1814573
    Change-Id: I4afe783e25ab2172ae999787e6b0e3ec91f78419
    Signed-off-by: Al Bailey <email address hidden>

commit cb921ff9348fec9c634ef191dba293e33088daae
Author: Gerry Kopec <email address hidden>
Date: Fri Feb 15 18:11:55 2019 -0500

    Update nova helm chart to fix console addressing

    Upstream nova helm chart attempts to figure out the address for VM
    consoles by running an init container that checks for ip routes and
    addresses on a compute host. It then sets the appropriate nova config
    options in a config file which it passes to nova-compute. However this
    effectively overwrites the same config option that stx has already
    set in nova.conf via per host overrides causing us to communicate over
    the wrong network or not to connect at all.

    This fix introduces an option to enable/disable passing of this
    additional config file to nova-compute. Default upstream behaviour is
    unchange...

tags: added: in-f-centos76
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.