[Containers] Low nginx fd limit causes application failure
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Al Bailey |
Bug Description
Title
-----
Low nginx fd limit causes application failure
Brief Description
-----------------
During application apply some pods will fail reporting errors similar to 'Lost connection to MySQL server during query'. The mariadb-ingress pod will report: 'socket() failed (24: Too many open files)'.
Bob's investigation found that the nginx file descriptor limit per worker can be too low in select hardware configurations. This limit is based on the nginx workers which is set to the number of logical cores in the system.
The limit is calculated as follows: <system_fd_limit> / <thread_count> - 1024. This is capped to a minimum of 1024.
For example, in a lab with two E5-2699 and hyperthreading enabled there are 88 logical cores. The limit works out as follows:
( 65536 / 88 ) – 1024 = -279 -> This is then set to the minimum of 1024.
When the worker limit is set to 1024 the application apply will fail. Manually setting the limit to 2048 in those labs corrects the issue.
At the very least we will need to see how to number of workers via an override.
Severity
--------
Critical
Steps to Reproduce
------------------
Select a system with a high number of logical cores (32 or more).
Configure the system for kubernetes
Attempt to apply the openstack application
Expected Behavior
------------------
The application applies successfully
Actual Behavior
----------------
The application fails to apply
Reproducibility
---------------
Reproducible on labs with high logical core counts.
System Configuration
-------
Seen in multi-node systems, and systems with dedicated storage.
Branch/Pull Time/Commit
-------
###
### StarlingX
### Release 19.01
###
OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="f/stein"
JOB="STX_
<email address hidden>"
BUILD_NUMBER="47"
BUILD_HOST=
BUILD_DATE=
Timestamp/Logs
--------------
See above
tags: | added: stx.containers |
Changed in starlingx: | |
importance: | Undecided → High |
assignee: | nobody → Al Bailey (albailey1974) |
Changed in starlingx: | |
status: | Triaged → Fix Released |
tags: |
added: stx.2.0 removed: stx.2019.05 |
Marking as release gating; this can impact systems coming up based on the # of cores they have