Too many open files after rebooting each controller node
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla-ansible |
New
|
Undecided
|
Unassigned |
Bug Description
After deploying the controllers in vmware, I originally didn't have enough ram for them. It ended up swapping. This was a month or so after deploying. I had to reboot each controller one at a time to increase the ram and cpu. Once I did that about 3-4 days later the controllers crashed again but this time pretty much in every container it was complaining about too many open files. This didn't happen for over a month before the controllers swapped since OpenStack wasn't used heavily in the beginning. This has happened 3 times ever since the initial reboot of OpenStack controllers. I changed ulimits and sysctl file max the first time it happened then rebooted each controller one at a time. As stated previously even after the rebooting it popped back up 2 more times. As of right now I'm still within the initial 3 days since last reboot.
Reproduce (haven't tested since I'm limited on time)
My environment, deploy 3 controllers in vmware but keep ram too low. Once the controllers swap power off one controller at a time and increase ram/cpu to needed amount. After doing this once OpenStack has run with fairly small usage 3-7? days later you might see that Too many files are open pretty much everywhere (I use memcached container to check if it happened again). Changing ulimits and file max doesn't help. Fixing it with rebooting one controller at a time then it should happen again.
My theory is that something is set during deployment/
Thanks for the help!
[root@ctl-os1 ~]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME=
HOME_URL="https:/
BUG_REPORT_URL="https:/
CENTOS_
CENTOS_
REDHAT_
REDHAT_
[root@ctl-os1 ~]# uname -a
Linux ctl-os1 4.18.0-
[root@ctl-os1 ~]# docker version
Client: Docker Engine - Community
Version: 19.03.12
API version: 1.40
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:46:54 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.12
API version: 1.40 (minimum version 1.12)
Go version: go1.13.10
Git commit: 48a66213fe
Built: Mon Jun 22 15:45:28 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef
docker-init:
Version: 0.18.0
GitCommit: fec3683
Kolla ansible version from pip freeze
kolla-ansible=
Docker install type: source
Docker distribution: centos
Official Images
I don't think inventory or globals.yml are relevant.
Try running deploy again indeed. It is largely idempotent. I have never seen such an issue.