[long testing] Cloud is not functional: "Lock wait timeout exceeded; try restarting transaction" error.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Fix Released
|
Critical
|
Sergii Golovatiuk |
Bug Description
Note: this issue reproduced on the hardware scale lab with 5 controller nodes and ~18 compute servers.
Steps To Reproduce:
1. Deploy OpenStack cloud with 5 controllers and 18 compute nodes
2. Run different tests during 4 days (Murano, Sahara integration tests, boot VMs with CPU / HDD / Network load)
3. Wait 2 days without any tests / load on the cloud
4. Login to Horizon, try to create Network. Horizon will work really slow, and after several minutes it will fail with 'Something went wrong!' error
5. Login to controller nodes and check the resources usage. We can see that all RAM (30Gb) and SWAP (16 Gb) are full (please see attached screenshots.
Expected Result:
Cloud is functional, all services work without issues, memory is free.
Observed Result:
We can see that all RAM (30Gb) and SWAP (16 Gb) are full (please see attached screenshots). Cloud is not functional:
root@node-7:~# . openrc
root@node-7:~# nova list
ERROR (Unauthorized): Unauthorized (HTTP 401) (Request-ID: req-9d7dea89-
MySQL and RabbitMQ take a lot of memory:
_______
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14463 mysql 20 0 11.994g 4.000g 1.325g S 0.3 12.7 122:30.16 mysqld
18106 rabbitmq 20 0 5312204 2.234g 2732 S 34.1 7.1 2813:41 beam.smp
22344 root 20 0 1014832 791396 7656 S 0.7 2.4 53:11.24 ceph-mon
17847 nova 20 0 388008 196548 3748 S 0.0 0.6 118:42.91 nova-conductor
7209 cinder 20 0 4181612 181548 8340 S 0.0 0.6 122:01.94 cinder-volume
17844 nova 20 0 353480 161940 3748 S 0.0 0.5 118:45.47 nova-conductor
17835 nova 20 0 350800 159208 3748 S 0.0 0.5 118:46.23 nova-conductor
17841 nova 20 0 348972 157460 3752 S 0.0 0.5 118:43.65 nova-conductor
17849 nova 20 0 347052 155620 3748 S 0.0 0.5 118:45.61 nova-conductor
7328 glance 20 0 2795720 152264 8748 S 0.0 0.5 3:47.63 glance-api
32470 neutron 20 0 388668 152008 2752 S 0.0 0.5 20:33.16 neutron-server
17838 nova 20 0 340748 149624 3756 S 0.0 0.5 118:41.81 nova-conductor
32469 neutron 20 0 385000 148808 2752 S 0.0 0.5 20:47.76 neutron-server
summary: |
[long testing] SWAP is full after 4 days of OpenStack cloud life - cloud - is not functional + is not functional: "Lock wait timeout exceeded; try restarting + transaction" error |
Changed in mos: | |
assignee: | nobody → Boris Bobrov (bbobrov) |
Changed in mos: | |
status: | Confirmed → In Progress |
summary: |
- [long testing] SWAP is full after 4 days of OpenStack cloud life - cloud - is not functional: "Lock wait timeout exceeded; try restarting - transaction" error + [long testing] Cloud is not functional: "Lock wait timeout exceeded; try + restarting transaction" error. |
Changed in mos: | |
status: | In Progress → Invalid |
Changed in mos: | |
status: | Invalid → Fix Committed |
status: | Fix Committed → In Progress |
Changed in mos: | |
status: | In Progress → Fix Committed |
tags: | added: long-haul-testing |
tags: | removed: long-haul-testing |
tags: | added: low-hanging-fruit |
tags: | removed: low-hanging-fruit |
tags: | added: long-haul-testing |
tags: | removed: long-haul-testing |
tags: | added: long-haul-testing |
Screenshot of 'atop' output