Glance creating heavy CPU load on standby cluster
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Invalid
|
High
|
Alexander Tivelkov | ||
6.0.x |
Fix Released
|
High
|
Denis Meltsaykin | ||
6.1.x |
Fix Released
|
High
|
Denis Meltsaykin | ||
7.0.x |
Invalid
|
High
|
Alexander Tivelkov |
Bug Description
Steps to reproduce:
1. Install MOS OpenStack HA environment with 3 controllers.
2. Run "ps aux | grep glance-api" on a controller and memorize cpu consumption by these services
3. Run rally tests on the environment (for description of what rally tests are, see "User impact" section below)
4. Repeat step #2 on the same controller.
You will notice that cpu consumption by all glance-api processes raised a little (less than 1% each). There are 12 glance-api processes, so all together consumption by the glance-api service should raise by 12%. If you run rally test again, the consumption will again go up a little. Note, here 1% is a 1% of a single cpu core, not overall machine compute capacity (which consists of several cpus/cores). In reproduction of this bug a single rally run created/deleted around 1300 images in Glance.
Conditions for reproduction:
No additional conditions required. We have not much data (2 repros so far), but we think that the issue reproduces in 100% cases.
User impact:
Rally tests emulate multi-user usage of cloud, by concurrently doing various actions with cloud. For example CRUD operations with users, instances, volumes, etc.
The bug does not result in visible impact for user, aside from that cpu consumption constantly grows while cloud is used. Obviously that will lead to problem once service consumes considerable part of machine compute capacity.
Workaround:
Restart the affected service, that will immediately drop cpu consumption by the service back to almost zero.
Current plan:
We continue to investigate issue and test possible fixes (see comments for details). We plan to fix the issue in updates for 6.1. Dina Belova from Scale team agreed that the issue is not blocker for the release, but must be fixed in 6.1 updates.
-------
Original description by Aleksandr Shaposhnikov:
Basically all on the controllers nodes have a lot glance processes consuming a lot of CPU resources.
On cluster there is no active provisioning or snapshotting.
Here is some information:
root@node-8:~# glance image-list --all-tenants
+------
| ID | Name | Disk Format | Container Format | Size | Status |
+------
| 44bd0e36-
+------
root@node-8:~# top
Tasks: 443 total, 4 running, 439 sleeping, 0 stopped, 0 zombie
%Cpu(s): 35.0 us, 3.6 sy, 0.0 ni, 60.4 id, 0.1 wa, 0.0 hi, 0.8 si, 0.0 st
KiB Mem: 32913976 total, 32647224 used, 266752 free, 144372 buffers
KiB Swap: 16777212 total, 1476 used, 16775736 free. 15889604 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5872 glance 20 0 2963992 159628 9316 R 24.2 0.5 96:22.79 glance-api
5866 glance 20 0 2921064 113452 9316 S 18.2 0.3 110:35.66 glance-api
5867 glance 20 0 3576688 115964 9316 S 18.2 0.4 103:15.95 glance-api
5868 glance 20 0 5173904 144564 9316 S 18.2 0.4 104:06.07 glance-api
5869 glance 20 0 2985956 175728 9316 S 18.2 0.5 83:01.87 glance-api
5870 glance 20 0 2939812 135292 9316 S 18.2 0.4 111:22.43 glance-api
5871 glance 20 0 2954588 151700 9320 S 18.2 0.5 119:42.82 glance-api
5873 glance 20 0 2997468 192524 9292 S 18.2 0.6 109:54.66 glance-api
5874 glance 20 0 2981164 179176 9316 S 18.2 0.5 105:45.58 glance-api
5876 glance 20 0 3009780 200332 9292 S 18.2 0.6 99:36.75 glance-api
5856 cinder 20 0 282696 87076 3920 S 12.1 0.3 20:23.18 cinder-api
5865 glance 20 0 2988316 184412 9320 S 12.1 0.6 108:24.61 glance-api
5875 glance 20 0 2988312 183496 9316 R 12.1 0.6 93:03.88 glance-api
4 root 20 0 0 0 0 S 6.1 0.0 1:48.71 kworker/0:0
MOS 6.1 build #521
Will attach snapshot later once it will be downloaded.
tags: | added: scale |
Changed in mos: | |
assignee: | nobody → MOS Glance (mos-glance) |
Changed in mos: | |
milestone: | none → 6.1 |
importance: | Undecided → High |
tags: | added: glance |
Changed in mos: | |
status: | New → Confirmed |
assignee: | MOS Glance (mos-glance) → Mike Fedosin (mfedosin) |
Changed in mos: | |
assignee: | Mike Fedosin (mfedosin) → Inessa Vasilevskaya (ivasilevskaya) |
tags: | added: 6.1rc2 |
Changed in mos: | |
assignee: | Inessa Vasilevskaya (ivasilevskaya) → Mike Fedosin (mfedosin) |
description: | updated |
tags: |
added: 6.1scale removed: 6.1rc2 scale |
tags: |
added: scale removed: 6.1scale |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in mos: | |
milestone: | 6.1 → 6.1-updates |
tags: | added: 6.1-mu-1 |
Changed in mos: | |
status: | Confirmed → Fix Committed |
Changed in mos: | |
milestone: | 6.1-updates → 6.1-mu-1 |
Changed in mos: | |
status: | Fix Committed → In Progress |
tags: | added: 6.0-mu-5 done release-notes |
tags: | added: customer-found support |
strace to glance-api sad, that we have many hundreds events per second of this: POLLIN| POLLPRI| POLLERR| POLLHUP} , {fd=8, events= POLLIN| POLLPRI| POLLERR| POLLHUP} ], 2, 0) = 0 (Timeout) POLLIN| POLLPRI| POLLERR| POLLHUP} , {fd=8, events= POLLIN| POLLPRI| POLLERR| POLLHUP} ], 2, 0) = 0 (Timeout) POLLIN| POLLPRI| POLLERR| POLLHUP} , {fd=8, events= POLLIN| POLLPRI| POLLERR| POLLHUP} ], 2, 0) = 0 (Timeout) POLLIN| POLLPRI| POLLERR| POLLHUP} ], 1, 0) = 0 (Timeout)
poll([{fd=5, events=
poll([{fd=5, events=
--- was eaten by mices ---
poll([{fd=5, events=
poll([{fd=5, events=
recvfrom(8, 0x7f748ef547d4, 7, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
root@node-8:~# time strace -p 5871 -o api.txt
Process 5871 attached
^CProcess 5871 detached
real 0m3.986s
user 0m0.153s
sys 0m0.310s
root@node-8:~# cat api.txt | wc -l
8916