keystone maxed out during overcloud deploy

Bug #1643006 reported by Derek Higgins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Expired
Undecided
Unassigned

Bug Description

Using RDO newton (on virt)

I tried deploying a overcloud with 3 controllers and 40 computes, during the deployment keystone-admin seems to be maxed out at 110%+ CPU usage and using 1.353g of RAM.

From tailing the logs keystone-admin seems to be getting through about 2 too 3 requests per second (rough guess)

Also If I'm reading netstat correctly lots of requests are being queued up

[root@undercloud-scale httpd]# netstat -pn | grep -i 35357 | grep ESTABLISHED | grep http
tcp 0 0 192.0.2.1:35357 192.0.2.1:42816 ESTABLISHED 16088/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42852 ESTABLISHED 9880/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42904 ESTABLISHED 552/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42974 ESTABLISHED 9868/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42824 ESTABLISHED 9874/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42938 ESTABLISHED 30259/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42830 ESTABLISHED 15040/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42896 ESTABLISHED 12372/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42868 ESTABLISHED 8380/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42958 ESTABLISHED 12529/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42978 ESTABLISHED 8396/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42908 ESTABLISHED 12328/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42818 ESTABLISHED 9721/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42950 ESTABLISHED 14164/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42976 ESTABLISHED 29083/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42902 ESTABLISHED 14158/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42856 ESTABLISHED 16090/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42898 ESTABLISHED 29154/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42970 ESTABLISHED 12601/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42838 ESTABLISHED 12418/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42960 ESTABLISHED 12316/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42966 ESTABLISHED 14165/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42822 ESTABLISHED 15041/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42920 ESTABLISHED 9873/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42948 ESTABLISHED 16170/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42940 ESTABLISHED 6286/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42890 ESTABLISHED 15042/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42942 ESTABLISHED 8391/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42832 ESTABLISHED 26971/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42906 ESTABLISHED 16097/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42968 ESTABLISHED 26191/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42982 ESTABLISHED 6277/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42858 ESTABLISHED 8448/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42820 ESTABLISHED 9884/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42944 ESTABLISHED 14161/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42900 ESTABLISHED 9883/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42850 ESTABLISHED 16089/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42860 ESTABLISHED 15034/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42876 ESTABLISHED 28826/httpd
tcp 0 0 192.0.2.1:35357 192.0.2.1:42946 ESTABLISHED 12281/httpd

Tags: scale
Steven Hardy (shardy)
Changed in tripleo:
status: New → Triaged
importance: Undecided → High
milestone: none → ocata-2
Revision history for this message
Derek Higgins (derekh) wrote :

I've ran some tests against keystone while the overcloud deploy is ongoing, at some stages keystone is taking 2 minutes to authenticate a token.

The DB isn't being overly stressed, I'm thinking the keystone-admin python process is CPU bound and we only have one of them, it doesn't scale with the CPU's.

I've also tried tweaking the WSGI config (specifically the number of processes) and tested it with the "ab" bench marking tool, nothing I could do increase the number of requests keystone-admin cloud process(about 3 per second).

Changed in tripleo:
milestone: ocata-2 → ocata-3
Changed in tripleo:
milestone: ocata-3 → ocata-rc1
Changed in tripleo:
milestone: ocata-rc1 → ocata-rc2
Changed in tripleo:
milestone: ocata-rc2 → pike-1
Changed in tripleo:
milestone: pike-1 → pike-2
Changed in tripleo:
milestone: pike-2 → pike-3
Changed in tripleo:
milestone: pike-3 → pike-rc1
Revision history for this message
Ben Nemec (bnemec) wrote :

This has been open for nearly a year, and nobody's done any recent work to confirm that it's still even present. I don't think it needs to be prioritized for pike. Deferring to queens.

Changed in tripleo:
milestone: pike-rc1 → queens-1
tags: added: scale
Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Revision history for this message
Emilien Macchi (emilienm) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (FUTURE, PIKE, QUEENS, ROCKY, STEIN).
  Valid example: CONFIRMED FOR: FUTURE

Changed in tripleo:
importance: High → Undecided
status: Triaged → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.