get 503 from keystone during 6 rps to get token and authenticate

Bug #1583095 reported by Leontii Istomin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Confirmed
High
MOS Keystone
9.x
Confirmed
High
MOS Keystone

Bug Description

Detailed bug description:
During rally scenario http://paste.openstack.org/show/497466/ we have faced with the following rally error: http://paste.openstack.org/show/497475/

Steps to reproduce:
1. Fuel 9.0-308 has been deployed
2. curl -s 'https://review.openstack.org/gitweb?p=openstack/fuel-web.git;a=patch;h=6106dfa026b042dac26ed77354321115b78aae5b' | patch -b -d /usr/lib/python2.7/site-packages -p2
rpm -Uvh epel-release-latest-7.noarch.rpm
sed -i s/^enabled=1/enabled=0/g /etc/yum.repos.d/epel.repo
yum --enablerepo=epel install uwsgi uwsgi-plugin-python python-uwsgidecorator
service receiverd restart && service nailgun restart
due https://bugs.launchpad.net/fuel/+bug/1569859 and https://bugs.launchpad.net/fuel/+bug/1570509
3. fuel-agent and fuel-library have been updated: http://paste.openstack.org/show/496825/ due https://bugs.launchpad.net/fuel/+bug/1543233 and https://bugs.launchpad.net/fuel/+bug/1574999
4. patch has been applied to keep rotated logs http://paste.openstack.org/show/495857/
5. patch has been applied to increase rsyslog chunks http://paste.openstack.org/show/496901/
6. LMA, ElasticSearch, Grafana plugins have been installed:
yum -y install createrepo rpm rpm-build dpkg-devel git
easy_install pip
pip install fuel-plugin-builder
git clone https://github.com/openstack/fuel-plugin-lma-collector.git
fpb --check ./fuel-plugin-lma-collector
fpb --build ./fuel-plugin-lma-collector
fuel plugins --install ./fuel-plugin-lma-collector/*.noarch.rpm
git clone https://github.com/openstack/fuel-plugin-elasticsearch-kibana.git
fpb --check ./fuel-plugin-elasticsearch-kibana
fpb --build ./fuel-plugin-elasticsearch-kibana
fuel plugins --install ./fuel-plugin-elasticsearch-kibana/*.noarch.rpm
git clone https://github.com/openstack/fuel-plugin-influxdb-grafana.git
fpb --check ./fuel-plugin-influxdb-grafana
fpb --build ./fuel-plugin-influxdb-grafana
fuel plugins --install ./fuel-plugin-influxdb-grafana/*.noarch.rpm
patch -b -d /var/www/nailgun/plugins/lma_collector-0.10/ -p1 < lma.patch (http://paste.openstack.org/show/495328/)
fuel plugins --sync
7. deployed cluster: 3 controller, 20 computes+ceph, 172 computes, vxlan+dvr, ceph for all
8. applied https://review.fuel-infra.org/#/c/20466/ on each controller node and restart apache. Also we changed keystone.conf: [revoke]driver = keystone.revoke.backends.dummy.Revoke
9. change /etc/haproxy/conf.d/020-keystone-1.cfg and /etc/haproxy/conf.d/030-keystone-2.cfg - commented out (due https://bugs.launchpad.net/fuel/+bug/1582202):
  #stick on src
  #stick-table type ip size 200k expire 2m
restarted haproxy via corosync
10. increased numbers of keystone processes changing processes parameter to 12 value in the file /etc/apache2/sites-enabled/05-keystone_wsgi_admin.conf on each controller node and restarted apache2:

11. perform rally
Expected results:
 rally test passed
Actual result:
 rally test failed
Reproducibility:
 each time
Workaround:
 reduce rps parameter
Impact:
 keystone performance
Description of the environment:
- Operation system: ubuntu
- Versions of components: mos9.0
- Reference architecture: 3 controller, 20 computes+ceph, 172 computes, vxlan+dvr, ceph for all
- Network model: vxlan+dvr
- Related projects installed: LMA
Additional information:
logs and etc folder from the controller nodes:
http://mos-scale-share.mirantis.com/bug_1583095_node-179_etc.tar.gz
http://mos-scale-share.mirantis.com/bug_1583095_node-179_logs.tar.gz
http://mos-scale-share.mirantis.com/bug_1583095_node-77_etc.tar.gz
http://mos-scale-share.mirantis.com/bug_1583095_node-77_logs.tar.gz
http://mos-scale-share.mirantis.com/bug_1583095_node-96_etc.tar.gz
http://mos-scale-share.mirantis.com/bug_1583095_node-96_logs.tar.gz

Revision history for this message
Leontii Istomin (listomin) wrote :

attached rally report with
3 rps and 1000 iterations - success
6 rps and 3000 iterations - failed

Changed in mos:
milestone: none → 9.0
Changed in mos:
status: New → Confirmed
importance: Undecided → High
tags: added: area-keystone
description: updated
description: updated
description: updated
Revision history for this message
Alexander Makarov (amakarov) wrote :

We use dogpile.cache MemcachedLock for distributed locking, which has way too long wait timeouts.
Looks like we need to patch oslo.cache.

Revision history for this message
Ilya Shakhat (shakhat) wrote :

Even with 1 process / 1 thread and idle cloud Keystone sometimes responses in several seconds - see http://paste.openstack.org/show/497502/

Revision history for this message
Leontii Istomin (listomin) wrote :

It seems rsyslog is the root cause of the kesytone issue.
When rsyslog running "time openstack user list" works during 1m11.814s
without rsyslog - 0m3.722s.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.