Mirantis OpenStack

Keystone is slow and unreliable on big clusters

Bug #1566802 reported by Sergey Kolekonov on 2016-04-06

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	Mirantis OpenStack	Status tracked in 10.0.x
	10.0.x	Invalid	Critical	MOS Keystone	Mirantis OpenStack 10.0

Bug Description

Steps to reproduce:
Deploy 9.0 environment with 3 controllers and large (>50) number of compute nodes
9.0 ISO #121
Start 100 vms (micro flavor) simultaneously using Horizon

Expected results:
All vms are in Active state

Actual result:
Keystone works extremely slow and sometimes doesn't answer at all, many vms are in ERROR state.

The following errors can be found in Apache logs:

Timeout when reading response headers from daemon process 'keystone_main': /usr/lib/cgi-bin/keystone/main
Resource temporarily unavailable: [client 192.168.0.2:42830] mod_wsgi (pid=10571): Unable to connect to WSGI daemon process 'keystone_main' on '/var/run/apache2/wsgi.33414.3.2.sock' after multiple attempts as listener backlog limit was exceeded.

Tags:

Ivan Berezovskiy (iberezovskiy) on 2016-04-06

Changed in mos:
status:	New → Confirmed

Revision history for this message

Alexander Makarov (amakarov) wrote on 2016-04-07:

Suggestions:
- in keystone.conf set revoke_by_id=False
- boost number of workers in the apache config several times
- boost trhead number in memcached config (looks most promising)

Revision history for this message

Alexander Makarov (amakarov) wrote on 2016-04-08:

Important part:
pylibmc appears more stable than python-memcached, so I strongly suggest use in in our ISO as a default for keystone cache.

Alexander Petrov (apetrov-n) on 2016-04-15

tags:

added: area-keystone keystone

Revision history for this message

Max Yatsenko (myatsenko) wrote on 2016-05-10:

@amakarov:
These are memcached parameters:

-t <num> number of threads to use (default: 4)
-I Override the size of each slab page. Adjusts max item size
             (default: 1mb, min: 1k, max: 128m)
-L Try to use large memory pages (if available). Increasing
             the memory page size could reduce the number of TLB misses
             and improve the performance. In order to get large pages
             from the OS, memcached will allocate the total item-cache
             in one large chunk.

in puppet manifests 'item_size' parameter specifies '-I' parameter
                    'processorcount' parameter specified '-t' parameter
                    'large_mem_pages' parameter should specify '-L' parameter, this parameter should be true/false
                    and I was not able to find it for 9.0 (it uses puppet-memcached version 2.5.0).

Now for 9.0 '-I' parameter should have '10m value,
'-t' parameter by default set to number of CPUs (processorcount), but in this patch I increase it in 4 time (only for testing needs):
https://review.openstack.org/#/c/313701/1/deployment/puppet/osnailyfacter/manifests/memcached/memcached.pp :
   ....
   ..
   item_size => '10m',
   processorcount => $threads,
   ...

Revision history for this message

Fuel Devops McRobotson (fuel-devops-robot) wrote on 2016-05-10: Fix proposed to openstack/keystone (9.0/mitaka)

Fix proposed to branch: 9.0/mitaka
Change author: Alexander Makarov <email address hidden>
Review: https://review.fuel-infra.org/20466

Revision history for this message

Alexander Makarov (amakarov) wrote on 2016-05-10:

Another approach: https://review.fuel-infra.org/20466
To disable revocation tree, set [revoke]driver = dummy

Sheena Conant (sheena-conant) on 2016-05-10

tags:

added: scale

Revision history for this message

Alexander Makarov (amakarov) wrote on 2016-05-16:

Please confirm the patch fixes the issue.

Changed in mos:
assignee:	MOS Keystone (mos-keystone) → Leontiy Istomin (listomin)

Revision history for this message

Fuel Devops McRobotson (fuel-devops-robot) wrote on 2016-05-17: Fix merged to openstack/keystone (9.0/mitaka)

Reviewed: https://review.fuel-infra.org/20466
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: dea8f1f9fef349797afdbd9c613a4275eca4aa00
Author: Alexander Makarov <email address hidden>
Date: Tue May 17 13:12:33 2016

Revocation driver stub

We don't need revocation tree for Fernet token, and it abuses memcached.

To enable dummy driver set in /etc/keystone/keystone.conf:

[revoke]
driver = keystone.revoke.backends.dummy.Revoke

Change-Id: Ic1dc4f8d071485277cba77e362c66d4130edbe94
Closes-Bug: 1566802

Revision history for this message

Timur Nurlygayanov (tnurlygayanov) wrote on 2016-05-19:

Marked as Fix committed for MOS 9.0 because patch was merged.

Changed in mos:
status:	Confirmed → Fix Committed

Revision history for this message

Timur Nurlygayanov (tnurlygayanov) wrote on 2016-05-19:

Please change status to Confirmed if the issue will be reproduced again.

What about fix for MOS 10.0?

Revision history for this message

Leontii Istomin (listomin) wrote on 2016-05-19:

#10

We have tried this way here https://bugs.launchpad.net/mos/+bug/1583095
BTW the fix doesn't include changing of keystone.conf file

Revision history for this message

Alexander Makarov (amakarov) wrote on 2016-05-19:

#11

Need to change configuration for the patch to work:

To enable dummy driver set in /etc/keystone/keystone.conf:

[revoke]
driver = keystone.revoke.backends.dummy.Revoke

Revision history for this message

Dmitry Klenov (dklenov) wrote on 2016-05-20:

#12

Removing Fuel project. Bug is tracked in 2 versions of Mirantis OpenStack project.

no longer affects:

fuel

Sergey Shevorakov (sshevorakov) on 2016-05-20

Changed in mos:
importance:	High → Critical

Revision history for this message

Leontii Istomin (listomin) wrote on 2016-05-25:

#13

Mos-puppet, we need to change keystone.conf:
[revoke]
driver = keystone.revoke.backends.dummy.Revoke
Please implement that for stable/mitaka

Changed in mos:
assignee:	Leontiy Istomin (listomin) → MOS Puppet Team (mos-puppet)

Revision history for this message

Ivan Berezovskiy (iberezovskiy) wrote on 2016-05-25:

#14

Returning status to confirmed because fix is required

Changed in mos:
status:	Fix Committed → Confirmed

Revision history for this message

Denis Egorenko (degorenko) wrote on 2016-05-25:

#15

Fix for mitaka is here: https://review.openstack.org/321058

Changed in mos:
assignee:	MOS Puppet Team (mos-puppet) → Denis Egorenko (degorenko)
status:	Confirmed → In Progress

Revision history for this message

Denis Egorenko (degorenko) wrote on 2016-05-25:

#16

Master: https://review.openstack.org/321086

Revision history for this message

Denis Egorenko (degorenko) wrote on 2016-05-30:

#17

master will be fixed, when will be prepared mos-10.0 branche, because fix includes custom patch.

Changed in mos:
status:	In Progress → Confirmed
status:	Confirmed → In Progress
status:	In Progress → Fix Committed

Revision history for this message

Boris Bobrov (bbobrov) wrote on 2016-05-30:

#18

Please stop. Using dummy driver is a bad approach. It removes a major part of functionality even with Fernet tokens. Here is the script for testing: http://paste.openstack.org/show/506324/ . It disables a domain, and the token with the disabled domain should be invalid. But with dummy driver, the token is still valid.

Both changes above should be reverted. There should be another fix. Cutting out functionality from keystone is a bad solution to performance problems.

Revision history for this message

Boris Bobrov (bbobrov) wrote on 2016-05-30:

#19

Opened https://bugs.launchpad.net/fuel/+bug/1587136 .

Changed in mos:
status:	Fix Committed → Confirmed

Ivan Berezovskiy (iberezovskiy) on 2016-05-30

Changed in mos:
assignee:	Denis Egorenko (degorenko) → MOS Keystone (mos-keystone)

Revision history for this message

Dina Belova (dbelova) wrote on 2016-05-31:

#20

Setting to Incomplete until future of https://bugs.launchpad.net/fuel/+bug/1587136 will be decided

Changed in mos:
status:	Confirmed → Incomplete

Revision history for this message

Boris Bobrov (bbobrov) wrote on 2016-05-31:

#21

We have decided to run the tests again with reverted dummy driver. We leave the bugreport in "incomplete" and wait for the results.

Revision history for this message

Dina Belova (dbelova) wrote on 2016-06-07:

#22

Marking as invalid so far using preliminary analysis done against small env and scale env (but with a bit outdated MOS 9.0 ISO). More specific tests are currently in progress with fresher ISO, but it looks like the original issue was gone with syslog bug being fixed

Changed in mos:
status:	Incomplete → Invalid

Revision history for this message

Fuel Devops McRobotson (fuel-devops-robot) wrote on 2017-04-20: Fix proposed to openstack/keystone (mcp/newton)

#23

Fix proposed to branch: mcp/newton
Change author: Alexander Makarov <email address hidden>
Review: https://review.fuel-infra.org/33585

Revision history for this message

Fuel Devops McRobotson (fuel-devops-robot) wrote on 2017-04-20: Fix proposed to openstack/keystone (11.0/ocata)

#24

Fix proposed to branch: 11.0/ocata
Change author: Alexander Makarov <email address hidden>
Review: https://review.fuel-infra.org/34078

Revision history for this message

Fuel Devops McRobotson (fuel-devops-robot) wrote on 2017-04-24: Fix proposed to openstack/keystone (mcp/ocata)

#25

Fix proposed to branch: mcp/ocata
Change author: Alexander Makarov <email address hidden>
Review: https://review.fuel-infra.org/34785

Revision history for this message

Fuel Devops McRobotson (fuel-devops-robot) wrote on 2017-04-26: Change abandoned on openstack/keystone (mcp/ocata)

#26

Change abandoned by Roman Podoliaka <email address hidden> on branch: mcp/ocata
Review: https://review.fuel-infra.org/34785
Reason: according to https://bugs.launchpad.net/fuel/+bug/1587136 this wasn't a good idea - we don't need this anymore

Revision history for this message

Fuel Devops McRobotson (fuel-devops-robot) wrote on 2017-04-26: Change abandoned on openstack/keystone (mcp/newton)

#27

Change abandoned by Roman Podoliaka <email address hidden> on branch: mcp/newton
Review: https://review.fuel-infra.org/33585
Reason: according to https://bugs.launchpad.net/fuel/+bug/1587136 this wasn't a good idea - we don't need this anymore

Revision history for this message

Fuel Devops McRobotson (fuel-devops-robot) wrote on 2017-04-26: Change abandoned on openstack/keystone (11.0/ocata)

#28

Change abandoned by Roman Podoliaka <email address hidden> on branch: 11.0/ocata
Review: https://review.fuel-infra.org/34078
Reason: we don't use 11.0/ocata anymore - mcp/ocata is the correct branch name

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.