Keystone is slow and unreliable on big clusters

Bug #1566802 reported by Sergey Kolekonov
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Invalid
Critical
MOS Keystone

Bug Description

Steps to reproduce:
Deploy 9.0 environment with 3 controllers and large (>50) number of compute nodes
9.0 ISO #121
Start 100 vms (micro flavor) simultaneously using Horizon

Expected results:
All vms are in Active state

Actual result:
Keystone works extremely slow and sometimes doesn't answer at all, many vms are in ERROR state.

The following errors can be found in Apache logs:

Timeout when reading response headers from daemon process 'keystone_main': /usr/lib/cgi-bin/keystone/main
Resource temporarily unavailable: [client 192.168.0.2:42830] mod_wsgi (pid=10571): Unable to connect to WSGI daemon process 'keystone_main' on '/var/run/apache2/wsgi.33414.3.2.sock' after multiple attempts as listener backlog limit was exceeded.

Changed in mos:
status: New → Confirmed
Revision history for this message
Alexander Makarov (amakarov) wrote :

Suggestions:
- in keystone.conf set revoke_by_id=False
- boost number of workers in the apache config several times
- boost trhead number in memcached config (looks most promising)

Revision history for this message
Alexander Makarov (amakarov) wrote :

Important part:
pylibmc appears more stable than python-memcached, so I strongly suggest use in in our ISO as a default for keystone cache.

tags: added: area-keystone keystone
Revision history for this message
Max Yatsenko (myatsenko) wrote :

@amakarov:
These are memcached parameters:

-t <num> number of threads to use (default: 4)
-I Override the size of each slab page. Adjusts max item size
             (default: 1mb, min: 1k, max: 128m)
-L Try to use large memory pages (if available). Increasing
             the memory page size could reduce the number of TLB misses
             and improve the performance. In order to get large pages
             from the OS, memcached will allocate the total item-cache
             in one large chunk.

in puppet manifests 'item_size' parameter specifies '-I' parameter
                    'processorcount' parameter specified '-t' parameter
                    'large_mem_pages' parameter should specify '-L' parameter, this parameter should be true/false
                    and I was not able to find it for 9.0 (it uses puppet-memcached version 2.5.0).

Now for 9.0 '-I' parameter should have '10m value,
'-t' parameter by default set to number of CPUs (processorcount), but in this patch I increase it in 4 time (only for testing needs):
https://review.openstack.org/#/c/313701/1/deployment/puppet/osnailyfacter/manifests/memcached/memcached.pp :
   ....
   ..
   item_size => '10m',
   processorcount => $threads,
   ...

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/keystone (9.0/mitaka)

Fix proposed to branch: 9.0/mitaka
Change author: Alexander Makarov <email address hidden>
Review: https://review.fuel-infra.org/20466

Revision history for this message
Alexander Makarov (amakarov) wrote :

Another approach: https://review.fuel-infra.org/20466
To disable revocation tree, set [revoke]driver = dummy

tags: added: scale
Revision history for this message
Alexander Makarov (amakarov) wrote :

Please confirm the patch fixes the issue.

Changed in mos:
assignee: MOS Keystone (mos-keystone) → Leontiy Istomin (listomin)
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/keystone (9.0/mitaka)

Reviewed: https://review.fuel-infra.org/20466
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: dea8f1f9fef349797afdbd9c613a4275eca4aa00
Author: Alexander Makarov <email address hidden>
Date: Tue May 17 13:12:33 2016

Revocation driver stub

We don't need revocation tree for Fernet token, and it abuses memcached.

To enable dummy driver set in /etc/keystone/keystone.conf:

[revoke]
driver = keystone.revoke.backends.dummy.Revoke

Change-Id: Ic1dc4f8d071485277cba77e362c66d4130edbe94
Closes-Bug: 1566802

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Marked as Fix committed for MOS 9.0 because patch was merged.

Changed in mos:
status: Confirmed → Fix Committed
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Please change status to Confirmed if the issue will be reproduced again.

What about fix for MOS 10.0?

Revision history for this message
Leontii Istomin (listomin) wrote :

We have tried this way here https://bugs.launchpad.net/mos/+bug/1583095
BTW the fix doesn't include changing of keystone.conf file

Revision history for this message
Alexander Makarov (amakarov) wrote :

Need to change configuration for the patch to work:

To enable dummy driver set in /etc/keystone/keystone.conf:

[revoke]
driver = keystone.revoke.backends.dummy.Revoke

Revision history for this message
Dmitry Klenov (dklenov) wrote :

Removing Fuel project. Bug is tracked in 2 versions of Mirantis OpenStack project.

no longer affects: fuel
Changed in mos:
importance: High → Critical
Revision history for this message
Leontii Istomin (listomin) wrote :

Mos-puppet, we need to change keystone.conf:
[revoke]
driver = keystone.revoke.backends.dummy.Revoke
Please implement that for stable/mitaka

Changed in mos:
assignee: Leontiy Istomin (listomin) → MOS Puppet Team (mos-puppet)
Revision history for this message
Ivan Berezovskiy (iberezovskiy) wrote :

Returning status to confirmed because fix is required

Changed in mos:
status: Fix Committed → Confirmed
Revision history for this message
Denis Egorenko (degorenko) wrote :

Fix for mitaka is here: https://review.openstack.org/321058

Changed in mos:
assignee: MOS Puppet Team (mos-puppet) → Denis Egorenko (degorenko)
status: Confirmed → In Progress
Revision history for this message
Denis Egorenko (degorenko) wrote :
Revision history for this message
Denis Egorenko (degorenko) wrote :

master will be fixed, when will be prepared mos-10.0 branche, because fix includes custom patch.

Changed in mos:
status: In Progress → Confirmed
status: Confirmed → In Progress
status: In Progress → Fix Committed
Revision history for this message
Boris Bobrov (bbobrov) wrote :

Please stop. Using dummy driver is a bad approach. It removes a major part of functionality even with Fernet tokens. Here is the script for testing: http://paste.openstack.org/show/506324/ . It disables a domain, and the token with the disabled domain should be invalid. But with dummy driver, the token is still valid.

Both changes above should be reverted. There should be another fix. Cutting out functionality from keystone is a bad solution to performance problems.

Revision history for this message
Boris Bobrov (bbobrov) wrote :
Changed in mos:
status: Fix Committed → Confirmed
Changed in mos:
assignee: Denis Egorenko (degorenko) → MOS Keystone (mos-keystone)
Revision history for this message
Dina Belova (dbelova) wrote :

Setting to Incomplete until future of https://bugs.launchpad.net/fuel/+bug/1587136 will be decided

Changed in mos:
status: Confirmed → Incomplete
Revision history for this message
Boris Bobrov (bbobrov) wrote :

We have decided to run the tests again with reverted dummy driver. We leave the bugreport in "incomplete" and wait for the results.

Revision history for this message
Dina Belova (dbelova) wrote :

Marking as invalid so far using preliminary analysis done against small env and scale env (but with a bit outdated MOS 9.0 ISO). More specific tests are currently in progress with fresher ISO, but it looks like the original issue was gone with syslog bug being fixed

Changed in mos:
status: Incomplete → Invalid
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/keystone (mcp/newton)

Fix proposed to branch: mcp/newton
Change author: Alexander Makarov <email address hidden>
Review: https://review.fuel-infra.org/33585

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/keystone (11.0/ocata)

Fix proposed to branch: 11.0/ocata
Change author: Alexander Makarov <email address hidden>
Review: https://review.fuel-infra.org/34078

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/keystone (mcp/ocata)

Fix proposed to branch: mcp/ocata
Change author: Alexander Makarov <email address hidden>
Review: https://review.fuel-infra.org/34785

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/keystone (mcp/ocata)

Change abandoned by Roman Podoliaka <email address hidden> on branch: mcp/ocata
Review: https://review.fuel-infra.org/34785
Reason: according to https://bugs.launchpad.net/fuel/+bug/1587136 this wasn't a good idea - we don't need this anymore

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/keystone (mcp/newton)

Change abandoned by Roman Podoliaka <email address hidden> on branch: mcp/newton
Review: https://review.fuel-infra.org/33585
Reason: according to https://bugs.launchpad.net/fuel/+bug/1587136 this wasn't a good idea - we don't need this anymore

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/keystone (11.0/ocata)

Change abandoned by Roman Podoliaka <email address hidden> on branch: 11.0/ocata
Review: https://review.fuel-infra.org/34078
Reason: we don't use 11.0/ocata anymore - mcp/ocata is the correct branch name

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.