haproxy for keystone-admin takes a 240-300s

Bug #1480153 reported by Matthew Mosesohn
34
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Bogdan Dobrelya

Bug Description

It seems that the dead time for waiting on memcached backend in keystone is too high (300s), causing the haproxy backend wait process to hang for 240-300s. We should lower the time to perhaps 15s and it will cut down the time for deployment and also minimize keystone downtime if memcached restarts.

memcache_dead_retry = 300
Setting is here:
https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/openstack/manifests/keystone.pp#L155

I came to these values by looking at puppet logs from 7.0 ubuntu BVT #112 and my local deploy.
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "113"
  build_id: "2015-07-30_16-01-07"
  nailgun_sha: "21ba6e2606a056883734392187845c172ecf99aa"
  python-fuelclient_sha: "71bb8fa87ee25f0c1bb84317884da7c917902a63"
  fuel-agent_sha: "dee9f2eb7e2822e89f6253f500f0c2e376a5b824"
  fuel-nailgun-agent_sha: "1512b9af6b41cc95c4d891c593aeebe0faca5a63"
  astute_sha: "488db988a1f2e18f99decf417371c50b2a7fb794"
  fuel-library_sha: "d1291ae75680818e715608814422075049a10ce8"
  fuel-ostf_sha: "92cdab6c6829be0d2d0c561fe56346dac8708d95"
  fuelmain_sha: "de5b333815f8541224c6726dc8446ffc7fb18b5b"

tags: added: keystone low-hanging-fruit
Changed in fuel:
milestone: none → 7.0
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Please take a look on a bug, where we set 300 https://bugs.launchpad.net/mos/+bug/1471318

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Boris, could you take a look?

Revision history for this message
Boris Bobrov (bbobrov) wrote :

We cannot set memcache_dead_retry to low values because it leads to a very bad performace of keystone during controller failure. The lower the value, the more often each keystone process will spend on trying to connect to dead memcache hosts.

Here is a wild guess for a workaround: restart haproxy after you've restarted keystone.

Changed in fuel:
importance: Medium → High
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Oleksiy Molchanov (omolchanov)
Boris Bobrov (bbobrov)
tags: removed: low-hanging-fruit
Revision history for this message
Boris Bobrov (bbobrov) wrote :

> We should lower the time to perhaps 15s and it will cut down the time for deployment and also minimize keystone downtime if memcached restarts.

> minimize keystone downtime if memcached restarts

1. This will not help to minimize downtime.
2. It will badly affect performance if one of the nodes fail.

Please also note that default in keystone is 300.

> It seems that the dead time for waiting on memcached backend in keystone is too high (300s), causing the haproxy backend wait process to hang for 240-300s.

Why is it bad? What issues does it cause and at what stage?

Changed in fuel:
assignee: Oleksiy Molchanov (omolchanov) → Matthew Mosesohn (raytrac3r)
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Boris, it's bad that our deployment hangs for 300s. Unnecessary delays in deploy are by nature wrong. I will try to fix this in the puppet workflow so that memcached is ready before trying to start keystone.

Dmitry Ilyin (idv1985)
Changed in fuel:
assignee: Matthew Mosesohn (raytrac3r) → Dmitry Ilyin (idv1985)
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

> I will try to fix this in the puppet workflow so that memcached is ready before trying to start keystone.

We run task 'memcached' before 'keystone'. So it is ready before trying to start keystone. Please check this bug https://bugs.launchpad.net/mos/+bug/1471318 it has detailed explanation why we need 300s up interval.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/209589

Changed in fuel:
assignee: Dmitry Ilyin (idv1985) → Vladimir Kuklin (vkuklin)
status: Confirmed → In Progress
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

There is also a fix that introduces critical section for keystone multithreading https://review.fuel-infra.org/#/c/10188/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/212439

Changed in fuel:
assignee: Vladimir Kuklin (vkuklin) → Sergii Golovatiuk (sgolovatiuk)
Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Alex Schultz (alex-schultz)
Changed in fuel:
assignee: Alex Schultz (alex-schultz) → Sergii Golovatiuk (sgolovatiuk)
Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Denis Egorenko (degorenko)
Revision history for this message
Vasyl Saienko (vsaienko) wrote :

Rising it to critical, it affects deployment, related bug: https://bugs.launchpad.net/fuel/+bug/1484066

Changed in fuel:
importance: High → Critical
Changed in fuel:
assignee: Denis Egorenko (degorenko) → Vasyl Saienko (vsaienko)
Changed in fuel:
assignee: Vasyl Saienko (vsaienko) → Denis Egorenko (degorenko)
Changed in fuel:
assignee: Denis Egorenko (degorenko) → Vasyl Saienko (vsaienko)
tags: added: tricky
Changed in fuel:
assignee: Vasyl Saienko (vsaienko) → Sergii Golovatiuk (sgolovatiuk)
Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Vasyl Saienko (vsaienko)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

We're testing dead retry 30 with rise 15 and will provide results here

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

As we also discovered today, the command 'apachectl graceful' instead of the "service apache2 stop; sleep; ...start" helps to address this issue from the other side. It applies conf changes as well, while keeping apache2 parent process pid the same. This allows HAproxy backend to pertain UP w/o additional downtimes inducted by generic restart action. @Vasyl preparing the patch.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Changed in fuel:
assignee: Vasyl Saienko (vsaienko) → Sergii Golovatiuk (sgolovatiuk)
Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Bogdan Dobrelya (bogdando)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The patch was adjusted according the test results https://goo.gl/Hi25QG

tags: added: haproxy
Changed in fuel:
assignee: Bogdan Dobrelya (bogdando) → Sergii Golovatiuk (sgolovatiuk)
Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Bogdan Dobrelya (bogdando)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/212439
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=ad556d88ce93dbc01687f14dd6e129fb9a630060
Submitter: Jenkins
Branch: master

commit ad556d88ce93dbc01687f14dd6e129fb9a630060
Author: Sergii Golovatiuk <email address hidden>
Date: Thu Aug 13 11:42:42 2015 +0200

    Adjust haproxy rise

    Change haproxy settings rise to 30. Keystone will be considered as
    operational after 30 successful checks by a 2 seconds intervals.
    It requires only 2*30 seconds to make it operational after the
    haproxy backend was marked as down/nosrv.
    Adjust Keystone memcached dead_retry setting to 60s to match
    the changed rise value.

    Change-Id: I0d6ff85376c78c0c4f1627fe9628c9ab3c686795
    Closes-Bug: 1480153
    Co-Authored-By: Sergii Golovatiuk <email address hidden>
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/209589
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=3fad9f09dd12a34548333caa43d8ae13caf2f0a9
Submitter: Jenkins
Branch: master

commit 3fad9f09dd12a34548333caa43d8ae13caf2f0a9
Author: Vladimir Kuklin <email address hidden>
Date: Wed Aug 5 19:26:14 2015 +0300

    Set keystone under apache mod_wsgi to MPM 6x3

    This sets keystone under apache WSGI to MPM mode
    limited to 6 worker processess and 3 threads
    in order to share info regarding memcached
    backends between the threads as well as provide
    the optimal performance for high load.

    This should allow us to decrease retry for
    dead memcache backends and thus fix a
    couple of bugs introduced by changes
    that are related to multiple processes
    of keystone checking memcached backends.

    Partial-bug: #1480153
    Related-bug: #1471318
    Related-bug: #1479782

    Change-Id: I01e2c74f8881d4fc208758455b6c13b64f2176c7
    Co-Authored-By: Vladimir Kuklin <email address hidden>

Revision history for this message
Anastasia Palkina (apalkina) wrote :

Verified on ISO #219

"build_id": "2015-08-23_15-01-12", "build_number": "219", "release_versions": {"2015.1.0-7.0": {"VERSION": {"build_id": "2015-08-23_15-01-12", "build_number": "219", "api": "1.0", "fuel-library_sha": "3a3ea6d9849bc1ba35c1bd882f0a0678b20d2e51", "nailgun_sha": "7790ce872512ecdf21689e6a5f970dd7119febdb", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "4c2ab9d6c623d345086c6e2874d1df81fd96a942", "production": "docker", "python-fuelclient_sha": "fc7b63aa6900fe3b2c183108ba6a13e868bc0472", "astute_sha": "53c86cba593ddbac776ce5a3360240274c20738c", "fuel-ostf_sha": "16839cbf471b7142b04c0d2c2d94786bc486fefe", "release": "7.0", "fuelmain_sha": "a494e6628319abfef57e1754f6453cf8f1a4bc65"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "3a3ea6d9849bc1ba35c1bd882f0a0678b20d2e51", "nailgun_sha": "7790ce872512ecdf21689e6a5f970dd7119febdb", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "4c2ab9d6c623d345086c6e2874d1df81fd96a942", "production": "docker", "python-fuelclient_sha": "fc7b63aa6900fe3b2c183108ba6a13e868bc0472", "astute_sha": "53c86cba593ddbac776ce5a3360240274c20738c", "fuel-ostf_sha": "16839cbf471b7142b04c0d2c2d94786bc486fefe", "release": "7.0", "fuelmain_sha": "a494e6628319abfef57e1754f6453cf8f1a4bc65"

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The fix it seems is not so good as we expected, see related bug https://bugs.launchpad.net/fuel/+bug/1488847

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.