Fuel for OpenStack

haproxy for keystone-admin takes a 240-300s

Bug #1480153 reported by Matthew Mosesohn on 2015-07-31

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	Critical	Bogdan Dobrelya	Fuel for OpenStack 7.0

Bug Description

It seems that the dead time for waiting on memcached backend in keystone is too high (300s), causing the haproxy backend wait process to hang for 240-300s. We should lower the time to perhaps 15s and it will cut down the time for deployment and also minimize keystone downtime if memcached restarts.

memcache_dead_retry = 300
Setting is here:
https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/openstack/manifests/keystone.pp#L155

I came to these values by looking at puppet logs from 7.0 ubuntu BVT #112 and my local deploy.
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "113"
  build_id: "2015-07-30_16-01-07"
  nailgun_sha: "21ba6e2606a056883734392187845c172ecf99aa"
  python-fuelclient_sha: "71bb8fa87ee25f0c1bb84317884da7c917902a63"
  fuel-agent_sha: "dee9f2eb7e2822e89f6253f500f0c2e376a5b824"
  fuel-nailgun-agent_sha: "1512b9af6b41cc95c4d891c593aeebe0faca5a63"
  astute_sha: "488db988a1f2e18f99decf417371c50b2a7fb794"
  fuel-library_sha: "d1291ae75680818e715608814422075049a10ce8"
  fuel-ostf_sha: "92cdab6c6829be0d2d0c561fe56346dac8708d95"
  fuelmain_sha: "de5b333815f8541224c6726dc8446ffc7fb18b5b"

Tags:

Matthew Mosesohn (raytrac3r) on 2015-07-31

tags:	added: keystone low-hanging-fruit
Changed in fuel:
milestone:	none → 7.0

Ihor Kalnytskyi (ikalnytskyi) on 2015-07-31

Changed in fuel:
assignee:	nobody → Fuel Library Team (fuel-library)
importance:	Undecided → Medium
status:	New → Confirmed

Revision history for this message

Oleksiy Molchanov (omolchanov) wrote on 2015-07-31:

Please take a look on a bug, where we set 300 https://bugs.launchpad.net/mos/+bug/1471318

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-07-31:

@Boris, could you take a look?

Revision history for this message

Boris Bobrov (bbobrov) wrote on 2015-07-31:

We cannot set memcache_dead_retry to low values because it leads to a very bad performace of keystone during controller failure. The lower the value, the more often each keystone process will spend on trying to connect to dead memcache hosts.

Here is a wild guess for a workaround: restart haproxy after you've restarted keystone.

Vladimir Kuklin (vkuklin) on 2015-08-03

Changed in fuel:
importance:	Medium → High

Oleksiy Molchanov (omolchanov) on 2015-08-04

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Oleksiy Molchanov (omolchanov)

Boris Bobrov (bbobrov) on 2015-08-04

tags:

removed: low-hanging-fruit

Revision history for this message

Boris Bobrov (bbobrov) wrote on 2015-08-04:

> We should lower the time to perhaps 15s and it will cut down the time for deployment and also minimize keystone downtime if memcached restarts.

> minimize keystone downtime if memcached restarts

1. This will not help to minimize downtime.
2. It will badly affect performance if one of the nodes fail.

Please also note that default in keystone is 300.

> It seems that the dead time for waiting on memcached backend in keystone is too high (300s), causing the haproxy backend wait process to hang for 240-300s.

Why is it bad? What issues does it cause and at what stage?

Oleksiy Molchanov (omolchanov) on 2015-08-04

Changed in fuel:
assignee:	Oleksiy Molchanov (omolchanov) → Matthew Mosesohn (raytrac3r)

Revision history for this message

Matthew Mosesohn (raytrac3r) wrote on 2015-08-04:

Boris, it's bad that our deployment hangs for 300s. Unnecessary delays in deploy are by nature wrong. I will try to fix this in the puppet workflow so that memcached is ready before trying to start keystone.

Dmitry Ilyin (idv1985) on 2015-08-05

Changed in fuel:
assignee:	Matthew Mosesohn (raytrac3r) → Dmitry Ilyin (idv1985)

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2015-08-05:

> I will try to fix this in the puppet workflow so that memcached is ready before trying to start keystone.

We run task 'memcached' before 'keystone'. So it is ready before trying to start keystone. Please check this bug https://bugs.launchpad.net/mos/+bug/1471318 it has detailed explanation why we need 300s up interval.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-05: Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/209589

Changed in fuel:
assignee:	Dmitry Ilyin (idv1985) → Vladimir Kuklin (vkuklin)
status:	Confirmed → In Progress

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2015-08-07:

There is also a fix that introduces critical section for keystone multithreading https://review.fuel-infra.org/#/c/10188/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-13:

Fix proposed to branch: master
Review: https://review.openstack.org/212439

Changed in fuel:
assignee:	Vladimir Kuklin (vkuklin) → Sergii Golovatiuk (sgolovatiuk)

OpenStack Infra (hudson-openstack) on 2015-08-13

Changed in fuel:
assignee:	Sergii Golovatiuk (sgolovatiuk) → Alex Schultz (alex-schultz)

OpenStack Infra (hudson-openstack) on 2015-08-13

Changed in fuel:
assignee:	Alex Schultz (alex-schultz) → Sergii Golovatiuk (sgolovatiuk)

OpenStack Infra (hudson-openstack) on 2015-08-14

Changed in fuel:
assignee:	Sergii Golovatiuk (sgolovatiuk) → Denis Egorenko (degorenko)

Revision history for this message

Vasyl Saienko (vsaienko) wrote on 2015-08-15:

#10

Rising it to critical, it affects deployment, related bug: https://bugs.launchpad.net/fuel/+bug/1484066

Changed in fuel:
importance:	High → Critical

OpenStack Infra (hudson-openstack) on 2015-08-15

Changed in fuel:
assignee:	Denis Egorenko (degorenko) → Vasyl Saienko (vsaienko)

OpenStack Infra (hudson-openstack) on 2015-08-17

Changed in fuel:
assignee:	Vasyl Saienko (vsaienko) → Denis Egorenko (degorenko)

OpenStack Infra (hudson-openstack) on 2015-08-17

Changed in fuel:
assignee:	Denis Egorenko (degorenko) → Vasyl Saienko (vsaienko)

Bogdan Dobrelya (bogdando) on 2015-08-17

tags:

added: tricky

OpenStack Infra (hudson-openstack) on 2015-08-18

Changed in fuel:
assignee:	Vasyl Saienko (vsaienko) → Sergii Golovatiuk (sgolovatiuk)

OpenStack Infra (hudson-openstack) on 2015-08-18

Changed in fuel:
assignee:	Sergii Golovatiuk (sgolovatiuk) → Vasyl Saienko (vsaienko)

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-08-18:

#11

We're testing dead retry 30 with rise 15 and will provide results here

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-08-18:

#12

As we also discovered today, the command 'apachectl graceful' instead of the "service apache2 stop; sleep; ...start" helps to address this issue from the other side. It applies conf changes as well, while keeping apache2 parent process pid the same. This allows HAproxy backend to pertain UP w/o additional downtimes inducted by generic restart action. @Vasyl preparing the patch.

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-08-18:

#13

The related patch is https://review.openstack.org/209924

OpenStack Infra (hudson-openstack) on 2015-08-19

Changed in fuel:
assignee:	Vasyl Saienko (vsaienko) → Sergii Golovatiuk (sgolovatiuk)

OpenStack Infra (hudson-openstack) on 2015-08-20

Changed in fuel:
assignee:	Sergii Golovatiuk (sgolovatiuk) → Bogdan Dobrelya (bogdando)

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-08-20:

#14

The patch was adjusted according the test results https://goo.gl/Hi25QG

Bogdan Dobrelya (bogdando) on 2015-08-20

tags:

added: haproxy

Sergii Golovatiuk (sgolovatiuk) on 2015-08-20

Changed in fuel:
assignee:	Bogdan Dobrelya (bogdando) → Sergii Golovatiuk (sgolovatiuk)

OpenStack Infra (hudson-openstack) on 2015-08-21

Changed in fuel:
assignee:	Sergii Golovatiuk (sgolovatiuk) → Bogdan Dobrelya (bogdando)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-21: Fix merged to fuel-library (master)

#15

Reviewed: https://review.openstack.org/212439
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=ad556d88ce93dbc01687f14dd6e129fb9a630060
Submitter: Jenkins
Branch: master

commit ad556d88ce93dbc01687f14dd6e129fb9a630060
Author: Sergii Golovatiuk <email address hidden>
Date: Thu Aug 13 11:42:42 2015 +0200

Adjust haproxy rise

    Change haproxy settings rise to 30. Keystone will be considered as
    operational after 30 successful checks by a 2 seconds intervals.
    It requires only 2*30 seconds to make it operational after the
    haproxy backend was marked as down/nosrv.
    Adjust Keystone memcached dead_retry setting to 60s to match
    the changed rise value.

    Change-Id: I0d6ff85376c78c0c4f1627fe9628c9ab3c686795
    Closes-Bug: 1480153
    Co-Authored-By: Sergii Golovatiuk <email address hidden>
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-21:

#16

Reviewed: https://review.openstack.org/209589
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=3fad9f09dd12a34548333caa43d8ae13caf2f0a9
Submitter: Jenkins
Branch: master

commit 3fad9f09dd12a34548333caa43d8ae13caf2f0a9
Author: Vladimir Kuklin <email address hidden>
Date: Wed Aug 5 19:26:14 2015 +0300

Set keystone under apache mod_wsgi to MPM 6x3

    This sets keystone under apache WSGI to MPM mode
    limited to 6 worker processess and 3 threads
    in order to share info regarding memcached
    backends between the threads as well as provide
    the optimal performance for high load.

    This should allow us to decrease retry for
    dead memcache backends and thus fix a
    couple of bugs introduced by changes
    that are related to multiple processes
    of keystone checking memcached backends.

    Partial-bug: #1480153
    Related-bug: #1471318
    Related-bug: #1479782

Change-Id: I01e2c74f8881d4fc208758455b6c13b64f2176c7
Co-Authored-By: Vladimir Kuklin <email address hidden>

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2015-08-24:

#17

Verified on ISO #219

"build_id": "2015-08-23_15-01-12", "build_number": "219", "release_versions": {"2015.1.0-7.0": {"VERSION": {"build_id": "2015-08-23_15-01-12", "build_number": "219", "api": "1.0", "fuel-library_sha": "3a3ea6d9849bc1ba35c1bd882f0a0678b20d2e51", "nailgun_sha": "7790ce872512ecdf21689e6a5f970dd7119febdb", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "4c2ab9d6c623d345086c6e2874d1df81fd96a942", "production": "docker", "python-fuelclient_sha": "fc7b63aa6900fe3b2c183108ba6a13e868bc0472", "astute_sha": "53c86cba593ddbac776ce5a3360240274c20738c", "fuel-ostf_sha": "16839cbf471b7142b04c0d2c2d94786bc486fefe", "release": "7.0", "fuelmain_sha": "a494e6628319abfef57e1754f6453cf8f1a4bc65"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "3a3ea6d9849bc1ba35c1bd882f0a0678b20d2e51", "nailgun_sha": "7790ce872512ecdf21689e6a5f970dd7119febdb", "feature_groups": ["mirantis"], "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c", "openstack_version": "2015.1.0-7.0", "fuel-agent_sha": "4c2ab9d6c623d345086c6e2874d1df81fd96a942", "production": "docker", "python-fuelclient_sha": "fc7b63aa6900fe3b2c183108ba6a13e868bc0472", "astute_sha": "53c86cba593ddbac776ce5a3360240274c20738c", "fuel-ostf_sha": "16839cbf471b7142b04c0d2c2d94786bc486fefe", "release": "7.0", "fuelmain_sha": "a494e6628319abfef57e1754f6453cf8f1a4bc65"

Changed in fuel:
status:	Fix Committed → Fix Released

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-08-26:

#18

The fix it seems is not so good as we expected, see related bug https://bugs.launchpad.net/fuel/+bug/1488847

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.