Fuel for OpenStack

Keystone wsgi might run out of connections limits for the apache2 process

Bug #1485644 reported by Bogdan Dobrelya on 2015-08-17

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	Critical	Alex Schultz	Fuel for OpenStack 7.0

Bug Description

This is a summary bug from these two: https://bugs.launchpad.net/fuel/+bug/1485597 and https://bugs.launchpad.net/fuel/+bug/1485591

We should:
1) tune Apache2 MPM worker (/etc/apache2/mods-available/worker.conf) with amount of simultaneous connections by increasing Apache2 connection limits from RAM based formula https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/osnailyfacter/manifests/apache_mpm.pp to CPU based:

MaxClients = ServerLimit * ThreadsPerChild (25 default), instead of the current value 100
ServerLimit = $::processorcount, while the current is 4

2) tune Linux sysctls to allow Apache2 to process more connections:
number of connections per a socket (somaxconn), tcp_max_syn_backlog global value, and (optional) enable tcp_abort_on_overflow and tcp_tw_reuse as well.

See original description

Tags:

Bogdan Dobrelya (bogdando) on 2015-08-17

Changed in fuel:
milestone:	none → 7.0
assignee:	nobody → Fuel Library Team (fuel-library)
importance:	Undecided → Critical

Alex Schultz (alex-schultz) on 2015-08-17

Changed in fuel:
status:	New → Confirmed
assignee:	Fuel Library Team (fuel-library) → Alex Schultz (alex-schultz)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-17: Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/213775

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-08-17: Re: Keystone HAProxy backend is too SLOW under wsgi

After reading http://www.haproxy.org/download/1.5/doc/configuration.txt carefully, It seems I misinterpreted the situation.
The messages like "<NOSRV> 0/-1/-1/-1/0 503 212 - - SC-- 2/0/0/0/0 0/0" look not related to the timed out connections, but probably to the connections limits exceeded for the apache proces, a cite:
1) "the connection to the server failed ('SC--')"
2) It is possible that the server refused the connection because of too many already established

summary:

- Keystone HAProxy backend is too SLOW under wsgi
+ Keystone wsgi runs out of connections limits for the apache proces

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-08-17: Re: Keystone wsgi runs out of connections limits for the apache proces

Probably these parameters http://www.beingroot.com/articles/apache/socket-backlog-tuning-for-apache should be adjusted

Revision history for this message

Boris Bobrov (bbobrov) wrote on 2015-08-18:

      SC The server explicitly refused the connection (the proxy received a
          TCP RST or an ICMP in return). Under some circumstances, it can
          also be the network stack telling the proxy that the server is
          unreachable (eg: no route, or no ARP response on local network).

How much time passed since keystone was restarted? Could you please post the logs before the osc command flow?

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-08-18:

@Boris, please look the logs of related bugs to get it from there https://bugs.launchpad.net/fuel/+bug/1485591 https://bugs.launchpad.net/fuel/+bug/1485597

Bogdan Dobrelya (bogdando) on 2015-08-18

description:

updated

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-08-18:

As we discussed with Alexandr Didenko, the same <NOSRV> log message could be related to the high rise 150 value in the haproxy config for the keystone backends. Among with the fastinter 2s, it makes *every* apache2 restart to mark keystone backend DOWN for 2x150 seconds, thus, making the deployment fail as well. But it still makes sense to raise defaults for MaxClients and ServerLimits

description:	updated
description:	updated
tags:	added: haproxy keystone
summary:	- Keystone wsgi runs out of connections limits for the apache proces + Keystone wsgi might run out of connections limits for the apache proces

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2015-08-18: Re: Keystone wsgi might run out of connections limits for the apache proces

This is what I get when I use all the avail Apache sockets and it can't accept new connections:
keystone-1 keystone-1/<NOSRV> 0/-1/-1/-1/0 503 212 - - SC-- 0/0/0/0/0 0/0 "GET /v3/endpoints HTTP/1.1"

This is what I get when I turn Apache off completely:
keystone-1 keystone-1/<NOSRV> 0/-1/-1/-1/0 503 212 - - SC-- 0/0/0/0/0 0/0 "GET /v3/endpoints HTTP/1.1"

This is what I get when I turn Apache back on, but it's still in "Status: DOWN 5/150" status (raising UP from DOWN for 5 minutes):
keystone-1 keystone-1/<NOSRV> 0/-1/-1/-1/0 503 212 - - SC-- 0/0/0/0/0 0/0 "GET /v3/endpoints HTTP/1.1"

As you can see, there's no difference, we're getting those messages in haproxy.log when keystone backend is down. So I don't think that TCP or Apache limits are the problem here. I'm pretty sure, that https://review.openstack.org/209924 would fix this problem.
So I'm marking this bug as duplicate of 1472675

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2015-08-18:

Actually, it's not a duplicate, it's rather invalid since we were not able to find any issues with connections limits for the apache process.

Changed in fuel:
status:	Confirmed → Invalid

Revision history for this message

Alex Schultz (alex-schultz) wrote on 2015-08-18:

I was able to reproduce apache returning connection refused for high load, but I agree that this is likely caused by our haproxy configuration. One tweak we could do is switch from workers to event for apache as that was the default for redhat but not ubuntu which in my testing seems to reduce the occurrence of the connection issues that I saw.

Bogdan Dobrelya (bogdando) on 2015-08-18

description:

updated

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-08-18:

#10

Yes , this bug looks invalid as simple computations show
if 16000M RAM => 16000 / 10 = 1600 maxclients
and 1600 / 25 => 64 ServerLimit (apache2 child pids)

should be enough

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-18: Change abandoned on fuel-library (master)

#11

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.openstack.org/213775

Revision history for this message

Roman Alekseenkov (ralekseenkov) wrote on 2015-08-18: Re: Keystone wsgi might run out of connections limits for the apache proces

#12

Are we going to tweak our HAProxy config? Is there a different bug for it?

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-08-18:

#13

AFAICT, our HAproxy config should only be fixed for the rise value, and here is related patchset https://review.openstack.org/212439

@Aleksandr, I forgot to mention that the maxclient values greater than 256 may not work as expected due to the sysctl somaxconn and tcp_max_syn_backlog defaults, which are only 128. So, this bug is still valid at scale.

Changed in fuel:
status:	Invalid → Confirmed
tags:	added: scale
tags:	added: apache2 wsgi

Bogdan Dobrelya (bogdando) on 2015-08-18

summary:

- Keystone wsgi might run out of connections limits for the apache proces
+ Keystone wsgi might run out of connections limits for the apache2
+ process

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-18: Fix proposed to fuel-library (master)

#14

Fix proposed to branch: master
Review: https://review.openstack.org/214321

Changed in fuel:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-19: Fix merged to fuel-library (master)

#15

Reviewed: https://review.openstack.org/214321
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=0a11a467f35dfd6032e57dc78022684e1aca8cfb
Submitter: Jenkins
Branch: master

commit 0a11a467f35dfd6032e57dc78022684e1aca8cfb
Author: Alex Schultz <email address hidden>
Date: Tue Aug 18 14:57:03 2015 -0500

Increase connection backlogs for apache

    This change increases the net.core.somaxconn from the defaults of 128
    to 4096 and increases net.ipv4.tcp_max_syn_backlog from the defaults of
    128 to 8192. These values only get applied as part of the apache
    modular task.

Change-Id: I2d4c060ffb0c09096fc86bc7d22efba25de29cd4
Closes-Bug: 1485644

Changed in fuel:
status:	In Progress → Fix Committed

Oleksiy Molchanov (omolchanov) on 2015-09-07

tags:

added: on-verification

Revision history for this message

Oleksiy Molchanov (omolchanov) wrote on 2015-09-07:

#16

Verified

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "284"
  build_id: "284"
  nailgun_sha: "5c33995a2e6d9b1b8cdddfa2630689da5084506f"
  python-fuelclient_sha: "1ce8ecd8beb640f2f62f73435f4e18d1469979ac"
  fuel-agent_sha: "082a47bf014002e515001be05f99040437281a2d"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "8283dc2932c24caab852ae9de15f94605cc350c6"
  fuel-library_sha: "f81fdabe6c05be7a3d11d88a7c3a8f3931921c73"
  fuel-ostf_sha: "1f08e6e71021179b9881a824d9c999957fcc7045"
  fuelmain_sha: "9ab01caf960013dc882825dc9b0e11ccf0b81cb0"

Changed in fuel:
status:	Fix Committed → Fix Released

Oleksiy Molchanov (omolchanov) on 2015-09-07

tags:

removed: on-verification

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.