Keystone wsgi might run out of connections limits for the apache2 process

Bug #1485644 reported by Bogdan Dobrelya
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Alex Schultz

Bug Description

This is a summary bug from these two: https://bugs.launchpad.net/fuel/+bug/1485597 and https://bugs.launchpad.net/fuel/+bug/1485591

We should:
1) tune Apache2 MPM worker (/etc/apache2/mods-available/worker.conf) with amount of simultaneous connections by increasing Apache2 connection limits from RAM based formula https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/osnailyfacter/manifests/apache_mpm.pp to CPU based:

MaxClients = ServerLimit * ThreadsPerChild (25 default), instead of the current value 100
ServerLimit = $::processorcount, while the current is 4

2) tune Linux sysctls to allow Apache2 to process more connections:
number of connections per a socket (somaxconn), tcp_max_syn_backlog global value, and (optional) enable tcp_abort_on_overflow and tcp_tw_reuse as well.

Changed in fuel:
milestone: none → 7.0
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → Critical
Changed in fuel:
status: New → Confirmed
assignee: Fuel Library Team (fuel-library) → Alex Schultz (alex-schultz)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/213775

Revision history for this message
Bogdan Dobrelya (bogdando) wrote : Re: Keystone HAProxy backend is too SLOW under wsgi

After reading http://www.haproxy.org/download/1.5/doc/configuration.txt carefully, It seems I misinterpreted the situation.
The messages like "<NOSRV> 0/-1/-1/-1/0 503 212 - - SC-- 2/0/0/0/0 0/0" look not related to the timed out connections, but probably to the connections limits exceeded for the apache proces, a cite:
1) "the connection to the server failed ('SC--')"
2) It is possible that the server refused the connection because of too many already established

summary: - Keystone HAProxy backend is too SLOW under wsgi
+ Keystone wsgi runs out of connections limits for the apache proces
Revision history for this message
Bogdan Dobrelya (bogdando) wrote : Re: Keystone wsgi runs out of connections limits for the apache proces
Revision history for this message
Boris Bobrov (bbobrov) wrote :

      SC The server explicitly refused the connection (the proxy received a
          TCP RST or an ICMP in return). Under some circumstances, it can
          also be the network stack telling the proxy that the server is
          unreachable (eg: no route, or no ARP response on local network).

How much time passed since keystone was restarted? Could you please post the logs before the osc command flow?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Boris, please look the logs of related bugs to get it from there https://bugs.launchpad.net/fuel/+bug/1485591 https://bugs.launchpad.net/fuel/+bug/1485597

description: updated
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

As we discussed with Alexandr Didenko, the same <NOSRV> log message could be related to the high rise 150 value in the haproxy config for the keystone backends. Among with the fastinter 2s, it makes *every* apache2 restart to mark keystone backend DOWN for 2x150 seconds, thus, making the deployment fail as well. But it still makes sense to raise defaults for MaxClients and ServerLimits

description: updated
description: updated
tags: added: haproxy keystone
summary: - Keystone wsgi runs out of connections limits for the apache proces
+ Keystone wsgi might run out of connections limits for the apache proces
Revision history for this message
Aleksandr Didenko (adidenko) wrote : Re: Keystone wsgi might run out of connections limits for the apache proces

This is what I get when I use all the avail Apache sockets and it can't accept new connections:
keystone-1 keystone-1/<NOSRV> 0/-1/-1/-1/0 503 212 - - SC-- 0/0/0/0/0 0/0 "GET /v3/endpoints HTTP/1.1"

This is what I get when I turn Apache off completely:
keystone-1 keystone-1/<NOSRV> 0/-1/-1/-1/0 503 212 - - SC-- 0/0/0/0/0 0/0 "GET /v3/endpoints HTTP/1.1"

This is what I get when I turn Apache back on, but it's still in "Status: DOWN 5/150" status (raising UP from DOWN for 5 minutes):
keystone-1 keystone-1/<NOSRV> 0/-1/-1/-1/0 503 212 - - SC-- 0/0/0/0/0 0/0 "GET /v3/endpoints HTTP/1.1"

As you can see, there's no difference, we're getting those messages in haproxy.log when keystone backend is down. So I don't think that TCP or Apache limits are the problem here. I'm pretty sure, that https://review.openstack.org/209924 would fix this problem.
So I'm marking this bug as duplicate of 1472675

Revision history for this message
Aleksandr Didenko (adidenko) wrote :

Actually, it's not a duplicate, it's rather invalid since we were not able to find any issues with connections limits for the apache process.

Changed in fuel:
status: Confirmed → Invalid
Revision history for this message
Alex Schultz (alex-schultz) wrote :

I was able to reproduce apache returning connection refused for high load, but I agree that this is likely caused by our haproxy configuration. One tweak we could do is switch from workers to event for apache as that was the default for redhat but not ubuntu which in my testing seems to reduce the occurrence of the connection issues that I saw.

description: updated
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Yes , this bug looks invalid as simple computations show
if 16000M RAM => 16000 / 10 = 1600 maxclients
and 1600 / 25 => 64 ServerLimit (apache2 child pids)

should be enough

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.openstack.org/213775

Revision history for this message
Roman Alekseenkov (ralekseenkov) wrote : Re: Keystone wsgi might run out of connections limits for the apache proces

Are we going to tweak our HAProxy config? Is there a different bug for it?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

AFAICT, our HAproxy config should only be fixed for the rise value, and here is related patchset https://review.openstack.org/212439

@Aleksandr, I forgot to mention that the maxclient values greater than 256 may not work as expected due to the sysctl somaxconn and tcp_max_syn_backlog defaults, which are only 128. So, this bug is still valid at scale.

Changed in fuel:
status: Invalid → Confirmed
tags: added: scale
tags: added: apache2 wsgi
summary: - Keystone wsgi might run out of connections limits for the apache proces
+ Keystone wsgi might run out of connections limits for the apache2
+ process
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/214321

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/214321
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=0a11a467f35dfd6032e57dc78022684e1aca8cfb
Submitter: Jenkins
Branch: master

commit 0a11a467f35dfd6032e57dc78022684e1aca8cfb
Author: Alex Schultz <email address hidden>
Date: Tue Aug 18 14:57:03 2015 -0500

    Increase connection backlogs for apache

    This change increases the net.core.somaxconn from the defaults of 128
    to 4096 and increases net.ipv4.tcp_max_syn_backlog from the defaults of
    128 to 8192. These values only get applied as part of the apache
    modular task.

    Change-Id: I2d4c060ffb0c09096fc86bc7d22efba25de29cd4
    Closes-Bug: 1485644

Changed in fuel:
status: In Progress → Fix Committed
tags: added: on-verification
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Verified

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "284"
  build_id: "284"
  nailgun_sha: "5c33995a2e6d9b1b8cdddfa2630689da5084506f"
  python-fuelclient_sha: "1ce8ecd8beb640f2f62f73435f4e18d1469979ac"
  fuel-agent_sha: "082a47bf014002e515001be05f99040437281a2d"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "8283dc2932c24caab852ae9de15f94605cc350c6"
  fuel-library_sha: "f81fdabe6c05be7a3d11d88a7c3a8f3931921c73"
  fuel-ostf_sha: "1f08e6e71021179b9881a824d9c999957fcc7045"
  fuelmain_sha: "9ab01caf960013dc882825dc9b0e11ccf0b81cb0"

Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.