Apache graceful restart leads to mod_wsgi segfault

Bug #1493353 reported by Vladimir Kuklin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Won't Fix
Critical
Vladimir Kuklin
6.1.x
Invalid
Undecided
Vladimir Kuklin
7.0.x
Won't Fix
High
Vladimir Kuklin

Bug Description

Please look into for details https://bugs.launchpad.net/fuel/7.0.x/+bug/1490523/comments/20

This needs to be worked around until upstream fix for mod_wsgi is available.

This happens during cluster scale-up/scale-down operations as apache gets reloaded and thus mod_wsgi starts misbehaving which makes deployment fail as we cannot operate with keystone entities.

For details also look into

https://github.com/GrahamDumpleton/mod_wsgi/issues/81

Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

The workaround is simple - do not reload mod_wsgi so often. As github comments suggest (and my comments in the bug/1490523), there should be at least 1 minute pause before restarts. Or, alternatively, run keystone as a separate process.

Revision history for this message
Igor Marnat (imarnat) wrote :

+amogylchenko: Sasha, I don't think we do reload of mod_wsgi more often than 1 in a minute. What makes you think so?

Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

There was somewhat similar investigation here:
https://bugs.launchpad.net/mos/+bug/1481671/comments/3

And comments to the bug say so:
https://github.com/GrahamDumpleton/mod_wsgi/issues/81#issue-94403134
> Note that this happens if there is less than 1 second in between reloads:

Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

Since it is not clear how to reproduce the problem from this description, please provide more detailed information:
1. Environment you are using
2. Apache and mod-wsgi versions
3. Shell command to reproduce a problem

If steps are similar to those in upstream github issue, please do not restart mod_wsgi so often, or keep keystone as a separate process.

Changed in fuel:
status: Confirmed → Incomplete
assignee: MOS Linux (mos-linux) → Vladimir Kuklin (vkuklin)
Revision history for this message
Igor Marnat (imarnat) wrote :

@amogylchenko: Sasha, there are more details in bug https://bugs.launchpad.net/mos/+bug/1491576. Does this answer your questions?

Changed in fuel:
status: Incomplete → Confirmed
assignee: Vladimir Kuklin (vkuklin) → MOS Linux (mos-linux)
Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

Please provide the following information:
1. Environment you are using
2. Apache and mod-wsgi versions
3. Shell command to reproduce a problem

Changed in fuel:
status: Confirmed → Incomplete
assignee: MOS Linux (mos-linux) → Vladimir Kuklin (vkuklin)
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

1. apache2:
  Установлен: 2.4.7-1ubuntu4.5
  Кандидат: 2.4.7-1ubuntu4.5
  Таблица версий:
 *** 2.4.7-1ubuntu4.5 0
        500 http://mirrors.msk.mirantis.net/ubuntu/ trusty-updates/main amd64 Packages
        500 http://mirrors.msk.mirantis.net/ubuntu/ trusty-security/main amd64 Packages
        100 /var/lib/dpkg/status
     2.4.7-1ubuntu4 0
        500 http://mirrors.msk.mirantis.net/ubuntu/ trusty/main amd64 Packages
libapache2-mod-wsgi:
  Установлен: 3.4-4ubuntu2.1.14.04.2
  Кандидат: 3.4-4ubuntu2.1.14.04.2
  Таблица версий:
 *** 3.4-4ubuntu2.1.14.04.2 0
        500 http://mirrors.msk.mirantis.net/ubuntu/ trusty-updates/main amd64 Packages
        500 http://mirrors.msk.mirantis.net/ubuntu/ trusty-security/main amd64 Packages
        100 /var/lib/dpkg/status
     3.4-4ubuntu2 0
        500 http://mirrors.msk.mirantis.net/ubuntu/ trusty/main amd64 Packages

2. Any fuel mod_wsgi enabled ISO. E.g. 286

3.

a) in one shell run periodic `openstack token issue` command or similar rally scenarion that introduces some workloads
b) in the other shell run "while :; do let i=i+1; echo -e "`date`\n"; apachectl graceful 2>&1; sleep N; done"

Grep for seg fault messages for apache in /var/log/syslog.

According to the symptoms N can be any arbitrary number up to 60 .

Changed in fuel:
status: Incomplete → Confirmed
assignee: Vladimir Kuklin (vkuklin) → Aleksander Mogylchenko (amogylchenko)
Revision history for this message
Aleksander Mogylchenko (amogylchenko) wrote :

Unable to reproduce after an hour run with N=30 using steps provided on ISO #187. It has mod_wsgi enables with the following parameters:
WSGIDaemonProcess keystone_main display-name=keystone-main group=keystone processes=2 threads=1 user=keystone

I was running 'openstack tocken issue' in one console (endless loop without sleeps):
while true; do openstack token issue; done

And the exact loop as provided in the other console with N=30:
while :; do let i=i+1; echo -e "`date`\n"; apachectl graceful 2>&1; sleep 30; done

Changed in fuel:
status: Confirmed → Incomplete
assignee: Aleksander Mogylchenko (amogylchenko) → Vladimir Kuklin (vkuklin)
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

according to the discussion, this bug is as-designed feature of apache2 which cannot handle cold/warm restarts properly. so far, we are closing this bug and will switch to uwsgi and apparently nginx in the upcoming releases.

Changed in fuel:
status: Incomplete → Won't Fix
tags: added: release-notes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/222360

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (stable/7.0)

Related fix proposed to branch: stable/7.0
Review: https://review.openstack.org/222362

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/222360
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=92a4ed5bb811c62263d72cf76b8cbd7a719fed3f
Submitter: Jenkins
Branch: master

commit 92a4ed5bb811c62263d72cf76b8cbd7a719fed3f
Author: Vladimir Kuklin <email address hidden>
Date: Thu Sep 10 23:44:48 2015 +0300

    Add workaround for apache restart during deployment

    As figured out in the following launcpad bug
    https://bugs.launchpad.net/fuel/+bug/1493353
    Apache2 is not very friendly to any types of
    restarts. This may lead to the issue when we
    restart apache to often during deployment and
    it gets into Byzantine unresponsive state
    along with keystone which breaks keystone
    providers operations.

    Change-Id: I11c52089e9598fc6d088c3478c90de3aa853652a
    Closes-bug: #1493372
    Related-bug: #1493353

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (stable/7.0)

Reviewed: https://review.openstack.org/222362
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=d4723a87bee9ab1a958bdec174b222a0fcd05d5c
Submitter: Jenkins
Branch: stable/7.0

commit d4723a87bee9ab1a958bdec174b222a0fcd05d5c
Author: Vladimir Kuklin <email address hidden>
Date: Thu Sep 10 23:44:48 2015 +0300

    Add workaround for apache restart during deployment

    As figured out in the following launcpad bug
    https://bugs.launchpad.net/fuel/+bug/1493353
    Apache2 is not very friendly to any types of
    restarts. This may lead to the issue when we
    restart apache to often during deployment and
    it gets into Byzantine unresponsive state
    along with keystone which breaks keystone
    providers operations.

    Change-Id: I11c52089e9598fc6d088c3478c90de3aa853652a
    Closes-bug: #1493372
    Related-bug: #1493353

Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

Hello,

Unfortunately, the fix does not help. The bug is still present in 8.0 (mentioned in https://bugs.launchpad.net/fuel/+bug/1506449).
According to the information available in the Internet, apache does seg fault if two restart commands are issues rapidly.
So, the fix might look like:
apachectl graceful || sleep 15 && apachectl restart

--
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  openstack_version: "2015.1.0-8.0"
  api: "1.0"
  build_number: "204"
  build_id: "204"
  fuel-nailgun_sha: "5a3b8907ae9ebd56c354436a9e8c9a47edf459ad"
  python-fuelclient_sha: "2a1b048cc439986e222ece43a290b5cc68e92a77"
  fuel-agent_sha: "d2103bee6e216396eb8e308ec5448328c9ee4261"
  fuel-nailgun-agent_sha: "00b4b11553c250f22c0079fb74c8b782dcb7b740"
  astute_sha: "cfd5d6b916a17ad2f73e6c567a0365845155b0e3"
  fuel-library_sha: "7794da76fd5797c4c4242fb4e70e3757d37c4a01"
  fuel-ostf_sha: "1ab201cb8c3bba04522bf56ce72e863a03ff09b3"
  fuel-createmirror_sha: "6e1b82b2059a20f1fa9a4d794b976edaad156b85"
  fuelmenu_sha: "e68335c88feca803c97d75ae5a6e7de1e3f330dc"
  shotgun_sha: "bbbfccff9eb90895b13fae3fac398e65efe646f4"
  fuelmain_sha: "058e07386350bfa0a8365818cf75893949e0d863"

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Kyrylo, if issue is present in 8.0, please add proper milestone to it, instead of reopen old one.

Revision history for this message
Dmitry Klenov (dklenov) wrote :

I see that the fix was merged to stable/7.0 branch. Vladimir, can you please confirm that no more fixes to stable/7.0 are expected? Please also move bug to 'fix committed' in this case.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Please note, all bugs involving a cluster scale-up/down should have this tag

tags: added: life-cycle-management
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Kiril, please create another bug as RCA of this bug was identified and fixed and tested many times. Please create new bug and add the test case and all the details required. In the worst case we will mark it as a duplicate of this one, but I suspect you have an outdated mod_wsgi version for your installation.

Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

Hello,

The new bug was originally filed to https://bugs.launchpad.net/fuel/+bug/1506449

--
Kyrylo

Dmitry Pyzhov (dpyzhov)
tags: added: area-library
Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/8.0.x
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: Incomplete → Won't Fix
status: Won't Fix → Invalid
status: Invalid → Won't Fix
tags: added: 8.0 release-notes-done
removed: release-notes
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.