Keystone answer with Bad Gateway (HTTP 502) after rollback

Bug #1495933 reported by Egor Kotko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Medium
Matthew Mosesohn
7.0.x
Won't Fix
Medium
Matthew Mosesohn

Bug Description

https://product-ci.infra.mirantis.net/job/7.0.system_test.ubuntu.rollback_one_controller/65/testReport/%28root%29/rollback_automatically_ha_one_controller/rollback_automatically_ha_one_controller/

 Scenario:
1. Revert snapshot with deploy Neutron VXLAN env
2. Add raise exception to docker_engine.py file
3. Run upgrade on master
4. Check that rollback starts automatically
5. Check that cluster was not upgraded
6. Run network verification
7. Run OSTF
8. Add 1 ceph node and re-deploy cluster
9. Run OSTF

Traceback on "fuel node" command:
http://paste.openstack.org/show/462779/

keystone-all.log contains:
http://paste.openstack.org/show/462813/

After 10 min cluster starts working normally.

{"build_id": "2015-06-19_13-02-31", "build_number": "525", "release_versions": {"2014.2.2-6.1": {"VERSION": {"build_id": "2015-06-19_13-02-31", "build_number": "525", "api": "1.0", "fuel-library_sha": "2e7a08ad9792c700ebf08ce87f4867df36aa9fab", "nailgun_sha": "dbd54158812033dd8cfd7e60c3f6650f18013a37", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "4fc55db0265bbf39c369df398b9dc7d6469ba13b", "astute_sha": "1ea8017fe8889413706d543a5b9f557f5414beae", "fuel-ostf_sha": "8fefcf7c4649370f00847cc309c24f0b62de718d", "release": "6.1", "fuelmain_sha": "a3998372183468f56019c8ce21aa8bb81fee0c2f"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "2e7a08ad9792c700ebf08ce87f4867df36aa9fab", "nailgun_sha": "dbd54158812033dd8cfd7e60c3f6650f18013a37", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "4fc55db0265bbf39c369df398b9dc7d6469ba13b", "astute_sha": "1ea8017fe8889413706d543a5b9f557f5414beae", "fuel-ostf_sha": "8fefcf7c4649370f00847cc309c24f0b62de718d", "release": "6.1", "fuelmain_sha": "a3998372183468f56019c8ce21aa8bb81fee0c2f"}

Revision history for this message
Egor Kotko (ykotko) wrote :
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

I think the issue is that we raise exception during rollback and keystone container is not removed. Is this the case?

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Matthew Mosesohn (raytrac3r)
status: New → Confirmed
Egor Kotko (ykotko)
Changed in fuel:
assignee: Matthew Mosesohn (raytrac3r) → Fuel Python Team (fuel-python)
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

http://paste.openstack.org/show/MIHIp4pIKPAFjHQ9Bmpx/

It's actually not fuel-upgrade's fault here after a deeper look. It seems Docker killed the container after it was unresponsive, but the port mapping failed. netstat.txt in the log doesn't support that conclusion. I'm waiting now for Egor to bring back the environment.

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Matthew Mosesohn (raytrac3r)
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Regarding 7.0, this is rather tricky to fix and the issue goes away on its own, actually. Something is keeping the port 35357 bound, but it isn't an obvious running process. Let's postpone to 8.0 since it's not a critical blocker. It only affects failed upgrades and keystone will recover on its own without any intervention.

For QA team, please raise the timeout to 10 minutes if keystone is unavailable on rollback. It takes 4-7 minutes for keystone to recover and start normally during our tests.

For release notes:
There is a known issue when a Fuel 6.1 -> 7.0 upgrade fails. One of Keystone's ports fails to bind and the service fails to start when Fuel Upgrade reverts the environment back to 6.1. This issue can be worked around by waiting approximately 10 minutes and then verify normal operation via ``dockerctl check all``
Note that this impacts only users who attempt at 6.1 -> 7.0 upgrade and the upgrade fails for some reason.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-docs (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/223732

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-docs (master)

Reviewed: https://review.openstack.org/223732
Committed: https://git.openstack.org/cgit/stackforge/fuel-docs/commit/?id=63a15bb8e9b3d69785d8dde81bce96cbfc89aabb
Submitter: Jenkins
Branch: master

commit 63a15bb8e9b3d69785d8dde81bce96cbfc89aabb
Author: evkonstantinov <email address hidden>
Date: Tue Sep 15 21:07:12 2015 +0300

    Add upgrade issue to relnotes

    Change-Id: I13a8113942814e310ab948ff22ab0d2230665f7e
    Related-bug:#1495933

Dmitry Pyzhov (dpyzhov)
tags: added: tricky
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

For 8.0, this may be resolved when we upgrade to CentOS 7.0 and install docker >1.6. I'll check on that timeline and update this bug soon

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Can't reproduce yet, moving to incomplete.

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/8.0.x
Dmitry Pyzhov (dpyzhov)
tags: added: area-library
Revision history for this message
Dmitry Klenov (dklenov) wrote :

@Yegor, can you please try to reproduce this issue on 8.0 master? If it is not reproducible, let's move it from incomplete to invalid.

Revision history for this message
Egor Kotko (ykotko) wrote :

The case was not reproduced on 8.0 Ci but still sometimes it is possible to get environment with similat problem:
keystone.common.environment.eventlet_server [-] Could not bind to 0.0.0.0:35357

http://paste.openstack.org/show/478111/

Revision history for this message
Egor Kotko (ykotko) wrote :
Revision history for this message
Egor Kotko (ykotko) wrote :
Changed in fuel:
status: Incomplete → New
Dmitry Klenov (dklenov)
Changed in fuel:
status: New → Confirmed
tags: added: blocked
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

The paste shows keystone w/ eventlet. in 8.0, we are using Keystone under Apache right? So where is this coming from and why are we not using the "normal" configuration?

Thanks,
Dims

tags: added: team-bugfix
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 8.0 → 9.0
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Invalid for 9.0 because the upgrade process is quite different.

Changed in fuel:
status: Confirmed → Invalid
tags: removed: blocked
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.