Upgrades, sometimes keystone container fails to sync database

Bug #1364087 reported by Evgeniy L
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Evgeniy L

Bug Description

1. install 5.0 fuel
2. run upgrade to 5.1

Actual result:
Sometimes (1 of 50 runs) on slow environments it fails.
Puppet in keystone container cannot sync db [1]
Migrations run is not atomic, as result in db we have record that migration 8 [2] was not ran, but it was ran because there are this tables [3].

Upgrade fails with error
2014-09-01 15:09:34 INFO 21568 (health_checker) Failed checkers: ['integration_postgres_nailgun_nginx', 'integration_ostf_keystone', 'keystone']

[1] http://paste.openstack.org/show/104284/
[2] http://paste.openstack.org/show/104274/
[3]http://paste.openstack.org/show/104283/

Expected result:
Upgrade passes without errors

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "5.1"
  api: "1.0"
  build_number: "491"
  build_id: "2014-09-01_00-01-17"
  astute_sha: "bc60b7d027ab244039f48c505ac52ab8eb0a990c"
  fuellib_sha: "2cfa83119ae90b13a5bac6a844bdadfaf5aeb13f"
  ostf_sha: "4dcd99cc4bfa19f52d4b87ed321eb84ff03844da"
  nailgun_sha: "d25ed02948a8be773e2bd87cfe583ef7be866bb2"
  fuelmain_sha: "109812be3425408dd7be192b5debf109cb1edd4c"

Tags: upgrade
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/118193

Evgeniy L (rustyrobot)
summary: - Upgrades, sometimes keystone container fails do sync database
+ Upgrades, sometimes keystone container fails to sync database
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/118193
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=0e01a20b3fa8315bc7d891189e4a2c80b86adada
Submitter: Jenkins
Branch: master

commit 0e01a20b3fa8315bc7d891189e4a2c80b86adada
Author: Evgeniy L <email address hidden>
Date: Mon Sep 1 20:53:19 2014 +0400

    Upgrade, increase timeout for containers stopping

    Stop container call send 9 signal after
    10 seconds, sometimes it can interrupt
    non atomic operations, such as keystone
    database migration.
    Increase timeout to prevent such issues.

    Change-Id: Ia6c09ea0b14fba61c141959a2d065b487d1970ef
    Closes-bug: #1364087

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/118387

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/118387
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=3ebbe106f1953602d2022297e7249e8a33da2871
Submitter: Jenkins
Branch: master

commit 3ebbe106f1953602d2022297e7249e8a33da2871
Author: Evgeniy L <email address hidden>
Date: Tue Sep 2 16:46:17 2014 +0400

    Upgrades, don't kill new containers during the upgrade

    Stopping of new containers during the upgrade
    leads to a lot of errors and raise conditions.
    At the same time containers should be run under
    supervisor because we need to store logs.

    Rewrote upgrade flow:

    * stop old containers
    * upload the images
    * generate supervisor config with autostart False,
      it allows to prevent supervisor to run containers
    * run containers in method `create_and_start_new_containers`
      one by one in right order
    * regenerate configs for supervisor with autostart
      True, to start all of the containers after supervisor
      restart
    * verify containers

    How it helps:

    * there was race condition when we were running
      services via supervisor and iptables cleaning
      at the same time, supervisor not always was
      able to start all containers, as result we
      could get nat rules with the same port but
      different ip addresses
    * containers stopping could interrupt non-atomic
      actions, like db migration in keystone container
    * since we run container one by one, we will not
      be able to get problem with ip duplication,
      during the upgrade

    Related-bug: #1357357
    Related-bug: #1364087
    Closes-bug: #1364054
    Change-Id: I86accb8b2c2fc5a15425e32838a58c9b45022d8d

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

verified on fuel-5.1-upgrade-11-2014-09-17_21-40-34.tar.lrz

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.