[upgrade] Upgarde script restores old DB dump while upgrading second time

Bug #1349833 reported by Artem Panchenko
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Andrey Sledzinskiy
5.0.x
Fix Committed
High
Evgeniy L
6.0.x
Invalid
High
Andrey Sledzinskiy

Bug Description

Steps to reproduce:

1. Run upgrade from 5.0 to 5.1 and interrupt it during launch of new 5.1 containers (in my case it failed because of https://bugs.launchpad.net/fuel/+bug/1349287). Automatic rollback successfully recovered old 5.0 container and everything works fine.
2. Run upgrade from 5.0 to 5.0.1 and it is successful.
3. Deploy new environment using 5.0.1 release
4. Run upgrade from 5.0.1 to 5.1 and it is successful. Check the environment deployed on step # 3.

Expected result:

- environment exists and cluster works fine

Actual result:

- created environment doesn't exist (in nailgun DB)

When you start Fuel upgrade to some X.Y version, upgrade script copies /etc/fuel/version.yaml to the /var/lib/fuel_upgrade/X.Y/ directory and use it for further upgrades. During upgrade from 5.0.1 to 5.1 (step #4) script tried to dump postgresql database from old 5.0 container, but it was down (5.0.1 containers were running), so it restored already existing dump of DB, here is the part of upgrade log:

2014-07-29 10:53:01 DEBUG 43074 (docker_engine) Backup database
2014-07-29 10:53:01 DEBUG 43074 (docker_engine) Failed to make database dump, will be used dump from previous run: Cannot find running container with name "fuel-core-5.0-postgres"
2014-07-29 10:53:01 DEBUG 43074 (utils) Check if file "/var/lib/fuel_upgrade/5.1/pg_dump_all.sql.1" matches to pattern "['-- PostgreSQL database cluster dump', '-- PostgreSQL database dump', '-- Postgre
SQL database dump complete', '-- PostgreSQL database cluster dump complete']"
2014-07-29 10:53:01 DEBUG 43074 (utils) Creating hardlink "/var/lib/fuel_upgrade/5.1/pg_dump_all.sql.1" -> "/var/lib/fuel_upgrade/5.1/pg_dump_all.sql" [overwrite=1]
2014-07-29 10:53:01 DEBUG 43074 (utils) Remove file "/var/lib/fuel_upgrade/5.1/pg_dump_all.sql"

I guess we should remove /var/lib/fuel_upgrade/X.Y/version.yaml file after upgrade to X.Y version if it is successful.

Tags: upgrade
tags: added: upgrade
Changed in fuel:
importance: Undecided → High
Revision history for this message
Evgeniy L (rustyrobot) wrote :

Removed from 5.0 because it's not critical.

no longer affects: fuel/5.0.x
Changed in fuel:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/110900

Evgeniy L (rustyrobot)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Evgeniy L (rustyrobot)
status: Confirmed → In Progress
Revision history for this message
Evgeniy L (rustyrobot) wrote :

I've created a patch in master but it will help only in similar cases for >5.1 upgrade tarballs, to solve the problem with 5.0.1 tarball we need to backport it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (stable/5.0)

Related fix proposed to branch: stable/5.0
Review: https://review.openstack.org/110916

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (stable/5.0)

Reviewed: https://review.openstack.org/110916
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=dbd29fd8ad27be4f7c88ec7a4bab25a685c1e700
Submitter: Jenkins
Branch: stable/5.0

commit dbd29fd8ad27be4f7c88ec7a4bab25a685c1e700
Author: Evgeniy L <email address hidden>
Date: Thu Jul 31 14:55:06 2014 +0400

    Upgrades, remove saved version file on success

    * created on_success method which upgrade
      script runs if upgrade succeed, don't
      fail upgrade in case of errors
    * remove saved version files for all
      upgrades from working directories

    It solves several problems:

    1. user runs upgrade 5.0 -> 5.1 which fails
    upgrade system saves version which we upgrade
    from in file working_dir/5.1/version.yaml.
    Then user runs upgrade 5.0 -> 5.0.1 which
    successfully upgraded. Then user runs again
    upgrade 5.0.1 -> 5.1, but there is saved file
    working_dir/5.1/version.yaml which contains
    5.0 version, and upgrade system thinks that
    it's upgrading from 5.0 version, as result
    it tries to make database dump from wrong
    version of container.

    2. without this hack user can run upgrade
    second time and loose his data, this hack
    prevents this case because before upgrade
    checker will use current version instead
    of saved version to determine version which
    we run upgrade from.

    Change-Id: I5e6ae6ba2ae2e60b9812e131d2a7c533f4a38ab6
    Related-bug: #1349833

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/110900
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=001ffbd1562b96cb6bf82ddbd461449901489200
Submitter: Jenkins
Branch: master

commit 001ffbd1562b96cb6bf82ddbd461449901489200
Author: Evgeniy L <email address hidden>
Date: Thu Jul 31 14:55:06 2014 +0400

    Upgrades, remove saved version file on success

    * created on_success method which upgrade
      script runs if upgrade succeed, don't
      fail upgrade in case of errors
    * remove saved version files for all
      upgrades from working directories

    It solves several problems:

    1. user runs upgrade 5.0 -> 5.1 which fails
    upgrade system saves version which we upgrade
    from in file working_dir/5.1/version.yaml.
    Then user runs upgrade 5.0 -> 5.0.1 which
    successfully upgraded. Then user runs again
    upgrade 5.0.1 -> 5.1, but there is saved file
    working_dir/5.1/version.yaml which contains
    5.0 version, and upgrade system thinks that
    it's upgrading from 5.0 version, as result
    it tries to make database dump from wrong
    version of container.

    2. without this hack user can run upgrade
    second time and loose his data, this hack
    prevents this case because before upgrade
    checker will use current version instead
    of saved version to determine version which
    we run upgrade from.

    Change-Id: I5e6ae6ba2ae2e60b9812e131d2a7c533f4a38ab6
    Related-bug: #1349833

Evgeniy L (rustyrobot)
Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

Upgrade failed fater next steps:

1. Run upgrade from 5.0 to 5.1 and fail upgrade with modifying /engines/openstack.py file adding some exception (fuel-5.1-upgrade-11-2014-09-17_21-40-34.tar.lrz)
2.After successful rollback run upgrade from 5.0 to 5.0.1

Expected - upgrade is successful
Actual - upgrade failed with
2014-09-23 08:56:51 DEBUG 10307 (utils) Execute command "docker cp fuel-core-5.0-astute:/var/lib/astute /var/lib/fuel_upgrade/5.0.1"
2014-09-23 08:56:51 DEBUG 10307 (utils) Stdout and stderr of command "docker cp fuel-core-5.0-astute:/var/lib/astute /var/lib/fuel_upgrade/5.0.1":
2014-09-23 08:56:51 DEBUG 10307 (utils) 2014/09/23 08:56:51 Error: Could not find the file /var/lib/astute in container fuel-core-5.0-astute
2014-09-23 08:56:51 INFO 10307 (supervisor_client) Stop all services
2014-09-23 08:56:51 ERROR 10307 (upgrade) DockerUpgrader: failed to upgrade: "<Fault 6: 'SHUTDOWN_STATE'>"
Traceback (most recent call last):
  File "/var/upgrade/site-packages/fuel_upgrade/upgrade.py", line 56, in run
    upgrader.upgrade()
  File "/var/upgrade/site-packages/fuel_upgrade/engines/docker_engine.py", line 76, in upgrade
    self.supervisor.stop_all_services()
  File "/var/upgrade/site-packages/fuel_upgrade/supervisor_client.py", line 125, in stop_all_services
    self.supervisor.stopAllProcesses()
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1253, in request
    return self._parse_response(h.getfile(), sock)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1392, in _parse_response
    return u.close()
  File "/usr/lib64/python2.6/xmlrpclib.py", line 838, in close
    raise Fault(**self._stack[0])
Fault: <Fault 6: 'SHUTDOWN_STATE'>
2014-09-23 08:56:51 DEBUG 10307 (upgrade) Run rollback

Logs are attached

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Evgeniy L (rustyrobot) wrote :

Andrey, it looks like separate issue, it doesn't look like the problem with database, the problem is for some reasons supervisor was shutted down.

If it's really another issue, could you please create separate ticket for the problem?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.