[Upgrade] Upgrade of Fuel Master 6.1 till 7.0 fails during command execution in postgres docker container

Bug #1494640 reported by Sergey Novikov
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
slava valyavskiy
7.0.x
Fix Released
Critical
Matthew Mosesohn
8.0.x
Fix Released
Critical
slava valyavskiy

Bug Description

Upgrade of Fuel Master 6.1 till 7.0 fails with

2015-09-11 08:56:45 DEBUG 11021 (docker_engine) Start container "873bc7cd9d96237da6fa616680ab3261081e3bcdf1065525c7cb085098d92117": {'links': None, 'network_mode': 'host', 'binds': {'/etc/fuel': {'bind': '/etc/fuel', 'ro': False}, '/var/l
ib/fuel/container_data/7.0/postgres': {'bind': '/var/lib/pgsql', 'ro': False}, '/var/lib/fuel_upgrade/7.0': {'bind': '/tmp/upgrade', 'ro': True}, '/etc/yum.repos.d': {'bind': '/etc/yum.repos.d', 'ro': False}, '/var/www/nailgun': {'bind':
'/var/www/nailgun', 'ro': False}, '/var/log/docker-logs': {'bind': '/var/log', 'ro': False}}, 'volumes_from': [], 'port_bindings': {'5432': [('127.0.0.1', 5432), ('192.168.2.221', 5432)]}, 'privileged': False}
2015-09-11 08:56:45 DEBUG 11021 (utils) Execute command "dockerctl shell 873bc7cd9d96237da6fa616680ab3261081e3bcdf1065525c7cb085098d92117 su postgres -c "psql -f /tmp/upgrade/pg_dump_all.sql postgres""
2015-09-11 08:56:45 DEBUG 11021 (utils) Stdout and stderr of command "dockerctl shell 873bc7cd9d96237da6fa616680ab3261081e3bcdf1065525c7cb085098d92117 su postgres -c "psql -f /tmp/upgrade/pg_dump_all.sql postgres"":
2015-09-11 08:56:45 DEBUG 11021 (utils) psql: could not connect to server: No such file or directory
2015-09-11 08:56:45 DEBUG 11021 (utils) Is the server running locally and accepting
2015-09-11 08:56:45 DEBUG 11021 (utils) connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
....
2015-09-11 08:58:52 ERROR 11021 (upgrade) DockerUpgrader: failed to upgrade: "Shell command executed with "2" exit code: dockerctl shell 873bc7cd9d96237da6fa616680ab3261081e3bcdf1065525c7cb085098d92117 su postgres -c "psql -f /tmp/upgrade
/pg_dump_all.sql postgres" "
Traceback (most recent call last):
  File "/var/tmp/upgrade/.fuel-upgrade-venv/lib/python2.6/site-packages/fuel_upgrade/upgrade.py", line 82, in run
    upgrader.upgrade()
  File "/var/tmp/upgrade/.fuel-upgrade-venv/lib/python2.6/site-packages/fuel_upgrade/engines/docker_engine.py", line 79, in upgrade
    self.create_and_start_new_containers()
  File "/var/tmp/upgrade/.fuel-upgrade-venv/lib/python2.6/site-packages/fuel_upgrade/engines/docker_engine.py", line 274, in create_and_start_new_containers
    self.run_after_container_creation_command(container)
  File "/var/tmp/upgrade/.fuel-upgrade-venv/lib/python2.6/site-packages/fuel_upgrade/engines/docker_engine.py", line 289, in run_after_container_creation_command
    '', retries=30, interval=4)
  File "/var/tmp/upgrade/.fuel-upgrade-venv/lib/python2.6/site-packages/fuel_upgrade/engines/docker_engine.py", line 330, in exec_with_retries
    return func()
  File "/var/tmp/upgrade/.fuel-upgrade-venv/lib/python2.6/site-packages/fuel_upgrade/engines/docker_engine.py", line 285, in execute
    self.exec_cmd_in_container(container['container_name'], command)
  File "/var/tmp/upgrade/.fuel-upgrade-venv/lib/python2.6/site-packages/fuel_upgrade/engines/docker_engine.py", line 297, in exec_cmd_in_container
    utils.exec_cmd("dockerctl shell {0} {1}".format(db_container_id, cmd))
  File "/var/tmp/upgrade/.fuel-upgrade-venv/lib/python2.6/site-packages/fuel_upgrade/utils.py", line 61, in exec_cmd
    _wait_and_check_exit_code(cmd, child)
  File "/var/tmp/upgrade/.fuel-upgrade-venv/lib/python2.6/site-packages/fuel_upgrade/utils.py", line 112, in _wait_and_check_exit_code
    'exit code: {1} '.format(exit_code, cmd))
ExecutedErrorNonZeroExitCode: Shell command executed with "2" exit code: dockerctl shell 873bc7cd9d96237da6fa616680ab3261081e3bcdf1065525c7cb085098d92117 su postgres -c "psql -f /tmp/upgrade/pg_dump_all.sql postgres"

Step to reproduce:

1. Deploy Fuel master 6.1
2. Upgrade Fuel master till 7.0

Expected result: upgrade passed successfully

Actual result: upgrade failed

Fuel master version:
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  openstack_version: "2014.2.2-6.1"
  api: "1.0"
  build_number: "18"
  build_id: "2015-06-29_20-18-55"
  nailgun_sha: "99ec40817eb75510b13c00206e2698d47b5f86aa"
  python-fuelclient_sha: "4fc55db0265bbf39c369df398b9dc7d6469ba13b"
  astute_sha: "1ea8017fe8889413706d543a5b9f557f5414beae"
  fuel-library_sha: "2e7a08ad9792c700ebf08ce87f4867df36aa9fab"
  fuel-ostf_sha: "8fefcf7c4649370f00847cc309c24f0b62de718d"
  fuelmain_sha: "a3998372183468f56019c8ce21aa8bb81fee0c2f"

tar ball version matches RC2:

fuel-7.0-upgrade-288-2015-09-08_11-57-46.tar.lrz

Changed in fuel:
milestone: none → 7.0
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

Serge, thank you for the report, could you please attach diagnostic snapshot here or at least logs from master node related to upgrade and nailgun container

Changed in fuel:
assignee: nobody → Fuel Python Team (fuel-python)
importance: Undecided → High
summary: [Upgrade] Upgrade of Fuel Master 6.1 till 7.0 fails during command
- execution in posgres docker container
+ execution in postgres docker container
Changed in fuel:
status: New → Incomplete
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Folks

Our containers are stateless. Why are we doing anything withing postgres container in this case? We should be able to reuse database which resides on the main host.

Changed in fuel:
importance: High → Critical
Revision history for this message
Sergey Novikov (snovikov) wrote :
Changed in fuel:
status: Incomplete → New
Changed in fuel:
status: New → Confirmed
Revision history for this message
slava valyavskiy (slava-val-al) wrote :

Guys, it seems that postgresql manifests trying to get something from public repository...but we have no external access on the our lab. This part of postgresql manifest seems suspicious for me - https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/postgresql/manifests/repo/apt_postgresql_org.pp .

Revision history for this message
slava valyavskiy (slava-val-al) wrote :

Folks, we have found that during any container deploying package manage is trying to update packages from our upstream repository: 'mirror.fuel-infra.org' . And it's trying to do it several times and each time it fails on timeout...so, we have spent additional piece of time on this action...and we don't consider this delay in fuel-web upgrade tool and get this operation failed.

I was able to upgrade my FM when I change timeout to 30sec, but it's not the right way I guess.
https://github.com/stackforge/fuel-web/blob/master/fuel_upgrade_system/fuel_upgrade/fuel_upgrade/engines/docker_engine.py#L288-L289

Any comment there?

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Folks, attached logs refer to 6.0 to 6.1 upgrade. Are we still sure that issue persists for 6.1 -> 7.0 case ?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/222616

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → slava valyavskiy (slava-val-al)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/222650

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Folks, from what I can see, this bug can be worked around by disabling update repos before the upgrade and enabling them after upgrade is finished.

The set of repos for the master node is specified here:

https://github.com/stackforge/fuel-web/blob/master/fuel_upgrade_system/fuel_upgrade/fuel_upgrade/config.py#L604-L619

We need to switch "enabled" option to zero before we start containers and then switch it to '1' right before the upgrade script xits with.

Revision history for this message
Alexander Kislitsky (akislitsky) wrote :

That is really strange to add disabled repos in one upgrade engine and enable them on exit from another engine. I think we should create separate engine for adding repos and it should be started after Docker upgrade engine.

Changed in fuel:
assignee: Alexander Kislitsky (akislitsky) → Matthew Mosesohn (raytrac3r)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/223085

Changed in fuel:
assignee: Matthew Mosesohn (raytrac3r) → slava valyavskiy (slava-val-al)
Changed in fuel:
assignee: slava valyavskiy (slava-val-al) → Matthew Mosesohn (raytrac3r)
Changed in fuel:
assignee: Matthew Mosesohn (raytrac3r) → slava valyavskiy (slava-val-al)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/223139

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/223075
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=c6ac49bb806cefbabc8ecc217726e9ad152ab470
Submitter: Jenkins
Branch: master

commit c6ac49bb806cefbabc8ecc217726e9ad152ab470
Author: Matthew Mosesohn <email address hidden>
Date: Mon Sep 14 15:01:58 2015 +0300

    Reduce yum timeout inside containers

    Changed yum configuration inside Docker containers:
    * Reduced retries to 5 (was 10)
    * Reduced timeout to 5 (was 30)

    This enables containers to fail connections to external
    repositories if network connectivity is not working in
    uncommon situations where DNS works, but HTTP connections
    fail.

    Change-Id: I06e5514157ed7bf143ac3738fd7af23ba383fdaa
    Closes-Bug: #1494640

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/7.0)

Reviewed: https://review.openstack.org/223085
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=8e9a9ae51abbbd4edef1432809311004461eec94
Submitter: Jenkins
Branch: stable/7.0

commit 8e9a9ae51abbbd4edef1432809311004461eec94
Author: Matthew Mosesohn <email address hidden>
Date: Mon Sep 14 15:01:58 2015 +0300

    Reduce yum timeout inside containers

    Changed yum configuration inside Docker containers:
    * Reduced retries to 5 (was 10)
    * Reduced timeout to 5 (was 30)

    This enables containers to fail connections to external
    repositories if network connectivity is not working in
    uncommon situations where DNS works, but HTTP connections
    fail.

    Change-Id: I06e5514157ed7bf143ac3738fd7af23ba383fdaa
    Closes-Bug: #1494640

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (master)

Change abandoned by Alexander Kislitsky (<email address hidden>) on branch: master
Review: https://review.openstack.org/223139
Reason: Wrong solution

tags: added: on-verification
Revision history for this message
Sergey Novikov (snovikov) wrote :

Verified on fuel-7.0-upgrade-298-2015-09-17_20-02-11.tar.lrz

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

This is a known issue for 6.1->7.0 upgrade. "Fix" is to wait up to 10 minutes after a failed upgrade to rollback and start all containers so that Fuel UI and CLI work.

Revision history for this message
Nikita Marchenko (nmarchenko) wrote :

verified on 297 tar ball

tags: removed: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Valyavskiy Viacheslav (<email address hidden>) on branch: master
Review: https://review.openstack.org/222616
Reason: Fix provided by Matthew M.has resolved issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (stable/7.0)

Change abandoned by Vladimir Kuklin (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/222650

Dmitry Pyzhov (dpyzhov)
tags: added: area-python
Revision history for this message
Veronica Krayneva (vkrayneva) wrote :
tags: added: rca-done
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.