overcloud upgrade fails with error: Timed out waiting for messages from Execution

Bug #1709281 reported by Jose Luis Franco
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Jose Luis Franco

Bug Description

Description
===========
When trying to upgrade an overcloud from Ocata to Pike (with containers) I am always obtaining the same error:

(undercloud) [stack@undercloud ~]$ openstack overcloud deploy --templates $THT --libvirt-type qemu --ntp-server pool.ntp.org -e $THT/overcloud-resource-registry-puppet.yaml -e $THT/environments/major-upgrade-composable-steps.yaml -e upgrade_repos.yaml -e $THT/environments/docker.yaml -e $THT/environments/major-upgrade-composable-steps-docker.yaml -e $THT/environments/docker-centos-tripleoupstream.yaml -e docker_registry.yaml
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 8d5ce9b8-e4af-4601-9db6-4f77787f4c35
Waiting for messages on queue '91e97a86-a3e2-4506-b97f-8c46e2fed226' with no timeout.
Removing the current plan files
Uploading new plan files
Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: 073bc4af-bd92-4f1e-8a85-7e5e6741675e
Plan updated.
Processing templates in the directory /tmp/tripleoclient-R0gKJS/tripleo-heat-templates
Started Mistral Workflow tripleo.plan_management.v1.get_deprecated_parameters. Execution ID: 0d0f53a6-dd75-4542-b817-cf2c3ed8f275
Timed out waiting for messages from Execution (ID: 0d0f53a6-dd75-4542-b817-cf2c3ed8f275, State: RUNNING). The WebSocket timed out before the Workflow completed.

Steps to reproduce
==================
Follow this instructions: http://www.anstack.com/blog/2016/11/28/testing-composable-upgrades.html

1. Deploy Pike undercloud and overcloud using TripleO-quickstart:
bash ./tripleo-quickstart/quickstart.sh --install-deps
export VIRTHOST=127.0.0.2
export CONFIG=~/deploy-config.yaml
bash ./tripleo-quickstart/quickstart.sh --clean --release master --teardown all --tags all -e @$CONFIG $VIRTHOST

2. Delete overcloud:
source ~/stackrc
openstack stack delete overcloud

3. Delete images and load new ones

4. Download tht for ocata and deploy Ocata overcloud (non-HA)
git clone -b stable/ocata https://github.com/openstack/tripleo-heat-templates tht-ocata
openstack overcloud deploy \
  --libvirt-type qemu \
  --ntp-server pool.ntp.org \
  --templates /home/stack/tht-ocata/ \
  -e /home/stack/tht-ocata/overcloud-resource-registry-puppet.yaml

5. Once deployed, download tht-master and load docker images:
git clone https://github.com/openstack/tripleo-heat-templates tht-master
openstack overcloud container image upload --config-file /usr/share/openstack-tripleo-common/container-images/overcloud_containers.yaml

6. Perform overcloud upgrade:
source ~/stackrc
export THT=/home/stack/tht-master
openstack overcloud deploy
--templates $THT
--libvirt-type qemu
--ntp-server pool.ntp.org
-e $THT/overcloud-resource-registry-puppet.yaml
-e $THT/environments/major-upgrade-composable-steps.yaml
-e upgrade_repos.yaml
-e $THT/environments/docker.yaml
-e $THT/environments/major-upgrade-composable-steps-docker.yaml
-e $THT/environments/docker-centos-tripleoupstream.yaml
-e docker_registry.yaml

Expected result
===============
Successful upgrade to Pike with containerized services

Actual result
=============
Removing the current plan files
Uploading new plan files
Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: 073bc4af-bd92-4f1e-8a85-7e5e6741675e
Plan updated.
Processing templates in the directory /tmp/tripleoclient-R0gKJS/tripleo-heat-templates
Started Mistral Workflow tripleo.plan_management.v1.get_deprecated_parameters. Execution ID: 0d0f53a6-dd75-4542-b817-cf2c3ed8f275
Timed out waiting for messages from Execution (ID: 0d0f53a6-dd75-4542-b817-cf2c3ed8f275, State: RUNNING). The WebSocket timed out before the Workflow completed.

Environment
===========
Dell PowerEdge R720
Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
32059 MB memory, 135026 GB disk space

Revision history for this message
Jose Luis Franco (jfrancoa) wrote :

When increasing the timeout in wait_for_messages in:

https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/workflows/parameters.py#L117

from 60 seconds to 180, the upgrade continues with no errors:

(undercloud) [stack@undercloud ~]$ openstack overcloud deploy --templates $THT --libvirt-type qemu --ntp-server pool.ntp.org -e $THT/overcloud-resource-registry-puppet.yaml -e $THT/environments/major-upgrade-composable-steps.yaml -e upgrade_repos.yaml -e $THT/environments/docker.yaml -e $THT/environments/major-upgrade-composable-steps-docker.yaml -e $THT/environments/docker-centos-tripleoupstream.yaml -e docker_registry.yaml
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 050fb650-f1e3-4756-a9a1-1c4d0d9bfcab
Waiting for messages on queue 'e7f83872-f36c-4d33-a097-72ab9460fd0a' with no timeout.
Removing the current plan files
Uploading new plan files
Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: 9a1fd37f-608f-4c7e-9c77-db9a5a2cae9b
Plan updated.
Processing templates in the directory /tmp/tripleoclient-cCvJFU/tripleo-heat-templates
Started Mistral Workflow tripleo.plan_management.v1.get_deprecated_parameters. Execution ID: 67fdb863-afdb-4e0b-b13b-6ce70f06cb3d
Deploying templates in the directory /tmp/tripleoclient-cCvJFU/tripleo-heat-templates
Started Mistral Workflow tripleo.deployment.v1.deploy_plan. Execution ID: 89d3d292-fe00-44d0-b135-5bbda70f5fd4
2017-08-08 10:49:57Z [ServiceNetMap]: UPDATE_IN_PROGRESS state changed
2017-08-08 10:49:58Z [overcloud-ServiceNetMap-74n4ytnllkaf]: UPDATE_IN_PROGRESS Stack UPDATE started
2017-08-08 10:49:58Z [DeploymentServerBlacklistDict]: CREATE_IN_PROGRESS state changed
2017-08-08 10:49:59Z [overcloud-ServiceNetMap-74n4ytnllkaf.ServiceNetMapValue]: UPDATE_IN_PROGRESS state changed
2017-08-08 10:49:59Z [Networks]: UPDATE_IN_PROGRESS state changed
2017-08-08 10:49:59Z [overcloud-ServiceNetMap-74n4ytnllkaf.ServiceNetMapValue]: UPDATE_COMPLETE state changed
2017-08-08 10:49:59Z [overcloud-ServiceNetMap-74n4ytnllkaf]: UPDATE_COMPLETE Stack UPDATE completed successfully
2017-08-08 10:50:00Z [HorizonSecret]: UPDATE_IN_PROGRESS state changed
2017-08-08 10:50:00Z [PcsdPassword]: UPDATE_IN_PROGRESS state changed
2017-08-08 10:50:00Z [MysqlRootPassword]: UPDATE_IN_PROGRESS state changed
2017-08-08 10:50:00Z [overcloud-Networks-qfmqqouxf3rl]: UPDATE_IN_PROGRESS Stack UPDATE started
2017-08-08 10:50:00Z [RabbitCookie]: UPDATE_IN_PROGRESS state changed

Saravanan KR (skramaja)
Changed in tripleo:
milestone: none → pike-rc1
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (master)

Fix proposed to branch: master
Review: https://review.openstack.org/492094

Changed in tripleo:
assignee: nobody → Jose Luis Franco (jfrancoa)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on python-tripleoclient (master)

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.openstack.org/492094
Reason: Need to clear out the queue as this is going to fail. Will restore momentarily

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (master)

Reviewed: https://review.openstack.org/492094
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=66c902a2b6fb0cf2b198bd813d94878c43c90378
Submitter: Jenkins
Branch: master

commit 66c902a2b6fb0cf2b198bd813d94878c43c90378
Author: Jose Luis Franco Arza <email address hidden>
Date: Wed Aug 9 12:52:29 2017 +0200

    Increase timeout in get_deprecated_parameters workflow

    When upgrading from Ocata to Pike with containers
    it seems that for some systems 60 seconds is not
    enough for the workflow to finish. Increasing it
    solves the issue.

    Change-Id: I8fcabf906b38094aca9ce350bc580fdf5875573a
    Closes-Bug: #1709281

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/505166

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/pike)

Reviewed: https://review.openstack.org/505166
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=6f150ffa6e1a4a6bad93bee8dc589c34c1f0f349
Submitter: Jenkins
Branch: stable/pike

commit 6f150ffa6e1a4a6bad93bee8dc589c34c1f0f349
Author: Jose Luis Franco Arza <email address hidden>
Date: Wed Aug 9 12:52:29 2017 +0200

    Increase timeout in get_deprecated_parameters workflow

    When upgrading from Ocata to Pike with containers
    it seems that for some systems 60 seconds is not
    enough for the workflow to finish. Increasing it
    solves the issue.

    Change-Id: I8fcabf906b38094aca9ce350bc580fdf5875573a
    Closes-Bug: #1709281
    (cherry picked from commit 66c902a2b6fb0cf2b198bd813d94878c43c90378)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 7.3.1

This issue was fixed in the openstack/python-tripleoclient 7.3.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 8.0.0

This issue was fixed in the openstack/python-tripleoclient 8.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.