Overcloud deploy fail when not using standard "overcloud" stack name

Bug #1867798 reported by Cédric Jeanneret
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Rabi Mishra

Bug Description

Hello,

It seems there's a new regression, probably located in tripleoclient. It's a new iteration of https://bugs.launchpad.net/tripleo/+bug/1867370 - but it seems to happen only when we deploy the overcloud directly, without manually calling config-download.

Here's the trace:
Tuesday 17 March 2020 15:18:52 +0000 (0:00:00.669) 0:00:01.871 *********
===============================================================================
Grant privileges to the execution user ---------------------------------- 0.67s
Ensure access path exists ----------------------------------------------- 0.54s
Check for required inputs ----------------------------------------------- 0.30s
Exception occured while running the command
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 34, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/cliff/command.py", line 187, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 994, in take_action
    in_flight_validations=parsed_args.inflight
  File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 397, in config_download
    work_dir = download.run(context=context)
  File "/usr/lib/python3.6/site-packages/tripleo_common/actions/config.py", line 124, in run
    self.work_dir)
  File "/usr/lib/python3.6/site-packages/tripleo_common/utils/swift.py", line 61, in download_container
    objects = swiftclient.get_container(container)[1]
  File "/usr/lib/python3.6/site-packages/swiftclient/client.py", line 1829, in get_container
    query_string=query_string)
  File "/usr/lib/python3.6/site-packages/swiftclient/client.py", line 1748, in _retry
    service_token=self.service_token, **kwargs)
  File "/usr/lib/python3.6/site-packages/swiftclient/client.py", line 1002, in get_container
    raise ClientException.from_response(resp, 'Container GET failed', body)
swiftclient.exceptions.ClientException: Container GET failed: https://192.168.24.2:13808/v1/AUTH_8bf8f53798cb40f2bd97dc6e236ce0fc/overcloud-0-config?format=json 404 Not Found [first 60 chars of response] b'<html><h1>Not Found</h1><p>The resource could not be found.<'
Container GET failed: https://192.168.24.2:13808/v1/AUTH_8bf8f53798cb40f2bd97dc6e236ce0fc/overcloud-0-config?format=json 404 Not Found [first 60 chars of response] b'<html><h1>Not Found</h1><p>The resource could not be found.<'
[CentOS-8.1 - stack@undercloud ~]$

Indeed, when we check swift containers, we can see:
(undercloud) [CentOS-8.1 - stack@undercloud ~]$ openstack container list
+-------------------------+
| Name |
+-------------------------+
| __cache__ |
| overcloud |
| overcloud-0 |
| overcloud-0-messages |
| overcloud-0-swift-rings |
| overcloud-config |
| overcloud-messages |
+-------------------------+

Here, we lack the "overcloud-0-config" container.

Steps to reproduce:
1. get your undercloud
2. deploy using the standard CLI call, passing "--stack my-special-name" and without any "--stack-only"
3. wait for the error to show (pretty fast once heat stack is over)

Changed in tripleo:
importance: Undecided → Critical
Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

Note: it DOES work when calling --stack-only and calling the ansible-playbook command, as documented here:
https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployment/ansible_config_download.html#manual-config-download

This tents to point to a regression somewhere in python-tripleoclient imho..

Revision history for this message
John Fulton (jfulton-org) wrote :

We should update the file produced by the CI so that our multinode job uses a different stack name than overcloud which would have made the CI point out this scenario.

https://10769f0369a546e0b81a-f2cb8655ffdf7ed1058c2e519d278ca2.ssl.cf2.rackcdn.com/713399/1/check/tripleo-ci-centos-8-containers-multinode/95c9a41/logs/undercloud/home/zuul/overcloud-deploy.sh

Revision history for this message
John Fulton (jfulton-org) wrote :
Revision history for this message
John Fulton (jfulton-org) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (master)

Fix proposed to branch: master
Review: https://review.opendev.org/713483

Changed in tripleo:
assignee: nobody → Rabi Mishra (rabi)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (master)

Reviewed: https://review.opendev.org/713483
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=520ae1d28e2c76920ab59faaa0ca3a0b9accf5fe
Submitter: Zuul
Branch: master

commit 520ae1d28e2c76920ab59faaa0ca3a0b9accf5fe
Author: Rabi Mishra <email address hidden>
Date: Tue Mar 17 21:57:05 2020 +0530

    Pass container_config to get_config()

    Change-Id: I1fbe7deaa0eb0106d2e81fa9734088fccf2a3b81
    Closes-Bug: #1867798

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 13.2.0

This issue was fixed in the openstack/python-tripleoclient 13.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.