'openstack overcloud config download' fails; mistral.actions.action_factory.GetOvercloudConfig expected a character buffer object

Bug #1834094 reported by John Fulton on 2019-06-24
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
James Slagle

Bug Description

Deployment unable to proceed before config-download can be downloaded.

Same error may be reproduced by following steps to do manual config download:

https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ansible_config_download.html#manual-config-download

Failure on this step:

(undercloud) [stack@undercloud ~]$ openstack overcloud config download --config-dir config-download
Starting config-download export...
The action raised an exception [action_ex_id=5ac200f3-ff56-4998-a99a-8857a9414a76, action_cls='<class 'mistral.actions.action_factory.GetOvercloudConfig'>', attributes='{}', params='{u'container_config': u'overcloud-config', u'container': u'overcloud', u'config_type': None}']
 expected a character buffer object
Exception exporting config-download: The action raised an exception [action_ex_id=5ac200f3-ff56-4998-a99a-8857a9414a76, action_cls='<class 'mistral.actions.action_factory.GetOvercloudConfig'>', attributes='{}', params='{u'container_config': u'overcloud-config', u'container': u'overcloud', u'config_type': None}']
 expected a character buffer object
(undercloud) [stack@undercloud ~]$

Using latest master to pass CI on June 24, 2019 on CentOS7 using undercloud and overcloud (not standalone).

John Fulton (jfulton-org) wrote :

- tripleo client call: https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/v1/overcloud_config.py#L98

- mistral task: https://github.com/openstack/tripleo-common/blob/master/workbooks/deployment.yaml#L615

- failed action's code: https://github.com/openstack/tripleo-common/blob/master/tripleo_common/actions/config.py#L31

- mistral engine log:

[root@undercloud mistral]# tail -1000 engine.log | curl -F 'f:1=<-' ix.io
http://ix.io/1MHH
[root@undercloud mistral]#

- swift list

(undercloud) [stack@undercloud train]$ swift list
__cache__
ov-lz54jdg42cb-0-rcgjn5q2ppko-Controller-jhhn7x55dxfs
ov-taezz52tcd-0-p4pmhsjkjqr7-NovaCompute-4bof3akxu77c
ov-trhpanqjze-0-2q65y6kp4ha3-CephStorage-ta7rcfgvcijz
overcloud
overcloud-messages
overcloud-swift-rings
overcloud_ceph_ansible_fetch_dir
tripleo-validations
(undercloud) [stack@undercloud train]$

Cédric Jeanneret (cjeanner) wrote :
Download full text (5.0 KiB)

Hitting the same issue as of today (1st of July) with latest packages on centos-7.

Can add the following:
right before the failure, located in mistral-executor log, we can see the following log trace bellow.
It looks like either a path is wrong, or a file doesn't exist in the swift container.
Also, this can be tracked back to the following code:
https://github.com/openstack/tripleo-common/blob/master/tripleo_common/utils/config.py#L517

Packages:
on the host:
python2-tripleo-common-11.0.1-0.20190622091455.a20d66f.el7.noarch

In mistral container:
puppet-mistral-15.1.0-0.20190610144005.030a4fe.el7.noarch
python2-mistral-8.1.0-0.20190620140934.f75e719.el7.noarch
openstack-mistral-executor-8.1.0-0.20190620140934.f75e719.el7.noarch
python2-mistral-lib-1.1.0-0.20190306210850.bac92db.el7.noarch
python2-mistralclient-3.9.0-0.20190517090825.de9d2de.el7.noarch
openstack-mistral-common-8.1.0-0.20190620140934.f75e719.el7.noarch
python2-tripleo-common-11.0.1-0.20190622091455.a20d66f.el7.noarch

Log of interest (from /var/log/containers/mistral/executor.log):

2019-07-01 14:21:17.995 8 INFO tripleo_common.actions.ansible [req-34985d26-9879-4842-8aff-01fe77df8081 988f449cc23b4114ae260bc8c022f5ac d1ff8565af7f4b0ca24df34237761a23 - default default] Running ansible-playbook command: ['ansible-playbook-2', '-vvvvv', '/tmp/ansible-mistral-action784EVw/playbook.yaml', '--become', '--become-user', 'root', '--inventory-file', '/tmp/ansible-mistral-action784EVw/inventory.yaml']
2019-07-01 14:21:34.127 8 INFO swiftclient [req-34985d26-9879-4842-8aff-01fe77df8081 988f449cc23b4114ae260bc8c022f5ac d1ff8565af7f4b0ca24df34237761a23 - default default] REQ: curl -i https://192.168.24.2:13808/v1/AUTH_d1ff8565af7f4b0ca24df34237761a23/overcloud-config?format=json -X GET -H "Accept-Encoding: gzip" -H "X-Auth-Token: gAAAAABdGhJeSqV5..."
2019-07-01 14:21:34.129 8 INFO swiftclient [req-34985d26-9879-4842-8aff-01fe77df8081 988f449cc23b4114ae260bc8c022f5ac d1ff8565af7f4b0ca24df34237761a23 - default default] RESP STATUS: 404 Not Found
2019-07-01 14:21:34.129 8 INFO swiftclient [req-34985d26-9879-4842-8aff-01fe77df8081 988f449cc23b4114ae260bc8c022f5ac d1ff8565af7f4b0ca24df34237761a23 - default default] RESP HEADERS: {u'Date': u'Mon, 01 Jul 2019 14:21:34 GMT', u'Content-Length': u'70', u'Content-Type': u'text/html; charset=UTF-8', u'X-Openstack-Request-Id': u'tx5c4f2548a9f240e49071c-005d1a16ee', u'X-Trans-Id': u'tx5c4f2548a9f240e49071c-005d1a16ee'}
2019-07-01 14:21:34.129 8 INFO swiftclient [req-34985d26-9879-4842-8aff-01fe77df8081 988f449cc23b4114ae260bc8c022f5ac d1ff8565af7f4b0ca24df34237761a23 - default default] RESP BODY: <html><h1>Not Found</h1><p>The resource could not be found.</p></html>
2019-07-01 14:21:36.442 8 INFO tripleo_common.utils.config.Config [req-34985d26-9879-4842-8aff-01fe77df8081 988f449cc23b4114ae260bc8c022f5ac d1ff8565af7f4b0ca24df34237761a23 - default default] Generating configuration under the directory: /tmp/tripleo-JSp_J9-config
2019-07-01 14:21:39.656 8 INFO tripleo_common.utils.config.Config [req-34985d26-9879-4842-8aff-01fe77df8081 988f449cc23b4114ae260bc8c022f5ac d1ff8565af7f4b0ca24df34237761a23 - default default] Getting deployment data from Heat...
2...

Read more...

Changed in tripleo:
assignee: nobody → James Slagle (james-slagle)
status: Triaged → In Progress
Cédric Jeanneret (cjeanner) wrote :

So James' patches does solve this first issue. Once we get over that empty set (or whatever it is), we hit another issue:

TASK [Copy NetworkConfig script] ***********************************************
Tuesday 02 July 2019 07:34:54 +0000 (0:00:00.459) 0:02:32.173 **********
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option
fatal: [overcloud-novacompute-0]: FAILED! => {"changed": false, "msg": "Could not find or access 'Compute/overcloud-novacompute-0/NetworkConfig'\nSearched in:\n\t/var/lib/mistral/overcloud/files/Compute/overcloud-novacompute-0/NetworkConfi
g\n\t/var/lib/mistral/overcloud/Compute/overcloud-novacompute-0/NetworkConfig\n\t/var/lib/mistral/overcloud/files/Compute/overcloud-novacompute-0/NetworkConfig\n\t/var/lib/mistral/overcloud/Compute/overcloud-novacompute-0/NetworkConfig on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}

So maybe a follow-up to James' is to create an empty file?

Cédric Jeanneret (cjeanner) wrote :
Download full text (3.6 KiB)

Meh... If we create an empty file, we then hit a third issue:

TASK [NetworkConfig stdout] ****************************************************
Tuesday 02 July 2019 08:12:24 +0000 (0:00:01.092) 0:02:33.170 **********
ok: [overcloud-controller-0] => {
    "NetworkConfig_result.stderr_lines": [
        "+ '[' -n '{\"network_config\": [{\"members\": [{\"name\": \"interface_name\", \"primary\": true, \"type\": \"interface\"}], \"name\": \"bridge_name\", \"type\": \"ovs_bridge\", \"use_dhcp\": true}]}' ']'",
        "+ '[' -z '' ']'",
        "+ trap configure_safe_defaults EXIT",
        "++ date +%Y-%m-%dT%H:%M:%S",
        "+ DATETIME=2019-07-02T08:12:18",
        "+ '[' -f /etc/os-net-config/config.json ']'",
        "+ mkdir -p /etc/os-net-config",
        "+ echo '{\"network_config\": [{\"members\": [{\"name\": \"interface_name\", \"primary\": true, \"type\": \"interface\"}], \"name\": \"bridge_name\", \"type\": \"ovs_bridge\", \"use_dhcp\": true}]}'",
        "++ type -t network_config_hook",
        "+ '[' '' = function ']'",
        "+ sed -i s/bridge_name/br-ex/ /etc/os-net-config/config.json",
        "+ sed -i s/interface_name/nic1/ /etc/os-net-config/config.json",
        "+ set +e",
        "+ os-net-config -c /etc/os-net-config/config.json -v --detailed-exit-codes",
        "[2019/07/02 08:12:18 AM] [INFO] Using config file at: /etc/os-net-config/config.json",
        "[2019/07/02 08:12:18 AM] [INFO] Ifcfg net config provider created.",
        "[2019/07/02 08:12:18 AM] [INFO] Not using any mapping file.",
        "[2019/07/02 08:12:19 AM] [INFO] Finding active nics",
        "[2019/07/02 08:12:19 AM] [INFO] eth1 is an embedded active nic",
        "[2019/07/02 08:12:19 AM] [INFO] eth0 is an embedded active nic",
        "[2019/07/02 08:12:19 AM] [INFO] lo is not an active nic",
        "[2019/07/02 08:12:19 AM] [INFO] No DPDK mapping available in path (/var/lib/os-net-config/dpdk_mapping.yaml)",
        "[2019/07/02 08:12:19 AM] [INFO] Active nics are ['eth0', 'eth1']",
        "[2019/07/02 08:12:19 AM] [INFO] nic2 mapped to: eth1",
        "[2019/07/02 08:12:19 AM] [INFO] nic1 mapped to: eth0",
        "[2019/07/02 08:12:19 AM] [INFO] adding bridge: br-ex",
        "[2019/07/02 08:12:19 AM] [INFO] adding interface: eth0",
        "[2019/07/02 08:12:19 AM] [INFO] applying network configs...",
        "[2019/07/02 08:12:19 AM] [INFO] running ifdown on interface: eth0",
        "[2019/07/02 08:12:19 AM] [INFO] running ifdown on bridge: br-ex",
        "[2019/07/02 08:12:19 AM] [INFO] Writing config /etc/sysconfig/network-scripts/ifcfg-br-ex",
        "[2019/07/02 08:12:19 AM] [INFO] Writing config /etc/sysconfig/network-scripts/ifcfg-eth0",
        "[2019/07/02 08:12:19 AM] [INFO] running ifup on bridge: br-ex",
        "[2019/07/02 08:12:19 AM] [INFO] running ifup on interface: eth0",
        "+ RETVAL=2",
        "+ set -e",
        "+ [[ 2 == 2 ]]",
        "+ '[' -f /etc/udev/rules.d/99-dhcp-all-interfaces.rules ']'",
        "+ rm /etc/udev/rules.d/99-dhcp-all-interfaces.rules",
        "+ '[' -f /usr/libexec/os-apply-config/templates/etc/os-net-config/config.json ']'",
        "+ ...

Read more...

Cédric Jeanneret (cjeanner) wrote :

in order to pass through that last issue, I had to:
- config download
- edit deploy_steps_playbook.yaml
- add a new condition for the following tasks:
° NetworkConfig stdout
° Write rc of NetworkConfig script
The condition was: NetworkConfig_result is changed

Doing so, in addition to James' patch + my proposal (create empty file nonetheless) allowed me to get a successful overcloud deploy.

Reviewed: https://review.opendev.org/668560
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=ecd3738077f54ce5440a66bbb616b1e32295e896
Submitter: Zuul
Branch: master

commit ecd3738077f54ce5440a66bbb616b1e32295e896
Author: James Slagle <email address hidden>
Date: Mon Jul 1 16:55:20 2019 -0400

    Handle empty NetworkConfig

    In some cases the NetworkConfig config resource may come back as an
    empty dict from Heat, such as when using net-config-noop.yaml. In that
    case, we should not attempt to write any config.

    Change-Id: I02129adf193f7a4a8b38411f3a20079d10cfa872
    Related-bug: #1834094

Reviewed: https://review.opendev.org/668652
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=c36433e34e4e613eae59ec254553d3621e9c90f2
Submitter: Zuul
Branch: master

commit c36433e34e4e613eae59ec254553d3621e9c90f2
Author: Cédric Jeanneret <email address hidden>
Date: Tue Jul 2 13:05:31 2019 +0200

    Run NetworkConfig only if configuration script exists

    Script is first located/generated on the undercloud node (aka ansible
    host), meaning we have to use local_action.
    We also deactivate the "become" in order to avoid useless privilege
    escalation.

    Change-Id: I8c1ed334dc5b578a87307a47656ee2d87f1e3688
    Depends-On: https://review.opendev.org/668560
    Related-Bug: #1834094

Change abandoned by Cédric Jeanneret (Tengu) (<email address hidden>) on branch: stable/stein
Review: https://review.opendev.org/670237
Reason: not needed. Thanks James for confirmation!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers