Heat stack stack-cirros-1 failed to reach CREATE_COMPLETE, actual status: CREATE_FAILED

Bug #1958035 reported by Alexandru Dimofte
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
YU CHENGDE

Bug Description

Brief Description
-----------------
The creation of heat stack failed on all bare-metal configurations.
This issue is not visible on virtual configurations.

Severity
--------
<Critical: System/Feature is not usable due to the defect>
I think this is critical because almost all tests executed after this one are failing.

Steps to Reproduce
------------------
Install starlingx and try to create a heat stack(OS::Nova::Server type)

Expected Behavior
------------------
Should work fine.

Actual Behavior
----------------

[2022-01-15 17:48:14,814] 427 DEBUG MainThread ssh.expect :: Output:.
+---------------+--------------------------------------+------------------+-----------------+----------------------+----------------+
| resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name |
+---------------+--------------------------------------+------------------+-----------------+----------------------+----------------+
| server | ff091966-4d80-4e13-8246-576d053743d4 | OS::Nova::Server | CREATE_FAILED | 2022-01-15T17:46:26Z | stack-cirros-1 |
+---------------+--------------------------------------+------------------+-----------------+----------------------+----------------+

...
                if fail_ok:
                    LOG.warning(err)
                    return False, err
> raise exceptions.HeatError(err)
E utils.exceptions.HeatError: Heat error.
E Details: Heat stack stack-cirros-1 failed to reach CREATE_COMPLETE, actual status: CREATE_FAILED

+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| id | 083a82f3-20a4-4f89-8ffc-bb354ce4b749 |
| stack_name | stack-cirros-1 |
| description | Launch an instance with Cirros image. |
| creation_time | 2022-01-15T17:46:25Z |
| updated_time | None |
| stack_status | CREATE_FAILED |
| stack_status_reason | Resource CREATE failed: ResourceInError: resources.server: Went to status ERROR due to "Message: No valid host was found. , Code: 500" |
| parameters | NetID: f82228e0-875f-438b-82c8-3f064ca6d260 |
| | OS::project_id: 981f38ba384344cea752f3a1ddf08288 |
| | OS::stack_id: 083a82f3-20a4-4f89-8ffc-bb354ce4b749 |
| | OS::stack_name: stack-cirros-1 |
| | |
| outputs | [] |
| | |
| links | - href: http://heat.openstack.svc.cluster.local/v1/981f38ba384344cea752f3a1ddf08288/stacks/stack-cirros-1/083a82f3-20a4-4f89-8ffc-bb354ce4b749 |
| | rel: self |
| | |
| parent | None |
| disable_rollback | True |
| notification_topics | [] |
| deletion_time | None |
| stack_user_project_id | eb27d3f331d142e8b0ee29856f05bbe1 |
| capabilities | [] |
| tags | None |
| stack_owner | None |
| timeout_mins | None |
+-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

Reproducibility
---------------
100% reproducible ONLY on bare-metal servers

System Configuration
--------------------
One node system, Two node system, Multi-node system, Dedicated storage

Branch/Pull Time/Commit
-----------------------
master 20220115T042348Z

Last Pass
---------
20220113T023728Z

Timestamp/Logs
--------------
Will be attached

Test Activity
-------------
Sanity

Workaround
----------
-

Revision history for this message
Alexandru Dimofte (adimofte) wrote :

The collected logs from Standard External bare-metal configuration: https://files.starlingx.cengn.ca/download_file/4

!!! This issue is NOT visible on virtual configurations !!!

Revision history for this message
Alexandru Dimofte (adimofte) wrote :

Name of the test is: test_check_existence_of_stack

Ghada Khalil (gkhalil)
tags: added: stx.distro.openstack
Revision history for this message
Austin Sun (sunausti) wrote :

Chant:
   please help to sync up with Alexandru and triage this issue.

Changed in starlingx:
assignee: nobody → YU CHENGDE (chant)
Revision history for this message
YU CHENGDE (chant) wrote :

The root cause may be "neutron-openvswitch-agent"

I try it on SX VM StarlingX

If using heat service to launch vm, the error log is listed below

Message: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance c3e7060a-d482-4a27-83ee-3e2b314aa13c., Code: 500"

If using command "openstack server create ..." to luanch vm, the error logs is listed below

controller-0:/var/log/pods$ grep -rn "Exhausted all" .
./openstack_nova-conductor-7c66899c6-tpkbp_12fd48fd-aca1-41fc-9e1c-b761f4d19ce3/nova-conductor/0.log:8:2022-01-19T11:09:24.101421997Z stdout F 2022-01-19 11:09:24.100 1 WARNING nova.scheduler.utils [req-b371d588-0465-4215-9b6f-f49ae5cc85e7 1ce83b3a74694752a07d9b99989dc225 f3f74c88d2834e1b9ba9a8308818914d - default default] [instance: c8db389a-4d8f-4216-a6c7-b35ede330673] Setting instance to ERROR state.: nova.exception.MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance c8db389a-4d8f-4216-a6c7-b35ede330673.

Both are same, so it may not be heat service problem.

After tracing, the error log of neutron-ovs-agent listed below

controller-0:/var/log/pods/openstack_neutron-ovs-agent-controller-0-937646f6-g8vvz_6384ebea-99ab-4612-9806-cd8511abda05$ cat neutron-ovs-agent/300.log
2022-01-19T11:22:18.920164131Z stderr F + exec neutron-openvswitch-agent --config-file /etc/neutron/neutron.conf --config-file /tmp/pod-shared/neutron-agent.ini --config-file /tmp/pod-shared/ml2-local-ip.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini
2022-01-19T11:22:22.303971537Z stdout F 2022-01-19 11:22:22.303 5460 INFO neutron.common.config [-] Logging enabled!
2022-01-19T11:22:22.304112132Z stdout F 2022-01-19 11:22:22.303 5460 INFO neutron.common.config [-] /var/lib/openstack/bin/neutron-openvswitch-agent version 16.4.3.dev56
2022-01-19T11:22:23.148750083Z stdout F 2022-01-19 11:22:23.147 5460 INFO neutron.agent.agent_extensions_manager [-] Loaded agent extensions: []
2022-01-19T11:22:25.606659955Z stdout F 2022-01-19 11:22:25.606 5460 INFO neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_bridge [-] Bridge br-int has datapath-ID 000086d381c13048
2022-01-19T11:22:28.161314021Z stdout F 2022-01-19 11:22:28.160 5460 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Mapping physical network physnet0 to bridge br-phy0
2022-01-19T11:22:28.161331774Z stdout F 2022-01-19 11:22:28.161 5460 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Bridge br-phy0 for physical network physnet0 does not exist. Agent terminated!

No proper network agent can support nova, so nova-schedule failed.

Current stx-openstack version is "1.0-155-centos-stable-versioned".
I am going to use "1.0-83-centos-stable-versioned" to do comparing.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Based on the above, this maybe a duplicate of https://bugs.launchpad.net/starlingx/+bug/1958073

Changed in starlingx:
importance: Undecided → High
status: New → Triaged
tags: added: stx.7.0
Revision history for this message
Austin Sun (sunausti) wrote :

please monitor the change https://review.opendev.org/c/starlingx/stx-puppet/+/825398 of LP1958073, once merged, need to check if this issue is gone.

Revision history for this message
YU CHENGDE (chant) wrote :

cat /etc/pltform/platform.conf

on SX VM
vswitch_type=none
## PS. for VM, only have ovs

on DX Baremetal
vswitch_type=ovs-dpdk

These might cause openstack failed to launch instance.

Revision history for this message
Ghada Khalil (gkhalil) wrote (last edit ):
Changed in starlingx:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.