victoria - overcloud deployment fails with: rabbitmq failed to start - minimal feature set

Bug #1951577 reported by Amol Kahat
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Invalid
Critical
Unassigned

Bug Description

Description
===========
Container failed to create in overcloud deploy

Actual Error
============
2021-11-18 16:31:56 | 2021-11-18 16:31:56.026856 | 00f6c653-303e-8099-65de-000000006e93 | OK | Check podman create status | overcloud-compute-0 | item=nova_wait_for_compute_service
2021-11-18 16:31:56 | 2021-11-18 16:31:56.029674 | 00f6c653-303e-8099-65de-000000006e93 | TIMING | tripleo_container_manage : Check podman create status | overcloud-compute-0 | 1:30:01.302740 | 673.19s
2021-11-18 16:31:56 | 2021-11-18 16:31:56.034657 | 00f6c653-303e-8099-65de-000000006e93 | TIMING | tripleo_container_manage : Check podman create status | overcloud-compute-0 | 1:30:01.307717 | 673.19s
2021-11-18 16:31:56 | 2021-11-18 16:31:56.053895 | 00f6c653-303e-8099-65de-000000006e95 | TASK | Check containers status
2021-11-18 16:31:58 |  [ERROR]: Container(s) which finished with wrong return code:

Nova compute error
==================

2021-11-18 16:20:38.444 6 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): grep -F node.session.scan /sbin/iscsiadm execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:384
2021-11-18 16:20:38.461 6 DEBUG oslo_concurrency.processutils [-] CMD "grep -F node.session.scan /sbin/iscsiadm" returned: 0 in 0.018s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:423
2021-11-18 16:20:38.532 6 ERROR oslo.messaging._drivers.impl_rabbit [req-43840109-d4c7-4991-bca4-43a88c4f5f2f - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 2.0 seconds): ConnectionRefusedError: [Errno 111] ECONNREFUSED
2021-11-18 16:20:40.557 6 ERROR oslo.messaging._drivers.impl_rabbit [req-43840109-d4c7-4991-bca4-43a88c4f5f2f - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 4.0 seconds): ConnectionRefusedError: [Errno 111] ECONNREFUSED
2021-11-18 16:20:44.601 6 ERROR oslo.messaging._drivers.impl_rabbit [req-43840109-d4c7-4991-bca4-43a88c4f5f2f - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 6.0 seconds): ConnectionRefusedError: [Errno 111] ECONNREFUSED
2021-11-18 16:20:50.636 6 ERROR oslo.messaging._drivers.impl_rabbit [req-43840109-d4c7-4991-bca4-43a88c4f5f2f - - - - -] Connection failed: [Errno 111] ECONNREFUSED (retrying in 8.0 seconds): ConnectionRefusedError: [Errno 111] ECONNREFUSED

Nova Compute Logs
=================
https://artifacts.ci.centos.org/rdo/jenkins-tripleo-quickstart-promote-victoria-current-tripleo-delorean-minimal-57/overcloud-compute-0/var/log/containers/nova/nova-compute.log

Logs
====
- https://artifacts.ci.centos.org/rdo/jenkins-tripleo-quickstart-promote-victoria-current-tripleo-delorean-minimal-57/
- https://artifacts.ci.centos.org/rdo/jenkins-tripleo-quickstart-promote-victoria-current-tripleo-delorean-minimal-58/

Ronelle Landy (rlandy)
Changed in tripleo:
status: Confirmed → Triaged
milestone: yoga-2 → yoga-1
importance: High → Critical
Revision history for this message
Ronelle Landy (rlandy) wrote : Re: container nova_wait_for_compute_service failed to create - victoria, minimal featureset
summary: - container nova_wait_for_compute_service failed to create
+ container nova_wait_for_compute_service failed to create - victoria
summary: - container nova_wait_for_compute_service failed to create - victoria
+ container nova_wait_for_compute_service failed to create - victoria,
+ minimal featureset
summary: - container nova_wait_for_compute_service failed to create - victoria,
- minimal featureset
+ victoria - overcloud deployment fails with: container
+ nova_wait_for_compute_service failed to create - minimal feature set
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/818726

Revision history for this message
Bogdan Dobrelya (bogdando) wrote : Re: victoria - overcloud deployment fails with: container nova_wait_for_compute_service failed to create - minimal feature set

The failure reason is rabbitmq was missing

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Also related
from https://artifacts.ci.centos.org/rdo/jenkins-tripleo-quickstart-promote-victoria-current-tripleo-delorean-minimal<email address hidden>

2021-11-18 15:47:00.769 [error] <0.1285.0> ** Connection attempt from disallowed node 'rabbitmqcli-84-rabbit@overcloud-controller-0' **
2021-11-18 15:47:00.831 [error] <0.1287.0> ** Connection attempt from disallowed node 'rabbitmqcli-84-rabbit@overcloud-controller-0' **
2021-11-18 15:47:04.011 [error] <0.1295.0> ** Connection attempt from disallowed node 'rabbitmqcli-193-rabbit@overcloud-controller-0' **
2021-11-18 15:47:04.095 [error] <0.1297.0> ** Connection attempt from disallowed node 'rabbitmqcli-193-rabbit@overcloud-controller-0' **
2021-11-18 15:53:43.693 [info] <0.1936.0> RabbitMQ is asked to stop...

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

There is a chance that the issue is intermittent and related to the poor env perf

summary: - victoria - overcloud deployment fails with: container
- nova_wait_for_compute_service failed to create - minimal feature set
+ victoria - overcloud deployment fails with: rabbitmq failed to start -
+ minimal feature set
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Changed in tripleo:
status: Triaged → Invalid
Revision history for this message
Ronelle Landy (rlandy) wrote :

Yes - this looks sporadic - passed in last run

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ansible-role-collect-logs (master)

Reviewed: https://review.opendev.org/c/openstack/ansible-role-collect-logs/+/818752
Committed: https://opendev.org/openstack/ansible-role-collect-logs/commit/529f0eecea71979a0b91be6a673279a249026f7c
Submitter: "Zuul (22348)"
Branch: master

commit 529f0eecea71979a0b91be6a673279a249026f7c
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Nov 22 11:44:43 2021 +0100

    Also collect rabbitmq report and cookies

    Related-bug: #1951577

    Change-Id: I73c4458bbfac04ee21f2f8ee42b8293046355911
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/openstack/ansible-role-collect-logs/+/818754
Committed: https://opendev.org/openstack/ansible-role-collect-logs/commit/4a4fba487a049369b1a2a6cce78c0988265c7dfe
Submitter: "Zuul (22348)"
Branch: master

commit 4a4fba487a049369b1a2a6cce78c0988265c7dfe
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Nov 22 12:07:13 2021 +0100

    Collect pcs CPU throttle events to logstash

    That helps to determine "slow envs".

    Related-bug: #1951577

    Change-Id: I11814e3fd02f3a8841a7b7038b0a5291ab51cf12
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/818726
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/dbf5d36fdf0660669b3199118b30f71872d676b7
Submitter: "Zuul (22348)"
Branch: master

commit dbf5d36fdf0660669b3199118b30f71872d676b7
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Nov 22 10:51:00 2021 +0100

    Add timestamps to nova/placement wait for scripts

    Related-bug: #1951577

    Change-Id: I5ca99f53540d27b3e7824d22910ddc69cae3c9d0
    Signed-off-by: Bogdan Dobrelya <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.