cs9 fs01 check job failing on node_provisioning

Bug #1987632 reported by Soniya Murlidhar Vyas
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

https://review.rdoproject.org/zuul/builds?job_name=tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001

tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001 is RED from a very long time.
It is currently failing at overcloud_node_provision step with following error[1]
```
[H[J[1;1H[?25l[m[H[J[1;1H[20;7H[mUse the ^ and v keys to change the selection.
      Press 'e' to edit the selected item, or 'c' for a command prompt. [4;80H [m[4;1H CentOS Stream (5.14.0-148.el9.x86_64) 9 [m[4;79H[m[m[5;1H CentOS Stream (5.14.0-80.el9.x86_64) 9 [m[5;79H[m[7m[6;1H CentOS Stream (5.14.0-160.el9.x86_64) 9 [m[6;79H[m[m[7;1H [m[7;79H[m[m[8;1H [m[8;79H[m[m[9;1H [m[9;79H[m[m[10;1H [m[10;79H[m[m[11;1H [m[11;79H[m[m[12;1H [m[12;79H[m[m[13;1H [m[13;79H[m[m[14;1H [m[14;79H[m[m[15;1H [m[15;79H[m[m[16;1H [m[16;79H[m[m[17;1H [m[17;79H[m[m[18;1H [m[18;79H[m[18;80H [6;79H[22;1H The selected entry will be started automatically in 5s. [6;79H[22;1H The selected entry will be started automatically in 4s. [6;79H[22;1H The selected entry will be started automatically in 3s. [6;79H[22;1H The selected entry will be started automatically in 2s. [6;79H[22;1H The selected entry will be started automatically in 1s. [6;79H[22;1H The selected entry will be started automatically in 0s. [6;79H[?25h[H[J[1;1H[H[J[1;1H
```

Note:
We consume pre-built overcloud images(promoted image) in RDO
ovb check job. Each time a new kernel comes. During repo
update as a part of dnf update process,
kernel also gets updated on the promoted images.

The images boots with the new default kernel changes but
kernel params are not properly set leading to overcloud
node provisioning issue.

The short term solution is to exclude the kernel from dnf update
or promote overcloud images with the latest kernel.

Since promotion is blocked on other issues so going with
excluding the kernel from dnf update step.

[1]. https://logserver.rdoproject.org/82/856582/2/openstack-check/tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001/f6ce217/logs/baremetal_2_6397_0-console.log

Revision history for this message
chandan kumar (chkumar246) wrote :

We are again seeing this issue on fs01 check jobs. https://review.rdoproject.org/zuul/builds?job_name=tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001&skip=0
https://logserver.rdoproject.org/03/856603/2/openstack-check/tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001/6aaa825/logs/baremetal_2_51224_3-console.log

```
[H[J[1;1H[?25l[m[H[J[1;1H[20;7H[mUse the ^ and v keys to change the selection.
      Press 'e' to edit the selected item, or 'c' for a command prompt. [4;80H [m[4;1H CentOS Stream (5.14.0-148.el9.x86_64) 9 [m[4;79H[m[m[5;1H CentOS Stream (5.14.0-80.el9.x86_64) 9 [m[5;79H[m[7m[6;1H CentOS Stream (5.14.0-160.el9.x86_64) 9 [m[6;79H[m[m[7;1H [m[7;79H[m[m[8;1H [m[8;79H[m[m[9;1H [m[9;79H[m[m[10;1H [m[10;79H[m[m[11;1H [m[11;79H[m[m[12;1H [m[12;79H[m[m[13;1H [m[13;79H[m[m[14;1H [m[14;79H[m[m[15;1H [m[15;79H[m[m[16;1H [m[16;79H[m[m[17;1H [m[17;79H[m[m[18;1H [m[18;79H[m[18;80H [6;79H[22;1H The selected entry will be started automatically in 5s. [6;79H[22;1H The selected entry will be started automatically in 4s. [6;79H[22;1H The selected entry will be started automatically in 3s. [6;79H[22;1H The selected entry will be started automatically in 2s. [6;79H[22;1H The selected entry will be started automatically in 1s. [6;79H[22;1H The selected entry will be started automatically in 0s. [6;79H[?25h[H[J[1;1H[H[J[1;1H

```

Not sure it is related to new kernel. It needs to be investigated.

Ronelle Landy (rlandy)
Changed in tripleo:
importance: Undecided → Critical
summary: - cs9 fs01 check job failing on node_provisioning with msg - ""msg":
- "timed out waiting for ping module test: Data could not be sent to
- remote host"
+ cs9 fs01 check job failing on node_provisioning
description: updated
description: updated
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

There are some greens on that job now, did this self-resolve or was something pinned?

Revision history for this message
chandan kumar (chkumar246) wrote :

We have rebuilt the cs9 nodepool virt image with new kernel and also we got master promotion.
In order to avoid this issue temporarily in future, we have proposed following fix

856603: Exclude kernel from dnf update | https://review.opendev.org/c/openstack/tripleo-quickstart/+/856603

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-quickstart/+/856603
Committed: https://opendev.org/openstack/tripleo-quickstart/commit/57c7090df3821dc577045e61189ab810bca6b14e
Submitter: "Zuul (22348)"
Branch: master

commit 57c7090df3821dc577045e61189ab810bca6b14e
Author: Chandan Kumar (raukadah) <email address hidden>
Date: Fri Sep 9 17:43:31 2022 +0530

    Exclude kernel from dnf update

    We consume pre-built overcloud images(promoted image) in RDO
    ovb check job. Each time a new kernel comes. During repo
    update as a part of dnf update process,
    kernel also gets updated on the promoted images.

    The images boots with the new default kernel changes but
    kernel params are not properly set leading to overcloud
    node provisioning issue.

    The short term solution is to exclude the kernel from dnf update
    or promote overcloud images with the latest kernel.

    Since promotion is blocked on other issues so going with
    excluding the kernel from dnf update step.

    Related-Bug: #1987632

    Signed-off-by: Chandan Kumar (raukadah) <email address hidden>
    Change-Id: I1456299ddb65e9e40c8587febc3311b98fb3b37b

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/858010
Committed: https://opendev.org/openstack/tripleo-quickstart-extras/commit/3c3c3dd7f2aa9aa784cf568696a47cafe52cbfd6
Submitter: "Zuul (22348)"
Branch: master

commit 3c3c3dd7f2aa9aa784cf568696a47cafe52cbfd6
Author: yatinkarel <email address hidden>
Date: Fri Sep 16 11:00:04 2022 +0530

    Exclude kernel from dnf update

    DNF update including kernel during image customization makes
    overcloud images unbootable.

    [1] Skipped kernel update during repo setup but kernel
    get's updated during setting up packages from gating repo.
    This patch excludes kernel update here too to avoid this issue.

    [1] https://review.opendev.org/c/openstack/tripleo-quickstart/+/856603

    Related-Bug: #1987632
    Change-Id: Ief8021702cfca3a3767299464f14a0ef6128eae0

Ronelle Landy (rlandy)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.