systems-module-load.service is not getting started in overcloud deploy

Bug #1956441 reported by Amol Kahat
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Amol Kahat

Bug Description

Error

Unable to start service systemd-modules-load.service

Actual Error
============

2022-01-05 05:44:02 | 2022-01-05 05:43:58.046688 | fa163ecc-7782-df20-9847-000000001650 | FATAL | Modules reload | overcloud-controller-0 | error={"changed": false, "msg": "Unable to start service systemd-modules-load.service: Job for systemd-modules-load.service failed because the control process exited with error code.\nSee \"systemctl status systemd-modules-load.service\" and \"journalctl -xe\" for details.\n"}
2022-01-05 05:44:02 | 2022-01-05 05:43:58.048487 | fa163ecc-7782-df20-9847-000000001650 | TIMING | tripleo-kernel : Modules reload | overcloud-controller-0 | 0:03:21.555599 | 0.55s
2022-01-05 05:44:02 | 2022-01-05 05:43:58.529070 | fa163ecc-7782-df20-9847-000000001650 | FATAL | Modules reload | overcloud-controller-1 | error={"changed": false, "msg": "Unable to start service systemd-modules-load.service: Job for systemd-modules-load.service failed because the control process exited with error code.\nSee \"systemctl status systemd-modules-load.service\" and \"journalctl -xe\" for details.\n"}
2022-01-05 05:44:02 | 2022-01-05 05:43:58.529763 | fa163ecc-7782-df20-9847-000000001650 | TIMING | tripleo-kernel : Modules reload | overcloud-controller-1 | 0:03:22.036865 | 0.54s
2022-01-05 05:44:02 | 2022-01-05 05:43:59.714889 | fa163ecc-7782-df20-9847-00000000164d | CHANGED | Remove dracut-config-generic | overcloud-controller-2
2022-01-05 05:44:02 | 2022-01-05 05:43:59.716297 | fa163ecc-7782-df20-9847-00000000164d | TIMING | tripleo-kernel : Remove dracut-config-generic | overcloud-controller-2 | 0:03:23.223416 | 2.94s
2022-01-05 05:44:02 | 2022-01-05 05:43:59.782717 | fa163ecc-7782-df20-9847-00000000164e | TASK | Ensure the /etc/modules-load.d/ directory exists
2022-01-05 05:44:02 | 2022-01-05 05:44:00.102653 | fa163ecc-7782-df20-9847-00000000164e | OK | Ensure the /etc/modules-load.d/ directory exists | overcloud-controller-2
2022-01-05 05:44:02 | 2022-01-05 05:44:00.104416 | fa163ecc-7782-df20-9847-00000000164e | TIMING | tripleo-kernel : Ensure the /etc/modules-load.d/ directory exists | overcloud-controller-2 | 0:03:23.611530 | 0.32s
2022-01-05 05:44:02 | 2022-01-05 05:44:00.167325 | fa163ecc-7782-df20-9847-00000000164f | TASK | Write list of modules to load at boot
2022-01-05 05:44:02 | 2022-01-05 05:44:01.151096 | fa163ecc-7782-df20-9847-00000000164f | CHANGED | Write list of modules to load at boot | overcloud-controller-2
2022-01-05 05:44:02 | 2022-01-05 05:44:01.153209 | fa163ecc-7782-df20-9847-00000000164f | TIMING | tripleo-kernel : Write list of modules to load at boot | overcloud-controller-2 | 0:03:24.660304 | 0.99s
2022-01-05 05:44:02 | 2022-01-05 05:44:01.231278 | fa163ecc-7782-df20-9847-000000001650 | TASK | Modules reload
2022-01-05 05:44:02 | 2022-01-05 05:44:01.806541 | fa163ecc-7782-df20-9847-000000001650 | FATAL | Modules reload | overcloud-controller-2 | error={"changed": false, "msg": "Unable to start service systemd-modules-load.service: Job for systemd-modules-load.service failed because the control process exited with error code.\nSee \"systemctl status systemd-modules-load.service\" and \"journalctl -xe\" for details.\n"}
2022-01-05 05:44:02 | 2022-01-05 05:44:01.808336 | fa163ecc-7782-df20-9847-000000001650 | TIMING | tripleo-kernel : Modules reload | overcloud-controller-2 | 0:03:25.315448 | 0.58s

Logs
====
- https://logserver.rdoproject.org/openstack-component-clients/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-clients-train/7c1b50e/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
- https://logserver.rdoproject.org/openstack-component-clients/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-clients-ussuri/d8015d1/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
- https://logserver.rdoproject.org/openstack-component-baremetal/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-baremetal-wallaby/02ddf5d/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
- https://logserver.rdoproject.org/openstack-component-baremetal/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-baremetal-master/cc26afc/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

Ronelle Landy (rlandy)
Changed in tripleo:
milestone: none → yoga-1
assignee: nobody → Amol Kahat (amolkahat)
importance: Medium → Critical
tags: added: promotion-blocker
removed: alert
Revision history for this message
Amol Kahat (amolkahat) wrote :

Service systemd-modules-load.service failed to start

● systemd-modules-load.service - Load Kernel Modules
   Loaded: loaded (/usr/lib/systemd/system/systemd-modules-load.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2022-01-05 14:26:11 UTC; 4min 7s ago
     Docs: man:systemd-modules-load.service(8)
           man:modules-load.d(5)
  Process: 14681 ExecStart=/usr/lib/systemd/systemd-modules-load (code=exited, status=1/FAILURE)
 Main PID: 14681 (code=exited, status=1/FAILURE)

Jan 05 14:26:11 overcloud-controller-1 systemd[1]: Starting Load Kernel Modules...
Jan 05 14:26:11 overcloud-controller-1 systemd-modules-load[14681]: Inserted module 'br_netfilter'
Jan 05 14:26:11 overcloud-controller-1 systemd-modules-load[14681]: Module 'msr' is builtin
Jan 05 14:26:11 overcloud-controller-1 systemd-modules-load[14681]: Failed to insert 'ipmi_si': No such device
Jan 05 14:26:11 overcloud-controller-1 systemd[1]: systemd-modules-load.service: Main process exited, code=exited, status=1/FAILURE
Jan 05 14:26:11 overcloud-controller-1 systemd[1]: systemd-modules-load.service: Failed with result 'exit-code'.
Jan 05 14:26:11 overcloud-controller-1 systemd[1]: Failed to start Load Kernel Modules.

- https://logserver.rdoproject.org/50/37350/4/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-baremetal-master/9e0896f/logs/overcloud-novacompute-0/var/log/extra/services.txt.gz
- https://logserver.rdoproject.org/50/37350/4/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-baremetal-victoria/ed39f66/logs/overcloud-novacompute-0/var/log/extra/services.txt.gz
- https://logserver.rdoproject.org/50/37350/4/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-baremetal-wallaby/b6aee90/logs/overcloud-controller-1/var/log/extra/services.txt.gz

Revision history for this message
Ronelle Landy (rlandy) wrote :

Possibly related BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1702452

(c8 is still 4.18 , and that bz was from 5.1, so I suspect same bug, just was backported into stream ?)

http://mirror.centos.org/centos/8-stream/BaseOS/x86_64/os/Packages/?C=M;O=D

has kernel update on 12/22 and other updates on 01/03

with latest failure: latest failure log: https://logserver.rdoproject.org/01/822001/4/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039/dd33688/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

Revision history for this message
Ronelle Landy (rlandy) wrote :

stevebaker[m]> rlandy|ruck: Getting a minimal reproducer would help to know what bug to raise. You could start by sourcing the centos8 qcow2 which was used to build this overcloud and see if that boots in this cloud. If that works try using the same diskimage-builder to build a basic centos vm image and boot that. Hopefully it is not necessary to build a centos partition image to reproduce

Revision history for this message
Ronelle Landy (rlandy) wrote :
Revision history for this message
Ronelle Landy (rlandy) wrote :

some investigation into the differences of the packages in the image build:

last passing:

2022-01-04 18:10:16.773 | kexec-tools x86_64 2.0.20-63.el8 baseos 521 k

first failing:

 kexec-tools x86_64 2.0.20-67.el8 baseos 522 k

Revision history for this message
Ronelle Landy (rlandy) wrote :

Look at fs002, (which builds images) you can see the start of the failure on 01/05:

https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-ovb-1ctlr_1comp-featureset002-master

Revision history for this message
Ronelle Landy (rlandy) wrote :

Similar BZ open:

Failed to insert 'ipmi_si': No such device causes systemd-modules-load.service failed - logged against RHEL 8.6

https://bugzilla.redhat.com/show_bug.cgi?id=2030993

Revision history for this message
chandan kumar (chkumar246) wrote :

Testing here with the latest centos stream 8 image: https://review.rdoproject.org/r/c/testproject/+/37791

Revision history for this message
Ronelle Landy (rlandy) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ci (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/823694

Revision history for this message
Amol Kahat (amolkahat) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/823737

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart/+/823738

Revision history for this message
Alfredo Moralejo (amoralej) wrote :

Another workaround would be to create /etc/modprobe.d/ipmi_si.conf with following content during image creation:

blacklist ipmi_si

That will prevent from trying to enable the module,

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ansible/+/823798

Revision history for this message
chandan kumar (chkumar246) wrote :

823798: Blacklist ipmi_si for CentOS 8 distros | https://review.opendev.org/c/openstack/tripleo-ansible/+/823798 and testing here with testproject: https://review.rdoproject.org/r/c/testproject/+/37857

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ci (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ci/+/823736
Committed: https://opendev.org/openstack/tripleo-ci/commit/018f3ecf7e2100d873391a354086f17469514ecf
Submitter: "Zuul (22348)"
Branch: master

commit 018f3ecf7e2100d873391a354086f17469514ecf
Author: Ronelle Landy <email address hidden>
Date: Thu Jan 6 18:16:25 2022 -0500

    Remove fwupd-redfish.conf file from overcloud images

    Right now, the fwupd package is attempting to load the module ipmi_si
    and this one is failing to be load. This patch customize the qcow2 image
    removing the fwupd-redfish.conf that is causing this problem.
    It is, right now hard to just exclude the package because of the
    dependences, as well as downgrade, because of how the repositories are
    configured in the image, so the best approach to temporarly solve the
    problem is in this way with virt-customize.

    Related-Bug: #1956441

    Signed-off-by: Amol Kahat <email address hidden>

    Change-Id: I07b2344f85c6997c10c3e50a86ba8e16dfcb1355

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart (master)

Change abandoned by "Ronelle Landy <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart/+/823738

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ci (master)

Change abandoned by "Ronelle Landy <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/823737

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "amolkahat <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/823694
Reason: https://review.opendev.org/c/openstack/tripleo-ci/+/823736

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ansible (master)

Change abandoned by "chandan kumar <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ansible/+/823798
Reason: 823736: Remove fwupd-redfish.conf file from overcloud images | https://review.opendev.org/c/openstack/tripleo-ci/+/823736

Revision history for this message
Alan Pevec (apevec) wrote :
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.