Overcloud deploy failed at step 3 with failed to start tripleo_iscsid.service

Bug #1922537 reported by chandan kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

In Train/ussuri/victoria/master C8 integration line, all the ovb jobs are failing during overcloud deploy at step 3.

and in all these jobs iscsi-initiator-utils-6.2.1.2-1.gita8fcb37.el8.x86_64 is used which comes from CentOS Stream 8 base os repo.

https://logserver.rdoproject.org/openstack-periodic-integration-stable3/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-train/2f3f5c6/logs/overcloud-controller-0/var/log/paunch.log.txt.gz

```
2021-04-05 07:01:10.121 71719 WARNING paunch [ ] Did not find container with "['podman', 'ps', '-a', '--filter', 'label=container_name=iscsid', '--format', '{{.Names}}']"
2021-04-05 07:01:19.025 71719 ERROR paunch [ ] systemctl failed
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/paunch/utils/systemctl.py", line 32, in systemctl
    subprocess.check_call(cmd)
  File "/usr/lib64/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['systemctl', 'enable', '--now', 'tripleo_iscsid']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/paunch/utils/systemctl.py", line 75, in enable
    systemctl(cmd, log)
  File "/usr/lib/python3.6/site-packages/paunch/utils/systemctl.py", line 34, in systemctl
    raise SystemctlException(str(err))
paunch.utils.systemctl.SystemctlException: Command '['systemctl', 'enable', '--now', 'tripleo_iscsid']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/paunch/utils/systemd.py", line 114, in service_create
    systemctl.enable(service, now=True)
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 292, in wrapped_f
    return self.call(f, *args, **kw)
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 358, in call
    do = self.iter(retry_state=retry_state)
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 331, in iter
    raise retry_exc.reraise()
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 167, in reraise
    raise self.last_attempt.result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 361, in call
    result = fn(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/paunch/utils/systemctl.py", line 79, in enable
    raise SystemctlException(str(err))
paunch.utils.systemctl.SystemctlException: Command '['systemctl', 'enable', '--now', 'tripleo_iscsid']' returned non-zero exit status 1.

```

Below is the other jobs:
* https://logserver.rdoproject.org/openstack-periodic-integration-stable3/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_1comp-featureset002-train/e0ef879/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

https://logserver.rdoproject.org/openstack-periodic-integration-stable3/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-train/c305b39/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

https://logserver.rdoproject.org/openstack-periodic-integration-stable3/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-train/017ad64/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

Filing this bug to track the issue.

Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
chandan kumar (chkumar246) wrote :
description: updated
Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
chandan kumar (chkumar246) wrote :
description: updated
description: updated
description: updated
Revision history for this message
chandan kumar (chkumar246) wrote :
description: updated
Revision history for this message
John Fulton (jfulton-org) wrote :

I worked around this issue by downgrading from podman version 3.1.0-dev to podman version 3.0.2-dev

Revision history for this message
chandan kumar (chkumar246) wrote :

By looking at https://logserver.rdoproject.org/openstack-periodic-integration-stable4/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-centos-8-buildimage-overcloud-full-train/52df329/build.log

Enabling module streams:
2021-04-06 06:32:39.632 | container-tools rhel8
2021-04-06 06:32:39.632 | idm client
2021-04-06 06:32:39.632 | perl 5.26
2021-04-06 06:32:39.632 | perl-IO-Socket-SSL 2.066
2021-04-06 06:32:39.632 | perl-libwww-perl 6.34
2021-04-06 06:32:39.632 | ruby 2.5
2021-04-06 06:32:39.632 |

Need to find out how it getting enabled.

Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
chandan kumar (chkumar246) wrote :

Based on testing on host itself: https://logserver.rdoproject.org/15/33115/1/check/periodic-tripleo-centos-8-buildimage-overcloud-full-train/5d6b848/logs/script_build.log

CentOS-Stream - AppStream 641 kB/s | 4.4 kB 00:00
CentOS-Stream - Base 994 kB/s | 3.9 kB 00:00
CentOS-Stream - Extras 508 kB/s | 1.5 kB 00:00
CentOS-Stream - HighAvailability 1.6 MB/s | 3.9 kB 00:00
CentOS-Stream - PowerTools 1.8 MB/s | 4.4 kB 00:00
Last metadata expiration check: 0:07:40 ago on Tue 06 Apr 2021 08:36:16 AM UTC.
CentOS-Stream - AppStream
Name Stream Profiles Summary
container-tools rhel8 [d][e] common [d] Most recent (rolling) versions of podman, buildah, skopeo, runc, conmon, runc, conmon, CRIU, Udica, etc as well as dependencies such as container-selinux built and tested together, and updated as frequently as every 12 weeks.
perl 5.26 [d][e] common [d], minimal Practical Extraction and Report Language
perl-IO-Socket-SSL 2.066 [d][e] common [d] Perl library for transparent TLS
perl-libwww-perl 6.34 [d][e] common [d] A Perl interface to the World-Wide Web
python36 3.6 [d][e] build, common [d] Python programming language, version 3.6
ruby 2.5 [d][e] common [d] An interpreter of object-oriented scripting language

Container-tools:rhel8 is enabled by default.

Revision history for this message
chandan kumar (chkumar246) wrote :

Proposed patch: https://review.opendev.org/c/openstack/tripleo-ci/+/784887 to enable container-tools:3.0 by default.

Revision history for this message
John Fulton (jfulton-org) wrote :

I'm hitting this now with podman version 3.0.2-dev.

[CentOS-8 - stack@undercloud ~]$ podman --version
podman version 3.0.2-dev
[CentOS-8 - stack@undercloud ~]$ rpm -q podman
podman-3.0.1-6.module_el8.5.0+736+58cc1a5a.x86_64
[CentOS-8 - stack@undercloud ~]$ sudo podman images | grep iscsi
undercloud.ctlplane.mydomain.tld:8787/tripleomaster/openstack-iscsid current-tripleo efbb81f56eb4 4 weeks ago 767 MB
[CentOS-8 - stack@undercloud ~]$
[CentOS-8 - stack@undercloud ~]$ sudo systemctl start tripleo_iscsid.service
Job for tripleo_iscsid.service failed because the control process exited with error code.
See "systemctl status tripleo_iscsid.service" and "journalctl -xe" for details.
[CentOS-8 - stack@undercloud ~]$ systemctl status tripleo_iscsid.service
● tripleo_iscsid.service - iscsid container
   Loaded: loaded (/etc/systemd/system/tripleo_iscsid.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2021-04-14 20:10:08 UTC; 5s ago
  Process: 313682 ExecStopPost=/usr/bin/podman stop -t 10 iscsid (code=exited, status=125)
  Process: 313574 ExecStart=/usr/bin/podman start iscsid (code=exited, status=125)
[CentOS-8 - stack@undercloud ~]$

Revision history for this message
John Fulton (jfulton-org) wrote :

 container "698eebc597a96b2aeb2b2b1549ccaff6d0d4560bdda44d958c95fac5edc350e4": container_linux.go:370: starting container process caused: unknown capability "CAP_PERFMON"

[CentOS-8 - root@undercloud ~]# rpm -q runc
runc-1.0.0-70.rc92.module_el8.5.0+736+58cc1a5a.x86_64
[CentOS-8 - root@undercloud ~]#

 https://github.com/opencontainers/runc/blob/v1.0.0-rc92/libcontainer/container_linux.go#L370

wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
John Fulton (jfulton-org) wrote :

Workaround:

 sudo dnf module enable -y container-tools:3.0
 sudo -E tripleo-repos current-tripleo-dev --stream

Looks like we're waiting for this bug to be solved:

 https://bugzilla.redhat.com/show_bug.cgi?id=1946982

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.