build hang in docker-build-images

Bug #2004488 reported by Scott Little
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
New
Low
Unassigned

Bug Description

Brief Description
-----------------
CENGN debian build of master branch initiated on Jan 31, 2:00 am EST
hung during 'build-stx-images.sh'.

Specifically the 'n3000-opae' docker image hung on the
'yum install -y python-pip' of Step 2

I was able to resume the build by killing the docker build of 'n3000-opae'.
The automatic retry of the 'n3000-opae' build was successful

I couldn't find a reason for the hang.

Build tools probably need to add a 'timeout' somewhere. Probably within build-stx-images.sh, within the retry loop but outside of the 'docker build'.

Severity
--------
Major

Steps to Reproduce
------------------
cd $MY_REPO/build-tools/build-docker-images && ./build-stx-images.sh --os-label=debian --attempts=3 --stream=stable --no-pull-base --version=20230129T070000Z --prefix=master --user=starlingx --latest --base=starlingx/stx-debian:master-stable-20230129T070000Z --wheels=$MY_WORKSPACE/std/build-wheels-debian-stable/stx-debian-stable-wheels.tar --cache --push --os=debian

Expected Behavior
------------------
All docker images build

Actual Behavior
----------------
Build hangs for > 24hrs until manual intervention

Reproducibility
---------------
Seen Once

System Configuration
N/A

Branch/Pull Time/Commit
-----------------------
20230131T070000Z

Last Pass
---------
20230129T070000

Timestamp/Logs
--------------
12:26:33 Building n3000-opae
12:26:33 Running: docker build /localdisk/designer/jenkins/debian-master/cgcs-root/stx/integ/kubernetes/n3000/debian/docker --file /localdisk/designer/jenkins/debian-master/cgcs-root/stx/integ/kubernetes/n3000/debian/docker/./Dockerfile --build-arg BASE=starlingx/stx-debian:master-stable-20230131T070000Z --tag jenkins/n3000-opae:master-debian-stable-build
12:26:33 Sending build context to Docker daemon 2.56kB

12:26:33 Step 1/3 : FROM centos:7.9.2009
12:26:33 ---> eeb6ee3f44bd
12:26:33 Step 2/3 : RUN yum install -y pciutils which hwloc-libs libuuid-devel sysvinit-tools epel-release http://mirror.starlingx.cengn.ca/mirror/centos/github.com/OPAE/opae-sdk/releases/download/1.3.7-5/opae-devel-1.3.7-5.el7.x86_64.rpm http://mirror.starlingx.cengn.ca/mirror/centos/github.com/OPAE/opae-sdk/releases/download/1.3.7-5/opae-libs-1.3.7-5.el7.x86_64.rpm http://mirror.starlingx.cengn.ca/mirror/centos/github.com/OPAE/opae-sdk/releases/download/1.3.7-5/opae-tools-1.3.7-5.el7.x86_64.rpm http://mirror.starlingx.cengn.ca/mirror/centos/github.com/OPAE/opae-sdk/releases/download/1.3.7-5/opae-tools-extra-1.3.7-5.el7.x86_64.rpm http://mirror.starlingx.cengn.ca/mirror/centos/github.com/OPAE/opae-sdk/releases/download/1.3.7-5/opae.admin-1.0.3-2.el7.noarch.rpm && yum install -y python-pip && yum clean all && rm -rf /var/cache/yum
...
12:27:33 Installed:
12:27:33 epel-release.noarch 0:7-11
12:27:33 hwloc-libs.x86_64 0:1.11.8-4.el7
12:27:33 libuuid-devel.x86_64 0:2.23.2-65.el7_9.1
12:27:33 opae-devel.x86_64 0:1.3.7-5.el7
12:27:33 opae-libs.x86_64 0:1.3.7-5.el7
12:27:33 opae-tools.x86_64 0:1.3.7-5.el7
12:27:33 opae-tools-extra.x86_64 0:1.3.7-5.el7
12:27:33 opae.admin.noarch 0:1.0.3-2.el7
12:27:33 pciutils.x86_64 0:3.5.1-3.el7
12:27:33 sysvinit-tools.x86_64 0:2.88-14.dsf.el7
12:27:33 which.x86_64 0:2.20-7.el7
12:27:33
12:27:33 Dependency Installed:
12:27:33 hwdata.x86_64 0:0.252-9.7.el7 libtirpc.x86_64 0:0.2.4-0.16.el7
12:27:33 libtool-ltdl.x86_64 0:2.4.2-22.el7_3 numactl-libs.x86_64 0:2.0.12-5.el7
12:27:33 pciutils-libs.x86_64 0:3.5.1-3.el7 python3.x86_64 0:3.6.8-18.el7
12:27:33 python3-libs.x86_64 0:3.6.8-18.el7 python3-pip.noarch 0:9.0.3-8.el7
12:27:33 python3-setuptools.noarch 0:39.2.0-10.el7 uuid.x86_64 0:1.6.2-26.el7
12:27:33
12:27:33 Dependency Updated:
12:27:33 libblkid.x86_64 0:2.23.2-65.el7_9.1 libmount.x86_64 0:2.23.2-65.el7_9.1
12:27:33 libsmartcols.x86_64 0:2.23.2-65.el7_9.1 libuuid.x86_64 0:2.23.2-65.el7_9.1
12:27:33 util-linux.x86_64 0:2.23.2-65.el7_9.1
12:27:33
12:27:33 Complete!
12:27:33 Loaded plugins: fastestmirror, ovl
12:27:34 Loading mirror speeds from cached hostfile
12:27:36 * base: centos.mirror.globo.tech
12:27:36 * epel: mirror.siena.edu
12:27:36 * extras: centos.mirror.globo.tech
12:27:36 * updates: centos.mirror.globo.tech
<<< HANG >>>

Entering the container I saw ....

$ ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Jan31 ? 00:00:00 /bin/sh -c yum install -y pciutils which hwloc-libs libuuid-devel sysvinit-tools epel-release http://mirror.starlingx.cengn.ca/mirror/centos/github.com/OP
root 77 1 0 Jan31 ? 00:00:00 /usr/bin/python /usr/bin/yum install -y python-pip
root 145 77 0 Jan31 ? 00:00:00 /usr/bin/python /usr/libexec/urlgrabber-ext-down
root 146 77 0 Jan31 ? 00:00:00 /usr/bin/python /usr/libexec/urlgrabber-ext-down
root 147 77 0 Jan31 ? 00:00:00 /usr/bin/python /usr/libexec/urlgrabber-ext-down
root 157 0 0 15:34 pts/0 00:00:00 /bin/bash
root 172 157 0 15:34 pts/0 00:00:00 ps -ef

kill 147 146 145

Container build fails. Subsequent build passes without hang.

Test Activity
-------------
Build

Workaround
----------
# Discover and set up environmnet of hung command
cat /proc/382055/environ | tr '\0' '\n' | grep -v '\(^_\|^SHELL=\|^SHLVL=\|^HOME=\|^USER=\|^OLDPWD=\|^PWD=\)' | sed 's#\(.*\)#export \1#' > /tmp/e

source /tmp/e

# Enter docker build env
cd stx-tools/stx/bin/
./stx -d shell

# Discover and enter hung docker build
docker ps
docker exec -it 5b4fc6472a56 /bin/bash

# Identify and kill hung process
ps -ef
kill 147 146 145

Tags: stx.build
Revision history for this message
Ghada Khalil (gkhalil) wrote :

one time occurrence; should be investigated, but doesn't hold up the stx.8.0 release activities

tags: added: stx.build
Changed in starlingx:
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.