build hang in docker-build-images
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
New
|
Low
|
Unassigned |
Bug Description
Brief Description
-----------------
CENGN debian build of master branch initiated on Jan 31, 2:00 am EST
hung during 'build-
Specifically the 'n3000-opae' docker image hung on the
'yum install -y python-pip' of Step 2
I was able to resume the build by killing the docker build of 'n3000-opae'.
The automatic retry of the 'n3000-opae' build was successful
I couldn't find a reason for the hang.
Build tools probably need to add a 'timeout' somewhere. Probably within build-stx-
Severity
--------
Major
Steps to Reproduce
------------------
cd $MY_REPO/
Expected Behavior
------------------
All docker images build
Actual Behavior
----------------
Build hangs for > 24hrs until manual intervention
Reproducibility
---------------
Seen Once
System Configuration
N/A
Branch/Pull Time/Commit
-------
20230131T070000Z
Last Pass
---------
20230129T070000
Timestamp/Logs
--------------
12:26:33 Building n3000-opae
12:26:33 Running: docker build /localdisk/
12:26:33 Sending build context to Docker daemon 2.56kB
12:26:33 Step 1/3 : FROM centos:7.9.2009
12:26:33 ---> eeb6ee3f44bd
12:26:33 Step 2/3 : RUN yum install -y pciutils which hwloc-libs libuuid-devel sysvinit-tools epel-release http://
...
12:27:33 Installed:
12:27:33 epel-release.noarch 0:7-11
12:27:33 hwloc-libs.x86_64 0:1.11.8-4.el7
12:27:33 libuuid-
12:27:33 opae-devel.x86_64 0:1.3.7-5.el7
12:27:33 opae-libs.x86_64 0:1.3.7-5.el7
12:27:33 opae-tools.x86_64 0:1.3.7-5.el7
12:27:33 opae-tools-
12:27:33 opae.admin.noarch 0:1.0.3-2.el7
12:27:33 pciutils.x86_64 0:3.5.1-3.el7
12:27:33 sysvinit-
12:27:33 which.x86_64 0:2.20-7.el7
12:27:33
12:27:33 Dependency Installed:
12:27:33 hwdata.x86_64 0:0.252-9.7.el7 libtirpc.x86_64 0:0.2.4-0.16.el7
12:27:33 libtool-ltdl.x86_64 0:2.4.2-22.el7_3 numactl-libs.x86_64 0:2.0.12-5.el7
12:27:33 pciutils-
12:27:33 python3-libs.x86_64 0:3.6.8-18.el7 python3-pip.noarch 0:9.0.3-8.el7
12:27:33 python3-
12:27:33
12:27:33 Dependency Updated:
12:27:33 libblkid.x86_64 0:2.23.2-65.el7_9.1 libmount.x86_64 0:2.23.2-65.el7_9.1
12:27:33 libsmartcols.x86_64 0:2.23.2-65.el7_9.1 libuuid.x86_64 0:2.23.2-65.el7_9.1
12:27:33 util-linux.x86_64 0:2.23.2-65.el7_9.1
12:27:33
12:27:33 Complete!
12:27:33 Loaded plugins: fastestmirror, ovl
12:27:34 Loading mirror speeds from cached hostfile
12:27:36 * base: centos.
12:27:36 * epel: mirror.siena.edu
12:27:36 * extras: centos.
12:27:36 * updates: centos.
<<< HANG >>>
Entering the container I saw ....
$ ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Jan31 ? 00:00:00 /bin/sh -c yum install -y pciutils which hwloc-libs libuuid-devel sysvinit-tools epel-release http://
root 77 1 0 Jan31 ? 00:00:00 /usr/bin/python /usr/bin/yum install -y python-pip
root 145 77 0 Jan31 ? 00:00:00 /usr/bin/python /usr/libexec/
root 146 77 0 Jan31 ? 00:00:00 /usr/bin/python /usr/libexec/
root 147 77 0 Jan31 ? 00:00:00 /usr/bin/python /usr/libexec/
root 157 0 0 15:34 pts/0 00:00:00 /bin/bash
root 172 157 0 15:34 pts/0 00:00:00 ps -ef
kill 147 146 145
Container build fails. Subsequent build passes without hang.
Test Activity
-------------
Build
Workaround
----------
# Discover and set up environmnet of hung command
cat /proc/382055/
source /tmp/e
# Enter docker build env
cd stx-tools/stx/bin/
./stx -d shell
# Discover and enter hung docker build
docker ps
docker exec -it 5b4fc6472a56 /bin/bash
# Identify and kill hung process
ps -ef
kill 147 146 145
one time occurrence; should be investigated, but doesn't hold up the stx.8.0 release activities