build-pkgs cannot complete std build

Bug #1794415 reported by Erich Cordoba
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Austin Sun

Bug Description

Brief Description
-----------------
After running `build-pkgs` the build stops showing that the following packages failed:

12:03:24 Failed to build packages: vm-topology-1.0-1.tis.src.rpm libvirt-python-3.5.0-1.tis.1.src.rpm python-cephclient-0.1.0.5-0.tis.2.src.rpm python-networking-bgpvpn-7.0.0-0.tis.3.src.rpm openstack-neutron-11.0.0-1.tis.5.src.rpm python-networking-sfc-5.0.0-1.tis.2.src.rpm i40e-kmod-2.4.10-0.tis.1.src.rpm libvirt-3.5.0-1.tis.2.src.rpm qemu-kvm-ev-2.10.0-0.tis.0.src.rpm openvswitch-2.9.0-3.el7.tis.1.src.rpm ceph-10.2.6-0.el7.tis.1.src.rpm

######## Tue Sep 25 12:03:25 UTC 2018: build-rpm-parallel --std failed with rc=1

The same behavior was seen in three different workstations with a slightly difference in the number of packages failed.

Steps to Reproduce
------------------
With a fresh clone of all the code, run:
build-pkgs

The build will stop after a couple of hours.

Expected Behavior
------------------
build-pkgs should complete the building.

Actual Behavior
----------------
build-pkgs stops showing that several packages failed.

Should be noticed that some packages are failing as they have as dependency another failing packages. In this execution I was able to identify two failing packages: openvswitch and ceph.

However going into mock environment and build these packages manually results in a successful build.

Should be noticed that on subsequent executions of `build-pkgs` the number of failing packages was reduced without any change on environment.

Reproducibility
---------------
3/3 systems reproduce this behavior.
In one system running build-pkgs again resulted in a succeed build
In another the number of failing package was reduced but it wasn't possible to complete the build just repeating the command.

Branch/Pull Time/Commit
-----------------------
commit 717da7b675 on stx-root

Revision history for this message
Erich Cordoba (ericho) wrote :

I lost the initial logs debugging the issue, once I run it again I'll upload the logs.

tags: added: stx.build
Revision history for this message
Erich Cordoba (ericho) wrote :

I was able to unblock the building by reverting this change: http://git.starlingx.io/cgit/stx-root/commit/?id=b20ac0164dd830eda64426b076322d7ac9365d20

Revision history for this message
Erich Cordoba (ericho) wrote :

Reverting the change mentioned earlier worked in a attempt, doing it again in the automated procedure did not.

Revision history for this message
Erich Cordoba (ericho) wrote :
Revision history for this message
Erich Cordoba (ericho) wrote :
Revision history for this message
Erich Cordoba (ericho) wrote :
Revision history for this message
Erich Cordoba (ericho) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

This is a blocking issue for the stx.2018.10 code freeze.

Changed in starlingx:
importance: Undecided → High
tags: added: stx.2018.10
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Based on the community call today, reverting http://git.starlingx.io/cgit/stx-root/commit/?id=b20ac0164dd830eda64426b076322d7ac9365d20 did not address the intermittent build issues reported here. Further investigation is still required.

Revision history for this message
Paul-Emile Element (paul-emileelement) wrote :

Erich:
The attached libvirt build log does not show the error for that package. Would it be possible to also attach the file 'root.log' from the same directory

Revision history for this message
Corey Erickson (corebits) wrote :

I was encountering this error this morning as well. I ran `build-pkgs --serial` and was able to build successfully.

Revision history for this message
Yatindra (yatindra) wrote :

I still get same issue with the "build-pkgs" and "build-pkgs --serial". I run "build-pkgs --serial" after "build-pkgs --clean"

Revision history for this message
Erich Cordoba (ericho) wrote :

The root cause of the issue has been found. It turns out that the mock environment is recycled between build packages to save time. After the CentOS 7.5 upgrade the autoconf-archive package is built and installed in the mock environment. This package contains a set of autoconf macros which conflict with the build scripts for ceph. Therefore, ceph fails with the 'too many loops' error and all the dependant packages fails as well, see [1]

It's to clear yet what causes the failure in autoconf, however we have identified two possible solutions for this issue.

1) Change autogen.sh script by removing the --install flag from the aclocal command. The --install flag install the macros available in the system into the directory specified in the same command. By removing --install we ensure that no 3rd party macros will be present in ceph environment.

sed -i 's/aclocal -I m4 --install/aclocal -I m4/g' \$MY_REPO/stx/git/ceph/autogen.sh

2) Copy the ax_require_defined.m4 file into m4 folder in ceph. The issue happens when ax_required_defined.m4 is being installed into the m4 folder (because --install flag). By copying this file the installation is skipped and ceph can be built. See [2]

[1] http://lists.starlingx.io/pipermail/starlingx-discuss/2018-September/001302.html
[2] https://github.com/starlingx-staging/stx-ceph/pull/2

Revision history for this message
Bruce Jones (brucej) wrote :

Do we have a fix in flight for this issue? Which of the two alternatives (if either) will be used?

Revision history for this message
Ghada Khalil (gkhalil) wrote :

It looks like Austin made a pull request to stx-ceph in the master branch:
https://github.com/starlingx-staging/stx-ceph/pull/2

The same is needed for the stx.2018.10 release branch

Changed in starlingx:
assignee: nobody → Austin Sun (sunausti)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

The equivalent pull request was merged in the stx.2018.10 release branch as of 2018-10-03
https://github.com/starlingx-staging/stx-ceph/pull/3

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as Fix Released based on the two pull requests above.
If more issues are encountered with ceph builds, please re-open this bug and provide the most recent details

Changed in starlingx:
status: New → Fix Released
Ken Young (kenyis)
tags: added: stx.1.0
removed: stx.2018.10
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.