debian: build-tools: apt fails with permission errors

Bug #1981094 reported by Davlet Panech
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
ZhangXiao

Bug Description

Brief Description
-----------------
When building packages on debian, the build sometimes fails with a permission error while trying to clean apt cache.

Severity
--------
Major

Steps to Reproduce
------------------
Error observed in Jenkins, the approximate sequence of steps performed by Jenkins is:

cd /path/to/stx-tools

./stx-init-env

stx build prepare

stx shell
  downloader -b -s -B
  build-pkgs -a -b std,rt

Expected Behavior
------------------
Build succeeds

Actual Behavior
----------------
Build fails

Reproducibility
---------------
Intermittent

System Configuration
--------------------
N/A

Branch/Pull Time/Commit
-----------------------
master/2022-08-08

Last Pass
---------
Unknown

Timestamp/Logs
--------------
05:26:23 2022-07-08 09:26:23,632 - debcontroller - DEBUG: Target dscs(4) passed to dsc_depends: ['/localdisk/loadbuild/jenkins/dpanech-debian/std/golang-1.16/golang-1.16_1.16.12-1~bpo11+1.stx.2.dsc', '/localdisk/loadbuild/jenkins/dpanech-debian/std/golang-1.17/golang-1.17_1.17.5-1~bpo11+1.stx.1.dsc', '/localdisk/loadbuild/jenkins/dpanech-debian/std/go-dep/go-dep_0.5.4-3.stx.1.dsc', '/localdisk/loadbuild/jenkins/dpanech-debian/std/bash/bash_5.1-2.stx.1.dsc']
05:26:41 rm: cannot remove '/var/cache/apt/archives/partial/*.deb': Permission denied
05:26:41 Traceback (most recent call last):
05:26:41 File "/localdisk/designer/jenkins/dpanech-debian/cgcs-root/build-tools/stx/dsc_depend.py", line 64, in get_aptcache
05:26:41 ret = apt_cache.update()
05:26:41 File "/usr/lib/python3/dist-packages/apt/cache.py", line 575, in update
05:26:41
05:26:41 raise FetchFailedException()
05:26:41 apt.cache.FetchFailedException
05:26:41
05:26:41 During handling of the above exception, another exception occurred:
05:26:41
05:26:41 Traceback (most recent call last):
05:26:41 File "/localdisk/designer/jenkins/dpanech-debian/cgcs-root/build-tools/stx/build-pkgs", line 1147, in <module>
05:26:41 build_controller.build_all(layers=layers, build_types=build_types, packages=packages)
05:26:41 File "/localdisk/designer/jenkins/dpanech-debian/cgcs-root/build-tools/stx/build-pkgs", line 810, in build_all
05:26:41 self.build_layers(layers=layers, build_types=build_types, packages=packages)
05:26:41 File "/localdisk/designer/jenkins/dpanech-debian/cgcs-root/build-tools/stx/build-pkgs", line 937, in build_layers
05:26:41 self.build_layer(layer=layer, build_types=build_types, packages=packages)
05:26:41 File "/localdisk/designer/jenkins/dpanech-debian/cgcs-root/build-tools/stx/build-pkgs", line 912, in build_layer
05:26:41 self.build_layer_and_build_types(layer=layer, build_types=build_types, packages=packages)
05:26:41 File "/localdisk/designer/jenkins/dpanech-debian/cgcs-root/build-tools/stx/build-pkgs", line 897, in build_layer_and_build_types
05:26:41 self.build_layer_and_build_type(layer=layer, build_type=build_type, packages=packages)
05:26:41 File "/localdisk/designer/jenkins/dpanech-debian/cgcs-root/build-tools/stx/build-pkgs", line 862, in build_layer_and_build_type
05:26:41 self.build_packages(layer_pkg_dirs, pkg_dirs, layer, build_type=build_type)
05:26:41 File "/localdisk/designer/jenkins/dpanech-debian/cgcs-root/build-tools/stx/build-pkgs", line 992, in build_packages
05:26:41 self.run_build_loop(layer_pkgdir_dscs, target_pkgdir_dscs, layer, build_type=build_type)
05:26:41 File "/localdisk/designer/jenkins/dpanech-debian/cgcs-root/build-tools/stx/build-pkgs", line 707, in run_build_loop
05:26:41 deps_resolver = dsc_depend.Dsc_build_order(dsc_list_file, dscs_list, ds_logger)
05:26:41 File "/localdisk/designer/jenkins/dpanech-debian/cgcs-root/build-tools/stx/dsc_depend.py", line 923, in __init__
05:26:41 self.aptcache = get_aptcache(apt_rootdir)
05:26:41 File "/localdisk/designer/jenkins/dpanech-debian/cgcs-root/build-tools/stx/dsc_depend.py", line 67, in get_aptcache
05:26:41 raise Exception('APT update failed')
05:26:41 Exception: APT update failed
05:26:41 command terminated with exit code 1

Test Activity
-------------
N/A

Workaround
----------
N/A

Revision history for this message
Davlet Panech (dpanech) wrote :

Problem seems to be triggered by the existence of '/var/cache/apt/archives/partial' directory, owned by "apt", which is not world-readable.

Revision history for this message
ZhangXiao (zhangxiao-windriver) wrote :

Here are two keys:
1) 05:26:41 rm: cannot remove '/var/cache/apt/archives/partial/*.deb': Permission denied
2) 05:26:41 raise FetchFailedException()
   05:26:41 apt.cache.FetchFailedException

For the first one, the "Permission denied". It is just a "bug" or "warning" message from python module apt and apt-pkg. It is harmless. We can just ignore it.

For the second one, the "Fetch error". That should be a occasionally network issue that make the meta file fetch failure. Like the same failure when we run `sudo apt update` manually.

To fix/workaround it, I think we can:
A) Use host's apt cach directly;
Container build is also a Debian bullseye system, the upstream repository "bullseye" is already exist in its "/etc/apt/sources.list" and run command `apt update` at the very beginning, thus all information we needed is already exist in its apt cache.

This way, we can avoid the apt.update process, both 1) and 2) can be avoided.

Risk: In case we added more and more resources into "builder"'s sources.list, many other package info been added into its cache, in case there are packages contain different info(build depend & runtime depend) been selected, it may lead to a wrong builder order. Of cause we can avoid it with /etc/apt/pereferaeces but we can't avoid developers modify the sources.list manually thus make the pre-defined perference-policy invalid. :-(

B) Add "try exception" on apt.update() in our python code.
It can reduce 2) while use use for 1).

Revision history for this message
ZhangXiao (zhangxiao-windriver) wrote (last edit ):

The "Permission denied" message is caused by script file "/etc/apt/apt.conf.d/docker-clean":
...
$ cat /etc/apt/apt.conf.d/docker-clean
# Since for most Docker users, package installs happen in "docker build" steps,
# they essentially become individual layers due to the way Docker handles
# layering, especially using CoW filesystems. What this means for us is that
# the caches that APT keeps end up just wasting space in those layers, making
# our layers unnecessarily large (especially since we'll normally never use
# these caches again and will instead just "docker build" again and make a brand
# new image).

# Ideally, these would just be invoking "apt-get clean", but in our testing,
# that ended up being cyclic and we got stuck on APT's lock, so we get this fun
# creation that's essentially just "apt-get clean".
DPkg::Post-Invoke { "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true"; };
APT::Update::Post-Invoke { "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true"; };

Dir::Cache::pkgcache "";
Dir::Cache::srcpkgcache "";

# Note that we do realize this isn't the ideal way to do this, and are always
# open to better suggestions (https://github.com/debuerreotype/debuerreotype/issues).
...

"APT::Update::Post-Invoke" will be spawned after the "upgrade" operation by "apt_pkg". Here it will always "rm -f /var/cache/apt/archives/partial/*.deb" no matter how we set the "rootdir". Thus caused the error message "Permission denied". It is harmless.

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.8.0 stx.build stx.debian
Changed in starlingx:
assignee: nobody → ZhangXiao (zhangxiao-windriver)
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to root (master)

Reviewed: https://review.opendev.org/c/starlingx/root/+/848876
Committed: https://opendev.org/starlingx/root/commit/9d60e85fcd17ba83fe22be254fbabe2668a2384d
Submitter: "Zuul (22348)"
Branch: master

commit 9d60e85fcd17ba83fe22be254fbabe2668a2384d
Author: Zhang Xiao <email address hidden>
Date: Wed Jul 6 23:00:42 2022 +0800

    Debian: dsc_depend: Support using host's sources.list

    In case the host is a bullseye Debian system, we can use its
    apt cache directly. To save build time.

    Test Plan:
      - Pass: Use host's sources.list, get the same build order.

    Story: 2008862
    Task: 45335
    Closes-Bug: #1981094

    Signed-off-by: Zhang Xiao <email address hidden>
    Change-Id: If582cdef9fb4c7fbdc47b573b39d6d0c92ca1003

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.