unattended-upgrades is running forever

Bug #2063094 reported by Brian Murray
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
unattended-upgrades (Ubuntu)
Confirmed
High
Unassigned

Bug Description

  28433 root 39 19 354904 111480 71296 R 99.7 0.7 107:29.26 unattended-upgr

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: unattended-upgrades 2.9.1+nmu4ubuntu1
ProcVersionSignature: Ubuntu 6.8.0-28.28-generic 6.8.1
Uname: Linux 6.8.0-28-generic x86_64
NonfreeKernelModules: zfs
ApportVersion: 2.28.0-0ubuntu1
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Date: Mon Apr 22 10:27:34 2024
InstallationDate: Installed on 2022-01-07 (836 days ago)
InstallationMedia: Ubuntu 21.10 "Impish Indri" - Release amd64 (20211012)
PackageArchitecture: all
SourcePackage: unattended-upgrades
UpgradeStatus: Upgraded to noble on 2024-04-10 (12 days ago)

Revision history for this message
Brian Murray (brian-murray) wrote :
Revision history for this message
Brian Murray (brian-murray) wrote :

It also seemed rather unkillable.

tags: added: rls-nn-incoming
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in unattended-upgrades (Ubuntu):
status: New → Confirmed
Changed in unattended-upgrades (Ubuntu):
importance: Undecided → High
tags: added: rls-oo-incoming
Revision history for this message
Sebastien Bacher (seb128) wrote :

I'm seeing that bug as well on a Noble system, the log shows that it installed an openjdk updates some hours ago, no activity since but it's burning CPU

trying to attach gdb to the process

0x000077fdc49445f1 in ?? () from target:/lib/x86_64-linux-gnu/libapt-pkg.so.6.0
(gdb) bt
#0 0x000077fdc49445f1 in ?? () from target:/lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#1 0x000077fdc494671a in pkgDepCache::MarkRequired(pkgDepCache::InRootSetFunc&) ()
   from target:/lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#2 0x000077fdc4946a55 in pkgDepCache::MarkAndSweep(pkgDepCache::InRootSetFunc&) ()
   from target:/lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#3 0x000077fdc493105b in pkgDepCache::ActionGroup::~ActionGroup() () from target:/lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#4 0x000077fdc4938a62 in pkgDepCache::MarkInstall(pkgCache::PkgIterator const&, bool, unsigned long, bool, bool) ()
   from target:/lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#5 0x000077fdc4a2c0da in ?? () from target:/usr/lib/python3/dist-packages/apt_pkg.cpython-312-x86_64-linux-gnu.so

Revision history for this message
Julian Andres Klode (juliank) wrote :

We're still going to need a couple more snapshots of backtraces - it's important to check if we ever return from pkgDepCache::MarkInstall() because I do not see an infinite loop in unattended-upgrades

Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote :

Calling "strace" and "perf record" against the process would also help debugging it.

Revision history for this message
Adrien Nader (adrien) wrote :
Download full text (9.7 KiB)

I'm seeing very different traces. Sorry. :)

I have processes which appear stuck and which have been using more than 11 weeks of CPU time over the past 132 days. No idea if it was in only one run (I think I looked at CPU time less than 132 days ago, and even less than 11 weeks ago, so it probably hasn't been stuck for 11 weeks). This happens in three LXC containers which run debian stable. It seems I ran out of space on the containers, but definitely not 11 weeks ago, maybe two weeks ago at most.

Traces I see:

  Thread 1 (Thread 0x7f9ca69f4040 (LWP 115) "unattended-upgr"):
  #0 0x00007f9ca6a7ad26 in _xbegin () at /usr/lib/gcc/x86_64-linux-gnu/12/include/rtmintrin.h:52
  #1 __GI___lll_trylock_elision (futex=0x2b0c9a0, adapt_count=0x89) at ../sysdeps/unix/sysv/linux/x86/elision-trylock.c:46
  #2 0x0000000000a840d8 in _PyRuntime ()
  #3 0x000000000058a52a in PyArg_ParseTupleAndKeywords (args=<optimized out>, keywords=<optimized out>, format=<optimized out>, kwlist=<optimized out>) at ../Python/getargs.c:1394
  #4 0x000000000058a245 in acquire_timed (timeout=<optimized out>, lock=0x7fffcf0f55f0) at ../Modules/_threadmodule.c:98
  #5 lock_PyThread_acquire_lock (self=0x2b0c9a0, args=<optimized out>, kwds=<optimized out>) at ../Modules/_threadmodule.c:179
  #6 0x0000000000550bbe in method_vectorcall_VARARGS_KEYWORDS (func=0x7f9ca6976f70, args=0x7f9ca6cff398, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/descrobject.c:364
  #7 0x000000000053ac2c in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=0x7f9ca6976f70, tstate=0xa840d8 <_PyRuntime+166328>) at ../Include/internal/pycore_call.h:92
  #8 PyObject_Vectorcall (callable=0x7f9ca6976f70, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:299
  #9 0x000000000052b940 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:4772
  #10 0x000000000052360b in _PyEval_EvalFrame (throwflag=0, frame=0x7f9ca6cff020, tstate=0xa840d8 <_PyRuntime+166328>) at ../Include/internal/pycore_ceval.h:73
  #11 _PyEval_Vector (args=0x0, argcount=0, kwnames=0x0, locals=<optimized out>, func=0x7f9ca69ba020, tstate=0xa840d8 <_PyRuntime+166328>) at ../Python/ceval.c:6435
  #12 PyEval_EvalCode (co=0x2959f10, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:1154
  #13 0x0000000000647497 in run_eval_code_obj (tstate=0xa840d8 <_PyRuntime+166328>, co=0x2959f10, globals=0x7f9ca69dec80, locals=0x7f9ca69dec80) at ../Python/pythonrun.c:1714
  #14 0x0000000000644d4f in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7f9ca69dec80, locals=0x7f9ca69dec80, flags=<optimized out>, arena=<optimized out>) at ../Python/pythonrun.c:1735
  #15 0x0000000000651010 in pyrun_file (fp=fp@entry=0x2906830, filename=filename@entry=0x7f9ca6918f80, start=start@entry=257, globals=globals@entry=0x7f9ca69dec80, locals=locals@entry=0x7f9ca69dec80, closeit=closeit@entry=1, flags=0x7fffcf0f5a68) at ../Python/pythonrun.c:1630
  #16 0x0000000000650d5b in _PyRun_SimpleFileObject (fp=fp@entry=0x2906830, filename=filename@entry=0x7f9ca6...

Read more...

Revision history for this message
Adrien Nader (adrien) wrote :

I have 10 backtraces with python3-dbg installed and therefore more complete backtraces (but I wouldn't call them very readable). I'm going to attach them here (.tar.xz archive which extracts in the current directory).

Revision history for this message
Adrien Nader (adrien) wrote :

Also, note that out of my three stuck processes, it seems got unstuck. Can't tell why however.

I also talked about running out of space but I don't think that was the issue: it was the reason I looked at the machine but it wasn't related to unattended-upgrades.

Benjamin Drung (bdrung)
tags: added: foundations-bugs
removed: rls-nn-incoming rls-oo-incoming
Revision history for this message
Adrien Nader (adrien) wrote :

Attaching 20 seconds of strace output for my case.

This actually lead me to look at /var/log/unattended-upgrades/unattended-upgrades-shutdown.log and I Noticed the following:

2024-01-18 15:01:16,996 WARNING - SIGTERM or SIGHUP received, stopping unattended-upgrades only if it is running
2024-01-18 15:01:22,365 WARNING - Could not get delay inhibitor lock
2024-01-18 15:01:24,367 WARNING - Unable to monitor PrepareForShutdown() signal, polling instead.
2024-01-18 15:01:24,367 WARNING - To enable monitoring the PrepareForShutdown() signal instead of polling please install the python3-gi package

I also have that in my journal logs:

systemd[1]: Starting systemd-logind.service - User Login Management...
(d-logind)[1994541]: systemd-logind.service: Failed to set up mount namespacing: Permission denied
(d-logind)[1994541]: systemd-logind.service: Failed at step NAMESPACE spawning /lib/systemd/systemd-logind: Permission denied
systemd[1]: systemd-logind.service: Main process exited, code=exited, status=226/NAMESPACE
systemd[1]: systemd-logind.service: Failed with result 'exit-code'.
systemd[1]: Failed to start systemd-logind.service - User Login Management.
systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 1.
systemd[1]: Stopped systemd-logind.service - User Login Management.
systemd[1]: <email address hidden> - Load Kernel Module drm was skipped because of an unmet condition check (ConditionCapability=CAP_SYS_MODULE).

and

[system] Failed to activate service 'org.freedesktop.login1': timed out (service_start_timeout=25000ms)
[system] Rejected: destination has a full message queue, 0 matched rules; type="error", sender="(unset)" ((bus)) interface="(unset)" member="(unset)" error name="org.freedesktop.DBus.Error.TimedOut" requested_reply="1" destination=":1.1" (uid=0 pid=115 comm="/usr/bin/python3 /usr/share/unattended-upgrades/un")
(last line is repeated 9 times in total)

Unfortunately I can't really go back and see when this started. Considering the total CPU time and the average CPU usage (which seems to be slightly above 50% in practice, with peaks at 100%), the issue might be present since the machine was booted.

However I now see the link with an issue I've been seeing for a long time on the host in dmesg:

audit: type=1400 audit(1717098847.156:4846706): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default-cgns" name="/" pid=840997 comm="(d-logind)" flags="rw, rslave"

I found some related topics which indicate I'm not entirely alone but I really can't tell if the bug initially reported here and mine are the same. I think the universal teaching here is that there can be loops for many different reasons, including maybe through dbus and service activation.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.