Comment 11 for bug 2003851

Revision history for this message
anarcat (anarcat) wrote :

We're seeing a similar issue here. At first we thought it was an issue specific to a prometheus collector (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1028212 / https://github.com/prometheus-community/node-exporter-textfile-collector-scripts/issues/179) but now that I see this bug report, I can't help but think this is an issue in apt itself.

I should also mention this seems like a regression between bullseye and bookworm. For context, we have a nightly job that runs apt-update here as well, a home-grown (`dsa-update-apt-status`, from DSA) script that spews warnings through cron when there's an issue. In my mailbox where I track those, I do have instances of this before we started the bookworm upgrade, but those were rare. Starting from the beginning of our bookworm deployment though, we are seeing more and more of this as we upgrade machines over. We're now seeing daily warnings, as the `dsa-update-apt-status` runs into lock contention with another job (`apt_info.py` from the above collector, runs every 15m) more frequently. We were getting daily warnings from the fleet, all bookworm machines, with locks sometimes being held for hours.

Our current workaround has been to set a time limit to the `apt_info.py` job, but we're *still* seeing errors, which is interesting in itself, as it means the issue is *not* specific to that script: it's a global apt issue. We've had unattended-upgrades.py hanging forever as well now, which we've never seen before.

So I think this is an apt issue. Perhaps Acquire::http::Timeout=120 is a valid workaround, but I can't help but think this is an issue that was specifically introduced between bullseye and bookworm (2.2 vs 2.6).