Reliable network connectivity for apt-daily
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
systemd |
New
|
Unknown
|
|||
apt (Ubuntu) |
Fix Released
|
High
|
Julian Andres Klode |
Bug Description
[Impact]
apt-daily.service is launched by a timer that depends on network-
At boot that is mostly sufficient for it to have network online, but it does not seem to work all the time, and we might be disagreeing with network-manager and friends what online state means.
At resume time, network-
[Proposed solution]
Introduce a new apt-helper wait-online that waits for the machine being online, using both network-manager and systemd-networkd helpers. If the service is active, we use the respective online wait helper to wait for it to signal onlineness. Once all helpers have reported onlineness, we continue.
[Original proposal, to be done later]
original plan:
It tries to connect() to remote hosts specified in sources.list until one connection works or a TIMEOUT is reached. The proposed algorithm looks something like this:
while (time elapsed < TIMEOUT):
for each entry:
host = getaddrinfo()
if host failed:
continue
fd = connect to it
if fd is invalid:
continue
all fds += fd
if poll(all fds, 100 ms timeout) finds a connected one:
exit(0)
exit(42) # timeout
There are two things to consider:
* getaddrinfo() and connect() may fail if network is not up yet, so we need to retry (we might need to sleep somewhere)
* If poll() fails, we likely sleep enough, so no extra sleep needed.
I believe the time out should be something like 30s.
On the systemd service side, we add:
ExecStartPre=
RestartForceE
RestartSec=15m
To retry the service after 15 minutes.
[Test case]
* Start apt-daily.service after turning off network -> It should wait (in ExecStartPre)
* Turn on network -> apt-daily.service should start
[Regression potential]
There might be increased I/O activity after resume, if that did not work before. The helper is launched in an ExecStartPre unit and failures are marked as ignored by "-". systemd automatically kills all ExecStartPre processes when the main ExecStartPre process exits, so there is no chance of ending with some child process still running.
Changed in apt (Ubuntu): | |
assignee: | nobody → Julian Andres Klode (juliank) |
description: | updated |
Changed in apt (Ubuntu): | |
status: | New → Triaged |
importance: | Undecided → High |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in apt (Ubuntu): | |
status: | In Progress → Fix Committed |
Changed in systemd: | |
status: | Unknown → New |
This sort of depends on https:/ /github. com/systemd/ systemd/ issues/ 2582 as we can't restart oneshot units apparently. In the meantime, maybe we could pull the service in on resumes, and use a long time out for the helper, with it retrying until the timeout happens (like an hour or so).
Ideas welcome.