Activity log for bug #1699850

Date Who What changed Old value New value Message
2017-06-22 16:49:17 Julian Andres Klode bug added bug
2017-06-22 16:49:29 Julian Andres Klode apt (Ubuntu): assignee Julian Andres Klode (juliank)
2017-06-22 16:51:20 Julian Andres Klode description [Impact] apt-daily.service is launched by a timer that depends on network-online.target (after the fixes for 1686470 are in everywhere) At boot that is mostly sufficient for it to have network online, but it does not seem to work all the time, and we might be disagreeing with network-manager and friends what online state means. At resume time, network-online.target is still active, so the service is started as soon as possible when it tries to catch up. Depending on the timing, the network connectivity might not be there yet, and it will fail and only retry 12 hours later. [Proposed solution] Introduce a new apt-helper wait-online that tries to connect() to remote hosts specified in sources.list until one connection works or a TIMEOUT is reached. The proposed algorithm looks something like this: while (time elapsed < TIMEOUT): for each entry: host = gethostbyname() if host failed: continue fd = connect to it if fd is invalid: continue all fds += fd if poll(all fds, 100 ms timeout) finds a connected one: exit(0) exit(42) # timeout There are two things to consider: * gethostbyname() and connect() may fail if network is not up yet, so we need to retry (we might need to sleep somewhere) * If poll() fails, we likely sleep enough, so no extra sleep needed. I believe the time out should be something like 30s. On the systemd service side, we add: RestartForceExitStatus=42 RestartSec=15m To retry the service after 15 minutes. [Impact] apt-daily.service is launched by a timer that depends on network-online.target (after the fixes for 1686470 are in everywhere) At boot that is mostly sufficient for it to have network online, but it does not seem to work all the time, and we might be disagreeing with network-manager and friends what online state means. At resume time, network-online.target is still active, so the service is started as soon as possible when it tries to catch up. Depending on the timing, the network connectivity might not be there yet, and it will fail and only retry 12 hours later. [Proposed solution] Introduce a new apt-helper wait-online that tries to connect() to remote hosts specified in sources.list until one connection works or a TIMEOUT is reached. The proposed algorithm looks something like this: while (time elapsed < TIMEOUT):   for each entry:     host = gethostbyname()     if host failed:       continue     fd = connect to it     if fd is invalid:       continue     all fds += fd     if poll(all fds, 100 ms timeout) finds a connected one:       exit(0) exit(42) # timeout There are two things to consider: * gethostbyname() and connect() may fail if network is not up yet, so we need to retry (we might need to sleep somewhere) * If poll() fails, we likely sleep enough, so no extra sleep needed. I believe the time out should be something like 30s. On the systemd service side, we add: ExecStartPre=/usr/lib/apt/apt-helper wait-online   RestartForceExitStatus=42   RestartSec=15m To retry the service after 15 minutes. [Test case] * Start apt-daily.service after turning off network -> It should wait (in ExecStartPre) * Turn on network -> apt-daily.service should start [Regression potential] There might be increased I/O activity after resume, if that did not work before.
2017-06-22 16:51:48 Julian Andres Klode apt (Ubuntu): status New Triaged
2017-06-22 16:52:02 Julian Andres Klode apt (Ubuntu): importance Undecided High
2017-06-22 16:52:11 Julian Andres Klode description [Impact] apt-daily.service is launched by a timer that depends on network-online.target (after the fixes for 1686470 are in everywhere) At boot that is mostly sufficient for it to have network online, but it does not seem to work all the time, and we might be disagreeing with network-manager and friends what online state means. At resume time, network-online.target is still active, so the service is started as soon as possible when it tries to catch up. Depending on the timing, the network connectivity might not be there yet, and it will fail and only retry 12 hours later. [Proposed solution] Introduce a new apt-helper wait-online that tries to connect() to remote hosts specified in sources.list until one connection works or a TIMEOUT is reached. The proposed algorithm looks something like this: while (time elapsed < TIMEOUT):   for each entry:     host = gethostbyname()     if host failed:       continue     fd = connect to it     if fd is invalid:       continue     all fds += fd     if poll(all fds, 100 ms timeout) finds a connected one:       exit(0) exit(42) # timeout There are two things to consider: * gethostbyname() and connect() may fail if network is not up yet, so we need to retry (we might need to sleep somewhere) * If poll() fails, we likely sleep enough, so no extra sleep needed. I believe the time out should be something like 30s. On the systemd service side, we add: ExecStartPre=/usr/lib/apt/apt-helper wait-online   RestartForceExitStatus=42   RestartSec=15m To retry the service after 15 minutes. [Test case] * Start apt-daily.service after turning off network -> It should wait (in ExecStartPre) * Turn on network -> apt-daily.service should start [Regression potential] There might be increased I/O activity after resume, if that did not work before. [Impact] apt-daily.service is launched by a timer that depends on network-online.target (after the fixes for bug 1686470 are in everywhere) At boot that is mostly sufficient for it to have network online, but it does not seem to work all the time, and we might be disagreeing with network-manager and friends what online state means. At resume time, network-online.target is still active, so the service is started as soon as possible when it tries to catch up. Depending on the timing, the network connectivity might not be there yet, and it will fail and only retry 12 hours later. [Proposed solution] Introduce a new apt-helper wait-online that tries to connect() to remote hosts specified in sources.list until one connection works or a TIMEOUT is reached. The proposed algorithm looks something like this: while (time elapsed < TIMEOUT):   for each entry:     host = gethostbyname()     if host failed:       continue     fd = connect to it     if fd is invalid:       continue     all fds += fd     if poll(all fds, 100 ms timeout) finds a connected one:       exit(0) exit(42) # timeout There are two things to consider: * gethostbyname() and connect() may fail if network is not up yet, so we need to retry (we might need to sleep somewhere) * If poll() fails, we likely sleep enough, so no extra sleep needed. I believe the time out should be something like 30s. On the systemd service side, we add:   ExecStartPre=/usr/lib/apt/apt-helper wait-online   RestartForceExitStatus=42   RestartSec=15m To retry the service after 15 minutes. [Test case] * Start apt-daily.service after turning off network -> It should wait (in ExecStartPre) * Turn on network -> apt-daily.service should start [Regression potential] There might be increased I/O activity after resume, if that did not work before.
2017-06-22 17:01:29 Julian Andres Klode description [Impact] apt-daily.service is launched by a timer that depends on network-online.target (after the fixes for bug 1686470 are in everywhere) At boot that is mostly sufficient for it to have network online, but it does not seem to work all the time, and we might be disagreeing with network-manager and friends what online state means. At resume time, network-online.target is still active, so the service is started as soon as possible when it tries to catch up. Depending on the timing, the network connectivity might not be there yet, and it will fail and only retry 12 hours later. [Proposed solution] Introduce a new apt-helper wait-online that tries to connect() to remote hosts specified in sources.list until one connection works or a TIMEOUT is reached. The proposed algorithm looks something like this: while (time elapsed < TIMEOUT):   for each entry:     host = gethostbyname()     if host failed:       continue     fd = connect to it     if fd is invalid:       continue     all fds += fd     if poll(all fds, 100 ms timeout) finds a connected one:       exit(0) exit(42) # timeout There are two things to consider: * gethostbyname() and connect() may fail if network is not up yet, so we need to retry (we might need to sleep somewhere) * If poll() fails, we likely sleep enough, so no extra sleep needed. I believe the time out should be something like 30s. On the systemd service side, we add:   ExecStartPre=/usr/lib/apt/apt-helper wait-online   RestartForceExitStatus=42   RestartSec=15m To retry the service after 15 minutes. [Test case] * Start apt-daily.service after turning off network -> It should wait (in ExecStartPre) * Turn on network -> apt-daily.service should start [Regression potential] There might be increased I/O activity after resume, if that did not work before. [Impact] apt-daily.service is launched by a timer that depends on network-online.target (after the fixes for bug 1686470 are in everywhere) At boot that is mostly sufficient for it to have network online, but it does not seem to work all the time, and we might be disagreeing with network-manager and friends what online state means. At resume time, network-online.target is still active, so the service is started as soon as possible when it tries to catch up. Depending on the timing, the network connectivity might not be there yet, and it will fail and only retry 12 hours later. [Proposed solution] Introduce a new apt-helper wait-online that tries to connect() to remote hosts specified in sources.list until one connection works or a TIMEOUT is reached. The proposed algorithm looks something like this: while (time elapsed < TIMEOUT):   for each entry:     host = getaddrinfo()     if host failed:       continue     fd = connect to it     if fd is invalid:       continue     all fds += fd     if poll(all fds, 100 ms timeout) finds a connected one:       exit(0) exit(42) # timeout There are two things to consider: * getaddrinfo() and connect() may fail if network is not up yet, so we need to retry (we might need to sleep somewhere) * If poll() fails, we likely sleep enough, so no extra sleep needed. I believe the time out should be something like 30s. On the systemd service side, we add:   ExecStartPre=/usr/lib/apt/apt-helper wait-online   RestartForceExitStatus=42   RestartSec=15m To retry the service after 15 minutes. [Test case] * Start apt-daily.service after turning off network -> It should wait (in ExecStartPre) * Turn on network -> apt-daily.service should start [Regression potential] There might be increased I/O activity after resume, if that did not work before.
2017-07-11 16:25:15 Julian Andres Klode bug watch added https://github.com/systemd/systemd/issues/2582
2017-07-11 16:25:15 Julian Andres Klode bug task added systemd
2017-09-09 16:10:07 Julian Andres Klode description [Impact] apt-daily.service is launched by a timer that depends on network-online.target (after the fixes for bug 1686470 are in everywhere) At boot that is mostly sufficient for it to have network online, but it does not seem to work all the time, and we might be disagreeing with network-manager and friends what online state means. At resume time, network-online.target is still active, so the service is started as soon as possible when it tries to catch up. Depending on the timing, the network connectivity might not be there yet, and it will fail and only retry 12 hours later. [Proposed solution] Introduce a new apt-helper wait-online that tries to connect() to remote hosts specified in sources.list until one connection works or a TIMEOUT is reached. The proposed algorithm looks something like this: while (time elapsed < TIMEOUT):   for each entry:     host = getaddrinfo()     if host failed:       continue     fd = connect to it     if fd is invalid:       continue     all fds += fd     if poll(all fds, 100 ms timeout) finds a connected one:       exit(0) exit(42) # timeout There are two things to consider: * getaddrinfo() and connect() may fail if network is not up yet, so we need to retry (we might need to sleep somewhere) * If poll() fails, we likely sleep enough, so no extra sleep needed. I believe the time out should be something like 30s. On the systemd service side, we add:   ExecStartPre=/usr/lib/apt/apt-helper wait-online   RestartForceExitStatus=42   RestartSec=15m To retry the service after 15 minutes. [Test case] * Start apt-daily.service after turning off network -> It should wait (in ExecStartPre) * Turn on network -> apt-daily.service should start [Regression potential] There might be increased I/O activity after resume, if that did not work before. [Impact] apt-daily.service is launched by a timer that depends on network-online.target (after the fixes for bug 1686470 are in everywhere) At boot that is mostly sufficient for it to have network online, but it does not seem to work all the time, and we might be disagreeing with network-manager and friends what online state means. At resume time, network-online.target is still active, so the service is started as soon as possible when it tries to catch up. Depending on the timing, the network connectivity might not be there yet, and it will fail and only retry 12 hours later. [Proposed solution] Introduce a new apt-helper wait-online that waits for the machine being online, using both network-manager and systemd-networkd helpers. If the service is active, we use the respective online wait helper to wait for it to signal onlineness. Once all helpers have reported onlineness, we continue. [Original proposal, to be done later] original plan: It tries to connect() to remote hosts specified in sources.list until one connection works or a TIMEOUT is reached. The proposed algorithm looks something like this: while (time elapsed < TIMEOUT):   for each entry:     host = getaddrinfo()     if host failed:       continue     fd = connect to it     if fd is invalid:       continue     all fds += fd     if poll(all fds, 100 ms timeout) finds a connected one:       exit(0) exit(42) # timeout There are two things to consider: * getaddrinfo() and connect() may fail if network is not up yet, so we need to retry (we might need to sleep somewhere) * If poll() fails, we likely sleep enough, so no extra sleep needed. I believe the time out should be something like 30s. On the systemd service side, we add:   ExecStartPre=/usr/lib/apt/apt-helper wait-online   RestartForceExitStatus=42   RestartSec=15m To retry the service after 15 minutes. [Test case] * Start apt-daily.service after turning off network -> It should wait (in ExecStartPre) * Turn on network -> apt-daily.service should start [Regression potential] There might be increased I/O activity after resume, if that did not work before. The helper is launched in an ExecStartPre unit and failures are marked as ignored by "-".
2017-09-09 16:10:49 Julian Andres Klode description [Impact] apt-daily.service is launched by a timer that depends on network-online.target (after the fixes for bug 1686470 are in everywhere) At boot that is mostly sufficient for it to have network online, but it does not seem to work all the time, and we might be disagreeing with network-manager and friends what online state means. At resume time, network-online.target is still active, so the service is started as soon as possible when it tries to catch up. Depending on the timing, the network connectivity might not be there yet, and it will fail and only retry 12 hours later. [Proposed solution] Introduce a new apt-helper wait-online that waits for the machine being online, using both network-manager and systemd-networkd helpers. If the service is active, we use the respective online wait helper to wait for it to signal onlineness. Once all helpers have reported onlineness, we continue. [Original proposal, to be done later] original plan: It tries to connect() to remote hosts specified in sources.list until one connection works or a TIMEOUT is reached. The proposed algorithm looks something like this: while (time elapsed < TIMEOUT):   for each entry:     host = getaddrinfo()     if host failed:       continue     fd = connect to it     if fd is invalid:       continue     all fds += fd     if poll(all fds, 100 ms timeout) finds a connected one:       exit(0) exit(42) # timeout There are two things to consider: * getaddrinfo() and connect() may fail if network is not up yet, so we need to retry (we might need to sleep somewhere) * If poll() fails, we likely sleep enough, so no extra sleep needed. I believe the time out should be something like 30s. On the systemd service side, we add:   ExecStartPre=/usr/lib/apt/apt-helper wait-online   RestartForceExitStatus=42   RestartSec=15m To retry the service after 15 minutes. [Test case] * Start apt-daily.service after turning off network -> It should wait (in ExecStartPre) * Turn on network -> apt-daily.service should start [Regression potential] There might be increased I/O activity after resume, if that did not work before. The helper is launched in an ExecStartPre unit and failures are marked as ignored by "-". [Impact] apt-daily.service is launched by a timer that depends on network-online.target (after the fixes for bug 1686470 are in everywhere) At boot that is mostly sufficient for it to have network online, but it does not seem to work all the time, and we might be disagreeing with network-manager and friends what online state means. At resume time, network-online.target is still active, so the service is started as soon as possible when it tries to catch up. Depending on the timing, the network connectivity might not be there yet, and it will fail and only retry 12 hours later. [Proposed solution] Introduce a new apt-helper wait-online that waits for the machine being online, using both network-manager and systemd-networkd helpers. If the service is active, we use the respective online wait helper to wait for it to signal onlineness. Once all helpers have reported onlineness, we continue. [Original proposal, to be done later] original plan: It tries to connect() to remote hosts specified in sources.list until one connection works or a TIMEOUT is reached. The proposed algorithm looks something like this: while (time elapsed < TIMEOUT):   for each entry:     host = getaddrinfo()     if host failed:       continue     fd = connect to it     if fd is invalid:       continue     all fds += fd     if poll(all fds, 100 ms timeout) finds a connected one:       exit(0) exit(42) # timeout There are two things to consider: * getaddrinfo() and connect() may fail if network is not up yet, so we need to retry (we might need to sleep somewhere) * If poll() fails, we likely sleep enough, so no extra sleep needed. I believe the time out should be something like 30s. On the systemd service side, we add:   ExecStartPre=/usr/lib/apt/apt-helper wait-online   RestartForceExitStatus=42   RestartSec=15m To retry the service after 15 minutes. [Test case] * Start apt-daily.service after turning off network -> It should wait (in ExecStartPre) * Turn on network -> apt-daily.service should start [Regression potential] There might be increased I/O activity after resume, if that did not work before. The helper is launched in an ExecStartPre unit and failures are marked as ignored by "-". systemd automatically kills all ExecStartPre processes when the main ExecStartPre process exits, so there is no chance of ending with some child process still running.
2017-09-09 16:21:03 Julian Andres Klode apt (Ubuntu): status Triaged In Progress
2017-09-09 20:08:31 Julian Andres Klode apt (Ubuntu): status In Progress Fix Committed
2017-09-10 14:54:47 Launchpad Janitor apt (Ubuntu): status Fix Committed Fix Released
2019-05-17 05:13:38 Bug Watch Updater systemd: status Unknown New