Installation blocks when the machine is behind a proxy server

Bug #1766542 reported by Robert Liu on 2018-04-24
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OEM Priority Project
High
Unassigned
apt (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Unassigned
ubiquity (Ubuntu)
High
Unassigned
Bionic
High
Unassigned

Bug Description

When the machine is behind a proxy server, the installation will block for a while (several minutes) to retrieve the package lists. The timeouts are too long and makes user feels the machine may have some problems.

The symptom is similar with bug #14599, but it seems the apt-setup module was rewritten.

Another method to trigger this issue is to make the machine cannot access to the Internet, for instance: a wrong gateway.

Image: 16.04

Yuan-Chen Cheng (ycheng-twn) wrote :

need test results on 18.04.

Changed in oem-priority:
status: New → Confirmed
importance: Undecided → High
Robert Liu (robertliu) wrote :

Tried the 18.04 daily build (http://cdimage.ubuntu.com/ubuntu/daily-live/20180424/bionic-desktop-amd64.iso), and it has the same issue.

Changed in ubiquity (Ubuntu):
importance: Undecided → High
tags: added: rls-bb-incoming
Steve Langasek (vorlon) wrote :

Robert, can you please provide complete steps to reproduce this bug? My understanding from discussion elsewhere that this is specific to oem-config mode; is that correct?

If your network is incorrectly configured and you have a route to the Internet but traffic is dropped, it is expected that an initial connection to the apt mirror will have to hit a tcp connection timeout (by default, ~128 seconds); so that might qualify as "several minutes". What behavior are you seeing that is "too long", and how long do you expect this to take?

Changed in ubiquity (Ubuntu):
status: New → Incomplete
Julian Andres Klode (juliank) wrote :

I've been considering lowering the timeout in apt from 120s to something like 10-30s for a year or two, but there's been some concern about high-latency connections. I personally do not think that 120s is a sensible timeout for one round trip.

There are also several options to prevent this issue on the network side:

- do not provide DNS servers / resolve internet names
- do not drop packets, but reject them
- point a SRV record in your DNS server to a working HTTP host (either serving the archive, or 404, as long as it connects, it should be fine)

We also had the same issue on some IPv6+IPv4 systems where the IPv6 is not reachable, but that's fixed in cosmic with happy eyeballs falling back to the next host within milliseconds (trying more and more hosts in parallel until one works).

Julian Andres Klode (juliank) wrote :

The default time out now is:

- 120s * #ip-addresses before bionic
- 120s + 250ms * #ip-addresses for bionic

That's a substantial improvement. I still think we should lower the timeout from 120s to 20s for cosmic, bringing this down to 20s + 250ms * #ipaddr (if we estimate up to 4 ip addresses, it will succeed or fail within 21s).

I'd be open to backporting happy-eyeballs to xenial too, once it has spent some more time in bionic. The changes are local to a single file (methods/connect.cc), but they are quite big relative to that file, so we should be careful.

Julian Andres Klode (juliank) wrote :

OK, so archive.ubuntu.com has 4 AAAA and 4 A entries, meaning that

16.04 waits 16 minutes (8 * 2 minutes)
18.04 waits 2 minutes, 2 seconds (2 minutes + 8 * 250 milliseconds)

Now we can shorten that further to one of the following:

* 12 seconds (10 + 8 * 250 ms)
* 22 seconds (20 + 8 * 250 ms)
* 32 seconds (30 + 8 * 250 ms)

30 seconds seems like a good conservative choice with reasonable behavior.

Robert Liu (robertliu) wrote :

Here is how I setup the environment:
1. prepare a broadband gateway. This time I use LEDE with VirtualBox. I can upload the VM image if necessary.
2. add a firewall rule on the gateway: all http/https traffic from LAN port will be redirected to a IP which is not used by any machine.
3. install Ubuntu and check the behavior.

I can reproduce the issue with this environment. Even though, I'm not sure if this is exactly the same with the customer's environment.

@Steve, Yes, We are using oem-config mode. With the normal mode, because it uses less archives and apt operations, the blocking time is shorter. Please see my following comments, I'll provide my results of the official 16.04 and 18.04 image.

Robert Liu (robertliu) wrote :

Image: ubuntu-16.04.4-desktop-amd64.iso

Significant blockings -
"Retrieving file 1 of 93", 30 seconds
"Retrieving file 1 of 31", 3 minutes

Robert Liu (robertliu) wrote :

Image: ubuntu-18.04-desktop-amd64.iso

Significant blockings -
"Retrieving file 1 of 3", 90 seconds
"Retrieving file 1 of 1", 32 seconds

The behavior is different from 16.04.

Steve Langasek (vorlon) wrote :

> Significant blockings -
> "Retrieving file 1 of 3", 90 seconds
> "Retrieving file 1 of 1", 32 seconds

Does this mean that in 18.04, the full delay you see is 122 seconds?

Is that considered acceptable, or not?

Robert Liu (robertliu) wrote :

Hi @Steve,
IMO, it would be great that the period is less than 30 seconds in total.
In this case, the updating operation is always failed. Users should know that the Internet is not available before setting a proxy server. They may expect to have a minimum timeout or pre-configure the proxy server before installation.

Julian Andres Klode (juliank) wrote :

I don't see how we can achieve an overall timeout. We only have per connect() timeouts available, and we run multiple apt processes / fetches. We can limit the individual connect() timeout to 30s, but overall? I think we do one attempt for archive.ubuntu.com / mirror and a another for security, so we should be looking at a one minute timeout overall.

In fact, apt-setup already sets a timeout of 30 seconds for the updates. Now about the 16.04 log: I'm not sure about networking, is that an IPv6-only network? It times out for security.ubuntu.com after 3 minutes, which is expected for 6 addresses (and security.u.c resolves to 6 IPv4 and 6 IPv6 addresses).

With 18.04 apt, this should time out in roughly 32 seconds (6 , which seems to match the second 18.04 result. Not sure why it times out in 90 seconds of file 1 of 3, though. Do you have logs?

Robert Liu (robertliu) wrote :

@Julian,

Here is the archive of logs. Sorry I forgot to attach it before.

Julian Andres Klode (juliank) wrote :

It seems we are experiencing a bug in apt in bionic WRT the 90s timeout. With happy eyeballs, we added a new place where stuff can timeout; and we did not mark these IP addresses as failed, hence they were always retried, and thus you saw it attempting to connect 3 times to tw.archive.ubuntu.com.

With that fixed, we should be down to 30s instead of 90s.

That said, I cannot reproduce that as connect() fails for me with "No Route to Host" after 3 seconds, so I might have missed something.

Changed in apt (Ubuntu):
status: New → Triaged
Julian Andres Klode (juliank) wrote :

I don't think we can realistically go lower than 2x30s - I think the intention is to add the mirror and security separately and comment them out if they don't work or something, so they need to be in seperate apt runs. Unless, of course, we write a tool in python or something that does update there and inspects failures and checks which hosts errored out; then we can do it in one.

Julian Andres Klode (juliank) wrote :

For the apt side, that's the first commit in

https://salsa.debian.org/apt-team/apt/merge_requests/18/

I'll also upload that to bionic eventually; but xenial would also need the happy eyeballs changes, which were a bit large (although, isolated).

tags: added: id-5af9ea356db8cb2d4eb3d4e7
Changed in apt (Ubuntu):
status: Triaged → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package apt - 1.7.0~alpha0ubuntu2

---------------
apt (1.7.0~alpha0ubuntu2) cosmic; urgency=medium

  [ David Kalnischkies ]
  * Add boilerplate plural form to po/apt-all.pot
  * don't try SRV requests based on IP addresses
  * use 127.0.0.1 instead of localhost as default Tor proxy
  * Extend apt build-dep pkg/release to switch dep as needed
  * Support release selector for volatile files as well
  * Start pkg records for deb files with dpkg output
  * Deprectate buggy/incorrect Rls/PkgFile::IsOk methods
  * Support --with-source in show & search commands
  * Support local files as arguments in show command (Closes: 883206)
  * Drop alternative URIs we got a hash-based fail from
  * Handle by-hash URI construction more centrally
  * Don't force the same mirror for by-hash URIs
  * Reword error for timed out read/write on SOCKS proxy (Closes: #898886)
  * Don't show acquire warning for "hidden" components (Closes: #879591)
  * Use a steady clock source for progress reporting
  * Use steady clock source for bandwidth limitation

  [ Filipe Brandenburger ]
  * Update .gitignore
  * Increase debug verbosity in `apt-get autoremove`
  * Extend test-apt-get-autoremove to check debug output

  [ Julian Andres Klode ]
  * tests: Do not expect requested-by if sudo was invoked by root
  * Run tests on GitLab CI
  * Handle a missed case of timed out ip addresses (LP: #1766542)
  * Lower default timeout from 120s to 30s
  * apt-key: Pass all instead of gpg-agent to gpgconf --kill (LP: #1773992)
  * Fix lock counting in debSystem

  [ annadane ]
  * Add verb 'be' to NEWS entry for 1.5~beta1 (Closes: 892792)

  [ Алексей Шилин ]
  * Russian program translation update (Closes: 898797)

  [ Frans Spiesschaert ]
  * Dutch program translation update (Closes: #900589)
  * Dutch manpage translation update (Closes: #900602)

 -- Julian Andres Klode <email address hidden> Tue, 19 Jun 2018 17:12:51 +0200

Changed in apt (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers