System takes forever trying to contact uncontactable sources during install

Bug #556831 reported by Mario Limonciello
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OEM Priority Project
Fix Released
Critical
Canonical Foundations Team
apt (Ubuntu)
Fix Released
High
Unassigned
apt-setup (Ubuntu)
Triaged
Wishlist
Unassigned

Bug Description

Binary package hint: ubiquity

In doing an automated install, the system is hooked up to a network where it won't be able to contact any network sources. This is causing ubiquity to hang while it "generates" the sources.list.

This is the preseed that was used for installation:
http://linux.dell.com/git/?p=ubuntu-fid.git;a=blob;f=framework/preseed/ubuntu.seed;h=418dbf477ba146c4236f75b49f064909b9c88067;hb=HEAD

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: ubiquity 2.2.14
ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1
Uname: Linux 2.6.32-19-generic i686
NonfreeKernelModules: wl nvidia
Architecture: i386
Date: Tue Apr 6 20:54:18 2010
DistributionChannelDescriptor:
 # This is a distribution channel descriptor
 # For more information see http://wiki.ubuntu.com/DistributionChannelDescriptor
 canonical-oem-dell-lucid-une-20100406-0
InstallationMedia: Ubuntu 10.04 "Lucid" - Build i386 LIVE Binary 20100406-03:46
LiveMediaBuild: Ubuntu 10.04 "Lucid" - Build i386 LIVE Binary 20100406-03:46
ProcEnviron:
 PATH=(custom, no user)
 LANG=C
 SHELL=/bin/bash
SourcePackage: ubiquity

Related branches

Revision history for this message
Mario Limonciello (superm1) wrote :
Changed in oem-priority:
assignee: nobody → Canonical Platform QA Team (canonical-platform-qa)
Revision history for this message
Robbie Williamson (robbiew) wrote :

Raised to "Critical" per info received from Dell.

Changed in oem-priority:
importance: Undecided → Critical
Changed in ubiquity (Ubuntu):
importance: Undecided → High
Changed in oem-priority:
status: New → Triaged
Changed in ubiquity (Ubuntu):
status: New → Triaged
Steve Beattie (sbeattie)
Changed in oem-priority:
assignee: Canonical Platform QA Team (canonical-platform-qa) → Canonical Foundations Team (canonical-foundations)
Revision history for this message
Evan (ev) wrote :

Can you be more specific about when ubiquity is hanging while generating the sources.list. Is this in ubiquity or oem-config? The presence of oem-config.log seems to suggest the latter. Is it just a long delay, or are you not able to get past this point at all?

Thanks Mario.

Revision history for this message
Jerone Young (jerone) wrote :

@Evan.

Posting this for Mario

         The hang happened in Ubiquity (I saw it also) .. he ran OEM config and then filed the bug. I have also seen the issue. Just connect your machine to a router not connected to the net. You will then see the problem.

Revision history for this message
Jerone Young (jerone) wrote :

Business Justification:
           #1 Critical bug for Dell. This bug causes a factory installs that usually takes 15 - 20 minutes, to go to over 1 hour. Dell cannot install Ubuntu in it's factories with this bug.

Revision history for this message
Steve Langasek (vorlon) wrote :

> Just connect your machine to a router not connected to the net.
> You will then see the problem.

Does the router advertise a default route to the system being installed? Does the *router* have a default route?

ubiquity (or probably, apt) should certainly handle a "no route to host" (EHOSTUNREACH) error gracefully; if, however, the network is configured in such a way that it *claims* to have a public Internet route when it doesn't, ubiquity cannot usefully distinguish this case from the case where the network is just Very, Very Slow, and would normally fall back to the TCP protocol-level timeouts.

Revision history for this message
Colin Watson (cjwatson) wrote : Re: [Bug 556831] Re: System takes forever trying to contact uncontactable sources during install

Except that apt is meant to apply that timeout only once per host, and
thereafter assume that the host is bad and give up. This appears to
have regressed from 8.04.

Revision history for this message
Mario Limonciello (superm1) wrote :

If there isn't a good way to be able to gracefully work from this scenario (in the past I think the timeouts were just a lot lower), then it would be preferable to support an optional preseed template to tell it to not try to verify any sources written to sources.list.

From a machine on the same network as this occurs:

$ tracepath archive.ubuntu.com -n
 1: 10.9.160.254 0.153ms pmtu 1500
 1: 10.9.160.1 0.736ms
 2: 10.9.29.2 0.601ms
 3: 10.254.251.246 asymm 2 0.738ms
 4: no reply
 5: no reply
 6: no reply
 7: no reply
 8: no reply
 9: no reply
10: no reply
11: no reply
12: no reply
13: no reply
14: no reply
15: no reply
16: no reply
17: no reply
18: no reply
19: no reply
20: no reply
21: no reply
22: no reply
23: no reply
24: no reply
25: no reply
26: no reply
27: no reply
28: no reply
29: no reply
30: no reply
31: no reply
     Too many hops: pmtu 1500
     Resume: pmtu 1500
$ ping archive.ubuntu.com
PING archive.ubuntu.com (91.189.88.30) 56(84) bytes of data.

--- archive.ubuntu.com ping statistics ---
6 packets transmitted, 0 received, 100% packet loss, time 5009ms

$ /sbin/route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.9.160.0 0.0.0.0 255.255.248.0 U 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
0.0.0.0 10.9.160.1 0.0.0.0 UG 0 0 0 eth0

Revision history for this message
Colin Watson (cjwatson) wrote :

On Mon, Apr 12, 2010 at 11:25:29PM -0000, Mario Limonciello wrote:
> If there isn't a good way to be able to gracefully work from this
> scenario (in the past I think the timeouts were just a lot lower),

The timeouts were raised recently to address a different problem:

apt-setup (1:0.42ubuntu1) lucid; urgency=low
[...]
  * Increase the mirror verification timeout to 30 seconds, now that this
    operation is cancellable. Some people (hi, Ara) seem to have trouble
    getting mirror responses in 10 seconds.
[...]
 -- Colin Watson <email address hidden> Mon, 09 Nov 2009 11:46:22 +0000

However, if this was the only relevant change, then you'd still have
been seeing a delay on the order of 20 minutes, and I'm guessing you
wouldn't have been too happy about that either! Thus, I don't think
mere timeout changes are responsible for this bug.

> then it would be preferable to support an optional preseed template to
> tell it to not try to verify any sources written to sources.list.

We already don't verify any sources by default. The 'apt-get update'
run is not for verification, but to make it possible to install packages
from those sources.

While it's not out of the question to come up with some kind of
workaround, this is going to affect many more people than just Dell,
many of whom aren't going to be familiar with preseeding, and so I'd
much rather we made it behave more sensibly by default - particularly
since we did that for 8.04 LTS and it seems to have regressed.

Revision history for this message
Steve Langasek (vorlon) wrote :

On Mon, Apr 12, 2010 at 11:24:56PM -0000, Colin Watson wrote:
> Except that apt is meant to apply that timeout only once per host, and
> thereafter assume that the host is bad and give up. This appears to
> have regressed from 8.04.

Right, that's the sensible thing to do and we should certainly get this
fixed - but in the *best* case, you're still waiting for 30 seconds or so
for a connection that's going to fail, right? Being able to avoid having to
wait for *any* timeout, either with a preseed like Mario suggests or by
having the network configured accurately, would seem to shave ~3% off the
*pre-regression* install time, which I guess is nothing to sneeze at when
multiplied by N machines.

Mario, any chance that a change to the DHCP server (i.e., not handing out a
default route) would be viable, and give you a short-term workaround while
this is being sorted?

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

Revision history for this message
Mario Limonciello (superm1) wrote : Re: [Bug 556831] Re: System takes forever trying to contact uncontactable sources during install

Hi Steve:

On Mon, Apr 12, 2010 at 19:10, Steve Langasek
<email address hidden>wrote:

> Right, that's the sensible thing to do and we should certainly get this
> fixed - but in the *best* case, you're still waiting for 30 seconds or so
> for a connection that's going to fail, right? Being able to avoid having
> to
> wait for *any* timeout, either with a preseed like Mario suggests or by
> having the network configured accurately, would seem to shave ~3% off the
> *pre-regression* install time, which I guess is nothing to sneeze at when
> multiplied by N machines.
>
> Yes, I think fixing both the original bug that caused this increase in
timeouts from continually failing the same server multiple times and a
workaround preseed to not even attempt an apt-get update if you know it's
going to fail would be a good idea.

> Mario, any chance that a change to the DHCP server (i.e., not handing out a
> default route) would be viable, and give you a short-term workaround while
> this is being sorted?
>

I wish this were doable, but these servers are outside of the control from
my group. As a short term workaround I suppose we can drop the
non-functional route in an early script, but there may be implications for
doing so for parts of the post-install process that used that route to
communicate with process servers.

Revision history for this message
Evan (ev) wrote :

I've so far been unable to reproduce this. I've tested this by starting a live CD in KVM, installing an instrumented copy of apt, disconnecting the host from the network, and then running an apt-get update. The first hit to archive.ubuntu.com times out, subsequent attempts fail immediately, as expected.

Revision history for this message
Mario Limonciello (superm1) wrote :
Download full text (5.9 KiB)

@Evan:

I believe the important part of the scenario is that the IP is resolving though. You can see more specifically from the log the time inbetween each attempted file it's trying to fetch:

Apr 7 05:38:33 ubuntu in-target: Err http://archive.canonical.com lucid Release.gpg
Apr 7 05:38:33 ubuntu in-target: Could not connect to archive.canonical.com:80 (91.189.88.33). - connect (110: Connection timed out)
Apr 7 05:38:54 ubuntu in-target: Ign http://archive.canonical.com lucid Release
Apr 7 05:39:15 ubuntu in-target: Ign http://archive.canonical.com lucid/partner Packages
Apr 7 05:39:36 ubuntu in-target: Ign http://archive.canonical.com lucid/partner Sources
Apr 7 05:39:36 ubuntu in-target: Err http://dz.archive.ubuntu.com lucid Release.gpg
Apr 7 05:39:36 ubuntu in-target: Could not connect to dz.archive.ubuntu.com:80 (91.189.88.46). - connect (110: Connection timed out) [IP: 91.189.88.46 80]
Apr 7 05:39:57 ubuntu in-target: Ign http://archive.canonical.com lucid/partner Packages
Apr 7 05:40:18 ubuntu in-target: Ign http://archive.canonical.com lucid/partner Sources
Apr 7 05:40:39 ubuntu in-target: Err http://archive.canonical.com lucid/partner Packages
Apr 7 05:40:39 ubuntu in-target: Could not connect to archive.canonical.com:80 (91.189.88.33). - connect (110: Connection timed out)
Apr 7 05:41:00 ubuntu in-target: Err http://archive.canonical.com lucid/partner Sources
Apr 7 05:41:00 ubuntu in-target: Could not connect to archive.canonical.com:80 (91.189.88.33). - connect (110: Connection timed out)
Apr 7 05:41:00 ubuntu in-target: Err http://dz.archive.ubuntu.com lucid-updates Release.gpg
Apr 7 05:41:00 ubuntu in-target: Could not connect to dz.archive.ubuntu.com:80 (91.189.88.46). - connect (110: Connection timed out) [IP: 91.189.88.46 80]
Apr 7 05:42:24 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid Release
Apr 7 05:43:48 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid-updates Release
Apr 7 05:45:12 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid/main Packages
Apr 7 05:46:36 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid/restricted Packages
Apr 7 05:48:00 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid/main Sources
Apr 7 05:49:24 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid/restricted Sources
Apr 7 05:50:48 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid/universe Packages
Apr 7 05:52:12 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid/universe Sources
Apr 7 05:53:36 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid/multiverse Packages
Apr 7 05:55:00 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid/multiverse Sources
Apr 7 05:56:24 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid-updates/main Packages
Apr 7 05:57:48 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid-updates/restricted Packages
Apr 7 05:59:12 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid-updates/main Sources
Apr 7 06:00:36 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid-updates/restricted Sources
Apr 7 06:02:00 ubuntu in-target: Ign http://dz.archive.ubuntu.com lucid-updates/universe Packages
Apr 7 06:03:24 u...

Read more...

Revision history for this message
Evan (ev) wrote :

I've sent the following patch to mvo for review:
http://paste.ubuntu.com/414324/

Changed in apt (Ubuntu):
importance: Undecided → High
status: New → Triaged
Changed in ubiquity (Ubuntu):
status: Triaged → Invalid
Revision history for this message
Mario Limonciello (superm1) wrote :

@Evan:
I've rolled an apt package locally with that patch and patched it in during install to see the improvement. It appears to solve the immediate general issue at hand.

Per Steve's comment though - if it's known that it will fail contacting servers, I think it would still be a good idea to try to support a preseed that skips the apt-get update step. That's still a whole minute that is lost on every system installed (30s per server; archive.canonical.com and $CC.archive.ubuntu.com). I'll attach an updated syslog with /usr/lib/apt/methods/http patched.

Revision history for this message
Michael Vogt (mvo) wrote :

Thanks a lot Evan for the fix. I merged it into my branch and it will be part of the next upload.

Changed in apt (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package apt - 0.7.25.3ubuntu7

---------------
apt (0.7.25.3ubuntu7) lucid; urgency=low

  Cherry pick fixes from the lp:~mvo/apt/mvo branch:

  [ Evan Dandrea ]
  * Remember hosts with general failures for
    https://wiki.ubuntu.com/NetworklessInstallationFixes (LP: #556831).

  [ Michael Vogt ]
  * improve debug output for Debug::pkgPackageManager
 -- Michael Vogt <email address hidden> Wed, 14 Apr 2010 20:30:03 +0200

Changed in apt (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Colin Watson (cjwatson) wrote :

Reopening an apt-setup task for Mario's suggestion.

affects: ubiquity (Ubuntu) → apt-setup (Ubuntu)
Changed in apt-setup (Ubuntu):
importance: High → Wishlist
status: Invalid → Triaged
Jerone Young (jerone)
Changed in oem-priority:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.