drop /etc/apt/apt.conf.d/90cloud-init-pipelining in 16.04+

Bug #1794982 reported by Julian Andres Klode
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Medium
Unassigned

Bug Description

/etc/apt/apt.conf.d/90cloud-init-pipelining disables pipelining which causes a significant performance reduction in apt downloads. This should not be necessary in 16.04, as apt can detect broken pipeline responses, fix it, and disable pipelining for the next connection (it can also match the response based on the hashes, rather than just complaining the hashes are wrong).

This is causing a significant performance decrease, as a small sample, firefox in a fresh lxd container:

without pipelining: Fetched 81.1 MB in 6s (13.2 MB/s)
with pipelining: Fetched 81.1 MB in 2s (32.2 MB/s)

(400 Mbit/s connection, 25-30ms RTT, xenial LXD container)

Related bugs:
 * bug 948461: apt-get hashsum/size mismatch because s3 mirrors don't support http pipelining correctly

Related branches

description: updated
Scott Moser (smoser)
description: updated
Revision history for this message
Scott Moser (smoser) wrote :

@Julian,

bug 948461 is what brought this in.

Some important information to be aware of in that regard.
a.) The issue occurred on S3 backed mirrors. S3 had (has?) a bug in their pipelining implementation. (bug 948461 comment 21)
b.) I believe (confirmation requested) that Canonical no longer runs mirrors in S3 but rather uses http proxy mirrors inside AWS.
c.) The user has no control on whether or not a proxy in their path is broken with respect to pipelining. Thus it was just seen as safer to disable it.
d.) other non-S3 users have seen such problems (bug 948461 comment 36).

The result of all above is that merely stating "this is fixed and is faster if the workaround is backed out" is not sufficient. At very least we need some evidence that it is no longer a problem with S3 backed mirrors. Ideally we have some way of recreating the broken state and showing that it is no longer necessary any more.

We have to do due diligence before SRU, other wise we risk regressing user in a way that is non-trivial to fix (since package updates are potentially broken). So, I'd really like to see demonstrated:
 1.) old apt without workaround recreating the problem
 2.) old apt with workaround works
 3.) new apt without workaround works

Revision history for this message
Scott Moser (smoser) wrote :

Is that something you can do?

Changed in cloud-init:
importance: Undecided → Low
status: New → Incomplete
Revision history for this message
Robert C Jennings (rcj) wrote :

I can confirm that for AWS the S3-based mirrors are no longer in place. You are correct that a caching proxy is used instead.

Revision history for this message
Julian Andres Klode (juliank) wrote :

I don't have a broken mirror or proxy. We have tests in apt >= 1.2 (or 1.1) that check pipeline fixup. The test case we use there is packages a, b, c, d. Then we map the responses like this:

GET a responds with d
GET b responds with c
GET c responds with b
GET d responds with a

There's also 0, which succeeds.

What APT then does is the following: It transparently detects which response belongs to which request using the hashes it has calculated. The output gets weird though, as we abort the pipeline after the first failure, close the connection, and disable pipelining for it.

So, in trusty, you get:

root@t:~# apt-get download pkg0 pkga pkgb pkgc pkgd
Get:1 http://10.33.102.1:8080/ stable/main pkg0 all 1.0 [20.7 kB]
Get:2 http://10.33.102.1:8080/ stable/main pkga all 1.0 [20.7 kB]
Get:3 http://10.33.102.1:8080/ stable/main pkgb all 1.0 [20.7 kB]
Get:4 http://10.33.102.1:8080/ stable/main pkgc all 1.0 [20.7 kB]
Get:5 http://10.33.102.1:8080/ stable/main pkgd all 1.0 [20.7 kB]
Fetched 103 kB in 0s (2050 kB/s)
E: Failed to fetch http://10.33.102.1:8080/pool/pkga_1.0_all.deb Hash Sum mismatch
E: Failed to fetch http://10.33.102.1:8080/pool/pkgb_1.0_all.deb Hash Sum mismatch
E: Failed to fetch http://10.33.102.1:8080/pool/pkgc_1.0_all.deb Hash Sum mismatch
E: Failed to fetch http://10.33.102.1:8080/pool/pkgd_1.0_all.deb Hash Sum mismatch

But in xenial:
# apt-get download pkg0 pkga pkgb pkgc pkgd
Get:1 http://10.33.102.1:8080 stable/main all pkg0 all 1.0 [20.7 kB]
Get:2 http://10.33.102.1:8080 stable/main all pkga all 1.0 [20.7 kB]
Get:2 http://10.33.102.1:8080 stable/main all pkga all 1.0 [20.7 kB]
Get:2 http://10.33.102.1:8080 stable/main all pkga all 1.0 [20.7 kB]
Get:2 http://10.33.102.1:8080 stable/main all pkga all 1.0 [20.7 kB]
Fetched 103 kB in 0s (0 B/s)
W: Can't drop privileges for downloading as file '/root/pkg0_1.0_all.deb' couldn't be accessed by user '_apt'. - pkgAcquire::Run (13: Permission denied)
W: http://10.33.102.1:8080/pool/pkga_1.0_all.deb: Automatically disabled Acquire::http::Pipeline-Depth due to incorrect response from server/proxy. (man 5 apt.conf)

Arguably the output is confusing, it looks like it downloaded pkga 4 times, but it actually downloaded all packages correctly.

If someone has a mirror on EC2 to test with, I'd be happy to do some more testing w/ trusty vs. xenial.

Failing that, I think a good start would be dropping the file in cosmic only and see how it goes.

Revision history for this message
Julian Andres Klode (juliank) wrote :

We really should have had like a host-scoped option for that.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for cloud-init because there has been no activity for 60 days.]

Changed in cloud-init:
status: Incomplete → Expired
Changed in cloud-init:
status: Expired → New
Revision history for this message
Julian Andres Klode (juliank) wrote :

Can we get this in for 19.04? This should be relatively low-risk to roll out there, and if we get regression reports, we can roll it back later.

Dan Watkins (oddbloke)
Changed in cloud-init:
status: New → Triaged
importance: Low → Medium
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit f2f530e5 to cloud-init on branch master.
To view that commit see the following URL:
https://git.launchpad.net/cloud-init/commit/?id=f2f530e5

Changed in cloud-init:
status: Triaged → Fix Committed
Revision history for this message
Chad Smith (chad.smith) wrote : Fixed in cloud-init version 19.1.

This bug is believed to be fixed in cloud-init in version 19.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.