high cpu usage on download (not optimised SHA256, SHA512, SHA1)

Bug #1123272 reported by Oleksij Rempel
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
apt (Ubuntu)
Confirmed
Wishlist
Unassigned

Bug Description

http process takes about 20% on intel i5-3317. Here is output of ps aux:
root 4309 21.7 0.0 37108 2344 pts/2 S+ 18:17 0:01 /usr/lib/apt/methods/http

it can be easy reproduces with "apt-get download firefox-dbg" or any other upgrade realted command.

The reason of this overhead are this functions:
22,34% http libapt-pkg.so.4.12.0 [.] SHA256_Transform(_SHA256_CTX*, unsigned int const*)
14,81% http libapt-pkg.so.4.12.0 [.] SHA512_Transform(_SHA512_CTX*, unsigned long const*)
8,11% http libapt-pkg.so.4.12.0 [.] SHA1Transform(unsigned int*, unsigned char const*)
5,62% http libapt-pkg.so.4.12.0 [.] MD5Transform(unsigned int*, unsigned int const*)

libapt uses own implementation of this hasches without any optimisation. Are there any reason why libapt do not use
openssl? Beside current version of openssl has AVX optimised SHA* implementations.

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: apt 0.9.7.7ubuntu1
ProcVersionSignature: Ubuntu 3.8.0-4.9-generic 3.8.0-rc6
Uname: Linux 3.8.0-4-generic x86_64
ApportVersion: 2.8-0ubuntu4
Architecture: amd64
Date: Tue Feb 12 18:12:33 2013
InstallationDate: Installed on 2012-09-13 (152 days ago)
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Alpha amd64 (20120913.1)
MarkForUpload: True
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
SourcePackage: apt
UpgradeStatus: Upgraded to raring on 2013-02-06 (5 days ago)

Revision history for this message
Oleksij Rempel (olerem) wrote :
Revision history for this message
dino99 (9d9) wrote :

also get that issue on Vivid i386; mainly when the index is rebuilt when using synaptic

tags: added: vivid
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in apt (Ubuntu):
status: New → Confirmed
Changed in apt (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Travis Johnson (conslo) wrote :

I believe this is still present in 16.04. Specifically I become CPU bottlenecked (100% on one core by /usr/lib/apt/methods/http) when downloading packages within AWS EC2 instances (better bandwidth with download servers there). Besides the issues around locking up a core, this makes several processes take noticeably more time than they should.

dino99 (9d9)
tags: added: xenial
removed: raring vivid
Revision history for this message
Julian Andres Klode (juliank) wrote :

The hash algorithms should be fast enough these days in software to handle usual network bandwidths. If you combine extremely high speeds with extremely slow CPUs, that's of course suboptimal.

We can consider using libnettle at some point, I assume this might give a nice speed up, but this is not planned for the immediate future as it requires an ABI break.

Changed in apt (Ubuntu):
importance: Medium → Wishlist
Revision history for this message
Oleksij Rempel (olerem) wrote :

Well, definition of "extreamly" is wired. In this bug report we have i5 with 20%!
I my case it was Intel Atom with 100% CPU usage and DSL 2000 Mbit/s!

Atom is slow, but not slowest used CPU.

Revision history for this message
Travis Johnson (conslo) wrote :

I am not satisfied with this response: I experience CPU bottlenecking on ec2 server instances. Gigabit networking with CPUs sub-3ghz is incredibly common place (read: entirety of server ecosystem), so I'm not sure what you mean by "usual" network bandwidths.

Revision history for this message
Julian Andres Klode (juliank) wrote :

Results from my 5 year old laptop (X230, Core i5-3320M with 2.6 GHz), for a 1 GB file:

MD5+SHA1+SHA256: 11s => 90 MB/s => 727 Mbit/s
MD5+SHA1+SHA256+SHA512: 15s => 67 MB/s => 533 Mbit/s

That seems like an acceptable baseline performance. You are not likely to hit GBs worth of packages, so you have like less than 10 seconds of maximum CPU use (likely you'll hit 100-200 MB, so 2-3 seconds of CPU burst). I think 2-3 seconds is completely acceptable.

With nettle which is hardware-accelerated, we achieve:

MD5+SHA1+SHA256: 9s => 111 MB/s => 888 Mbit/s
MD5+SHA1+SHA256+SHA512: 12.8s => 78 MB/s => 625 Mbit/s

(Tests might be skewed a bit in favor of apt, as it avoids duplicate reads - I had to run multiple nettle-hash after each other, rather than after each other per block, the overhead is about 0.2 / 0.4s)

The improvement does not seem huge.

Obviously, if you find faster implementations to contribute to nettle, things might improve further.

Revision history for this message
Julian Andres Klode (juliank) wrote :

I think my laptop is too old to support AVX, though. But I'm not sure if libnettle supports that anyway, and libnettle would be our choice.

Revision history for this message
Julian Andres Klode (juliank) wrote :

We can't use openssl for legal reasons, and also don't want it in libapt-pkg for technical reasons anyway.

Revision history for this message
Julian Andres Klode (juliank) wrote :

New results (user time only, removing read() overhead):

OpenSSL: 6.8s or 9.8s = 147 or 102 MB/s = 1176 or 816 Mbit/s
Nettle: 8.8s or 11.8s = 113 MB/s or 85 MB/s = 904 or 680 Mbit/s
APT: 10.9s or 15s = 92 MB/s or 67 MB/s = 736 or 536 Mbit/s

As we can see, the improvement is not that large, and libnettle also still has a small way to go to reach OpenSSL's performance.

Revision history for this message
Julian Andres Klode (juliank) wrote :

Tests run (for SHA256, add SHA512 as needed):

(1) time for i in md5 sha1 sha256; do nettle-hash -a $i < /tmp/test ; done
(2) time for i in md5 sha1 sha256; do openssl $i < /tmp/test ; done
(3) ./a.out < /tmp/test

where a.out is

#include <apt-pkg/hashes.h>
#include <iostream>

int main(void)
{
    Hashes h(Hashes::MD5SUM | Hashes::SHA1SUM | Hashes::SHA256SUM);
    h.AddFD(0);
    auto hsl = h.GetHashStringList();

    for (HashString const &h : hsl)
        std::cout << h.toStr() << std::endl;
    return 0;
}

compiled with g++ -lapt-pkg -O2

Revision history for this message
Travis Johnson (conslo) wrote :

This seems to be a bit of a bikeshed on specific implementation options. The problem here is downloads pin the CPU, and in my personal experience I can download the same files via curl with nothing close to a CPU bottleneck.

I do not know how you benchmarked APT but this may be a case of a micro benchmark not properly representing the problem.

Revision history for this message
Julian Andres Klode (juliank) wrote :

Of course you don't have a CPU bottleneck with curl, it does not run 3 or 4 hashes over everything it downloads. APT needs to hash its downloads to ensure they are secure, and it uses all available hashes to do so, so if one hash's security is compromised, we can still hopefully still rely on the others.

And the topic of this bug report is indeed hash CPU usage, and I compared our CPU usage on a 1 GB test file in a tmpfs to the CPU usage of OpenSSL and Nettle. So this test operates on a data size 5 times higher than the usual packages and under optimal conditions to evaluate how fast we can hash.

The test shows that on a 1 Gbit/s connection you'll likely be throttled slightly at a comparable CPU (assuming the connection reaches about 800 Mbit/s, that is, 80% efficiency). If you have a data rate of about 500 Mbit/s, you will likely be fine (not counting parallel downloads).

If there are other problems reducing the download speeds, these are separate bugs. This one covers the overhead of hashing, nothing else.

BTW: The original bug report talks about 20% CPU usage on a 5 year old CPU, that seems entirely reasonable and not really an issue. If your bandwidth is high, you'll have higher CPU usage for a shorter time (like 100% for 5 seconds or so).

Revision history for this message
Oleksij Rempel (olerem) wrote :

Hi Julian,

I would suggest to rewords you initial comments to some thing like:
"apt was heavy reworked and optimised, probably your initial problem won't hit you as hard as before.
Hashing optimisation can't currently be done for this release without braking the ABI."

Any way, thank you for your work and I hope you can still continue to do it.
Sorry for my misunderstanding.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.