high cpu usage on download (not optimised SHA256, SHA512, SHA1)

Bug #1123272 reported by Oleksij Rempel on 2013-02-12
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
apt (Ubuntu)
Wishlist
Unassigned

Bug Description

http process takes about 20% on intel i5-3317. Here is output of ps aux:
root 4309 21.7 0.0 37108 2344 pts/2 S+ 18:17 0:01 /usr/lib/apt/methods/http

it can be easy reproduces with "apt-get download firefox-dbg" or any other upgrade realted command.

The reason of this overhead are this functions:
22,34% http libapt-pkg.so.4.12.0 [.] SHA256_Transform(_SHA256_CTX*, unsigned int const*)
14,81% http libapt-pkg.so.4.12.0 [.] SHA512_Transform(_SHA512_CTX*, unsigned long const*)
8,11% http libapt-pkg.so.4.12.0 [.] SHA1Transform(unsigned int*, unsigned char const*)
5,62% http libapt-pkg.so.4.12.0 [.] MD5Transform(unsigned int*, unsigned int const*)

libapt uses own implementation of this hasches without any optimisation. Are there any reason why libapt do not use
openssl? Beside current version of openssl has AVX optimised SHA* implementations.

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: apt 0.9.7.7ubuntu1
ProcVersionSignature: Ubuntu 3.8.0-4.9-generic 3.8.0-rc6
Uname: Linux 3.8.0-4-generic x86_64
ApportVersion: 2.8-0ubuntu4
Architecture: amd64
Date: Tue Feb 12 18:12:33 2013
InstallationDate: Installed on 2012-09-13 (152 days ago)
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Alpha amd64 (20120913.1)
MarkForUpload: True
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
SourcePackage: apt
UpgradeStatus: Upgraded to raring on 2013-02-06 (5 days ago)

Oleksij Rempel (olerem) wrote :
dino99 (9d9) wrote :

also get that issue on Vivid i386; mainly when the index is rebuilt when using synaptic

tags: added: vivid
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in apt (Ubuntu):
status: New → Confirmed
Changed in apt (Ubuntu):
importance: Undecided → Medium
Travis Johnson (conslo) wrote :

I believe this is still present in 16.04. Specifically I become CPU bottlenecked (100% on one core by /usr/lib/apt/methods/http) when downloading packages within AWS EC2 instances (better bandwidth with download servers there). Besides the issues around locking up a core, this makes several processes take noticeably more time than they should.

dino99 (9d9) on 2016-08-04
tags: added: xenial
removed: raring vivid
Julian Andres Klode (juliank) wrote :

The hash algorithms should be fast enough these days in software to handle usual network bandwidths. If you combine extremely high speeds with extremely slow CPUs, that's of course suboptimal.

We can consider using libnettle at some point, I assume this might give a nice speed up, but this is not planned for the immediate future as it requires an ABI break.

Changed in apt (Ubuntu):
importance: Medium → Wishlist
Oleksij Rempel (olerem) wrote :

Well, definition of "extreamly" is wired. In this bug report we have i5 with 20%!
I my case it was Intel Atom with 100% CPU usage and DSL 2000 Mbit/s!

Atom is slow, but not slowest used CPU.

Travis Johnson (conslo) wrote :

I am not satisfied with this response: I experience CPU bottlenecking on ec2 server instances. Gigabit networking with CPUs sub-3ghz is incredibly common place (read: entirety of server ecosystem), so I'm not sure what you mean by "usual" network bandwidths.

Julian Andres Klode (juliank) wrote :

Results from my 5 year old laptop (X230, Core i5-3320M with 2.6 GHz), for a 1 GB file:

MD5+SHA1+SHA256: 11s => 90 MB/s => 727 Mbit/s
MD5+SHA1+SHA256+SHA512: 15s => 67 MB/s => 533 Mbit/s

That seems like an acceptable baseline performance. You are not likely to hit GBs worth of packages, so you have like less than 10 seconds of maximum CPU use (likely you'll hit 100-200 MB, so 2-3 seconds of CPU burst). I think 2-3 seconds is completely acceptable.

With nettle which is hardware-accelerated, we achieve:

MD5+SHA1+SHA256: 9s => 111 MB/s => 888 Mbit/s
MD5+SHA1+SHA256+SHA512: 12.8s => 78 MB/s => 625 Mbit/s

(Tests might be skewed a bit in favor of apt, as it avoids duplicate reads - I had to run multiple nettle-hash after each other, rather than after each other per block, the overhead is about 0.2 / 0.4s)

The improvement does not seem huge.

Obviously, if you find faster implementations to contribute to nettle, things might improve further.

Julian Andres Klode (juliank) wrote :

I think my laptop is too old to support AVX, though. But I'm not sure if libnettle supports that anyway, and libnettle would be our choice.

Julian Andres Klode (juliank) wrote :

We can't use openssl for legal reasons, and also don't want it in libapt-pkg for technical reasons anyway.

Julian Andres Klode (juliank) wrote :

New results (user time only, removing read() overhead):

OpenSSL: 6.8s or 9.8s = 147 or 102 MB/s = 1176 or 816 Mbit/s
Nettle: 8.8s or 11.8s = 113 MB/s or 85 MB/s = 904 or 680 Mbit/s
APT: 10.9s or 15s = 92 MB/s or 67 MB/s = 736 or 536 Mbit/s

As we can see, the improvement is not that large, and libnettle also still has a small way to go to reach OpenSSL's performance.

Julian Andres Klode (juliank) wrote :

Tests run (for SHA256, add SHA512 as needed):

(1) time for i in md5 sha1 sha256; do nettle-hash -a $i < /tmp/test ; done
(2) time for i in md5 sha1 sha256; do openssl $i < /tmp/test ; done
(3) ./a.out < /tmp/test

where a.out is

#include <apt-pkg/hashes.h>
#include <iostream>

int main(void)
{
    Hashes h(Hashes::MD5SUM | Hashes::SHA1SUM | Hashes::SHA256SUM);
    h.AddFD(0);
    auto hsl = h.GetHashStringList();

    for (HashString const &h : hsl)
        std::cout << h.toStr() << std::endl;
    return 0;
}

compiled with g++ -lapt-pkg -O2

Travis Johnson (conslo) wrote :

This seems to be a bit of a bikeshed on specific implementation options. The problem here is downloads pin the CPU, and in my personal experience I can download the same files via curl with nothing close to a CPU bottleneck.

I do not know how you benchmarked APT but this may be a case of a micro benchmark not properly representing the problem.

Julian Andres Klode (juliank) wrote :

Of course you don't have a CPU bottleneck with curl, it does not run 3 or 4 hashes over everything it downloads. APT needs to hash its downloads to ensure they are secure, and it uses all available hashes to do so, so if one hash's security is compromised, we can still hopefully still rely on the others.

And the topic of this bug report is indeed hash CPU usage, and I compared our CPU usage on a 1 GB test file in a tmpfs to the CPU usage of OpenSSL and Nettle. So this test operates on a data size 5 times higher than the usual packages and under optimal conditions to evaluate how fast we can hash.

The test shows that on a 1 Gbit/s connection you'll likely be throttled slightly at a comparable CPU (assuming the connection reaches about 800 Mbit/s, that is, 80% efficiency). If you have a data rate of about 500 Mbit/s, you will likely be fine (not counting parallel downloads).

If there are other problems reducing the download speeds, these are separate bugs. This one covers the overhead of hashing, nothing else.

BTW: The original bug report talks about 20% CPU usage on a 5 year old CPU, that seems entirely reasonable and not really an issue. If your bandwidth is high, you'll have higher CPU usage for a shorter time (like 100% for 5 seconds or so).

Oleksij Rempel (olerem) wrote :

Hi Julian,

I would suggest to rewords you initial comments to some thing like:
"apt was heavy reworked and optimised, probably your initial problem won't hit you as hard as before.
Hashing optimisation can't currently be done for this release without braking the ABI."

Any way, thank you for your work and I hope you can still continue to do it.
Sorry for my misunderstanding.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers