simplestreams is slower at downloading than wget

Bug #1240838 reported by Julian Edwards
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
simplestreams
Triaged
Low
Unassigned

Bug Description

wget the maas ephemerals file on Canonistack takes around 11 seconds. Using the simplestreams library with the new importer takes around *5 minutes*.

Revision history for this message
Raphaël Badin (rvb) wrote :

One more data point: when using simplestreams (actually running maas-import-ephemerals from maas) with a proxy, it takes several minutes before the proxy is actually used (i.e. before I see a hit in the proxy's log — and note that even though the log is only updated *after* the download, using wget to perform the exact same download shows it only takes 10 seconds or so). I was expecting that the first thing to happen would be the image download.

Revision history for this message
Raphaël Badin (rvb) wrote :

More precisely, this is squid's log (note that maas-import-ephemerals then crashes with http://paste.ubuntu.com/6250153/, but that is bug 1240652) : http://paste.ubuntu.com/6250151/

Revision history for this message
Scott Moser (smoser) wrote :

the "order of magnitude" is very incorrect.

### wget ###
$ time wget http://maas.ubuntu.com/images/ephemeral/daily/precise/20131017/precise-daily-maas-amd64.tar.gz -q
real 0m5.888s
user 0m0.165s
sys 0m1.720s

### python2 with no local proxy ###
$ rm -Rf out.d; time python2 /usr/bin/sstream-mirror --max=1 -vvv "$INDEX_URL" out.d release=precise arch=amd64
<snip>
inserting http://maas.ubuntu.com/images/ephemeral/daily/precise/20131017/precise-daily-maas-amd64.tar.gz to precise/20131017/precise-daily-maas-amd64.tar.gz

real 0m8.092s
user 0m3.852s
sys 0m1.694s

### python3 with local proxy ###
$ rm -Rf out.d; time http_proxy=http://localhost:3128/ python3 /usr/bin/sstream-mirror --max=1 -vvv "$INDEX_URL" out.d release=precise arch=amd64
<snip>
inserting http://maas.ubuntu.com/images/ephemeral/daily/precise/20131017/precise-daily-maas-amd64.tar.gz to precise/20131017/precise-daily-maas-amd64.tar.gz

real 0m6.689s
user 0m3.008s
sys 0m1.160s

### python3 with no local proxy ###
$ rm -Rf out.d; time python3 /usr/bin/sstream-mirror --max=1 -vvv "$INDEX_URL" out.d release=precise arch=amd64
<snip>
inserting http://maas.ubuntu.com/images/ephemeral/daily/precise/20131017/precise-daily-maas-amd64.tar.gz to precise/20131017/precise-daily-maas-amd64.tar.gz

real 0m7.637s
user 0m3.774s
sys 0m1.533s

Additionally, for sanity's sake:
$ cat go.py
#!/usr/bin/python
BUFLEN = 1024 * 10

import urllib2, sys
if len(sys.argv) > 3:
   BUFLEN = int(sys.argv[3])

rfp = urllib2.urlopen(sys.argv[1])
with open(sys.argv[2], "wb") as wfp:
   while True:
      buf = rfp.read(BUFLEN)
      wfp.write(buf)
      if len(buf) != BUFLEN:
          break
rfp.close()

$ time ./go.py http://maas.ubuntu.com/images/ephemeral/daily/precise/20131017/precise-daily-maas-amd64.tar.gz out.img

real 0m6.011s
user 0m1.033s
sys 0m1.762s

Very unscientific, but clearly there is no order of magnitude problem in the downloading itself. Note the simplstreams code spends more 'user' time than wget, that is to be expected in python code execution compared to C. Also, simplestreams downloading does not seem significantly different to simple urllib2 example above.

Changed in simplestreams:
status: New → Triaged
summary: - simplestreams is several orders of magnitude slower at downloading than
- wget
+ simplestreams is slower at downloading than wget
Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 1240838] Re: simplestreams is several orders of magnitude slower at downloading than wget

On Thursday 17 Oct 2013 14:49:51 you wrote:
> the "order of magnitude" is very incorrect.

The behaviour that we're seeing is that for each ephemeral file, it takes a
few seconds with wget and many minutes with simplestreams (as called from
maas-import-ephemerals).

I have no idea what's going on, perhaps it's related to the python-requests
bug?

This is trivial to recreate - start up a canonistack instance, install maas,
and run maas-import-pxe-files.

As it stands this is practically unusable.

Revision history for this message
Scott Moser (smoser) wrote :

Julian, I can't recreate this.
I ran the above with timings on a canonistack instance. I believe that it correctly shows that simplestreams downloading is not significantly slower than wget.

Note, that maas-import-ephemerals is doing substantially more than 'wget', so its not apples to apples.
for a give ephemeral tar file, it does:
 download the tar file (giving no status output, bug 1238148)
 extract it to get the kernel and initramfs and .img file
 copies (see bug 1239159) these files into place
 mounts the .img file and creates the dist-root.tar.gz file from it (700+M of READ IO, compress, 250M of write)

Note that the fix I put in for https://launchpad.net/bugs/1240652 does improve downloading a bit int hat it doesn't stat unnecessarily.

If you want to compare download only times with simplestreams download code there is examples in bug 1240652 on how to do that.

Scott Moser (smoser)
Changed in simplestreams:
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.