bzr branch http:// with a pack repository takes too long
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Fix Released
|
High
|
Vincent Ladeuil |
Bug Description
Downloading a new pack repository, using http took me 18 minutes. During this time network traffic was high.
robertc@
real 18m2.075s
user 0m49.847s
sys 0m5.352s
robertc@
--07:48:29-- http://
=> `2a0e5ecbd9eea5
Resolving bazaar.
Connecting to bazaar.
HTTP request sent, awaiting response... 200 OK
Length: 57,135,763 (54M) [text/plain]
100%[==
07:54:35 (153.01 KB/s) - `2a0e5ecbd9eea5
real 6m5.762s
user 0m0.564s
sys 0m2.892s
We do 4 readv's of the pack during fetch:
http readv of 2a0e5ecbd9eea5a
http readv of 2a0e5ecbd9eea5a
http readv of 2a0e5ecbd9eea5a
Retry "2a0e5ecbd9eea5
http readv of 2a0e5ecbd9eea5a
These should be revisions, inventories, texts, signatures. The counts line up appropriately for that.
One possibility for the total time count is that we're downloading the entire file more than once. However, there is not enough information for me to determine if that is the case from the .bzr.log.
I'll look at doing some extra diagnostics later but I wanted to record this issue.
Related branches
Changed in bzr: | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in bzr: | |
assignee: | nobody → v-ladeuil |
Changed in bzr: | |
milestone: | none → 1.0rc1 |
status: | Fix Committed → Fix Released |
Depending on which http implementation you're using, you can:
- activate debug for pycurl in _pycurl.py by uncommenting the line:
## curl.setopt( pycurl. VERBOSE, 1)
- activate debug for urllib in _urllib2_ wrappers. py by setting DEBUG to 1
DEBUG = 1
I don't think that should generate too much output.
BUT, looking at the numbers of offsets, it's pretty obvious that the server should shoke with a 400: Bad request because the header describing the offsets is just too big.
http readv of 2a0e5ecbd9eea5a 28df78a003ced30 a4.pack offsets => 33490 collapsed 671 a28df78a003ced3 0a4.pack" with single range request
Retry "2a0e5ecbd9eea5
is a pretty good indication that it occurred.
So this one should hurt.
We already discussed a way to avoid that by issuing several GET requests for a single readv with John, but I didn't implemented it because (from memory) it required to rewrite http/response.py which I didn't want to do at the time.
This has do be done anyway if only to avoid holding the whole file in memory (2 times in some cases), looks like the time has come.