import_package seems slow to detirmine that there are no versions to import

Bug #608702 reported by John A Meinel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Distributed Development
Confirmed
Low
Unassigned
bzr-builddeb
Confirmed
Low
Unassigned

Bug Description

The import_package script seems to be doing something inefficient when it comes to determining if there are versions that need importing. As an example:
4.158 Time (UTC): 2010-07-21 20:00:26.377955
4.192 creating repository in file:///srv/package-import.canonical.com/new/updates/libsmbios/.bzr/.
4.196 finding all versions of libsmbios
17.307 found 33 versions: [PackageToImport(libsmbios, 0.10.6-1, ubuntu, dapper, release, ), Package
19.245 ssh implementation is OpenSSH
166.631 These versions are new: []
168.910 Using fetch logic to copy between RemoteRepository(bzr+ssh://bazaar.launchpad.net/~ubuntu-b
168.910 fetch up to rev {<email address hidden>}
177.679 creating branch <bzrlib.branch.BzrBranchFormat7 object at 0x31f4690> in file:///srv/package
177.744 created new branch BzrBranch7('file:///srv/package-import.canonical.com/new/updates/libsmbi
177.751 trying to create missing lock '/srv/package-import.canonical.com/new/updates/libsmbios/mave
177.751 opening working tree '/srv/package-import.canonical.com/new/updates/libsmbios/maverick'
178.255 opening working tree '/srv/package-import.canonical.com/new/updates/libsmbios/maverick'
178.283 Using fetch logic to copy between RemoteRepository(bzr+ssh://bazaar.launchpad.net/~ubuntu-b
178.283 fetch up to rev {<email address hidden>}
178.427 Base revid: '<email address hidden>'

It is taking 17s to find the version listing, which isn't terrible. It then takes 166-17 = 149s to determine that all of those revisions are, in fact, present in the branch.

Looking at the code, this might be checking solely on a remote Branch, and using a fresh 'Graph' object for every version that it wants to evaluate. (It does a graph check to make sure that every Version maps to a bzr tag, and that all those tags are present in the ancestry of the Branch tip.)

One possibility is to grab a KnownGraph object, and to cache that between calls.

I wasn't sure about KnownGraph as it grabs the whole ancestry, but if we are going to check all Versions anyway, then it is fine, because we will have to check really old things anyway.

Another option is to just use a CachingParentsProvider in there (which branch.repo.get_graph() may already do), but we at least need to hold on to the Graph object.

(This may actually be code in bzr-builddeb, I haven't traced it thoroughly.)

Revision history for this message
John A Meinel (jameinel) wrote :

This only really matters if load gets higher than we can actually maintain. James mentioned that load only seems to really be a factor during the 6-month distribution rollouts, where we end up checking all packages.

However, some of that may change if we move off of the hardware we are currently on.

Changed in udd:
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
John A Meinel (jameinel) wrote :

This does appear to be in bzr-builddeb because the DistributionBranch.has_version() calls _has_version() and that is the one doing the graph ancestry checks.

Also note that .has_version() potentially searches the graph 3 times, using 3 different graph objects in a single call.

Changed in bzr-builddeb:
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
John A Meinel (jameinel) wrote :

I've done an lsprof, and it seems to not be the has_version() check that I thought it was.

Looking at the profile, it appears to be 16 'BranchStore.get_branch' calls, which have 16 associated get_branch_parts calls which then spends 62% of the time in Branch.open() and 38% of the time in _branch_location.

both of those times seem surprisingly slow given that it should be re-using the bzr connection, and the profile is running on Jubany, which should have very low latency to the codehosting machine.

Revision history for this message
John A Meinel (jameinel) wrote :

The 16 get_branch calls does seem to match the various branches that the Versions should have been imported into. Such as (debian, etch, release), (ubuntu, dapper, release), (ubuntu, hardy, proposed), etc.

Revision history for this message
James Westby (james-w) wrote :

It's now down to ~45s to do this check, and I think that may be because of
a bug in reusing transports.

I think this is more of a problem when developers are trying to test locally.

Thanks,

James

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.