manage-chroot has a near 100% failure rate

Bug #1475673 reported by Adam Conrad
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Confirmed
Undecided
Unassigned

Bug Description

As can be seen in the attached output, only 1/6 chroot uploads worked for me this morning. This isn't an anomaly, it's been like this pretty much forever, but it's really driving me nuts.

If we can't fix this, can I get my direct-db-mangling tool back on pepo for now while we sort out why the API version is so unfriendly?

Revision history for this message
Adam Conrad (adconrad) wrote :
Revision history for this message
Colin Watson (cjwatson) wrote :

I've bumped the appserver timeout to 15 seconds for this API call, and that seems to have cleared things up.

Changed in launchpad:
status: New → Fix Released
Revision history for this message
Adam Conrad (adconrad) wrote :

Colin's fix mostly worked, but I still got 1/12 failures, apparently due to an 8s soft timeout somewhere.

Changed in launchpad:
status: Fix Released → Confirmed
Revision history for this message
Colin Watson (cjwatson) wrote :

OOPS-3d9ca7630ee30bd9b1cf09d68b79e7a1 (but this is huge and you probably don't want to open it in a graphical browser).

Revision history for this message
Colin Watson (cjwatson) wrote :

This is almost entirely non-SQL time, as you might expect. The slow blocks between SQL statements are as follows, with all Launchpad line numbers relative to r17628:

 1) 1278ms: zope/publisher/publish.py:129 (... DistroSeries.getDistroArchSeries) to 132 (... DistroArchSeries.isNominatedArchIndep). Not quite sure what's happening in here - maybe something to do with request form processing?

 2) 3004ms: The first part of DistroArchSeries.setChroot, up to the start of the LibraryFileAliasSet.create call at line 190. A fair bit of this will be the sha1 computation, but https://bugs.launchpad.net/launchpad/+bug/1116954 may also be implicated. I note that the sha1 computation is done from scratch in the event of needing to apply the \r hack, when it should be possible to just add one byte to the sha1 object and get an updated digest; but the chroot in question here didn't end with \r.

 3) 2304ms: FileUploadClient.addFile:161-217. Mostly necessary reading and checksumming of the file, although I wonder if a larger chunk size would be more efficient.

 4) 1883ms: zope/publisher/publish.py:132 (... looking something up in SessionPkgData) to 137 (... post-webservice-call commit). Not quite sure what's happening in here - possibly something else to do with the enormous request, or maybe something tracelog-related?

I think that's as much as I can readily work out by OOPS inspection. More than that will probably require stepping through or profiling this locally.

Revision history for this message
Colin Watson (cjwatson) wrote :

A larger chunk size appears to make negligible difference in a local microbenchmark.

Revision history for this message
William Grant (wgrant) wrote :

Soft timeouts are soft, not hard. A soft timeout does not explain a failure.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.