Launchpad itself

manage-chroot has a near 100% failure rate

Bug #1475673 reported by Adam Conrad on 2015-07-17

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Confirmed	Undecided	Unassigned

Bug Description

As can be seen in the attached output, only 1/6 chroot uploads worked for me this morning. This isn't an anomaly, it's been like this pretty much forever, but it's really driving me nuts.

If we can't fix this, can I get my direct-db-mangling tool back on pepo for now while we sort out why the API version is so unfriendly?

Revision history for this message

Adam Conrad (adconrad) wrote on 2015-07-17:

manage-chroot-output.txt Edit (26.3 KiB, text/plain)

Revision history for this message

Colin Watson (cjwatson) wrote on 2015-07-17:

I've bumped the appserver timeout to 15 seconds for this API call, and that seems to have cleared things up.

Changed in launchpad:
status:	New → Fix Released

Revision history for this message

Adam Conrad (adconrad) wrote on 2015-07-17:

manage-chroot-new.txt Edit (2.0 KiB, text/plain)

Colin's fix mostly worked, but I still got 1/12 failures, apparently due to an 8s soft timeout somewhere.

Changed in launchpad:
status:	Fix Released → Confirmed

Revision history for this message

Colin Watson (cjwatson) wrote on 2015-07-17:

OOPS-3d9ca7630ee30bd9b1cf09d68b79e7a1 (but this is huge and you probably don't want to open it in a graphical browser).

Revision history for this message

Colin Watson (cjwatson) wrote on 2015-07-17:

This is almost entirely non-SQL time, as you might expect. The slow blocks between SQL statements are as follows, with all Launchpad line numbers relative to r17628:

1) 1278ms: zope/publisher/publish.py:129 (... DistroSeries.getDistroArchSeries) to 132 (... DistroArchSeries.isNominatedArchIndep). Not quite sure what's happening in here - maybe something to do with request form processing?

2) 3004ms: The first part of DistroArchSeries.setChroot, up to the start of the LibraryFileAliasSet.create call at line 190. A fair bit of this will be the sha1 computation, but https://bugs.launchpad.net/launchpad/+bug/1116954 may also be implicated. I note that the sha1 computation is done from scratch in the event of needing to apply the \r hack, when it should be possible to just add one byte to the sha1 object and get an updated digest; but the chroot in question here didn't end with \r.

3) 2304ms: FileUploadClient.addFile:161-217. Mostly necessary reading and checksumming of the file, although I wonder if a larger chunk size would be more efficient.

4) 1883ms: zope/publisher/publish.py:132 (... looking something up in SessionPkgData) to 137 (... post-webservice-call commit). Not quite sure what's happening in here - possibly something else to do with the enormous request, or maybe something tracelog-related?

I think that's as much as I can readily work out by OOPS inspection. More than that will probably require stepping through or profiling this locally.

Revision history for this message

Colin Watson (cjwatson) wrote on 2015-07-17:

A larger chunk size appears to make negligible difference in a local microbenchmark.

Revision history for this message

William Grant (wgrant) wrote on 2015-07-18:

Soft timeouts are soft, not hard. A soft timeout does not explain a failure.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.