bugtask:+index Timeout (lots of comments and attachments)

Bug #583553 reported by Bryce Harrington
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Robert Collins

Bug Description

[Problem]
Bug #317094 has been consistently generating a HTTP Error 503: Service Unavailable over the last week when running a script that processes its attachments.

[Discussion]
Cronned runs of process-release-tagging.py on May 14, 15, 16, 17, 18, 19, and 20 all failed midway through with 503 Service Unavailable errors. Of those runs, all but two failed while processing bug 317094. The other two failed prior to reaching 317094, perhaps for some other reason.

This bug report has over 200 comments, and also has an unusually large number of attachments from a range of different people. I notice if I try to view all the comments from the web interface it times out after a few moments. (I tested both edge and production.)

I think it's fairly well known that bugs with >400 comments tend to time out, but ones with only 216 comments generally are okay. So I'm more wondering if it has something to do with the large number of attachments on this bug report.

When accessing the comments from the web interface, the Oops looks like this

Timeout error
Sorry, something just went wrong in Launchpad.
We’ve recorded what happened, and we’ll fix it as soon as possible. Apologies for the inconvenience.
Trying again in a couple of minutes might work.
(Error ID: OOPS-1601G2848)

Traceback (most recent call last):

    * Module zope.publisher.publish, line 134, in publish
...
    * Module canonical.launchpad.webapp.adapter, line 464, in connection_raw_execute
      connection, raw_cursor, statement, params)
    * Module storm.tracer, line 49, in connection_raw_execute

TimeoutError: 'SELECT LibraryFileContent.datecreated, LibraryFileContent.filesize, LibraryFileContent.id, LibraryFileContent.md5, LibraryFileContent.sha1 FROM LibraryFileContent WHERE LibraryFileContent.id = %s LIMIT 1', [<storm.variables.IntVariable object at 0x2aaab7487578>]<br />

SQL time: 5202 ms
Non-sql time: 15510 ms
Total time: 20712 ms
Statement Count: 506

no long sql statements here - there is however a big jump:
348. 8717 44ms launchpad-main-slave SELECT BugAttachment.bug, BugAttachment.libraryfile, BugAttachment.id, BugAttachment.libraryfile, BugAttachment.message, BugAttachment.title, BugAttachment.type FROM BugAttachment, LibraryFileAlias WHERE BugAttachment.bug = %s AND BugAttachment.libraryfile = LibraryFileAlias.id AND NOT (LibraryFileAlias.content IS NULL) ORDER BY BugAttachment.id
349. 15868 62ms launchpad-main-slave SELECT BugTag.bug, BugTag.id, BugTag.tag FROM BugTag WHERE BugTag.bug = %s ORDER BY BugTag.tag
That is, between getting the attachment data, and starting onto tags, there is a 5 second 'nonsql' gap - could be talking to librarian or something I guess.

Revision history for this message
Bryce Harrington (bryce) wrote :

Here is the output from the cron job, showing the error page returned to the script, and a Traceback.

Revision history for this message
Bryce Harrington (bryce) wrote :

And here's the oops page returned when I try to access the bug from the web interface.

Revision history for this message
Bryce Harrington (bryce) wrote :

Fwiw, I'm going to wontfix the bug in question as a workaround. It'd be fine with me to reopen it for testing purposes (but just be sure to close it when done).

Revision history for this message
Bryce Harrington (bryce) wrote :

(ah, actually I can leave it open but tag it so my scripts ignore it)

Curtis Hovey (sinzui)
affects: launchpad → malone
Revision history for this message
Deryck Hodge (deryck) wrote :

My guess is like yours, that its the combination of comments and attachments causing the timeout in this case.

summary: - LP 503 timeout accessing bug with lots of comments and attachments
+ Timeout accessing bug with lots of comments and attachments
Changed in malone:
status: New → Triaged
importance: Undecided → High
tags: added: attachments bug-page comments dhrb timeout
description: updated
Revision history for this message
Ursula Junque (ursinha) wrote : Re: Timeout accessing bug with lots of comments and attachments

I see OOPS-1686A1008, that's a similar timeout but accessing the bug via API. Is that related to this bug or should we open another one?

Revision history for this message
Robert Collins (lifeless) wrote :

The SQL trace for the two things looks very different, I would file a new bug.

Revision history for this message
Ursula Junque (ursinha) wrote :

@lifeless, I've filed bug 618849 for the API one. Thanks

Revision history for this message
Bryce Harrington (bryce) wrote :

Bug #317094 still oopses, however the time seems to have improved a good deal:

# SQL time: 6168 ms
# Non-sql time: 7378 ms
# Total time: 13546 ms
# Statement Count: 413

https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1713EB2411

I don't know if this is due to bug fixes such as bug #618849, or just a side-effect of dialing down the timeout durations. I'll optimistically assume the former.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 583553] Re: Timeout accessing bug with lots of comments and attachments

1 121 2801 23 2778 SQL-launchpad-main-slave SELECT
LibraryFileContent.datecreated, LibraryFileContent.filesize,
LibraryFileContent.id, LibraryFileContent.md5, LibraryFileContent.sha1
FROM LibraryFileContent WHERE LibraryFileContent.id = %s LIMIT 1

Death-by-1000 queries : doing one query for these things will help a
lot. Figuring out why its being queried for may allow avoiding it
entirely.

My WAG: we're checking for deleted content:

=== modified file 'lib/canonical/launchpad/database/librarian.py'
--- lib/canonical/launchpad/database/librarian.py 2010-09-06
21:02:44 +0000
+++ lib/canonical/launchpad/database/librarian.py 2010-09-10
00:28:16 +0000
@@ -236,7 +236,7 @@

     @property
     def deleted(self):
- return self.content is None
+ return self.contentID is None

I'll put up a branch in a minute for this.

Revision history for this message
Robert Collins (lifeless) wrote :

Branch is off to pqm; we should see it land on edge tonight and can
follow up after that.

Revision history for this message
Launchpad QA Bot (lpqabot) wrote : Bug fixed by a commit
Changed in malone:
assignee: nobody → Robert Collins (lifeless)
milestone: none → 10.10
tags: added: qa-needstesting
Changed in malone:
status: Triaged → Fix Committed
Revision history for this message
Robert Collins (lifeless) wrote : Re: Timeout accessing bug with lots of comments and attachments

With my LFC fix + jeroens libraryfilealias fix:

https://bugs.edge.launchpad.net/ubuntu/+source/xf86-input-evtouch/+bug/317094/+index
rendered fairly quickly:
At least 390 queries/external actions issued in 4.96 seconds

\o/

Revision history for this message
Robert Collins (lifeless) wrote :

@Bryce, the improvements in the index page were done for a different bug - just general hygeine: we had a crazy loop on subscribers (still there, but optimised, we can remove entirely because the portlet is loaded through async (or fold the portlet back into the page for more effiicent delivery); we were doing something else which I just cannot remember just now; and we had the nuts lookups of LFC's which is now fixed in edge and seems to help the page considerably; I'll ask for a CP to prod on Monday.

tags: added: qa-ok
removed: qa-needstesting
summary: - Timeout accessing bug with lots of comments and attachments
+ bugtask:+index Timeout (lots of comments and attachments)
Changed in malone:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.