generate-ppa-htaccess takes several minutes to run

Reported by Michael Nelson on 2012-04-16
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Critical
William Grant

Bug Description

Based on info from bug 980241 - a 4 minute delay when purchasing software, I took a look at the corresponding logs for generate-ppa-htaccess and it shows (Natalia's timestamps are -0300) which show it happily chugging along nearly once-per minute until:

{{{
2012-04-12 19:36:33 INFO Logging INFO and higher messages to /srv/launchpad.net/production-logs/lp_publish/generate-ppa-htaccess.log
2012-04-12 19:36:33 INFO Creating lockfile: /var/lock/launchpad-generate-ppa-htaccess.lock
2012-04-12 19:37:14 INFO Logging INFO and higher messages to /srv/launchpad.net/production-logs/lp_publish/generate-ppa-htaccess.log
2012-04-12 19:37:14 INFO Creating lockfile: /var/lock/launchpad-generate-ppa-htaccess.lock
2012-04-12 19:37:35 INFO Starting the PPA .htaccess generation
2012-04-12 19:38:10 INFO Logging INFO and higher messages to /srv/launchpad.net/production-logs/lp_publish/generate-ppa-htaccess.log
2012-04-12 19:38:10 INFO Creating lockfile: /var/lock/launchpad-generate-ppa-htaccess.lock
2012-04-12 19:39:10 INFO Logging INFO and higher messages to /srv/launchpad.net/production-logs/lp_publish/generate-ppa-htaccess.log
2012-04-12 19:39:10 INFO Creating lockfile: /var/lock/launchpad-generate-ppa-htaccess.lock
2012-04-12 19:39:50 INFO Committing transaction...
2012-04-12 19:39:56 INFO Finished PPA .htaccess generation
}}}

(more context for log https://pastebin.canonical.com/64326/)

which seems to indicate that the script:
1) Started at 19:36:33 but didn't even start PPA.htaccess generation (perhaps lockfile existed? Need to check the lp code)
2) Restarted at 19:37:14 but didn't commit the transaction or finish the htaccess generation.
3) Restarted at 19:38:10 but didn't even start PPA .htaccess generation, before finally
4) Restarted at 19:39:10 and finished 46 seconds later.

Related branches

lp:~lifeless/launchpad/generate-htaccess-speed
Merged into lp:launchpad at revision 15337
Robert Collins (community): Approve on 2012-05-30
Jonathan Lange (community): Approve on 2012-05-30
lp:~wgrant/launchpad/faster-htpasswd-again
Merged into lp:launchpad at revision 15339
Steve Kowalik (community): Approve (code) on 2012-05-31
William Grant (wgrant) wrote :

That just shows it took 2.5 minutes to run. It started generating at 19:37:35, finished at 19:39:50. The others failed to start because there was already a previous one running. Probably relevant is that generate-ppa-htaccess.py is currently ioniced on production to prevent it from overloading germanium.

summary: - generate-ppa-htaccess failing multiple times before succeeding
+ generate-ppa-htaccess takes several minutes to run
Changed in launchpad:
status: New → Triaged
importance: Undecided → Critical
tags: added: ppa
Michael Nelson (michael.nelson) wrote :

Thanks for the quick response William. I'm not sure why you say "The others failed to start because there was already a previous one running." though - if you check the log (some context at https://pastebin.canonical.com/64326/) it *looks* pretty clear that the previous run finished at 2012-04-12 19:35:53 - what am I missing?

William Grant (wgrant) wrote :

germanium's load average fluctuates between around 6 and 21. You can see from other startups that it sometimes takes a minute or probably more to initialize:

2012-04-12 19:40:13 INFO Creating lockfile: /var/lock/launchpad-generate-ppa-htaccess.lock
2012-04-12 19:41:08 INFO Starting the PPA .htaccess generation

That falls near the end of a several-hour period of roughly twice normal load, so it was probably exceptionally bad.

Assuming you're referring to this particular bit:

2012-04-12 19:35:49 INFO Committing transaction...
2012-04-12 19:35:53 INFO Finished PPA .htaccess generation
2012-04-12 19:36:33 INFO Logging INFO and higher messages to /srv/launchpad.net/production-logs/lp_publish/generate-ppa-htaccess.log
2012-04-12 19:36:33 INFO Creating lockfile: /var/lock/launchpad-generate-ppa-htaccess.lock
2012-04-12 19:37:14 INFO Logging INFO and higher messages to /srv/launchpad.net/production-logs/lp_publish/generate-ppa-htaccess.log
2012-04-12 19:37:14 INFO Creating lockfile: /var/lock/launchpad-generate-ppa-htaccess.lock
2012-04-12 19:37:35 INFO Starting the PPA .htaccess generation

.. it seems unlikely that it managed to start in just 20s this time. It's more likely that 19:37:14's failed to obtain the lock, and the 19:37:35 message is in fact from the 19:36:33 run.

David Pitkin (dpitkin) on 2012-04-18
tags: added: ca-escalated
David Pitkin (dpitkin) on 2012-05-23
Changed in launchpad:
assignee: nobody → David Pitkin (dpitkin)
assignee: David Pitkin (dpitkin) → nobody
Michael Nelson (michael.nelson) wrote :

lifeless had some ideas about fixing this in LP here: https://irclogs.canonical.com/2012/05/24/%23ca-internal.html#t08:50

Jonathan Lange (jml) wrote :

Summarizing lifeless's ideas:

 * The latency is due either to cron nature or to it simply taking a long time to generate the .htaccess (perhaps because it's too large)
 * cron nature can be fixed by switching to some kind of rabbit / celery mechanism
 * generating .htaccess can be improved by
   * optimizing the code
   * teaching apache to directly query the database
   * having a Y adapter which looks first on disk and if the record is not there asks the DB [and this would get new users immediately, and updates to disk would keep overall load low]

He recommends first measuring how long it takes to do updates for our popular PPAs. That will tell us if we should poke first at cron nature or the htaccess generation.

Michael Nelson (michael.nelson) wrote :

Two obvious points for optimisation:

Currently generate_ppa_access:
 * loads a dict into memory of all the PPAs requiring updating *with* all their tokens
 * generates the new htaccess as a temporary file for each of these PPAs (regardless of whether it could potentially have just appended if no tokens were removed), and replaces the existing one only if they are different

Michael Nelson (michael.nelson) wrote :

Another possibility (depending on the outcome of the measurements of course) is switching to DBM user authentication:

http://stackoverflow.com/questions/6590789/linux-htpasswd-file-too-big-more-than-2000-users

Robert Collins (lifeless) wrote :

DBM is roughly equivalent to querying in realtime (in that we wouldn't need an a-priori listing of all users), but harder (because we would need a message passing system to keep it up to date). We can just query a fast webservice, I suspect, with a small cache. We've done this before.

@noodles Loading all the PPA tokens into memory isn't an issue, because a single run only ever updates a few minutes worth of PPAs. That said, having N queries per archive is fine, so your branch isn't a problem (even though it increases query count).

The current driver for exceptional times is the late evaluation of person, which I have fixed in the attached branch. We should see how that runs and then investigate further if needed.

William Grant (wgrant) wrote :

My additional optimisations have been landed and cowboyed, and this should be fast enough again for the medium term.

Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → Michael Nelson (michael.nelson)
tags: added: qa-needstesting
Changed in launchpad:
status: Triaged → Fix Committed

On Thu, May 31, 2012 at 1:01 AM, Robert Collins
<email address hidden> wrote:
> DBM is roughly equivalent to querying in realtime (in that we wouldn't
> need an a-priori listing of all users), but harder (because we would
> need a message passing system to keep it up to date). We can just query
> a fast webservice, I suspect, with a small cache. We've done this
> before.
>
> @noodles Loading all the PPA tokens into memory isn't an issue, because
> a single run only ever updates a few minutes worth of PPAs. That said,
> having N queries per archive is fine, so your branch isn't a problem
> (even though it increases query count).
>

Cool - thanks Robert. We weren't sure whether loading 21k tokens into
memory (for one PPA) would be an issue until the data collection
branch was finished, and so decided to push the branch forward without
knowing for sure. Glad it's sorted now.

Changed in launchpad:
assignee: Michael Nelson (michael.nelson) → nobody
William Grant (wgrant) on 2012-05-31
tags: added: qa-ok
removed: qa-needstesting
Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → William Grant (wgrant)
tags: added: qa-needstesting
removed: qa-ok
William Grant (wgrant) on 2012-05-31
tags: added: qa-ok
removed: qa-needstesting
William Grant (wgrant) on 2012-06-01
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers