Launchpad may be polling Apache SVN server too much

Bug #327126 reported by Chris Jones
22
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Tim Penhey

Bug Description

It appears that the Canonical netblock (91.189.88.0/21) is unable to access svn.apache.org:80.
The immediate assumption for this would be that they've firewalled us, which could be explained by Launchpad code mirroring talking to their servers more frequently than they consider to be reasonable[0].

Are we able to specifically limit the frequency with which we will poll *all* branches on a given server (since they host a number of projects, we are presumably polling quite a lot of branches on svn.apache.org)?

Separately from that, we presumably now also need a dialogue with the apache.org maintainers to re-open the port.

[0] http://www.apache.org/dev/version-control.html#poll

Revision history for this message
James Troup (elmo) wrote :

If you guys could look into whatever we are (or are trying) to import from apache.org, that'd be great. In the meantime, I've contacted one of the SAs for apache.org to see if they can tell us why it's blocked.

Changed in launchpad:
importance: Undecided → High
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Yes, it seems likely that we're connecting a bit more often than once an hour, as we are currently trying to import 12 projects once every 6 hours, not to mention that cscvs doesn't use persistent connections:

https://pastebin.canonical.com/13707/

(a review_status of 20 means active in this context).

I was aware that the apache admins had a propensity to get upset over this, and _I_ have avoided approving imports from there. I guess I didn't pass this information on as well as I could have done to others who review imports.

"update codeimport set review_status = 30 where svn_branch_url like '%apache%'" will stop all import attempts.

If you can convince the admins that connection count is not a very good measure of the load we're placing on the server, that would be good (cscvs makes lots of connections, but they're all very small/cheap), I had a hard time trying to get the admin I spoke to to even consider this.

If we want to import from svn.apache.org (and we probably do) maybe we can keep a svnsync-ed copy of the repo somewhere? I imagine it takes up rather a lot of disk though...

Revision history for this message
James Troup (elmo) wrote :

They claimed to have been seeing 500K hits/day before they blocked us. Does that seem plausible given the number of projects we're trying to import? I'm still waiting for further details from them, as so far they haven't been able to give me an IP or any sort of timeline.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Yes, that number is plausible. Code imports would have only been from two ips, nice of them to block all of our machines in response to this :/

Jonathan Lange (jml)
Changed in launchpad-bazaar:
status: New → Triaged
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Now that bzr-svn is being used for code imports we shouldn't need that much hits anymore - just one or two TCP connections per Subversion branch that we import (bzr-svn does try to use persistent connections but needs two concurrent connections sometimes).

Since there's quite a few branches in their repository and they ask that scripts only pull at most once an hour, perhaps we need to consider maintaining a local clone of their svn repository to work from, which we sync from them on an hourly basis.

As discussed with mwhudson on IRC, this would require:
* setting up the mirror and adding a cron job that keeps it up to date using svnsync
* having a way in the importer scripts to automatically rewrite http://svn.apache.org/repos/asf -> local mirror url

The initial clone of their svn repo would probably take about a week or so given the size of the repo (almost 1M revisions).

We'd need to contact them either way to get the IP block removed.

Since this would be quite a bit of work, I guess the question is how important these mirrors are?

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

FWIW, a WAG is that the repo takes up about 30 gigs, so big, but not THAT big.

tags: added: code-import
Tom Haddon (mthaddon)
tags: added: canonical-losa-lp
Revision history for this message
Tim Penhey (thumper) wrote :

This is now resolved. I've talked with the apache admins and we have been unblocked. We should however make sure that all subversion imports for apache are using bzr-svn.

Changed in launchpad-code:
status: Triaged → Fix Released
assignee: nobody → Tim Penhey (thumper)
milestone: none → 10.05
Curtis Hovey (sinzui)
visibility: private → public
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.