use mirrorbrain for download managemenet instead of broken mirror-choosing by timezone

Bug #913235 reported by Jeff Johnson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
RPM
New
Undecided
Unassigned
Mandriva
Confirmed
Wishlist

Bug Description

geolocation tuning for package repositories

Tags: mageia repo
Revision history for this message
In , Lohmaier+mageia (lohmaier+mageia) wrote :

As mageia did not include
https://qa.mandriva.com/show_bug.cgi?id=56879

urpm still relies on the timezone to choose a mirror, and as there still is no working fallback in urpmi either, when that one mirror is not up-to-date or has some other failure, updating/installing a package will fail.

That's why I propose to drop the client-side mirror choosing altogether and use mirrorbrain instead.

Mirrorbrain itself doesn't need much resources, but has the benefit of

* checking mirrors whether they are up-to-date
* handing out the geographically closest mirror/mirror within the same ASN if possible, with the option to prefer powerful mirrors over weaker ones by giving the mirrors appropriate scores
* creation of mirrorlists
* managing of mirrors in once place
* can create torrents as well (as one part of bug#890)
* and as all links are passed through one instance, it is easier to create downloadstats (bug #2330)

With mirrorbrain, the problem with the installer also is gone (you can only give one specific URL when adding media in the installer, the installer is not aware of mirrorlist method)

Revision history for this message
In , Rdalverny (rdalverny) wrote :

Can this be deployed alongside the current setup?

Revision history for this message
In , Lohmaier+mageia (lohmaier+mageia) wrote :

yes - mirrorbrain is an apache module, the individual mirrors still would get the stuff via rsync, mirrorbrain is not involved in that part. All mirrors can be used individually as they are now. So when you got a server that can run apache & has the mirrored files, you got all it needs to set it up.

The only thing that needs to change woul be that the "mirrorlist" method would only return one single URL, namely the mirrorbrain one. (or if you want to play ultra, ultra safe, just provide the mirrorbrain url as additional mirror and advertise it on the webpage/forum/blogs instead of "forcing" it on users as an initial step, and do the switch of the mirrorlist method when you gained confidence in it)

LibreOffice as well as OpenOffice.org before for example also use mirrorbrain.

You got one single URL per download:

http://download.documentfoundation.org/libreoffice/old/3.4.4.1/mac/ppc/LibO_3.4.4rc1_MacOS_PPC_install_en-US.dmg

but depending on where the access is done from, the user will be redirected to the actual mirror. You can append a ".mirrorlist" to see what choice it did make, and what alternative mirrors you could use (as well as md5, sha1sum and some additional stuff)

http://download.documentfoundation.org/libreoffice/old/3.4.4.1/mac/ppc/LibO_3.4.4rc1_MacOS_PPC_install_en-US.dmg.mirrorlist

(SuSE also uses it for its mirroring btw)

For users who already have a manual mirror configured, nothing will change, this is still possible.

Mirrorbrain won't interfere with the distribution of files to the mirrors and won't remove the option to manually specify a mirror. You can do a smooth transition as described above (set it up and advertise mirrorbrain as experimental method, i.e. invite users to change their media sources to the mirrorbrain URL, and after some time change the mirrorlist definition to only include the mirrorbrain URL), but it really doesn't demand much ressources/you don't need to be afraid of it generating too much load.

Revision history for this message
In , Marja11 (marja11) wrote :

Sounds good :)

@ remmy

Just curious... WDYT?

Revision history for this message
In , Remco Rijnders (remco-p) wrote :

It sounds good to me. One thing I notice though is this in the FAQ on mirrorbrain:

  Is only HTTP supported?

  No — FTP mirrors are fully supported, in addition to HTTP. Furthermore, BitTorrent can be integrated via Metalinks.

  To scan mirrors for their content, rsync is used. It is the most efficient method for that purpose. However, if rsync isn't available on a mirror, FTP and HTTP can be used as fallback.

However, looking at http://mirrors.mageia.org/ it seems only about half of the mirrors offer rsync access. I'm wondering what the difference in "efficiency" is between these methods for both the mageia server as well as the mirror.

Revision history for this message
In , Lohmaier+mageia (lohmaier+mageia) wrote :

in terms of server load or server resources it doesn't really matter - but if it cannot use rsync to scan a mirror if it carries the files, mirrorbrain has to parse the returned html & has to manually iterate over all directories. So if you got hundreds of directories, hundreds of directory listings have to be parsed.

And while mirrorbrain knows about the "big" webserver implementations (apache, etc), there might be modified directory listings in use by some of your mirrors that could make mirrorbrain's html-parsing fail (and thus you might have to tweak mirrorbrain a little to accommodate for those mirrors)

Revision history for this message
In , Rdalverny (rdalverny) wrote :

That's about 50% of current mirrors that provide rsync. We can deploy this first for mirrors providing rsync and see later what we do.

Revision history for this message
In , Andre999mga (andre999mga) wrote :

(In reply to comment #6)
> That's about 50% of current mirrors that provide rsync. We can deploy this
> first for mirrors providing rsync and see later what we do.

Note that about 2/3 of the faster (>1 Gb/s) mirrors have rsync.

Having a setup that first downloads from a default mirror, and then uses rsync from other mirrors for any missing packages, would be efficient.

Note that downloading a single large package (such as Openoffice or Libreoffice) is not the same problem as downloading a multitude of relatively small packages, which would be typical of most Mageia downloads. Excepting ISO's, of course.

The geolocation of mirrorbrain could be useful for deciding the initial mirror. Just don't use the directory download if rsync is not available on this initial mirror.

Revision history for this message
In , Lohmaier+mageia (lohmaier+mageia) wrote :

to be clear on this: the scanning will have no impact on the regular user. the user will never be provided an rsync URL by mirrorbrain - mirrorbrain will hand out the http-URL (or in case of a ftp-only mirror the ftp-URL).

scanning is used to check "Ah, this mirror has file path/to/X", so when a user requests mirrorbrain.mageia.org/path/to/X it can use that mirror as a possible target.

What andré suggest is a different downloading system, in my opinion once again "too smart"/not worth the effort.
rpm packages in general are small, so using download techniques that request the very same file from different mirrors is pointless/waste of resources. Better to run downloads of multiple different files in parallel.
In the same thought using rsync to get a set of files from a single mirror also is not really that much of an improvement (always assuming the use case of installing updates, what is what mirrors are mostly used for - the numbers of packages to update is not that big usually, except maybe after initial installation) - if your line is fat, I'd rather use parallel downloads from multiple mirrors.

But having written all that: This is out-of-scope of this issue. Changing the download-intelligence would be a separate step, independent of setting up mirrorbrain.

Revision history for this message
In , Jeff Johnson (n3npq) wrote :
tags: added: mageia repo
Revision history for this message
In , Rdalverny (rdalverny) wrote :

(In reply to comment #9)
> tracked at https://bugs.launchpad.net/rpm/+bug/913235

Off-topic, but... what for?

Changed in mandriva:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
In , Jeff Johnson (n3npq) wrote :

for ROADMAP planning @rpm5.org. bugs == bugs wrto *.rpm packaging

Revision history for this message
In , Remco Rijnders (remco-p) wrote :

@Romain

Can we assign this to you or the webteam?

Revision history for this message
In , Rdalverny (rdalverny) wrote :

I won't be able to work on this for some time. You can assign it to webteam, it's left for someone to pick it up.

Revision history for this message
In , Marja11 (marja11) wrote :

(In reply to comment #13)
> I won't be able to work on this for some time. You can assign it to webteam,
> it's left for someone to pick it up.

Thx, assigning

Revision history for this message
In , Marja11 (marja11) wrote :

Hi,

This bug was filed against cauldron, but we do not have cauldron at the moment.

Please report whether this bug is still valid for Mageia 2.

Thanks :)

Cheers,
marja

Revision history for this message
In , Rdalverny (rdalverny) wrote :

This is not relevant to Cauldron only, changing product.

Changed in mandriva:
importance: Medium → Wishlist
Revision history for this message
In , Marja11 (marja11) wrote :

(In reply to comment #13)
> I won't be able to work on this for some time. You can assign it to webteam,
> it's left for someone to pick it up.

(In reply to comment #16)
> This is not relevant to Cauldron only, changing product.

@ Romain,

When the product was changed (btw, thanks for doing that), the assignee was automatically changed along with it. The assignee is now sysadmin-bugs. Do you want it to be assigned back to webteam, or to stay assigned to sysadminteam?

Revision history for this message
In , Thierry-vignaud (thierry-vignaud) wrote :

Anyway we need some support on web site pior to being able to test patches...

Revision history for this message
In , Rdalverny (rdalverny) wrote :

What support do you need from the web site? (or do you mean, support from an installed mirrorbrain instance?)

Revision history for this message
In , Thierry-vignaud (thierry-vignaud) wrote :

Yes.
@Christian Lohmaier: For geolocation vs timezone picking of mirrors, I think it would best to just do geolocation rather than asking one to manually pick its location.

Revision history for this message
In , Lohmaier+mageia (lohmaier+mageia) wrote :

@Thierry: Picking timezone as in the bug mentioned in the initial description has nothing to do with mirrorbrain.

Mirrorbrain does geolocation by IP, so depending on timezone is not necessary at all..

I did create the patch for https://qa.mandriva.com/show_bug.cgi?id=56879 because Mandriva (and Mageia) does *not* use mirrorbrain, but instead *does* depend on the timezone.

Currently, the $MIRRORLIST method uses the system's timezone-city as reference as to what mirror to use, and this is stupid, at least in central Europe (where there are many mirrors that are much closer than your country's capital city)

To make it clear: Mirrorbrain doesn't depend on any user-configured stuff. It decides based on the IP that is used what mirror to chose. It does geolocation by IP.

Ordered from worse to best:
* location reference point is taken from the timezone (urpmi as it is now)
* user has the option to manually specify his actual location (urpmi with patch, patch is available)
* user doesn't have to bother, but closest mirror is assigned by having the downlaod-server examine the IP that is used to connect (mirrorbrain on server, no patch to urpmi necessary, but no mirrorbrain installed yet)

If you meant that urpmi should query the geolocation on the user's machine and use that instead, this would be an alternative method, but of course without the other benefit that mirrorbrain would bring. And this way would also require Mageia to setup an appropriate service that would return the location on request (or you would have to require a geolocation-database package to installed on the user's system, not very nice....)

Revision history for this message
In , Thierry-vignaud (thierry-vignaud) wrote :

That's exactly what I wrote.
Using mirrorbrain is totally orthogonal and has nothing to do with the patches you posted for now

Revision history for this message
In , Pascal Terjan (pterjan42) wrote :

*** Bug 11454 has been marked as a duplicate of this bug. ***

Revision history for this message
Per Øyvind Karlsen (proyvind) wrote :

@Lohmaier, urpmi is doing user side geolocalization based on user selected timezone, removing the need for carrying additional geoip databases on user side.

What one actually doesn't get by not using mirrorbrain is a neat mirroring utility that can be used together with urpmi while at it, rather than the current mirror api that urpmi is using.

FWIW mirrorbrain support has just recently been introduced to original upstream branch of urpmi.. :)

Revision history for this message
In , Luigiwalser (luigiwalser) wrote :

Note that Per Øyvind Karlsen has implemented mirrorbrain support in urpmi in his branch, which may be of some use.

Revision history for this message
In , Marja11 (marja11) wrote :

Adjusting summary, there's more than one reason to wish for it

Revision history for this message
In , Neal Gompa (ngompa13) wrote :

We also probably want to have an automatic redirector like Fedora's download.fedoraproject.org site, which automatically redirects to the best mirror that can provide a given directory/file.

For example: https://download.fedoraproject.org/pub/fedora/linux/releases/23/Workstation/x86_64/iso/Fedora-Workstation-netinst-x86_64-23.iso

The above link will redirect to any one of Fedora's 400+ mirrors that offer ISOs that provides a high quality connection to me (low latency and high throughput). In my case, it automatically redirected me to: http://mirror.chpc.utah.edu/pub/fedora/linux/releases/23/Workstation/x86_64/iso/Fedora-Workstation-netinst-x86_64-23.iso

This redirector applies to anything replicated out to mirrors, and can be used to offer an automatic mirror director in a way that is transparent to the tool. This can be used as the repo URL in urpmi, for instance.

For example, https://download.mageia.org/mageia/distrib/cauldron/x86_64/ could be in /etc/urpmi/mediacfg.d/Devel-6-x86_64/url, which would automatically redirect to the correct mirror. In my case, that is http://mirrors.kernel.org/mageia/distrib/cauldron/x86_64/. From there, urpmi should work as intended.

Revision history for this message
In , Filip-komar (filip-komar) wrote :

(In reply to Neal Gompa from comment #26)
> We also probably want to have an automatic redirector like Fedora's
> download.fedoraproject.org site, which automatically redirects to the best
> mirror that can provide a given directory/file.

We already use a simple download redirector[0] for almost all our ISO files[1] and for pdf and epub files doc[2] files.
But when the mirrorbrain infrastructure will be in place and well tested I'll do my best to implement it on mentioned web pages.
Current redirector is working well now but it relies on generated lists[3][4]. Refresh of those lists[5] is triggered manually.

[0] http://gitweb.mageia.org/web/www/tree/en/downloads/get/index.php
[1] http://www.mageia.org/en/downloads/get/index.php?q=Mageia-5-i586-DVD.iso
[2] https://www.mageia.org/doc/
[3] http://gitweb.mageia.org/web/www/tree/lib/cached.list.php
[4] http://gitweb.mageia.org/web/www/tree/lib/cached.list_doc.php
[5] http://gitweb.mageia.org/web/www/tree/tools/update-mirrors-list.php

Revision history for this message
In , Lohmaier+mageia (lohmaier+mageia) wrote :

(In reply to Neal Gompa from comment #26)
> We also probably want to have an automatic redirector like Fedora's
> download.fedoraproject.org site, which automatically redirects to the best
> mirror that can provide a given directory/file.

mirrorbrain does that already. by geolocation and by optional giving custom weight to mirrors/limiting mirrors to only serve a specific region or network-block/ASN.

> This redirector applies to anything replicated out to mirrors,

No, it applies to anything, whether mirrored or not. just add something to the URL and you will still be redirected. This is difference to mirrorbrain. Mirrorbrain only redirects to mirrors known to have the file, and doesn't redirect on directory listing.

> For example, https://download.mageia.org/mageia/distrib/cauldron/x86_64/
> could be in /etc/urpmi/mediacfg.d/Devel-6-x86_64/url,

Same when mirrobrain is used.

if urpmi knows to follow redirects, then there's no difference.

I'd prefer being able to browse the representative/official directory listing on download.mageia.org and only be redirected for actual downloads, than being instantaneously redirected to a mirror where I cannot tell whether it is up-to-date/when it synced last/whether it has all files.

In this regard mirrorbrain is "superior", as you have one definite filelisting that represents the state of the repository, no matter whether all mirrors did sync already or not. If a mirror doesn't carry the requested file, you won't be redirected to it. This solves the "repository-info updated, but rpms not synced yet problems", and any other random sync failures that mirrors encounter.

see e.g. http://download.documentfoundation.org/libreoffice/stable/5.1.1/rpm/x86_64/LibreOffice_5.1.1_Linux_x86-64_rpm.tar.gz (TDF uses mirrorbrain) - that url redirects to a specific mirror, but still allows you to browse http://download.documentfoundation.org/libreoffice/stable/5.1.1/rpm/x86_64/
If you want to manually pick a mirror, you use the "Details" link in the listing, or just append ".mirrorlist" to the file's URL:
http://download.documentfoundation.org/libreoffice/stable/5.1.1/rpm/x86_64/LibreOffice_5.1.1_Linux_x86-64_rpm.tar.gz.mirrorlist
→ that will not only show various hashes (that you could also request by e.g. appending .sha256 for the sha256sum: http://download.documentfoundation.org/libreoffice/stable/5.1.1/rpm/x86_64/LibreOffice_5.1.1_Linux_x86-64_rpm.tar.gz.sha256 ), links to torrents (if enabled, TDF also runs a tracker, torrents have some mirrors as webseeds, so even if tracker is down or there are no seeds BitTorrent clients can download - and as you might have guessed, if you append .torrent to the file url, you get the torrentfile)
It shows (a selection of) suitable mirrors (as determined by your request's geoIP data) you can pick from.

Revision history for this message
In , Lohmaier+mageia (lohmaier+mageia) wrote :

oh, and I forgot: the redirect response not only contains the url to the mirror, but also contains metadata - just try a

curl -I http://download.documentfoundation.org/libreoffice/stable/5.1.1/rpm/x86_64/LibreOffice_5.1.1_Linux_x86-64_rpm.tar.gz

not only get you the redirect to a matching mirror, you also get a list of alternative mirrors, in case the first one has a power outage or is down for maintenance or is not reachable for some random problem.
And also the hashes are included (base64 encoded)

and you can go fancy by selecting what type you want by using accept headers like e.g.

curl -H "Accept: application/metalink+xml" http://download.documentfoundation.org/libreoffice/stable/5.1.1/rpm/x86_64/LibreOffice_5.1.1_Linux_x86-64_rpm.tar.gz

(will not redirect, but instead get you metalink xml - same as you'd get if you had requested file.metalink)

Revision history for this message
In , Thomas Backlund (tmb) wrote :

Just a small update...

I now have a mirrorbrain running on dl.mageia.org
(but I haven't pushed it on public dns yet as I want to review my changes after some sleep to verify I haven't missed anything...)

for now I only use pseudo tree on dl.mageia.org until I get extra disks installed

Revision history for this message
In , Olav Vitters (ovitters) wrote :

I've just obsoleted MirrorBrain in Cauldron as it hasn't been updated (or developed) since 2015. The alternative seems to be MirrorManager2, by Fedora infrastructure team. Either this bug needs to be updated or WONTFIX and a new bug.

Revision history for this message
In , Fri-5 (fri-5) wrote :
Revision history for this message
In , Fri-5 (fri-5) wrote :

Kind of duplicate:

Bug 17400 - Deploy and configure MirrorBrain to manage mirrors and generate metalinks for DNF

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.