Make one of the *buntu wiki's the "canonical" source to search engines

Bug #353400 reported by Matthew Nuzum
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Website - OBSOLETE
Won't Fix
Undecided
Unassigned

Bug Description

The kubuntu, ubuntu and edubuntu wikis each point to the same data but with different themes. This confuses search engines so sometimes we see one search result directing to the edubuntu wiki, another to kubuntu and another to ubuntu. Generally this appears to be quite arbitrary.

The original report:

I've noticed for a while that when Ubuntu wiki pages are returned in Google
results, it's very often a wiki.edubuntu.org URL rather than a
wiki.ubuntu.com one. This is surprising, since it seems to happen even when
"ubuntu" is one of the search terms.

This indicates to me that there may be a problem with how wiki.ubuntu.com is
indexed by Google.

Is anyone else seeing this, and do you have an idea as to what might be
causing it?

A follow up:

Yes, I encountered this yesterday with
<http://www.google.com/search?hl=en&q=%22Since+what+you+submitted+is+not+really+a+bug%22+-site%3Alaunchpad.net>.

Simpler examples (wiki.edubuntu.org version ranked first):
<http://www.google.com/search?q=64BitPIEDefaultSpec>
<http://www.google.com/search?q=CarInsurance%2Fnerskysmtovuukcmodsuftnq>

Counterexamples (wiki.ubuntu.com version ranked first):
<http://www.google.com/search?q=AimSixPreferencesInSQLiteMigrationTool>
<http://www.google.com/search?q=HandlingTooAggressivePowerManagement>
...
Though the text on these four wikis is similar, it's apparently not
similar enough for Google to consider the pages as variants (and thereby
trigger its "In order to show you the most relevant results, we have
omitted some entries very similar" message). So, each result gets ranked
independently based on its own incoming links.
...
I can think of three ways to fix this:

1. Retire wiki.edubuntu.org, redirecting it to wiki.ubuntu.com.

2. Link from <http://www.ubuntu.com/> to <https://wiki.ubuntu.com/>,
   thereby boosting the ranking of wiki.ubuntu.com pages in general.
   This might not be appropriate in itself, though, unless
   <https://wiki.ubuntu.com/> was rebranded as the "Ubuntu Developer
   Center" or similar.

3. Use the coincidentally-named rel="canonical" attribute

<http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html>
   in the templates for wiki.edubuntu.org and wiki.kubuntu.org, to
   specify that wiki.ubuntu.com/whatever is the canonical version of
   wiki.edubuntu.org/whatever or wiki.kubuntu.org/whatever. This would
   be undesirable for pages that really were Edubuntu- or
   Kubuntu-specific, though.

The email thread where this was discussed is at https://lists.canonical.com/archives/ubuntu-website/2009-March/thread.html#619

Matthew Nuzum (newz)
Changed in ubuntu-website:
status: New → Confirmed
Revision history for this message
Alan Bell (alanbell) wrote :

yeah, this can be done http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html
and I think could be implemented nicely with a moin macro so on pages where we always want a particular variant to be indexed we could include <<Canonical(edubuntu)>> somewhere and it would sort it all out. Some pages we probably don't care which one gets indexed and would normally end up with the wiki.ubuntu.com branding.

tags: added: light-moin-theme
Revision history for this message
Matt Zimmerman (mdz) wrote :

This seems to be getting worse, to the point where I often find that the wiki.ubuntu.com URLs aren't showing up at all on the first page.

Revision history for this message
Jonathan Carter (jonathan) wrote :

+1 on this, additionally, I'd be happy if we could drop the Edubuntu theme alltogether (it's outdated and not really necessary), it would also be nice if the wiki.edubuntu.org pages could redirect (with http) to the wiki.ubuntu.com pages so that when users paste links, it gets indexed correctly by the search engines and so that we don't have the same pages competing with each other for search results.

It also confuses users directly, ie: users will paste the same page from both URL's and say something like "hey, here are two great pages I found on the topic!" when it's actually just the same page.

Revision history for this message
Matthew Nuzum (newz) wrote :

Not enough hours in the day. This is a good idea but I just don't see it as a priority.

Changed in ubuntu-website:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.