update robots.txt to allow /lts/ubuntu-help/ indexing

Bug #1676780 reported by Peter Mahnke
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Online publishing of the Ubuntu documentation
Invalid
Undecided
Unassigned

Bug Description

From: Yandex.Search support <email address hidden>
Date: Tue, 28 Mar 2017, 07:14
Subject: The availability of the site https://help.ubuntu.com for crawler
To: webmaster <email address hidden>

Hello!

I’m a representative of the leading search engine in Russia, Yandex LLC. (http://www.yandex.com ). We think that the content of your site is very important and would be very useful for the users of our search engine system.

Our specialists have detected a problem: some pages of your website are disallowed in your robots.txt file (https://help.ubuntu.com/robots.txt ) with the following directives:
Disallow: /lts/ubuntu-help/

Could you please specify the reasons for this restriction? Is it possible to remove it so that our robot could index pages similar to https://help.ubuntu.com/lts/ubuntu-help/power-hibernate.html and add them to the search base in order to allow Yandex search engine users to search for them and visit your website?

Once your site has been indexed by Yandex, it will appear in relevant search results not only on yandex.com (with users all over the world), but also in our national searches (tens of millions of users from Russia, Ukraine, Kazakhstan and Belarus will be able to find your site). Yandex is Russia's largest and world's seventh largest search engine and web portal with a workday audience of more than 27,6 million unique visitors from all over the world.

--
Sincerely yours, Platon
Yandex customer support
http://company.yandex.com/

Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :

We publish the docs both for the latest Ubuntu release and previous but still supported releases, and many pages are identical or almost identical in all published releases. We found that some search engines were disinclined to index all versions of such pages, so we decided to point the search engines to the latest version of each page and disallow the other.

As an example we allow indexing of

https://help.ubuntu.com/stable/ubuntu-help/power-hibernate.html

but not of

https://help.ubuntu.com/lts/ubuntu-help/power-hibernate.html

or

https://help.ubuntu.com/14.04/ubuntu-help/power-hibernate.html

There is another rationale behind this measure: The maintenance of the documentation often happens with some delay, so the version of a page for the latest release is often the most accurate also for the latest LTS.

Hope that helps. In other words, the way we do it is intentional and not a bug, so closing.

Changed in help.ubuntu.com:
status: New → Invalid
Revision history for this message
Doug Smythies (dsmythies) wrote :

Gunner, Peter M:

I assume that Yandex isn't subscribed to this thread. Will one you reply to their e-mail or do you want me to?

Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :

Thanks for the heads up, Doug. I thought about it, but forgot; suppose it's Peter's call to decide.

(Do you agree on what I wrote, btw?)

Revision history for this message
Doug Smythies (dsmythies) wrote :

> (Do you agree on what I wrote, btw?)

Yes, absolutely. We discussed while making the robots.txt file changes and were in agreement.

Revision history for this message
Doug Smythies (dsmythies) wrote :

@Peter Mahnke: Did you ever write back to Yandex? If not, I will.

Revision history for this message
Doug Smythies (dsmythies) wrote :

I replied to Yandex about this.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.