Comment 2 for bug 1735435

Revision history for this message
kaputtnik (franku) wrote :

HTTP 301 is fine. But linking to the wiki's main page isn't that good imho, because HTTP 301 is used for redirects. From my understanding a HTTP 301 should be returned if the address has changed, say an Article 'TranslationDutch' is moved to 'TranslDutch', but the content of the article is still valid and should be crawled. In this case this would be wrong, because we don't want the content to be crawled. But i do not understand much of this ;)

I think the right approach would be:

1. Add a DB row for model ARTICLE which defines a state of an article. A state would be something like 'up_to_date', 'needs_update', 'deleted', 'in_progress'

2. Deleted pages get two urls: wiki/trash/TranslationDutch (original content) and /wiki/TranslationDutch (contain a sentence that this article has been deleted and a link to wiki/trash/TranslationDutch)

3. Modify robots.txt to inform crawlers not to crawl wiki/trash/*

4. Modify the sitemap to not contain articles from wiki/trash

This way we could also easily remove articles in trash from the 'List of all pages' or create additional lists depending on the state of an article, e.g. show a list of articles that where outdatet and needs work.

I am not sure about point 2... always struggling with the urls :-D