Sitemap generator tries to write to the web root

Bug #914490 reported by Richard Mansfield
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mahara
Fix Released
High
Richard Mansfield

Bug Description

Shouldn't they go into dataroot instead?

If that's not going to work for some reason, we should disable the "generate sitemap" checkbox whenever the docroot is unwritable (99% of sites).

Also, the help file for "generate sitemap" is currently missing - if sitemap generation requires the admin to mess around with directory permissions, it would be good to tell them about that somewhere.

Tags: sitemaps
Revision history for this message
François Marier (fmarier) wrote :

I reckon that's a serious bug. Nothing should ever get written to the webroot, it should be owned by root and inaccessible to the application.

Changed in mahara:
importance: Low → High
milestone: none → 1.5.0
tags: added: sitemaps
Changed in mahara:
assignee: nobody → Richard Mansfield (richard-mansfield)
Revision history for this message
Richard Mansfield (richard-mansfield) wrote :

The first part of this is fixed in the patches here:

https://reviews.mahara.org/#q,status:open+project:mahara+branch:master+topic:dataroot-sitemaps,n,z

This will write the sitemap files to dataroot rather than docroot, makes sitemaps available from the download.php script, and changes the links in the sitemap index to use the new download.php urls. This at least fixes the errors in cron.

Two things still need to be done:

First, work out how to point search engines at the sitemaps. This could be done manually, but I think it may also be as simple as putting a "Sitemap:" line into robots.txt with the new download.php url (for the sitemap index) in it, but it will need to be tested on a crawlable site somewhere. If we were to add this line to the default mahara robots.txt, we need to check that things are ok when sitemap generation is off, because then the url will point to a file not found page when perhaps it should deliver an empty xml file.

Secondly, we need to add the help file in site options, and explain what the sitemap urls are, what would need to be done to submit sitemaps manually to a crawler, and maybe also something for the server administrator on how to configure the webserver to make the sitemaps appear at wwwroot/sitemap.xml, etc.

Changed in mahara:
status: Triaged → In Progress
Revision history for this message
Mahara Bot (dev-mahara) wrote : A change has been merged

Reviewed: https://reviews.mahara.org/1031
Committed: http://gitorious.org/mahara/mahara/commit/3ed83078d7bb5cc47e8f5f4725f1cfefd99b55f2
Submitter: Francois Marier (<email address hidden>)
Branch: master

commit 3ed83078d7bb5cc47e8f5f4725f1cfefd99b55f2
Author: Richard Mansfield <email address hidden>
Date: Thu Feb 2 10:11:08 2012 +1300

    Write sitemap xml files to dataroot, not docroot (bug #914490)

    The cron process can't write to the docroot, so we'll have to write
    sitemaps to dataroot and make them accessible another way.

    Change-Id: If3b09f7322f59beed9eba1ac3d6f63c667909dfa
    Signed-off-by: Richard Mansfield <email address hidden>

Revision history for this message
François Marier (fmarier) wrote :

I think these other things should be moved to a different bug. The main problem covered by this bug report (dataroot v. webroot) is now resolved.

Changed in mahara:
status: In Progress → Fix Committed
Revision history for this message
Kristina Hoeppner (kris-hoeppner) wrote :

The new bug for the sitemap help file is bug # 974855
The bug for the sitemap not being publicly available is bug # 979538

Melissa Draper (melissa)
Changed in mahara:
status: Fix Committed → Fix Released
Revision history for this message
Mahara Bot (dev-mahara) wrote :

Reviewed: https://reviews.mahara.org/1156
Committed: http://gitorious.org/mahara/mahara/commit/72ed0b00206c1f060b442f107694961c7b74d8f7
Submitter: Hugh Davenport (<email address hidden>)
Branch: master

commit 72ed0b00206c1f060b442f107694961c7b74d8f7
Author: Richard Mansfield <email address hidden>
Date: Thu Apr 12 16:00:30 2012 +1200

    Make download.php publicly accessible (bug #979538)

    In commit 647a99fd1025ceb748f8455b82418b03d788f9a8 (see bug #914490),
    the sitemaps were made available from the download.php script, but
    this script is not publicly accessible, so crawlers would not be able
    to download them.

    Making the script public is okay here, because whenever a non-sitemap
    file is requested, there is already an exception thrown if the user is
    not logged in.

    Change-Id: Ia9c62940ee7dada05f4f1b448ead0c146171535c
    Signed-off-by: Richard Mansfield <email address hidden>

Revision history for this message
Mahara Bot (dev-mahara) wrote :

Reviewed: https://reviews.mahara.org/1155
Committed: http://gitorious.org/mahara/mahara/commit/e6bdf9dc45b3bfb93baa7bb2dc5b1a904510f4d7
Submitter: Hugh Davenport (<email address hidden>)
Branch: 1.5_STABLE

commit e6bdf9dc45b3bfb93baa7bb2dc5b1a904510f4d7
Author: Richard Mansfield <email address hidden>
Date: Thu Apr 12 16:00:30 2012 +1200

    Make download.php publicly accessible (bug #979538)

    In commit 647a99fd1025ceb748f8455b82418b03d788f9a8 (see bug #914490),
    the sitemaps were made available from the download.php script, but
    this script is not publicly accessible, so crawlers would not be able
    to download them.

    Making the script public is okay here, because whenever a non-sitemap
    file is requested, there is already an exception thrown if the user is
    not logged in.

    Change-Id: Ia9c62940ee7dada05f4f1b448ead0c146171535c
    Signed-off-by: Richard Mansfield <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.