# This patch file, files created by its use and not subject to other copyright # and any changes in other files generated by its use are # Copyright (C) 2000,2001,2002,2003 by the Free Software Foundation, Inc. # 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA # This patch file is Free Software and permission is granted to copy and # redistribute it in original or modified form under the terms of the # GNU General Public License (GPL). # See the GPL COPYING files accompanying Mailman distributions this # patch was intended to work in conjunction with for further details. diff -r -u -P mailman-2.1-index/INSTALL mailman-2.1-htdig/INSTALL --- mailman-2.1-index/INSTALL Tue Dec 24 16:24:21 2002 +++ mailman-2.1-htdig/INSTALL Thu Jan 2 15:09:00 2003 @@ -437,6 +437,11 @@ -c option to mmsitepass to set this. + - If you want to use htdig for searching your mail archives using + the Mailman-htdig integration developed by Richard Barrett + (r.barrett@ftel.co.uk) then see the instructions in + INSTALL.htdig-mm. + 6. Getting started See the README file under the section "CREATE YOUR FIRST LIST" for diff -r -u -P mailman-2.1-index/INSTALL.htdig-mm mailman-2.1-htdig/INSTALL.htdig-mm --- mailman-2.1-index/INSTALL.htdig-mm Thu Jan 1 01:00:00 1970 +++ mailman-2.1-htdig/INSTALL.htdig-mm Wed Jan 22 11:57:24 2003 @@ -0,0 +1,1133 @@ +History +Introduction +Installing and Building Mailman with this patch +What is Installed by the Patch +Configuration of Mailman-htdig Integration + Health Warning on the packet! + Starting from Scratch (Again) + General + htdig Permissions Considerations + Local htdig Configuration + Remote htdig Configuration + Upgrading an Existing Standard Mailman Installation + Changing from local to remote htdig or vice versa + Coping with htdig Upgrades + Changing the Addressing Scheme of your web_page_url +Operational Information +Notes and Warnings +Contributors +Appendices + Appendix 1 -Technique for htdigging when Mailman's DEFAULT_URL_PATTERN uses +the https scheme + +Prerequisites +============ + +Prior to installing this patch you should also have installed the patch +that provides enhanced indexing of Mailman archives see: + +http://sourceforge.net/tracker/index.php?func=detail&aid=444879&group_id=103&atid=300103 + +If you are using vanilla MM 2.1 final you will also need to apply: + +http://sourceforge.net/tracker/index.php?func=detail&aid=668685&group_id=103&atid=300103 + +You must have a working installation of htdig with htsearch available +via CGI on your HTTP server installed on either the machine on which you +are running Mailman or on another machine which has access to Mailman +list archives via NFS or some similarly competent network file sharing +scheme. + +Regardless of how you configure things to provide Mailman's Web UI, if +its gives normal operation of the /mailman/private CGI script for +providing access to private list archives, it should also support access +to htdig search results via the /mailman/htdig CGI script. + +Compatibility +============= + +htdig-2.1-0.3.patch - Mailman 2.1 + +htdig-2.1-0.2.patch - Mailman 2.1 + +htdig-2.1-0.1.patch - Mailman 2.1 + +htdig-2.1b6-0.1.patch - Mailman 2.1b6 + +htdig-2.1b5-0.1.patch - Mailman 2.1b5 + +htdig-2.1b4-0.1.patch - Mailman 2.1b4 + +htdig-2.1b3-0.3.patch - Mailman 2.1b3 + +htdig-2.1b3-0.2.patch - Mailman 2.1b3 + +htdig-2.1b3-0.1.patch - Mailman 2.1b3 + +htdig-2.1b2-0.1.patch - Mailman 2.1b2 + +htdig-2.0.13-0.2.patch - Mailman 2.0.13 + +htdig-2.0.13-0.1.patch - Mailman 2.0.13 + +htdig-2.0.12-0.1.patch - Mailman 2.0.12 + +htdig-2.0.11-0.1.patch - Mailman 2.0.11 + +htdig-2.0.10-0.2.patch - Mailman 2.0.10 + +htdig-2.0.10-0.1.patch - Mailman 2.0.10 + +htdig-2.0.9-0.1.patch - Mailman 2.0.9 + +htdig-2.0.8-0.1.patch - Mailman 2.0.8, 2.0.7, 2.0.6 and probably 2.0.3, +2.0.4 and 2.0.5 + +History +======= + +Previous versions - original versions of this patch provided most of the +features described here with the main exception being support for remote +htdig, that is running htdig on a different system to Mailman. They were +also baked in some configuration assumptions, which are now +configurable. + +htdig-2.1-0.3.patch - latest version: + + 1. corrects errors in the way $prefix/Mailman/htdig.py worked out + content type of file being returned. + + 2. $prefix/Mailman/htdig.py adopts revised method for establishing + the default URL introduced in 2.1 and as used in + $prefix/Mailman/MailList.py + + 3. removed unecessary setup of variable DEFAULT_URL in cron scripts + $prefix/cron/remote_nightly_htdig_noshare and + $prefix/cron/remote_nightly_htdig.pl + + 3. Changes references to DEFAULT_URL in this document to + DEFAULT_URL_PATTERN. + +htdig-2.1-0.2.patch + + 1. improved content type and security handling in + $prefix/Mailman/htdig.py. Fixes bug with htdig.py and problem + of interaction with bug in $prefix/scripts/driver script (see + patch #668685 for more details) + +htdig-2.1-0.1.patch + + 1. Reworked patch for compatibility with MM 2.1. + +htdig-2.1b6-0.1.patch + + 1. Reworked patch for compatibility with MM 2.1b6. + +htdig-2.1b5-0.1.patch + + 1. Reworked patch for compatibility with MM 2.1b5. + +htdig-2.1b4-0.1.patch + + 1. Reworked patch for compatibility with MM 2.1b4. As a consequence, + the remainder of the mailman-htdig integration templates that were + strings declared in Mailman/Archiver/HyperArch.py have been extracted + into files under the templates directory. Edit these with care if you + must. + +htdig-2.1b3-0.3.patch + + 1. Removed unecessary code dependency on Python 2.2 file() function + +htdig-2.1b3-0.2.patch + + 1. Removed syntax error in htdig-2.1b3-0.1.patch which showed up as + up as logged errors in the operation of the ArchRunner qrunner at + line 721 of HyperArch.py + +htdig-2.1b3-0.1.patch + + 1. Reworked patch for compatibility with MM 2.1b3 + + 2. Removed non-English language template files which were acting as +placeholders until someone actually translated them. + + 3. Removed updateTOC.py and replaced it with an alternate mechanism +in a patch to $prefix/Mailma/Queue/ArchRunner.py to update list TOC +page after reindexing by htdig. This new method is only exercised when +the remote_nightly_htdig series of cron scripts are used. + + 4. Changes to remote_nightly_htdig series of cron scripts to +reflect demise of updateTOC cgi script. + + 5. Multiple instances of code hygiene and conformance to MM +"standards" cleanup. + + 6. Tidied up this documentation. + +htdig-2.1b2-0.1.patch: + + reworked patch for compatibility with MM 2.1b2 + +htdig-2.0.13-0.2.patch: + + 1. Added license header + +htdig-2.0.13-0.1.patch: + + 1. Rebuilt patch to get no-comment application on Mailman 2.0.13 + +htdig-2.0.12-0.1.patch: + + 1. Rebuilt patch to get no-comment application on Mailman 2.0.12 + + 2. Added HTDIG_EXTRAS facility to allow arbitrary htdig +configuration parameters to be specified for addition to every +htdig.conf file created i.e. site wide additions. See comments below +on the use of HTDIG_EXTRAS. + +htdig-2.0.11-0.1.patch: + + 1. No substantive change. Simply rebuilt patch to get no-comment +application on Mailman 2.0.11 + +htdig-2.0.10-0.2.patch: + + 1. Python 2.2 compatibility fixes to nightly_htdig cron script and +its relatives. Doing import * inside a function removed. + + 2. Added note on potential problems with htdig and file permissions. + +htdig-2.0.10-0.1.patch: + + 1. change in src/Makefile.in to get clean patch application to MM +2.0.10 + +htdig-2.0.9-0.1.patch: + + 1. minor cosmetic changes to get clean patch application to MM 2.0.9 + +htdig-2.0.8-0.1.patch: + + 1. resolves a problem with the integration of htdig when the +web_page_url for a list, which is usually the same as DEFAULT_URL +from either $prefix/Mailman/Defaults.py or +$prefix/Mailman/mm_cfg.py, doesn't use the http addressing scheme. +This arises because htdig will only build indices if the URLs for +pages use the http addressing scheme. There is a work-around for +this problem posted in htdig's mail archives - see the copy in +Appendix 1 to this document. + + 2. This patch revision implements the solution documented in that +e-mail. If non-http URLs are used by the web_page_url of a list an +additional htdig configuration file for use by htsearch is +generated. + + 3. In all other respects the operation of the Mailman-htdig +integration remains unchanged. There is no benefit in upgrading to +this revised patch unless you need to use other than http addressing +in your DEFAULT_URL or set other than http addressing in the +web_page_url configuration of any of your lists. + + 4. If changing to or from a non-http addressing scheme then the per +list htdig config files of the lists affected and their associated +htdig indices must be reconstructed. See the section below entitled +'Changing the Addressing Scheme of your web_page_url' for details of +how to do this. + +htdig-2.0.6-0.3.patch: + + 1. adds support for remote htdig, that is: running htdig on a +different system to Mailman. + + 2. enhances the configurability of the integration. Some of the +programmed assumptions made in previous versions are now +configurable in mm_cfg.py. The configuration variables concerned +default to the previous fixed values so that this version is +backwards compatible with earlier versions. + + 3. does some minor cosmetic code changes. + + 4. extends the associated documentation. + +Introduction +============ + +This integration enables use of the htdig (http://www.htdig.org) search +engine for searching mail list archives produced by pipermail, Mailman's +built-in archiver. + +You can use htdig without applying these patches to Mailman but you may +find it awkward to achieve some of the features offered by this patch. + +The main features of the patch are: + + 1. per list search facility with a search form on each list's TOC +page. + + 2. maintenance of privacy of private archives. The user has to +establish their credentials via the normal private archive access +mechanism before any access via htdig is allowed. + + 3. a common base URL for both public and private archive access via +htsearch results. This means that htdig indices are unaffected by +changing an archive from private to public and vice versa. All +access to archives via htdig is controlled by a wrapped CGI script +called htdig.py. + + 4. Choice of running htdig on the machine running Mailman (aka local +htdig) or running htdig on another machine which has access to +Mailman's archives via NFS or some similarly competent network file +sharing scheme (aka remote htdig). + + 5. cron activated scripts and crontab entry to run htdig regularly +to maintain the per list search indices. + + 6. automatic creation, deletion and maintenance of htdig +configuration files and such. Beyond installing htdig and telling +Mailman where it is via mm_cfg you do not have to do much other +setup. + +Installing and Building Mailman with this patch +============================================== + +Create your Mailman build directory in the normal way. + +You can apply the patch to either a fresh expansion of the Mailman +source distribution or the one you used to build a currently working +Mailman installation. + +Execute the following command in the Mailman build directory: + + patch -p1 < path-to-htdig-2.1b4-0.1.patch + +Follow the configure and make procedures for regular Mailman as given in +the $build/INSTALL file. + +Then follow the Mailman-htdig configuration instructions given below. + +What is Installed by the Patch +============================== + +The patch amends: +---------------- + +$build/INSTALL + + Adds a reference to this file to the standard installation notes. + +$prefix/Mailman/Archiver/HyperArch.py + + The changes in this file set up the per list htdig stuff such as +config files and adds the search forms to the list TOC pages. + +$prefix/Mailman/Queue/ArchRunner.py + + The changes in this file rewrite a list's TOC page if, when +archiving a new message for the list, the update time of the list's +TOC page are after the last time that rundig was last run. This is +is only of relevance when one of the remote_nightly_htdig series of +cron scripts (see below) is being used. + + The only deficiency with this approach is that if no message is sent +to the list after rundig is run for the list the TOC page is not +rewritten to reflect that rundig was run. + +$build/Mailman/Defaults.py.in + + Adds the default configuration variables needed to support the +mailman-htdig integration + +$build/cron/crontab.in.in + + Adds the nightly_htdig cron script to the default crontab + +$build/configure +$build/configure.in +$build/Makefile.in +$build/cron/Makefile.in +$build/src/Makefile.in +$build/bin/Makefile.in + + Necessary changes to configuration and Makefiles used for installing +Mailman + +The patch adds: +-------------- + +$build/INSTALL.htdig-mm + + This file. + +$prefix/cgi-bin/htdig +$prefix/Mailman/Cgi/htdig.py + + these are a CGI script and its wrapper, which is always on the path +of URLs returned from searches of htdig indices. The script provides +secure access to such URLs in the same way that the +$prefix/cgi-bin/private and $prefix/Mailman/Cgi/private.py. htdig.py +ensures private archives are kept private, applying the same +criteria for permitting access as private.py, and delivering +material from public archives without demanding any authentication. + +$prefix/bin/blow_away_htdig + + this is a utility script for removing per list htdig data, e.g. the +config file and indices/db files. This is necessary when: + + a. ceasing use of the Mailman-htdig integration + + b. moving from local to remote htdig or vice-versa + + c. upgrading to a version of htdig which has an incompatible +index/db file format + + d. changing the addressing scheme (http versus https) in the +web_page_url configuration variable of a list + +$prefix/cron/nightly_htdig +$prefix/cron/remote_nightly_htdig +$prefix/cron/remote_nightly_htdig_noshare +$prefix/cron/remote_nightly_htdig.pl + + These scripts all do the same thing; they can be installed as a cron +task and run regularly to invoke htdig's rundig script to update +mailing list search indices. Only one of these scripts is used, the +choice of which depending on your system configuration. + + nightly_htdig is used where Mailman and htdig run on the same +system. + + the remote_... scripts are used where Mailman and htdig live on +different systems. You choose which one suits your needs best: + + remote_nightly_htdig uses the same python files on both systems, +that is the same .py and .pyc files are accessed, and it hence +depends on compatible bytecode between the Mailman system and +htdig system. It also accesses Mailman data files and depends on +compatibility of data files contents, for example pickled python +values. This should work OK if the same version of python is +being run on both systems even where the systems are not +heterogeneous, for example one is Sun/Solaris and the other is +PC/Linux. + + remote_nightly_htdig_noshare shares no python files between the +two systems. While it is still written in python it but acquires +information from the file system using directory listings and +stat operations. + + remote_nightly_htdig.pl is a rewrite of +remote_nightly_htdig_noshare in Perl. It is for use where the +htdig system does not have python available on it: in which +case, shame on you. + +$prefix/templates/en/TOC_htsearch.html +$prefix/templates/en/htdig_access_error.html +$prefix/templates/en/htdig_auth_failure.html +$prefix/templates/en/htdig_conf.txt + + These are English language templates special to the htdig +integration: + + TOC_htsearch.html - the HTML of the search form that is embedded +in a list's archive TOC page. + + htdig_access_error.html - HTML page returned by htdig.py in the +event of an access error for a page access. + + htdig_auth_failure.html - HTML page returned by htdig.py in the +event of an authentication error for a page access. + + htdig_conf.txt - template for the per list htdig.conf files +generated by the patched code. + +Configuration of Mailman-htdig Integration +========================================== + +Configuration of the Mailman-htdig integration is carried out on the +Mailman side. While you must have to hand some information about your +htdig installation, you should not have to tinker with htdig for the +integration to work. + +Most of the configuration of the integration is done by values assigned +to python variables in either $prefix/Mailman/Defaults.py or +$prefix/Mailman/mm_cfg.py. + +If you opt to run htdig on a different machine or under a different HTTP +server to the one running the HTTP server which provides Mailman's Web +UI you will also have to edit whichever of the patch's three htdig +related cron scripts you opt to run (remote_nightly_htdig, +remote_nightly_htdig_noshare, or remote_nightly_htdig.pl) to add a small +amount of configuration information. + +Health Warning on the packet! +----------------------------- + +Be careful when editing configuration information in +$prefix/Mailman/mm_cg.py: the only Mailman config file you should be +editing. Check, double check and then recheck before going ahead. If you +get either variable names or their values wrong a lot of confusion in +the operation of both Mailman and htdig can result. + +You (and others supporting you) can spend hours trying to identify +problems and looking for non-existent bugs as a consequence of such +editing errors. Expect to find errors in these instructions; compensate +for them and tell me when you do (r.barrett@ftel.co.uk). + +Also do read the htdig documentation, release notes etc. This patch +integrates a working htdig with htsearch available through CGI. These +notes are about Mailman and integrating it with that working htdig. It +is up to you to sort out the htdig end of things. + +Starting from Scratch (Again) +----------------------------- + +This is getting ahead of things but some of you may already be asking +"What if I've already been using an older version of this patch and want +to start afresh", or "I want to change from local to remote htdig or +vice versa" + +In these cases your friend will be the $prefix/bin/blow_away_htdig +script. It removes existing htdig related stuff out of your Mailman +installation to the extent that it was added by this patch and added to +by the normal operation of pipermail and nightly_htdig. With that +removed and a revised Mailman configuration, the patched code will start +rebuilding the htdig data. + +But before you get carried away with blow_away_htdig, read the rest of +these notes. + +General +------- +This patch adds a number of default variables to the file +$prefix/Mailman/Defaults.py that affect operation of the Mailman-htdig +integration. These are in addition to the standard Mailman defaults in +that file. If, in the light of what is said below, you decide any of +these are incorrect, you can override them in $prefix/Mailman/mm_cfg.py +[NOT IN Defaults.py! See the comments in Defaults.py for details]. + +By default the Mailman-htdig integration is NOT ENABLED by the +installation of this patch; a default variable in Defaults.py turns off +the operation of the integration. You have to actively override that +default in mm_cfg.py to turn on operation of the integration. + +Once a list is created, changing most of these variables will have +either no effect or a bad effect. You will need to run +$prefix/bin/blow_away_htdig script and/or $prefix/bin/arch to rebuild +the archive pages if you make significant changes to the Mailman-htdig +integration configuration variables. + +The install process will not overwrite an existing mm_cfg.py file so you +can freely make changes to this file. If you are re-installing a later +version of this patch you may have to change what is already configured +in the existing file and, if necessary, add extra configuration +variables to it. + +Most of the Mailman-htdig control variables default to sensible values +which you will not need to change, especially if you are using local +htdig. The semantics of most variables apply to both local and remote +htdig operation but with some the values assigned will depend on whether +htdig is viewing things from the same or a remote machine. + +The first two variables control what is indexed by htdig. The values +assigned are both embedded in the HTML generated by pipermail in the +list archives and added. Changing the values of these variables will +mean that all previously generated HTML pages in list archives will be +out of date and you will probably want to rebuild existing archives +using $prefix/bin/arch: + +ARCHIVE_INDEXING_ENABLE + + defines a string telling htdig that it should look at the following +material when building it indices. + + Default: ARCHIVE_INDEXING_ENABLE = '' + +ARCHIVE_INDEXING_DISABLE + + defines a string telling htdig that it not should not look at the +following material when building it indices. + + Default: ARCHIVE_INDEXING_DISABLE = '' + +USE_HTDIG - Semantics 0 - don't use integrated htdig, 1 - use it + + turns Mailman-htdig integration on or off. + + Defaults: USE_HTDIG = 0 + + Notes: + + 1. when USE_HTDIG is turned on the patched code in Mailman will +start adding htdig stuff for any archiving-enabled mail lists as new +posts for eachlist are handled by Mailman. Until a new post is made +after enabling with USE_HTDIG an existing mail list's archive will +not be htdig searchable. When the new post is handled: + + a. the list's personalised htdig config file is created + + b. necessary links to the htdig config file are created + + c. a search form is added to the TOC page for the list + + Even with this done, htdig searches only become available when +htdig indices are constructed. This is done when one or other of +the patch's htdig related cron scripts are run (nightly_htdig, +remote_nightly_htdig, remote_nightly_htdig_noshare, or +remote_nightly_htdig.pl, depending on how you configure your +system). These can be run from the command line ahead of their +scheduled cron time to get htdig searches operational. + + 2. Turning USE_HTDIG off will not remove htdig indices or search +forms from existing archive-enabled lists. It will however stop +htdig features from being added to newly created lists. If you want +to eliminate htdig from your existing lists then use the +$prefix/bin/blow_away_htdig script. + +HTDIG_ARCHIVE_URL + + this is the URL path that equates to the wrapper +$prefix/cgi-bin/htdig which controls access to the +$prefix/Mailman/Cgi/htdig.py script. + + Default: HTDIG_ARCHIVE_URL = '/mailman/htdig' + + It is highly unlikely that you will want to change from the default +value unless you are also changing other variables such as +PRIVATE_ARCHIVE_URL because of some non-standard installation +decisions on your part. + +HTDIG_SEARCH_URL + + this is the URL of the htsearch CGI program part of the htdig +package. + + Default: HTDIG_SEARCH_URL = '/cgi-bin/htsearch' + + The default assumes a single HTTP server providing access to htdig +and to Mailman's web UI are on the Mailman machine and htsearch has +been installed in the HTTP server's cgi-bin directory. This value +will depend on your htdig installation decisions and HTTP server +configuration files (typically /etc/httpd/httpd.conf on a late model +Apache installation) i.e the ScriptAlias through which the htsearch +CGI program is reached. + +HTDIG_FILES_URL + + this is the URL of the directory containing various HTML and +Graphics files installed by htdig; files such as buttonr.gif, +buttonl.gif and button1-10.gif. The URL must end with a '/'. + + Default: HTDIG_FILES_URL = '/htdig/' + + The default assumes the HTTP servers providing access to htdig and +to Mailman's web UI are on the same machine and a symbolic link +called 'htdig' has been put into your HTTP server's top level HTML +directory which points to the directory your htdig install has put +the actual files into; this link is often to /usr/share/htdig. This +value will depend on your htdig installation decisions and HTTP +server's configuration files (typically /etc/httpd/httpd.conf on a +late model Apache installation) i.e the Alias through which the link +to the htdig files are reached. + +HTDIG_CONF_LINK_DIR + + this is the name of a directory in which links to list specific +htdig config files are placed. + + Default: HTDIG_CONF_LINK_DIR = os.path.join(VAR_PREFIX, 'archives', +'htdig') + + The VAR_PREFIX of the default is resolved to an actual file system +path when when Mailman's 'make install' is run. The 'os.path.join' +creates a full file system path by gluing together the three pieces +when Mailman is run. This definition puts the directory alongside +the default PUBLIC_ARCHIVE_FILE_DIR and PRIVATE_ARCHIVE_FILE_DIR. +Unless you are changing the value of these variables you probably do +not want to change HTDIG_CONF_LINK_DIR. + +HTDIG_RUNDIG_PATH + + this is the path in you file system to the rundig shell script that +is installed as part of htdig. This tells one or other of the +patch's htdig related cron scripts (nightly_htdig and +remote_nightly_htdig) where to find rundig in order that they can +execute it. + + Default: HTDIG_RUNDIG_PATH = '/usr/local/bin/rundig' + +HTDIG_MAILMAN_LINK + + the value of this is the name of a symbolic link you must create in +the directory where htdig expects to find its configuration files. +The target of this link is the directory whose path is the value of +HTDIG_CONF_LINK_DIR. The value of this variable is embedded in the +per list search forms in each list's TOC page generated by the +patched code, where it tells htsearch where to find the list's htdig +config file. + + Default: HTDIG_MAILMAN_LINK = 'htdig-mailman' + +REMOTE_HTDIG - Semantics 0 - htdig runs on local machine, 1 -on remote +machine + + says whether htdig is run on the same machine as Mailman or on +another machine. + + Default: REMOTE_HTDIG = 0 + +REMOTE_PRIVATE_ARCHIVE_FILE_DIR + + only relevant if REMOTE_HTDIG = 1. It is the file system path to the +directory in which Mailman stores private archives, as seen by the +machine running htdig. + + Default: REMOTE_PRIVATE_ARCHIVE_FILE_DIR = os.path.join(VAR_PREFIX, + 'archives', 'private') + + The VAR_PREFIX of the default is resolved to an actual file system +path when when Mailman's 'make install' is run. The 'os.path.join' +creates a full file system path by gluing together the three pieces +when Mailman is run. If you assign a value to this in mm_cfg.pfg, +just put the relevant explicit file + system path in. + +HTDIG_EXTRAS + + You can assign a string value to this config variable and that +string will be included in all of your site's list specific htdig +configuration files when they are created. The value of the string +can be any attribute declarations as defined at +http://www.htdig.org/confindex.html. + + Be cautious in what you do with this. Most sites will not need to +use this at all. But if you have some idiosyncratic htdig +installation it might help overcome problems in integrating with +Mailman. If you think you need to use it I suggest: + + 1. You try creating a test list without assigning a value to +HTDIG_EXTRAS in $prefix/Mailman/mm.cfg.py + + 2. Enable archiving for that test list. + + 3. Send a message to the test list so that its archive is created +together with its htdig configuration file. + + 4. Review the content of the list's htdig conf file in +$prefix/archives/private//htdig/.conf. + + 5. You will see where the default value of HTDIG_EXTRAS from +$prefix/Mailman/Defaults.py has been inserted. This value is onlyan +htdig comment and does nothing. + + 6. Consider whether what you will assign to HTDIG_EXTRAS in +$prefix/Mailman/mm.cfg.py will make sense in the context of the +restof the htdig conf file's contents. + +htdig Permissions Considerations +------------------------------------ + +Python scripts added by this patch (nightly_htdig and its relatives) run +the htdig rundig script identified by HTDIG_RUNDIG_PATH to build search +indices for Mailman archives. Code added by this patch generates per +list htdig configuration files which are passed as a parameter to the +rundig script. These configuration files identify a list specific +directory ($prefix/archives/private//htdig) in which list +specific data files generated by and used by htdig are to be placed. + +However, the rundig script identified by HTDIG_RUNDIG_PATH may attempt +to generate some files in htdig's COMMON_DIR when it is first run by +nightly_htdig; the files concerned are likely to be root2word.db, +word2root.db, synonyms.db andpossibly some others generated by htidg's +htfuzzy program. The standard rundig script generates these files +selectively if they do not already exist. Depending on how you have +installed htdig and how the rundig script is first run, there may be a +permissions problem when nightly_hdig executes rundig under the mailman +UID if it tries to generate these files. + +You may need to either give the mailman UID write permission over +htdig's COMMON_DIR or, before the nightly_htdig script is first run, run +htdig's htfuzzy executable with a sufficiently privileged UID in the +manner that the rundig script would run htfuzzy, to create any necessary +files in COMMON_DIR. + +See htdig's documentation for further information on this topic. + +Local htdig Configuration +------------------------- + +This configuration is for when you are running Mailman, htdig, the HTTP +server used to provide Mailman's web UI and htdig's htsearch CGI script, +on the same machine. + +You will need to: + + 1. Set up a symbolic link in the directory where htdig expects to +find its configuration files; this depends on how you configured and +installedhtdig but it is usually the directory containing htdig's +defaulthtdig.conf file. The target of this link is the directory +whose path isassigned as the value of HTDIG_CONF_LINK_DIR. The name +of the link mustbe same as the value you assign to +HTDIG_MAILMAN_LINK. For example, usethe command: + + ln -s /home/mailman/archives/htdig /etc/htdig-mailman + + 2. If different to the default value, add the definition of +HTDIG_MAILMAN_LINK to file $prefix/Mailman/mm_cfg.py + + 3. If different to the default value, add the definition of +HTDIG_RUNDIG_PATH to file $prefix/Mailman/mm_cfg.py. + + 4. Add the definition of USE_HTDIG with the value 1 to +$prefix/Mailman/mm_cfg.py. + + USE_HTDIG = 1 + + +If necessary you can override the values of any of the other +configuration variables in file $prefix/Mailman/mm_cfg.py. In particular +you might need to change the following URL variables from their +defaults: HTDIG_SEARCH_URL and HTDIG_FILES_URL. + +These URLs can be just the path i.e. absolute URL on the same server as +that which serves Mailman's Web UI, or a full URL identifying the +protocol (http), server, server port and path, for example +http://mailer.your.com:8080/cgi-bin/htdig/htsearch. + +Remote htdig Configuration +-------------------------- + +This configuration is for when you are running htdig and an HTTP server +providing access to htsearch on a different machine to that running +Mailman and the HTTP server used to provide Mailman's web interface. + +For this configuration to work, htdig's programs, both those run from +command lines such as rundig and those run via CGI such as htsearch, +must be able to see Mailman archives through NFS. In the examples below +we'll assume that /mnt/mailman-archives on the htdig machine maps to +$prefix/mailman/archives on the Mailman machine. + +You should also arrange for he mailman UID and its GID to be common to +both machines. Remember that when rundig is called on the htdig machine +to produce search indices for each list it will be trying to write those +files via NFS in Mailman's archive area and will thus need to run with +an appropriate identity and permissions. + +The differences between the local and remote configuration are: + + 1. configuration values telling htdig where to find files are as +viewed from the remote machine. + + 2. configuration values giving URLs that refer to htdiggy things +have to be as viewed from the Mailman machine. + +You will need to: + + 1. Set up a symbolic link in the directory where htdig expects to +find its configuration files; this depends on how you configured and +installed htdig but it is usually the directory containing htdig's +default htdig.conf file. The target of this link is the directory +whose path is assigned as the value of HTDIG_CONF_LINK_DIR as seen +from the remote machine running htdig. The name of the link must be +same as the value you assign to HTDIG_MAILMAN_LINK. For example, use +the command: + + ln -s /mnt/mailman-archives/htdig /etc/htdig-mailman + + 2. Add the definition of HTDIG_MAILMAN_LINK to file +$prefix/Mailman/mm_cfg.py. For example: + + HTDIG_MAILMAN_LINK = 'htdig-mailman' + + 3. Add the definition of HTDIG_RUNDIG_PATH to file +$prefix/Mailman/mm_cfg.py. This is path to rundig on the remote +machine running htdig. For example: + + HTDIG_RUNDIG_PATH = '/usr/local/bin/rundig' + + 4. Add the definition of HTDIG_SEARCH_URL to file +$prefix/Mailman/mm_cfg.py. This must be a full URL referring to the +htsearch CGI program on the remote htdig machine, as seen from the +Mailman local machine. Forexample: + + HTDIG_SEARCH_URL = 'http://htdiggy.your.com/cgi-bin/htsearch' + + 5. Add the definition of HTDIG_FILES_URL to file +$prefix/Mailman/mm_cfg.py. This must be a full URL referring to the +directory containing htdig files on the remote htdig machine as seen +from the Mailman local machine. ThisURL must end with a '/'. For +example: + + HTDIG_FILES_URL = 'http://htdiggy.your.com/htdig/' + + 6. Add the definition of REMOTE_PRIVATE_ARCHIVE_FILE_DIR to +$prefix/Mailman/mm_cfg.py. This must be the absolute file system +path to the directory in which Mailman stores private archives as +seen by the machine running htdig. For example: + + REMOTE_PRIVATE_ARCHIVE_FILE_DIR = +'/mnt/mailman-archives/private' + + 7. Add the definition of USE_HTDIG with the value 1 to +$prefix/Mailman/mm_cfg.py. + + USE_HTDIG = 1 + + 8. Add the definition of REMOTE_HTDIG with the value 1 to +$prefix/Mailman/mm_cfg.py. + + REMOTE_HTDIG = 1 + +You have to choose one of the three remote_nightly_htdig scripts found +in $prefix/cron - remote_nightly_htdig, remote_nightly_htdig_noshare and +remote_nightly_htdig.pl - and transfer it to the htdig machine. See +above under heading "What is Installed by the Patch/What the patch adds" +for an explanation of the differences between these scripts, which all +do the same basic job. You should add the script to the crontab for the +mailman UID on the htdig machine. But first you need to edit the +selected script to add some configuration information. What has to be +added depends on which script you opt to use. In each case the variables +concerned are declared near the top of the script and you just have to +enter the appropriate values: + + remote_nightly_htdig + you only need to set the value of the python variable +MAILMAN_PATH to be the directory $prefix as seen from the htdig +machine. The whole Mailman installation must be accessible via +NFS in order to use this script. + + remote_nightly_htdig_noshare + you need to copy the values for the following configuration +variables from either $prefix/Mailman/mm_cfg.py or +$prefix/Mailman/Defaults.py to the script: +REMOTE_PRIVATE_ARCHIVE_FILE_DIR, HTDIG_RUNDIG_PATH. The +variables declared in remote_nightly_htdig_noshare use the same +names. This script only requires that the archives directory of +the Mailman installation be accessible via NFS. + + remote_nightly_htdig.pl + you need to copy the values for the following configuration +variables from either $prefix/Mailman/mm_cfg.py or +$prefix/Mailman/Defaults.py to the script: +REMOTE_PRIVATE_ARCHIVE_FILE_DIR, HTDIG_RUNDIG_PATH. Being a Perl +script, the variables in remote_nightly_htdig.pl use the same +names but prefixed with the '$' character. This script only +requires that the archives directory of the Mailman installation +be accessible via NFS. + + Note 1: You may need to change the '#! /usr/bin/env perl' on the +first line of this script if that doesn't find your Perl +executable. You may also need to verify the Perl packages used +by this script are installed on your system. + +As with the nightly_htdig script when running with local htdig, these +scripts can be run from the command line using the mailman UID in order +to get htdig to construct an initial set of indices. + +Upgrading an Existing Standard Mailman Installation +--------------------------------------------------- + +You will want to suspend operation of Mailman while doing the upgrade. +Consider doing a shutdown of the MTA delivering mail to Mailman and +removing Mailman's crontab. + +Configure and install as described above. + +Restart Mailman's crontab and restart your MTA's delivery to Mailman. + +If your installation already has archives: + + 1. Send a message to each of your archive-enabled lists. This will +stimulate the setup of the new per list htdig config files in the +Mailman archives. + + 2. Consider rebuilding your existing archives with $prefix/bin/arch. +This will embed the ARCHIVE_INDEXING_ENABLE and +ARCHIVE_INDEXING_DISABLE in the regenerated archive pages and, after +nightly_htdig has been run, give improved search results. + + 3. Run the nightly_htdig script from the command line to generate a +new set of per list htdig search indices. + +Changing from local to remote htdig or vice versa +------------------------------------------------- + +You will want to suspend operation of Mailman while making this change. +Consider doing a shutdown of the MTA delivering mail to Mailman and +removing Mailman's crontab. + +Run the $prefix/bin/blow_away_htdig script to remove all existing per +list htdig config files and htdig indices/db files. + +Configure per the instructions above for the local or remote target. + +Restart Mailman's crontab and restart your MTA's delivery to Mailman. + +Send a message to each of your archive-enabled lists. This will +stimulate the set up of the new per list htdig config files in Mailman +archives. + +Run the nightly_htdig script from the command line to generate a new set +of per list htdig search indices. + +Coping with htdig Upgrades +-------------------------- + +If you change the version of htdig you run, you may find that the +indices built with the ealier version are not compatible with the newer +version of htdig's programs. In that case do the following: + + 1. You will want to suspend operation of Mailman while making this +change. Consider doing a shutdown of the MTA delivering mail to +Mailman and removing Mailman's crontab. + + 2. Run the $prefix/bin/blow_away_htdig script with the -i flag to +remove all existing per list htdig indices/db files. + + 3. Restart Mailman's crontab and restart your MTA's delivery to +Mailman. + + 4. Run the nightly_htdig script from the command line to generate +new sets of per list htdig search indices. + +Changing the Addressing Scheme of your web_page_url +--------------------------------------------------- + +If you change the addressing scheme of the web_page_url for a list to or +from http then you will need to rebuild the list's htdig configuration +file(s) and the related htdig indices. Do the following: + + 1. You may want to suspend operation of Mailman while making this +change. Consider doing a shutdown of the MTA delivering mail to +Mailman and removing Mailman's crontab. + + 2. Run the $prefix/bin/blow_away_htdig script to remove all existing +per list htdig material for the list(s) concerned. + + 3. Restart Mailman's crontab and restart your MTA's delivery to +Mailman. + + 4. Send a message to each affected list to provoke reconstruction of +the list's htdig config file(s). + + 5. Run the nightly_htdig script from the command line to generate +new sets of per list htdig search indices. + + +Operational Information +======================= + +If you have just turned USE_HTDIG on or just used +$prefix/bin/blow_away_htdig (without the -i flag) there will initially +be no per list htdig information saved in the archives. + +When the first post to each archive-enabled list is archived by +pipermail, the per list htdig config file will be constructed and some +directories and links added to your Mailman archive directories. The +htdig search form will be added to list's TOC page. + +However, until one of the nightly_htdig scripts is run no htdig indices +will be constructed. You can either wait for the script to run as a cron +job or run it (while using the mailman UID) from the command line. + +Notes and Warnings +================== + +Redhat 7.1 and 7.2 installations: + + If you install htdig from the htdig-3.2.0 binary rpm of RH7.1/2 +Binary CD 1 of 2 you also have to install the htdig-web-3.2.0 binary +rpm. This may be from RH 7.1/2 Binary CD 2 of 2 or CD 1 of 2 +depending on whether you are using actual CDs or downloaded CD +images. + +Apache/htdig issues + + The htsearch CGI script part of htdig and some associated HTML and +graphics file must be accessible via you web server and the Mailman +configuration variables HTDIG_SEARCH_URL and HTDIG_FILES_URL setup +accordingly. Depending on how you install htdig and Apache you may +need to add Alias and/or ScriptAlias directives to you Apache +configuration file to make the htdig components accessible. Check +the Apache and htdig documentation. + +Contributors +============ + +Original author and maintainer: Richard Barrett - r.barrett@ftel.co.uk + +Past bug fixes: Nigel Metheringham + +Testers: Mark T. Valites , + Rehan van der Merwe + +Appendices +========== + +Appendix 1 -Technique for htdigging when Mailman's web_page_url uses the +https scheme +------------------------------------------------------------------------ +A technique for htdigging when Mailman's web_page_url uses the https +addressing scheme is described in this archived e-mail: +http://www.htdig.org/mail/1999/10/0187.html + +The text of that e-mail is as follows: + +[htdig] Re: Help about htdig indexing https files + +------------------------------------------------------------------------ +Gilles Detillieux (grdetil@scrc.umanitoba.ca) +Wed, 27 Oct 1999 10:18:31 -0500 (CDT) + + +Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] +Next message: Avi Rappoport: "[htdig] indexing SSL (was: Help building +the database)" +Previous message: Gilles Detillieux: "Re: Fw: [htdig] mutiple search +results" +In reply to: Torsten Neuer: "Re: Fw: [htdig] mutiple search results" + +------------------------------------------------------------------------ +According to Edouard DESSIOUX: +> >Currently, htdig will not support URLs that begin with https://, even +> >when using local_urls to bypass the server. A trick that might work +> >would be to index using http:// instead, but use local_urls to point +> >to the directory that contains the contents of the secure server. +> +> I used that, and now, when i use htsearch, it work, except the fact +> that all my URL are http://x.y.z/ instead of https://x.y.z/ +> +> >You'd need to use separate +> >configuration files for digging and searching, and use +> >url_part_aliases in each of these configuration files to rewrite the +> >http:// into https:// in the search results. +> +> This is the part i dont understand, and i would like you to explain. + + +It basically works as a search and replace. One url_part_aliases in the +configuration file used by htdig maps the http://x.y.z/ into some +special code like "*site", and another url_part_aliases in the +configuration file used by htsearch maps the "*site" back into the value +you want, i.e. https://x.y.z/. The substitution is left to right in +htdig, and right to left in htsearch. So, if you use the same config +file for both, or the same setting for both, you get back what you +started with (but saved some space in the database because of the +encoding). However, if you use two separate config files with different +url_part_aliases setting for htdig and htsearch, you can remap parts of +URLs from one substring to another. + + +I hope this makes things clearer. I thought the current description at +http://www.htdig.org/attrs.html#url_part_aliases was already quite +clear. + + + +-- +Gilles R. Detillieux E-mail: +Spinal Cord Research Centre WWW: +http://www.scrc.umanitoba.ca/~grdetil +Dept. Physiology, U. of Manitoba Phone: (204)789-3766 +Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 +------------------------------------ diff -r -u -P mailman-2.1-index/Mailman/Archiver/HyperArch.py mailman-2.1-htdig/Mailman/Archiver/HyperArch.py --- mailman-2.1-index/Mailman/Archiver/HyperArch.py Thu Jan 2 15:05:35 2003 +++ mailman-2.1-htdig/Mailman/Archiver/HyperArch.py Thu Jan 2 15:09:00 2003 @@ -31,8 +31,11 @@ import re import errno import urllib +import urlparse import time import os +from stat import * +import errno import types import HyperDatabase import pipermail @@ -566,6 +569,9 @@ self.lang = maillist.preferred_language self.charset = Utils.GetCharSet(maillist.preferred_language) + if mm_cfg.USE_HTDIG: + self.setup_htdig() + if hasattr(self.maillist,'archive_volume_frequency'): if self.maillist.archive_volume_frequency == 0: self.ARCHIVE_PERIOD='year' @@ -683,6 +689,7 @@ 'meta': '', "indexing_enable": mm_cfg.ARCHIVE_INDEXING_ENABLE, "indexing_disable": mm_cfg.ARCHIVE_INDEXING_DISABLE, + "htsearch": '', } # Avoid i18n side-effects otrans = i18n.get_translation() @@ -703,6 +710,31 @@ d["archive_listing_end"] = quick_maketext( 'archlistend.html', mlist=mlist) + if mm_cfg.USE_HTDIG: + list_htdig_dir = os.path.join(self.maillist.archive_dir(), + 'htdig') + rundig_file = os.path.join(list_htdig_dir, 'rundig_last_run') + try: + last_rundig_mtime = os.stat(rundig_file)[ST_MTIME] + lastrun = time.strftime("%A, %d %b %Y %H:%M:%S %Z", + time.localtime(last_rundig_mtime)) + except OSError, e: + if e.errno <> errno.ENOENT: raise + lastrun = '[has yet to be built for this new list]' + h = {"listname": self.maillist.internal_name(), + "htconfdir": mm_cfg.HTDIG_MAILMAN_LINK, + "htsearchcgi": mm_cfg.HTDIG_SEARCH_URL, + "lastrun": lastrun, + "htsearchconf": '', + } + conf_name_search = self.maillist.internal_name() + \ + '.htsearch.conf' + conf_file_search = os.path.join(list_htdig_dir, + conf_name_search) + if os.path.exists(conf_file_search): + h['htsearchconf'] = '.htsearch' + d["htsearch"] = Utils.maketext('TOC_htsearch.html', + dict=h, raw=1) accum = [] for a in self.archives: @@ -753,6 +785,120 @@ 'indexing_disable': mm_cfg.ARCHIVE_INDEXING_DISABLE, }, mlist=self.maillist) + + def remove_htdig(self, indices_only): + list_htdig_dir = os.path.join(self.maillist.archive_dir(), 'htdig') + if not os.path.exists(list_htdig_dir): + return + conf_name_dig = self.maillist.internal_name() + '.conf' + conf_file_dig = os.path.join(list_htdig_dir, conf_name_dig) + conf_name_search = self.maillist.internal_name() + '.htsearch.conf' + conf_file_search = os.path.join(list_htdig_dir, conf_name_search) + dual_conf_files = None + if os.path.exists(conf_file_search): + dual_conf_files = 1 + if indices_only: + cfd = open(conf_file_dig, 'r') + conf_data_dig = cfd.readlines() + cfd.close() + if dual_conf_files: + cfd = open(conf_file_search, 'r') + conf_data_search = cfd.readlines() + cfd.close() + os.system('rm -rf ' + list_htdig_dir + '/*') + cfd = open(conf_file_dig, 'w') + cfd.writelines(conf_data_dig) + cfd.close() + if dual_conf_files: + cfd = open(conf_file_search, 'w') + cfd.writelines(conf_data_search) + cfd.close() + else: + os.system('rm -rf ' + list_htdig_dir) + conf_file_link_dig = os.path.join(mm_cfg.HTDIG_CONF_LINK_DIR, conf_name_dig) + os.unlink(conf_file_link_dig) + if dual_conf_files: + conf_file_link_search = os.path.join(mm_cfg.HTDIG_CONF_LINK_DIR, conf_name_search) + os.unlink(conf_file_link_search) + + def setup_htdig(self): + listname = self.maillist.internal_name() + # we want to make a directory to put the mail list's htdig stuff in + list_htdig_dir = os.path.join(self.maillist.archive_dir(), 'htdig') + # but we bug out if this has already been done + try: + os.mkdir(list_htdig_dir, 02775) + except OSError, e: + if e.errno <> errno.EEXIST: raise + return + # assemble the mapping for characterising the htdig config + htdigfiles = mm_cfg.HTDIG_FILES_URL + if mm_cfg.HTDIG_FILES_URL[-1] == '/': + htdigfile = htdigfiles[:-1] + d = {'databases': list_htdig_dir, + "filepath": self.maillist.archive_dir() + '/', + "maintainer": Utils.get_site_email(), + "indexing_enable": mm_cfg.ARCHIVE_INDEXING_ENABLE, + "indexing_disable": mm_cfg.ARCHIVE_INDEXING_DISABLE, + "htdig_url": htdigfiles, + "htdig_extras": mm_cfg.HTDIG_EXTRAS, + } + # we need to changes paths to be relative to file system of + # remote machine if we are not running htdig on mailman machine + if mm_cfg.REMOTE_HTDIG: + d['filepath'] = os.path.join( + mm_cfg.REMOTE_PRIVATE_ARCHIVE_FILE_DIR, + listname + '/') + d['databases'] = os.path.join(d['filepath'], 'htdig') + # now the URL through which htdig access to the pipermail data will go + starturl_dig = self.maillist.GetScriptURL('htdig') + '/' + starturl_search = starturl_dig + # we need to know if the addressing scheme for the URL as htdig cannot + # cope with other than http (https for instance) when building indices + # we'll need different conf files for htdig and htsearch in that case + dual_conf_files = None + urlbits = urlparse.urlparse(starturl_dig) + if urlbits[0] != 'http': + urlbits = ('http',) + urlbits[1:] + starturl_dig = urlparse.urlunparse(urlbits) + dual_conf_files = 1 + # create htdig config files. we may need one for digging and another + # for searching if the addressing scheme is https these config files + # are slightly different we'll put the files in the directory we just + # created above + conf_name_dig = listname + '.conf' + d['url_part_aliases'] = starturl_dig + " *mm-htdig*" + d['starturl'] = starturl_dig + d['urlpath'] = starturl_dig + conf_file_dig = os.path.join(list_htdig_dir, conf_name_dig) + fd = open(conf_file_dig, 'w') + fd.write(Utils.maketext('htdig_conf.txt', dict=d, raw=1)) + fd.close() + # we need symlinks so that htdig will be able to find the config files + conf_file_link_dig = os.path.join(mm_cfg.HTDIG_CONF_LINK_DIR, + conf_name_dig) + try: + os.unlink(conf_file_link_dig) + except OSError, e: + if e.errno <> errno.ENOENT: raise + os.symlink(conf_file_dig, conf_file_link_dig) + # make the second conf file and link to it for htsearch if necessary + if dual_conf_files: + conf_name_search = listname + '.htsearch.conf' + d['url_part_aliases'] = starturl_search + " *mm-htdig*" + d['starturl'] = starturl_search + d['urlpath'] = starturl_search + conf_file_search = os.path.join(list_htdig_dir, conf_name_search) + fd = open(conf_file_search, 'w') + fd.write(Utils.maketext('htdig_conf.txt', dict=d, raw=1)) + fd.close() + conf_file_link_search = os.path.join(mm_cfg.HTDIG_CONF_LINK_DIR, + conf_name_search) + try: + os.unlink(conf_file_link_search) + except OSError, e: + if e.errno <> errno.ENOENT: raise + os.symlink(conf_file_search, conf_file_link_search) def GetArchLock(self): if self._lock_file: diff -r -u -P mailman-2.1-index/Mailman/Cgi/htdig.py mailman-2.1-htdig/Mailman/Cgi/htdig.py --- mailman-2.1-index/Mailman/Cgi/htdig.py Thu Jan 1 01:00:00 1970 +++ mailman-2.1-htdig/Mailman/Cgi/htdig.py Wed Jan 22 11:23:39 2003 @@ -0,0 +1,150 @@ +# Copyright (C) 2002, 2003 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +"""Provide an authentication wrapper around archives accessed via +returned results from htdig's htsearch. Access via htdig requires the +user's request present a valid cookie authorizing access to the +list's archives for private archives. +This cookie must be obtained by the same process as the user must +adopt for accessing the archive directly rather than via +htsearch results. Indeed the user should only be able to reach the +search facility, which appears on the list archives front page, if +they have been through the authentication process. However, this code +prevents someone hand fettling a URL on the browser or using one +given to them by an authorised user, which might compromise the +list's privacy. +""" + +# this code was derived from the private.py cgi script + +import sys +import os +import cgi +import mimetypes +import re + +from Mailman import mm_cfg +from Mailman import Utils +from Mailman import MailList +from Mailman import Errors +from Mailman import i18n +from Mailman.htmlformat import * +from Mailman.Logging.Syslog import syslog + +# Set up i18n. Until we know which list is being requested, we use the +# server's default. +_ = i18n._ +i18n.set_language(mm_cfg.DEFAULT_SERVER_LANGUAGE) + +errors = {'path': _('The requested document cannot be found.'), + 'data': _('The requested document cannot be read.'), + 'auth': Utils.maketext('htdig_auth_failure.html', dict=None, raw=0) + } + +def true_path(path): + "Ensure that the path is safe by removing .." + path = path.replace("../", "") + path = path.replace("./", "") + return path[1:] + + +def make_inserts(listname): + urlbase = mm_cfg.DEFAULT_URL or \ + mm_cfg.DEFAULT_URL_PATTERN % mm_cfg.DEFAULT_URL_HOST + return { + 'mailto': Utils.get_site_email(), + 'listinfo_link': urlbase + '/listinfo/' + listname, + 'referer': os.environ.get('HTTP_REFERER', _('Referer not known')), + 'uri': os.environ.get('REQUEST_URI', _('URI not known')), + } + + +def error_quit(listname, error_type): + d = make_inserts(listname) + d['error'] = errors[error_type] + charset = Utils.GetCharSet(mm_cfg.DEFAULT_SERVER_LANGUAGE) + stuff = 'Content-type: text/html; charset=' + charset + '\n\n' + stuff += Utils.maketext('htdig_access_error.html', dict=d, + lang=mm_cfg.DEFAULT_SERVER_LANGUAGE) + print stuff + sys.exit(0) + + +def main(): + list_info = Utils.GetPathPieces() + listname = '' + access_failure = 'path' + if list_info: + access_failure = '' + try: + path = os.environ.get('PATH_INFO') + true_filename = os.path.join(mm_cfg.PRIVATE_ARCHIVE_FILE_DIR, + true_path(path)) + # The path should be: + # // + # / + num_parts = len(list_info) + if (num_parts == 3) or (num_parts == 2): + listname = list_info[0].lower() + try: + mlist = MailList.MailList(listname, lock=0) + except: + access_failure = 'list' + else: + if num_parts == 3 and \ + list_info[1] in ('database', 'htdig'): + access_failure = 'path' + elif not (os.path.exists(true_filename) and \ + os.path.isfile(true_filename)): + access_failure = 'file' + elif num_parts == 2 and \ + not re.compile(r'\.(html|txt|txt\.gz)$').search(true_filename): + access_failure = 'file' + else: + access_failure = 'path' + except: + access_failure = 'path' + + if access_failure: + error_quit(listname, 'path') + + # We only need to authorize the user if it's a private archive + if mlist.archive_private: + is_auth = mlist.WebAuthenticate((mm_cfg.AuthUser, + mm_cfg.AuthListModerator, + mm_cfg.AuthListAdmin, + mm_cfg.AuthSiteAdmin), + '', '') + if not is_auth: + error_quit(listname, 'auth') + + # OK to output the desired file + try: + f = open(true_filename, 'r') + except IOError: + error_quit(listname, 'data') + ctype, cencode = mimetypes.guess_type(true_filename) + if not (ctype or cencode): + ctype = 'application/octet-stream' + elif cencode: + ctype = "application/x-%s" % cencode + print "Content-type: %s\n\n" % ctype + while (1): + data = f.read(16384) + if data == "": break + sys.stdout.write(data) + f.close() + sys.exit(0) diff -r -u -P mailman-2.1-index/Mailman/Defaults.py.in mailman-2.1-htdig/Mailman/Defaults.py.in --- mailman-2.1-index/Mailman/Defaults.py.in Thu Jan 2 15:05:35 2003 +++ mailman-2.1-htdig/Mailman/Defaults.py.in Thu Jan 2 15:09:00 2003 @@ -1204,6 +1204,48 @@ #ARCHIVE_INDEXING_ENABLE = '\n' #ARCHIVE_INDEXING_DISABLE = '\n' +ARCHIVE_INDEXING_ENABLE = '' +ARCHIVE_INDEXING_DISABLE = '' +# htdig integration parameters +# if you set USE_HTDIG then you must also set HTDIG_MAILMAN_LINK +# and HTDIG_RUNDIG_PATH to suit your htdig installation, for instance: +# HTDIG_MAILMAN_LINK = 'htdig-mailman' +# HTDIG_RUNDIG_PATH = '/usr/bin/rundig' +USE_HTDIG = 0 # 0 - don't use integrated htdig, 1 - use it +HTDIG_ARCHIVE_URL = '/mailman/htdig/' # must end in a slash +HTDIG_SEARCH_URL = '/cgi-bin/htsearch' +HTDIG_FILES_URL = '/htdig/' +HTDIG_CONF_LINK_DIR = os.path.join(VAR_PREFIX, 'archives', 'htdig') +HTDIG_MAILMAN_LINK = 'htdig-mailman' +HTDIG_RUNDIG_PATH = '/usr/local/bin/rundig' + +# you can use the HTDIG_EXTRAS parameter to add arbitrary htdig +# configuration attributes to per list htdig config files. The string +# value you specify is inserted verbatim at the top of each htdig conf +# file when it is generated. The default value does nothing. Make sure +# you understand what you are doing before you fool with this facility. +HTDIG_EXTRAS = """\ +# start of extra site specific htdig configuration attributes +# +# replace these line with your htdig config attribute declarations +# as defined at http://www.htdig.org/confindex.html +# +# end of extra site specific htdig configuration attributes +""" + +# remote htdig support parameters for mailman-htdig integration +# provides support for running htdig on a different machine from the one +# running mailman but one having NFS access to the installation directory +# of the Mailman package. +# set REMOTE_HTDIG if you are running htdig on a different machine to +# Mailman. Has no effect unless you also set REMOTE_HTDIG +# REMOTE_PRIVATE_ARCHIVE_FILE_DIR is the absolute path to the directory in +# which Mailman stores private archives as seen by the machine running htdig. +# It should resolve to the same directory as PRIVATE_ARCHIVE_FILE_DIR when +# viewed from the remote system. +REMOTE_HTDIG = 0 # 0 - htdig runs on Mailman machine, 1 - runs on remote machine +REMOTE_PRIVATE_ARCHIVE_FILE_DIR = os.path.join(VAR_PREFIX, 'archives', 'private') + # Vgg: Language descriptions and charsets dictionary, any new supported # language must have a corresponding entry here. Key is the name of the # directories that hold the localized texts. Data are tuples with first diff -r -u -P mailman-2.1-index/Mailman/Queue/ArchRunner.py mailman-2.1-htdig/Mailman/Queue/ArchRunner.py --- mailman-2.1-index/Mailman/Queue/ArchRunner.py Thu Jul 25 06:47:48 2002 +++ mailman-2.1-htdig/Mailman/Queue/ArchRunner.py Thu Jan 2 15:09:00 2003 @@ -17,17 +17,33 @@ """Outgoing queue runner.""" import time +import errno +import os +from stat import * from email.Utils import parsedate_tz, mktime_tz, formatdate from Mailman import mm_cfg from Mailman import LockFile from Mailman.Queue.Runner import Runner +from Mailman import MailList +from Mailman import Utils +from Mailman.Archiver import HyperArch +# Part of the Mailman-htdig integration. +# This controls how often _doperiodic() will try to deal with the +# consequences of a remote machine having reindexed mail archives +# and hence the need for the MM machine to update affected lists +# TOC pages to reflect the datetime when the htdigging was done. +CHECK_REMOTE_RUNDIG_EFFECTS = 10 class ArchRunner(Runner): QDIR = mm_cfg.ARCHQUEUE_DIR + def __init__(self, slice=None, numslices=1): + Runner.__init__(self, slice, numslices) + self._periodic_htdig_check = CHECK_REMOTE_RUNDIG_EFFECTS + def _dispose(self, mlist, msg, msgdata): # Support clobber_date, i.e. setting the date in the archive to the # received date, not the (potentially bogus) Date: header of the @@ -74,3 +90,35 @@ mlist.Save() finally: mlist.Unlock() + + def _doperiodic(self): + """Do some processing `every once in a while'. + + If the mailman-htdig archiving is being used then we want to ensure + that the TOC page for each list has been updated since the last time + thtdigging of the list was done. This is only necessary if we are + running htdig on a different machine to Mailman + + """ + + if mm_cfg.USE_HTDIG and mm_cfg.REMOTE_HTDIG: + self._periodic_htdig_check -= 1 + if self._periodic_htdig_check <= 0: + self._periodic_htdig_check = CHECK_REMOTE_RUNDIG_EFFECTS + listnames = Utils.list_names() + for name in listnames: + mlist = MailList.MailList(name, lock=0) + if not mlist.last_post_time > 0: continue + arch_dir = mlist.archive_dir() + rundig_run_file = os.path.join(arch_dir, 'htdig', + 'rundig_last_run') + toc_file = os.path.join(arch_dir, 'index.html') + try: + last_rundig_time = os.stat(rundig_run_file)[ST_MTIME] + last_toc_time = os.stat(toc_file)[ST_MTIME] + except OSError, e: + if e.errno <> errno.ENOENT: raise + else: + if last_rundig_time > last_toc_time: + HyperArch.HyperArchive(mlist).write_TOC() + diff -r -u -P mailman-2.1-index/Makefile.in mailman-2.1-htdig/Makefile.in --- mailman-2.1-index/Makefile.in Thu Dec 12 04:30:35 2002 +++ mailman-2.1-htdig/Makefile.in Thu Jan 2 15:09:00 2003 @@ -42,7 +42,7 @@ VAR_DIRS= \ logs archives lists locks data spam qfiles \ - archives/private archives/public + archives/private archives/public archives/htdig ARCH_INDEP_DIRS= \ bin templates scripts cron pythonlib \ diff -r -u -P mailman-2.1-index/bin/Makefile.in mailman-2.1-htdig/bin/Makefile.in --- mailman-2.1-index/bin/Makefile.in Wed Dec 4 14:44:25 2002 +++ mailman-2.1-htdig/bin/Makefile.in Thu Jan 2 15:09:00 2003 @@ -47,7 +47,7 @@ version config_list list_lists dumpdb cleanarch \ list_admins genaliases change_pw mailmanctl qrunner inject \ unshunt fix_url.py convert.py transcheck b4b5-archfix \ - list_owners + list_owners blow_away_htdig BUILDDIR= ../build/bin diff -r -u -P mailman-2.1-index/bin/blow_away_htdig mailman-2.1-htdig/bin/blow_away_htdig --- mailman-2.1-index/bin/blow_away_htdig Thu Jan 1 01:00:00 1970 +++ mailman-2.1-htdig/bin/blow_away_htdig Thu Jan 2 15:09:00 2003 @@ -0,0 +1,122 @@ +#! @PYTHON@ +# +# Copyright (C) 2002 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. +# +"""Blow away the per list htdig files. + +This script is for when you: + a. decide to stop using Mailman-htdig integration + b. move from local to remote htdig or vice-versa + c. are updgrading to a version of htdig which has an incompatible + index/db file format + +You really want to stop Mailman operating while you are running this. For +instance, shutdown the MTA delivering mail to Mailman and remove Mailman's +crontab. + +Usage: %(program)s [-v] [-h] [i] [listnames] + +Where: + --verbose / -v + print each list as htdig is run for it + + --help / -h + print this message and exit + + --indices / -i + only delete htdig search indices for the lists + leave the htdig conf file in place + + listnames + Optionally, only runs htdig for the named lists. Without + this, all archivable lists are processed. + +""" + +# this code was derived from the nightly_gzip cron script + +import sys +import os +from stat import * +import time +from stat import * +import getopt +import paths +from Mailman import MailList +from Mailman import Utils +from Mailman import mm_cfg +from Mailman.Archiver import HyperArch +from Mailman.i18n import _ + +program = sys.argv[0] + +def usage(code, msg=''): + print >> sys.stderr, _( __doc__) + if msg: + print msg + sys.exit(code) + +def main(): + try: + opts, args = getopt.getopt(sys.argv[1:], 'vhi', + ['verbose', 'help', 'indices']) + except getopt.error, msg: + usage(1, msg) + + # defaults + verbose = 0 + indices_only = 0 + for opt, arg in opts: + if opt in ('-h', '--help'): + usage(0) + elif opt in ('-v', '--verbose'): + verbose = 1 + elif opt in ('-i', '--indices'): + indices_only = 1 + # limit to the specified lists? + if args: + listnames = args + else: + listnames = Utils.list_names() + + # make sure htdig use is off for the moment in this process + mm_cfg.USE_HTDIG = 0 + + # process all the specified lists + for name in listnames: + mlist = MailList.MailList(name, lock=0) + if not mlist.archive: + continue + archive = HyperArch.HyperArchive(mlist) + if verbose: + if indices_only: + print _('Blowing away all htdig indices of list %(name)s') + else: + print _('Blowing away all htdig stuff of list %(name)s') + archive.remove_htdig(indices_only) + archive.write_TOC() + +if __name__ == '__main__' and \ + mm_cfg.USE_HTDIG and \ + mm_cfg.ARCHIVE_TO_MBOX in (0, 2): + # we're only going to run this if messages are archived to the internal + # archiver and we are using htdig to provide archive search + omask = os.umask(002) + try: + main() + finally: + os.umask(omask) diff -r -u -P mailman-2.1-index/configure mailman-2.1-htdig/configure --- mailman-2.1-index/configure Tue Dec 31 21:49:40 2002 +++ mailman-2.1-htdig/configure Thu Jan 2 15:09:00 2003 @@ -2065,6 +2065,7 @@ SCRIPTS="build/bin/add_members:bin/add_members \ build/bin/arch:bin/arch \ +build/bin/blow_away_htdig:bin/blow_away_htdig \ build/bin/change_pw:bin/change_pw \ build/bin/check_db:bin/check_db \ build/bin/check_perms:bin/check_perms \ @@ -2104,6 +2105,10 @@ build/cron/gate_news:cron/gate_news \ build/cron/mailpasswds:cron/mailpasswds \ build/cron/nightly_gzip:cron/nightly_gzip \ +build/cron/nightly_htdig:cron/nightly_htdig \ +build/cron/remote_nightly_htdig:cron/remote_nightly_htdig \ +build/cron/remote_nightly_htdig_noshare:cron/remote_nightly_htdig_noshare \ +build/cron/remote_nightly_htdig.pl:cron/remote_nightly_htdig.pl \ build/cron/senddigests:cron/senddigests \ " diff -r -u -P mailman-2.1-index/configure.in mailman-2.1-htdig/configure.in --- mailman-2.1-index/configure.in Tue Dec 31 21:49:40 2002 +++ mailman-2.1-htdig/configure.in Thu Jan 2 15:09:00 2003 @@ -541,6 +541,7 @@ AC_DEFUN(MM_SCRIPTS, [dnl bin/add_members \ bin/arch \ +bin/blow_away_htdig \ bin/change_pw \ bin/check_db \ bin/check_perms \ @@ -580,6 +581,10 @@ cron/gate_news \ cron/mailpasswds \ cron/nightly_gzip \ +cron/nightly_htdig \ +cron/remote_nightly_htdig \ +cron/remote_nightly_htdig_noshare \ +cron/remote_nightly_htdig.pl \ cron/senddigests \ ]) diff -r -u -P mailman-2.1-index/cron/Makefile.in mailman-2.1-htdig/cron/Makefile.in --- mailman-2.1-index/cron/Makefile.in Sat Mar 16 06:57:37 2002 +++ mailman-2.1-htdig/cron/Makefile.in Thu Jan 2 15:09:00 2003 @@ -41,7 +41,9 @@ SHELL= /bin/sh PROGRAMS= checkdbs mailpasswds senddigests gate_news \ - nightly_gzip bumpdigests disabled + nightly_gzip bumpdigests disabled \ + nightly_htdig remote_nightly_htdig \ + remote_nightly_htdig_noshare remote_nightly_htdig.pl FILES= crontab.in BUILDDIR= ../build/cron diff -r -u -P mailman-2.1-index/cron/crontab.in.in mailman-2.1-htdig/cron/crontab.in.in --- mailman-2.1-index/cron/crontab.in.in Sun Jan 6 06:28:12 2002 +++ mailman-2.1-htdig/cron/crontab.in.in Thu Jan 2 15:09:00 2003 @@ -18,6 +18,11 @@ # or want to exclusively use a callback strategy instead of polling. 0,5,10,15,20,25,30,35,40,45,50,55 * * * * @PYTHON@ -S @prefix@/cron/gate_news # +# At 2:19am every night, regenerate htdig search files. Only +# turn this on if the internal archiver is used and htdig +# use enabled in mm_cfg.py with USE_HTDIG +19 2 * * * @PYTHON@ -S @prefix@/cron/nightly_htdig +# # At 3:27am every night, regenerate the gzip'd archive file. Only # turn this on if the internal archiver is used and # GZIP_ARCHIVE_TXT_FILES is false in mm_cfg.py diff -r -u -P mailman-2.1-index/cron/nightly_htdig mailman-2.1-htdig/cron/nightly_htdig --- mailman-2.1-index/cron/nightly_htdig Thu Jan 1 01:00:00 1970 +++ mailman-2.1-htdig/cron/nightly_htdig Thu Jan 2 15:09:00 2003 @@ -0,0 +1,155 @@ +#! @PYTHON@ +# +# Copyright (C) 2002 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. +# +"""Re-generate the htdig archive search files. + +This script should normally be run nightly from cron. When run from the +command line, the following usage is understood: + +Usage: %(program)s [-v] [-h] [listnames] + +Where: + --verbose / -v + print each list as htdig is run for it + + listnames + Optionally, only runs htdig for the named lists. Without + this, all archivable lists are processed. + + --help / -h + print this message and exit + +""" + +# this code was derived from the nightly_gzip cron script + +import sys +import os +from stat import * +import time +from types import * +import getopt +import paths +import errno +from Mailman import MailList +from Mailman import Utils +from Mailman import mm_cfg +from Mailman.Archiver import HyperArch +from Mailman.i18n import _ + +program = sys.argv[0] + +def usage(code, msg=''): + print >> sys.stderr, _( __doc__) + if msg: + print msg + sys.exit(code) + +def main(): + try: + opts, args = getopt.getopt(sys.argv[1:], 'vh', ['verbose', 'help']) + except getopt.error, msg: + usage(1, msg) + + # defaults + verbose = 0 + for opt, arg in opts: + if opt in ('-h', '--help'): + usage(0) + elif opt in ('-v', '--verbose'): + verbose = 1 + + # limit to the specified lists? + if args: + listnames = args + else: + listnames = Utils.list_names() + + # process all the specified lists + for name in listnames: + mlist = MailList.MailList(name, lock=0) + if not mlist.archive: + continue + archive_dir = mlist.archive_dir() + try: + os.listdir(archive_dir) + except os.error: + # has the list received any messages? if not, + # last_post_time will + # be zero, so it's not really a bogus archive dir. + if mlist.last_post_time > 0: + print _( + 'List %(name)s has a bogus archive dir: %(archive_dir)s') + continue + + # check htdig has been set up for this list and skip it if not + list_htdig_dir = os.path.join(archive_dir, 'htdig') + if not os.path.exists(list_htdig_dir): + if verbose: + print _('Skipping htdig for list; no htdig setup: %(name)s') + continue + + # check if there have been any archive files created since we + # last ran htdig and skip list if not. well actually we only + # test if the archive volume directories mod times have + # changed + recent_posts = None + rundig_run_file = os.path.join(list_htdig_dir, 'rundig_last_run') + archive = HyperArch.HyperArchive(mlist) + try: + last_rundig_time = os.stat(rundig_run_file)[ST_MTIME] + except OSError, e: + if e.errno <> errno.ENOENT: raise + open(rundig_run_file, 'w').close() + recent_posts = 1 + else: + for volume in archive.archives: + archive_name = os.path.join(archive_dir, volume) + last_archive_change = os.stat(archive_name)[ST_MTIME] + if last_archive_change > last_rundig_time: + recent_posts = 1 + break + if not recent_posts: + if verbose: + print _('Skipping htdig for list; no recent posts: %(name)s') + continue + + # ok, so running htdig is worthwhile + if verbose: + print _("htdig'ing archive of list: %(name)s") + htdig_conf_file = os.path.join(list_htdig_dir, name + '.conf') + cmd = '%s -c %s' % (mm_cfg.HTDIG_RUNDIG_PATH, htdig_conf_file) + status = (os.system(cmd) >> 8) & 0xff + if status: + print _('rundig failed for list %(name)s, exit code: %(status)s') + else: + os.utime(rundig_run_file, None) + archive.write_TOC() + +if __name__ == '__main__' and \ + mm_cfg.USE_HTDIG and \ + mm_cfg.ARCHIVE_TO_MBOX in (0, 2) and \ + os.path.exists(mm_cfg.HTDIG_RUNDIG_PATH): + # we're only going to run the nightly rundig if messages are + # archived to the internal archiver, we are using htdig to provide + # archive search and we know where rundig is. + omask = os.umask(002) + try: + main() + finally: + os.umask(omask) diff -r -u -P mailman-2.1-index/cron/remote_nightly_htdig mailman-2.1-htdig/cron/remote_nightly_htdig --- mailman-2.1-index/cron/remote_nightly_htdig Thu Jan 1 01:00:00 1970 +++ mailman-2.1-htdig/cron/remote_nightly_htdig Thu Jan 2 15:09:00 2003 @@ -0,0 +1,163 @@ +#! @PYTHON@ +# +# Copyright (C) 2002 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. +# +"""Python script to re-generate the htdig archive search files. Read +INSTALL.htdig-mm to determine if you should be running this script. + +This script has to be edited before use to provide a value for the +configuration parameter MAILMAN_PATH. The value should be the path +to Mailman's installation directory as seen by the script. + +This script should normally be run nightly from cron. When run from the +command line, the following usage is understood: + +Usage: %(program)s [-v] [-h] [listnames] + +Where: + --verbose / -v + print each list as htdig is run for it + + --help / -h + print this message and exit + + listnames + Optionally, only runs htdig for the named lists. Without + this, all archivable lists are processed. + +""" + +# this code was derived from the nightly_gzip cron script + +MAILMAN_PATH = '' + +import sys +import os +from stat import * +import time +import getopt +import string +from types import * +import errno + +import urllib, urlparse +import paths + +program = sys.argv[0] + +def usage(code, msg=''): + print >> sys.stderr, _( __doc__) + if msg: + print msg + sys.exit(code) + +def main(): + try: + opts, args = getopt.getopt(sys.argv[1:], 'vhm', ['verbose', 'help']) + except getopt.error, msg: + usage(1, msg) + # defaults + verbose = 0 + for opt, arg in opts: + if opt in ('-h', '--help'): + usage(0) + elif opt in ('-v', '--verbose'): + verbose = 1 + # limit to the specified lists? + if args: + listnames = map(string.lower, args) + else: + listnames = Utils.list_names() + # process all the specified lists + for name in listnames: + mlist = MailList.MailList(name, lock=0) + if not mlist.archive: + continue + archive_dir = os.path.join(mm_cfg.REMOTE_PRIVATE_ARCHIVE_FILE_DIR, + name + '/') + try: + os.listdir(archive_dir) + except os.error: + # has the list received any messages? if not, last_post_time + # will be zero, so it's not really a bogus archive dir. + if mlist.last_post_time > 0: + print _( + 'List %(name)s has a bogus archive dir: %(archive_dir)s') + continue + # check htdig has been set up for this list and skip it if not + list_htdig_dir = os.path.join(archive_dir, 'htdig') + if not os.path.exists(list_htdig_dir): + if verbose: + print _( + 'Skipping remote htdig for list; nno htdig setup: %(name)s') + continue + # check if there have been any archive files created since we + # last ran htdig and skip list if not. well actually we only + # test if the archive volume directories mod times have changed + recent_posts = None + rundig_run_file = os.path.join(list_htdig_dir, 'rundig_last_run') + archive = HyperArch.HyperArchive(mlist) + try: + last_rundig_time = os.stat(rundig_run_file)[ST_MTIME] + except OSError, e: + if e.errno <> errno.ENOENT: raise + open(rundig_run_file, 'w').close() + recent_posts = 1 + else: + for volume in archive.archives: + archive_name = os.path.join(archive_dir, volume) + last_archive_change = os.stat(archive_name)[ST_MTIME] + if last_archive_change > last_rundig_time: + recent_posts = 1 + break + if not recent_posts: + if verbose: + print _('Skipping htdig for list; no recent posts: %(name)s') + continue + # ok, so running htdig is worthwhile + if verbose: + print _("htdig'ing archive of list: %(name)s") + htdig_conf_file = os.path.join(list_htdig_dir, name + '.conf') + cmd = '%s -c %s' % (mm_cfg.HTDIG_RUNDIG_PATH, htdig_conf_file) + status = (os.system(cmd) >> 8) & 0xff + if status: + print _('rundig failed for list %(name)s, exit code: %(status)s') + else: + os.utime(rundig_run_file, None) + +if __name__ == '__main__' and os.path.exists(MAILMAN_PATH): + # Access the mailman installation + sys.path = [MAILMAN_PATH] + sys.path + from Mailman import MailList + from Mailman import Utils + from Mailman import mm_cfg + from Mailman.Archiver import HyperArch + from Mailman.i18n import _ + if mm_cfg.USE_HTDIG and \ + mm_cfg.ARCHIVE_TO_MBOX in (0, 2) and \ + os.path.exists(mm_cfg.HTDIG_RUNDIG_PATH): + # we're only going to run the nightly rundig if messages are archived to + # the internal archiver, we are using htdig to provide archive search + # and we know where rundig is. + omask = os.umask(002) + try: + main() + finally: + os.umask(omask) +else: + print 'Invalid configuration variables' + print 'Edit this script in accordance with INSTALL.htdig-mm' diff -r -u -P mailman-2.1-index/cron/remote_nightly_htdig.pl mailman-2.1-htdig/cron/remote_nightly_htdig.pl --- mailman-2.1-index/cron/remote_nightly_htdig.pl Thu Jan 1 01:00:00 1970 +++ mailman-2.1-htdig/cron/remote_nightly_htdig.pl Wed Jan 22 11:53:16 2003 @@ -0,0 +1,151 @@ +#! /usr/bin/env perl +# +# Copyright (C) 2002 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. +# + +# this code was derived from the nightly_gzip cron script + +my $REMOTE_PRIVATE_ARCHIVE_FILE_DIR = ''; +my $HTDIG_RUNDIG_PATH = ''; + +use strict; +use File::Spec; +use File::stat; +use LWP::Simple; +use Getopt::Long; + +my $doc = < \$VERBOSE, + "help" => \$help); + if ($help) { + usage(0); + } + # limit to the specified lists? + my @listnames = (); + if (scalar(@ARGV)) { + @listnames = map lc, @ARGV; + } else { + opendir(DIR, $REMOTE_PRIVATE_ARCHIVE_FILE_DIR); + @listnames = grep { $_ ne '.' and $_ ne '..' and ! /\.mbox$/ } readdir DIR; + closedir(DIR); + } + # process all the specified lists + foreach my $name (@listnames) { + my $archive_dir = File::Spec->catfile($REMOTE_PRIVATE_ARCHIVE_FILE_DIR, $name); + next if (-e not $archive_dir); + # check htdig has been set up for this list and skip it if not + my $list_htdig_dir = File::Spec->catfile($archive_dir, 'htdig'); + if (not -e $list_htdig_dir) { + print "Skipping remote htdig for list $name, no htdig setup\n" if $VERBOSE; + next + } + # check if there have been any archive files created since we + # last ran htdig and skip list if not. well actually we only + # test if the archive volume directories mod times have changed + my $recent_posts = 0; + my $rundig_run_file = File::Spec->catfile($list_htdig_dir, 'rundig_last_run'); + if (-e $rundig_run_file){ + my $last_rundig_time = stat($rundig_run_file)->mtime(); + opendir(DIR, $archive_dir); + my @volumes = grep { $_ ne '.' and $_ ne '..' } readdir DIR; + closedir(DIR); + foreach my $volume (@volumes) { + my $archive_name = File::Spec->catfile($archive_dir, $volume); + my $last_archive_change = stat($archive_name)->mtime(); + if ($last_archive_change > $last_rundig_time) { + $recent_posts = 1; + last; + } + } + } else { + $recent_posts = 1; + } + if (not $recent_posts) { + if ($VERBOSE) { + print "Skipping htdig for list $name, no recent posts\n"; + } + next; + } + # ok, so running htdig is worthwhile + if ($VERBOSE) { + print "htdig'ing archive of list $name\n"; + } + my $htdig_conf_file = File::Spec->catfile($list_htdig_dir, $name.'.conf'); + my @cmd = ($HTDIG_RUNDIG_PATH, '-c', $htdig_conf_file); + my $status = system(@cmd) >> 8 & 0xFF; + if ($status) { + print "rundig failed for list, $name, exit code, $status\n"; + } else { + system(("touch", $rundig_run_file)); + } + } +} + +if (-x $HTDIG_RUNDIG_PATH and + -d $REMOTE_PRIVATE_ARCHIVE_FILE_DIR) { + # we're only going to run the nightly rundig if we have a sensible + # set of configuration variables and we know where rundig is. + $omask = umask; + umask(002); + eval { main() }; + my $res = $@; + umask($omask); + die $res if ($res); +} else { + die "Invalid configuration variables.\nEdit this script in accordance with INSTALL.htdig-mm\n"; +} diff -r -u -P mailman-2.1-index/cron/remote_nightly_htdig_noshare mailman-2.1-htdig/cron/remote_nightly_htdig_noshare --- mailman-2.1-index/cron/remote_nightly_htdig_noshare Thu Jan 1 01:00:00 1970 +++ mailman-2.1-htdig/cron/remote_nightly_htdig_noshare Wed Jan 22 11:49:07 2003 @@ -0,0 +1,151 @@ +#! @PYTHON@ +# +# Copyright (C) 2002 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. +# +"""Python script to re-generate the htdig archive search files. Read +INSTALL.htdig-mm to determine if you should be running this script. + +This script has to be edited before use to provide values for the +configuration parameters REMOTE_PRIVATE_ARCHIVE_FILE_DIR and +HTDIG_RUNDIG_PATH. The values should be the same as those acquired by +other of Mailman's python code from $prefix/Mailman/Defaults.py or +overridden in $prefix/Mailman/mm_cfg.py. + +This script should normally be run nightly from cron. When run from the +command line, the following usage is understood: + +Usage: %(program)s [-v] [-h] [listnames] + +Where: + --verbose / -v + print each list as htdig is run for it + + --help /-h + print this message and exit + + listnames + Optionally, only runs htdig for the named lists. Without + this, all archivable lists are processed. + +""" + +# this code was derived from the nightly_gzip cron script + +REMOTE_PRIVATE_ARCHIVE_FILE_DIR = '' +HTDIG_RUNDIG_PATH = '' + +import sys +import os +from stat import * +import time +from types import * +import getopt +import urllib +import urlparse +import string +import errno + +program = sys.argv[0] + +def usage(code, msg=''): + print __doc__ % globals() + if msg: + print msg + sys.exit(code) + +def main(): + try: + opts, args = getopt.getopt(sys.argv[1:], 'vhm', ['verbose', 'help']) + except getopt.error, msg: + usage(1, msg) + + # defaults + verbose = 0 + for opt, arg in opts: + if opt in ('-h', '--help'): + usage(0) + elif opt in ('-v', '--verbose'): + verbose = 1 + # limit to the specified lists? + if args: + listnames = map(string.lower, args) + else: + listnames = filter(lambda m: m[-5:] != '.mbox', + os.listdir(REMOTE_PRIVATE_ARCHIVE_FILE_DIR)) + # process all the specified lists + listnames.sort() + for name in listnames: + archive_dir = os.path.join(REMOTE_PRIVATE_ARCHIVE_FILE_DIR, name) + # check if this list has an archive and skip it if not + if not os.path.exists(archive_dir): + if verbose: + print 'Skipping remote htdig for list', name, 'no archive' + continue + # check htdig has been set up for this list and skip it if not + list_htdig_dir = os.path.join(archive_dir, 'htdig') + if not os.path.exists(list_htdig_dir): + if verbose: + print 'Skipping remote htdig for list', name, 'no htdig setup' + continue + # check if there have been any archive files created since we + # last ran htdig and skip list if not. well actually we only + # test if the archive volume directories mod times have changed + recent_posts = None + rundig_run_file = os.path.join(list_htdig_dir, 'rundig_last_run') + try: + last_rundig_time = os.stat(rundig_run_file)[ST_MTIME] + except OSError, e: + if e.errno <> errno.ENOENT: raise + open(rundig_run_file, 'w').close() + recent_posts = 1 + else: + for volume in os.listdir(archive_dir): + archive_name = os.path.join(archive_dir, volume) + last_archive_change = os.stat(archive_name)[ST_MTIME] + if last_archive_change > last_rundig_time: + recent_posts = 1 + break + if not recent_posts: + if verbose: + print 'Skipping htdig for list', name, 'no recent posts' + continue + # ok, so running htdig is worthwhile + if verbose: + print "htdig'ing archive of list", name + htdig_conf_file = os.path.join(list_htdig_dir, name + '.conf') + cmd = '%s -c %s' % (HTDIG_RUNDIG_PATH, htdig_conf_file) + status = (os.system(cmd) >> 8) & 0xff + if status: + print 'rundig failed for list %s, exit code: %s' % (name, status) + else: + os.utime(rundig_run_file, None) + +if __name__ == '__main__' and \ + os.path.exists(REMOTE_PRIVATE_ARCHIVE_FILE_DIR) and \ + os.path.exists(HTDIG_RUNDIG_PATH): + # we're only going to run the nightly rundig if we have a sensible + # set of configuration variables and we know where rundig is. + omask = os.umask(002) + try: + main() + finally: + os.umask(omask) +else: + print "Invalid configuration variables" + print "Edit this script in accordance with INSTALL.htdig-mm" + + diff -r -u -P mailman-2.1-index/src/Makefile.in mailman-2.1-htdig/src/Makefile.in --- mailman-2.1-index/src/Makefile.in Thu Dec 12 04:30:37 2002 +++ mailman-2.1-htdig/src/Makefile.in Thu Jan 2 15:09:00 2003 @@ -70,7 +70,8 @@ # Fixed definitions CGI_PROGS= admindb admin confirm create edithtml listinfo options \ - private rmlist roster subscribe + private rmlist roster subscribe \ + htdig COMMONOBJS= common.o vsnprintf.o @@ -78,7 +79,7 @@ #ALIAS_PROGS= addaliases -SUID_CGI_PROGS= private +SUID_CGI_PROGS= private htdig SUID_MAIL_PROGS= diff -r -u -P mailman-2.1-index/templates/en/TOC_htsearch.html mailman-2.1-htdig/templates/en/TOC_htsearch.html --- mailman-2.1-index/templates/en/TOC_htsearch.html Thu Jan 1 01:00:00 1970 +++ mailman-2.1-htdig/templates/en/TOC_htsearch.html Thu Jan 2 15:09:00 2003 @@ -0,0 +1,40 @@ +

+ To search this archive fill in the following form: +

+

+

+ + Match: + Format: + Sort by: + + + + +
+ Search: + + +
+

+

+ Note:The archive search index was last rebuilt at + %(lastrun)s. Any postings after that will not be found by + a search. Index rebuild is usally done once every 24 hours for + this list. You can use a "View by date" link below to access + more recent postings. +

diff -r -u -P mailman-2.1-index/templates/en/archtoc.html mailman-2.1-htdig/templates/en/archtoc.html --- mailman-2.1-index/templates/en/archtoc.html Thu Jan 2 15:05:36 2003 +++ mailman-2.1-htdig/templates/en/archtoc.html Thu Jan 2 15:09:00 2003 @@ -13,6 +13,7 @@ or you can download the full raw archive (%(size)s).

+ %(htsearch)s %(noarchive_msg)s %(archive_listing_start)s %(archive_listing)s diff -r -u -P mailman-2.1-index/templates/en/htdig_access_error.html mailman-2.1-htdig/templates/en/htdig_access_error.html --- mailman-2.1-index/templates/en/htdig_access_error.html Thu Jan 1 01:00:00 1970 +++ mailman-2.1-htdig/templates/en/htdig_access_error.html Thu Jan 2 15:09:00 2003 @@ -0,0 +1,22 @@ + + + htdig Archives Access Failure + + +

htdig Archives Access Failure

+%(error)s +

+ If you want to make another attempt to access a list archive then go via the + list users information page. +

+

+ If this problem persists then please e-mail the following information to the +%(mailto)s: +

+
+    %(referer)s
+    %(uri)s
+
+
+ + diff -r -u -P mailman-2.1-index/templates/en/htdig_auth_failure.html mailman-2.1-htdig/templates/en/htdig_auth_failure.html --- mailman-2.1-index/templates/en/htdig_auth_failure.html Thu Jan 1 01:00:00 1970 +++ mailman-2.1-htdig/templates/en/htdig_auth_failure.html Thu Jan 2 15:09:00 2003 @@ -0,0 +1,22 @@ +

+ You are not authorised to access the URL referenced. +

+

+ This access failure may be due to: +

+
    +
  1. + If cookies are disabled in your browser then your attempt to + authenticate yourself for access to the desired list will have + been compromised. You should enable cookies in your browser and + try again. +
  2. +
  3. + You have not attempted to authenticate yourself and are trying + to access private data. +
  4. +
  5. + An earlier attempt to authenticate yourself for access to private + data failed. +
  6. +
diff -r -u -P mailman-2.1-index/templates/en/htdig_conf.txt mailman-2.1-htdig/templates/en/htdig_conf.txt --- mailman-2.1-index/templates/en/htdig_conf.txt Thu Jan 1 01:00:00 1970 +++ mailman-2.1-htdig/templates/en/htdig_conf.txt Thu Jan 2 15:09:00 2003 @@ -0,0 +1,52 @@ +# There is nothing to language translate in this template which is for the +# Mailman-htdig integration +# +# This is taken from the example config file for ht://Dig, with most comments excised +# See the htdig.conf from the distribution you have installed +# +# This is the template for the per mailing list htddig.conf files +# +%(htdig_extras)s +database_dir: %(databases)s +start_url: %(starturl)s +limit_urls_to: ${start_url} +local_urls: %(urlpath)s=%(filepath)s +local_urls_only: true +url_part_aliases: %(url_part_aliases)s +noindex_end: %(indexing_enable)s +noindex_start: %(indexing_disable)s +exclude_urls: /cgi-bin/ .cgi +bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \ + .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi +maintainer: %(maintainer)s +max_head_length: 10000 +max_doc_size: 200000 +no_excerpt_show_top: true +search_algorithm: exact:1 synonyms:0.5 endings:0.1 +template_map: Long long ${common_dir}/long.html \ + Short short ${common_dir}/short.html +template_name: short +next_page_text: next +no_next_page_text: +prev_page_text: prev +no_prev_page_text: +page_number_text: '1' \ + '2' \ + '3' \ + '4' \ + '5' \ + '6' \ + '7' \ + '8' \ + '9' \ + '10' +no_page_number_text: '1' \ + '2' \ + '3' \ + '4' \ + '5' \ + '6' \ + '7' \ + '8' \ + '9' \ + '10' diff -r -u -P mailman-2.1-index/templates/fr/archtoc.html mailman-2.1-htdig/templates/fr/archtoc.html --- mailman-2.1-index/templates/fr/archtoc.html Thu Jan 2 15:05:36 2003 +++ mailman-2.1-htdig/templates/fr/archtoc.html Thu Jan 2 15:09:00 2003 @@ -12,6 +12,7 @@ propos de cette liste ou vous pouvez  télécharger les archives complètes (%(size)s).

+ %(htsearch)s %(noarchive_msg)s %(archive_listing_start)s %(archive_listing)s diff -r -u -P mailman-2.1-index/templates/hu/archtoc.html mailman-2.1-htdig/templates/hu/archtoc.html --- mailman-2.1-index/templates/hu/archtoc.html Thu Jan 2 15:05:36 2003 +++ mailman-2.1-htdig/templates/hu/archtoc.html Thu Jan 2 15:09:00 2003 @@ -13,6 +13,7 @@ vagy letöltheted a teljes nyers archívumát (%(size)s).

+ %(htsearch)s %(noarchive_msg)s %(archive_listing_start)s %(archive_listing)s diff -r -u -P mailman-2.1-index/templates/it/archtoc.html mailman-2.1-htdig/templates/it/archtoc.html --- mailman-2.1-index/templates/it/archtoc.html Thu Jan 2 15:05:36 2003 +++ mailman-2.1-htdig/templates/it/archtoc.html Thu Jan 2 15:09:00 2003 @@ -13,6 +13,7 @@ o puoi scaricare l'intero archivio grezzo (%(size)s).

+ %(htsearch)s %(noarchive_msg)s %(archive_listing_start)s %(archive_listing)s diff -r -u -P mailman-2.1-index/templates/ja/archtoc.html mailman-2.1-htdig/templates/ja/archtoc.html --- mailman-2.1-index/templates/ja/archtoc.html Thu Jan 2 15:05:36 2003 +++ mailman-2.1-htdig/templates/ja/archtoc.html Thu Jan 2 15:09:00 2003 @@ -13,6 +13,7 @@ Á´Éô¤Î¥á¡¼¥ë¤òmbox·Á¼°¤Ç¥À¥¦¥ó¥í¡¼¥É (%(size)s).

+ %(htsearch)s %(noarchive_msg)s %(archive_listing_start)s %(archive_listing)s diff -r -u -P mailman-2.1-index/templates/lt/archtoc.html mailman-2.1-htdig/templates/lt/archtoc.html --- mailman-2.1-index/templates/lt/archtoc.html Thu Jan 2 15:05:36 2003 +++ mailman-2.1-htdig/templates/lt/archtoc.html Thu Jan 2 15:09:00 2003 @@ -12,6 +12,7 @@ Èia - atsisiøsti visà forumo archyvà. (%(size)s).

+ %(htsearch)s %(noarchive_msg)s %(archive_listing_start)s %(archive_listing)s diff -r -u -P mailman-2.1-index/templates/no/archtoc.html mailman-2.1-htdig/templates/no/archtoc.html --- mailman-2.1-index/templates/no/archtoc.html Thu Jan 2 15:05:36 2003 +++ mailman-2.1-htdig/templates/no/archtoc.html Thu Jan 2 15:09:00 2003 @@ -13,6 +13,7 @@ eller du kan laste ned hele arkivet (%(size)s).

+ %(htsearch)s %(noarchive_msg)s %(archive_listing_start)s %(archive_listing)s diff -r -u -P mailman-2.1-index/templates/ru/archtoc.html mailman-2.1-htdig/templates/ru/archtoc.html --- mailman-2.1-index/templates/ru/archtoc.html Thu Jan 2 15:05:36 2003 +++ mailman-2.1-htdig/templates/ru/archtoc.html Thu Jan 2 15:09:00 2003 @@ -12,6 +12,7 @@ ÒÁÓÓÙÌËÉ. ÷Ù ÔÁËÖÅ ÍÏÖÅÔÅ ÚÁÇÒÕÚÉÔØ ×ÅÓØ ÁÒÈÉ× × ÆÏÒÍÁÔÅ mbox (%(size)s).

+ %(htsearch)s %(noarchive_msg)s %(archive_listing_start)s %(archive_listing)s diff -r -u -P mailman-2.1-index/templates/sv/archtoc.html mailman-2.1-htdig/templates/sv/archtoc.html --- mailman-2.1-index/templates/sv/archtoc.html Thu Jan 2 15:05:36 2003 +++ mailman-2.1-htdig/templates/sv/archtoc.html Thu Jan 2 15:09:00 2003 @@ -13,6 +13,7 @@ så kan du ladda ner det kompletta arkivet (%(size)s).

+ %(htsearch)s %(noarchive_msg)s %(archive_listing_start)s %(archive_listing)s