diff -r -u -P mailman-2.0.9-index/INSTALL mailman-2.0.9-htdig/INSTALL --- mailman-2.0.9-index/INSTALL Thu Nov 16 21:57:37 2000 +++ mailman-2.0.9-htdig/INSTALL Mon Apr 8 18:00:35 2002 @@ -333,6 +333,11 @@ mailman site administrator the ability to adjust these things when necessary. + - If you want to use htdig for searching your mail archives using + the Mailman-htdig integration developed by Richard Barrett + (r.barrett@ftel.co.uk) then see the instructions in + INSTALL.htdig-mm. + 6. Getting started - Create a list named `test'. To do so, run the program diff -r -u -P mailman-2.0.9-index/INSTALL.htdig-mm mailman-2.0.9-htdig/INSTALL.htdig-mm --- mailman-2.0.9-index/INSTALL.htdig-mm Thu Jan 1 01:00:00 1970 +++ mailman-2.0.9-htdig/INSTALL.htdig-mm Mon Apr 8 18:18:20 2002 @@ -0,0 +1,865 @@ +Installing and Using the Mailman-htdig Integration +================================================== + +This patch: + +http://sourceforge.net/tracker/index.php?func=detail&aid=444884&group_id=103&atid=300103 + +Contents +======== + +Prereqisites +Compatibility +History +Introduction +Installing and Building Mailman with this patch +What is Installed by the Patch +Configuration of Mailman-htdig Integration + Health Warning on the packet! + Starting from Scratch (Again) + General + Local htdig Configuration + Remote htdig Configuration + Upgrading an Existing Standard Mailman Installation + Changing from local to remote htdig or vice versa + Coping with htdig Upgrades + Changing the Addressing Scheme of your web_page_url +Operational Information +Notes and Warnings +Contributors +Appendices + Appendix 1 -Technique for htdigging when Mailman's DEFAULT_URL uses the + https + +Prerequisites +============ + +Prior to installing this patch you should also have installed the patch that +provides enhanced indexing of Mailman archives see: + +http://sourceforge.net/tracker/index.php?func=detail&aid=444879&group_id=103&atid=300103 + +You must have a working installation of htdig with htsearch available via CGI on +your HTTP server installed on either the machine on which you are running +Mailman or on another machine which has access to Mailman list archives via NFS +or some similarly competent network file sharing scheme. + +Regardless of how you configure things to provide Mailman's Web UI, if its gives +normal operation of the /mailman/private CGI script for providing access to +private list archives, it should also support access to htdig search results via +the /mailman/htdig CGI script. + +Compatibility +============= + +htdig-2.0.9-0.1.patch - Mailman 2.0.9 + +htdig-2.0.8-0.1.patch - Mailman 2.0.8, 2.0.7, 2.0.6 and probably 2.0.3, 2.0.4 +and 2.0.5 + +History +======= + +Previous versions - original versions of this patch provided most of the +features described here with the main exception being support for remote htdig, +that is running htdig on a different system to Mailman. They were also baked in +some configuration assumptions, which are now configurable. + +htdig-2.0.9-0.1.patch - latest version: + + 1. minor cosmetic changes to get clean patch application to MM 2.0.9 + +htdig-2.0.8-0.1.patch: + + 1. resolves a problem with the integration of htdig when the web_page_url +for a list, which is usually the same as DEFAULT_URL from either +$prefix/Mailman/Defaults.py or $prefix/Mailman/mm_cfg.py, doesn't use the http +addressing scheme. This arises because htdig will only build indices if the URLs +for pages use the http addressing scheme. There is a work-around for this +problem posted in htdig's mail archives - see the copy in Appendix 1 to this +document. + + 2. This patch revision implements the solution documented in that e-mail. If +non-http URLs are used by the web_page_url of a list an additional htdig +configuration file for use by htsearch is generated. + + 3. In all other respects the operation of the Mailman-htdig integration +remains unchanged. There is no benefit in upgrading to this revised patch unless +you need to use other than http addressing in your DEFAULT_URL or set other than +http addressing in the web_page_url configuration of any of your lists. + + 4. If changing to or from a non-http addressing scheme then the per list +htdig config files of the lists affected and their associated htdig indices must +be reconstructed. See the section below entitled 'Changing the Addressing Scheme +of your web_page_url' for details of how to do this. + +htdig-2.0.6-0.3.patch: + + 1. adds support for remote htdig, that is: running htdig on a different +system to Mailman. + + 2. enhances the configurability of the integration. Some of the programmed +assumptions made in previous versions are now configurable in mm_cfg.py. The +configuration variables concerned default to the previous fixed values so +that this version is backwards compatible with earlier versions. + + 3. does some minor cosmetic code changes. + + 4. extends the associated documentation. + +Introduction +============ + +This integration enables use of the htdig (http://www.htdig.org) search engine +for searching mail list archives produced by pipermail, Mailman's built-in +archiver. + +You can use htdig without applying these patches to Mailman but you may find it +awkward to achieve some of the features offered by this patch. + +The main features of the patch are: + + 1. per list search facility with a search form on each list's TOC page. + + 2. maintenance of privacy of private archives. The user has to establish +their credentials via the normal private archive access mechanism before any +access via htdig is allowed. + + 3. a common base URL for both public and private archive access via htsearch +results. This means that htdig indices are unaffected by changing an archive +from private to public and vice versa. All access to archives via htdig is +controlled by a wrapped CGI script called htdig.py. + + 4. Choice of running htdig on the machine running Mailman (aka local htdig) +or running htdig on another machine which has access to Mailman's archives +via NFS or some similarly competent network file sharing scheme (aka remote +htdig). + + 5. cron activated scripts and crontab entry to run htdig regularly to +maintain the per list search indices. + + 6. automatic creation, deletion and maintenance of htdig configuration files +and such. Beyond installing htdig and telling Mailman where it is via mm_cfg +you do not have to do much other setup. + +Installing and Building Mailman with this patch +============================================== + +Create your Mailman build directory in the normal way. + +You can apply the patch to either a fresh expansion of the Mailman source +distribution or the one you used to build a currently working Mailman +installation. + +Execute the following command in the Mailman build directory: + + patch -p1 < htdig-2.0.8-0.1.patch + +Follow the configure and make procedures for regular Mailman as given in the +$build/INSTALL file + +Then follow the Mailman-htdig configuration instructions given below. + +What is Installed by the Patch +============================== + +The patch amends: +---------------- + +$prefix/Mailman/Archiver/HyperArch.py + + the changes in this file set up the per list htdig stuff such as config + files and adds the search forms to the list TOC pages. + +$build/Mailman/Defaults.py.in + + adds the default configuration variables needed to support the mailman-htdig + integration + +$build/cron/crontab.in.in + + adds the nightly_htdig cron script to the default crontab + +$build/Makefile.in +$build/cron/Makefile.in +$build/src/Makefile.in +$build/bin/Makefile.in + + necessary changes to Makefiles used for installing Mailman + +The patch adds: +-------------- + +$prefix/cgi-bin/htdig +$prefix/Mailman/Cgi/htdig.py + + these are a CGI script and its wrapper, which is always on the path of URLs + returned from searches of htdig indices. The script provides secure access + to such URLs in the same way that the $prefix/cgi-bin/private and + $prefix/Mailman/Cgi/private.py. htdig.py ensures private archives are kept + private, applying the same criteria for permitting access as private.py, + and delivering material from public archives without demanding any + authentication. + +$prefix/bin/blow_away_htdig + + this is a utility script for removing per list htdig data, e.g. the config + file and indices/db files. This is necessary when: + + a. ceasing use of the Mailman-htdig integration + + b. moving from local to remote htdig or vice-versa + + c. upgrading to a version of htdig which has an incompatible + index/db file format + + d. changing the addressing scheme (http versus https) in the + web_page_url configuration variable of a list + +$prefix/cron/nightly_htdig +$prefix/cron/remote_nightly_htdig +$prefix/cron/remote_nightly_htdig_noshare +$prefix/cron/remote_nightly_htdig.pl + + These scripts all do the same thing; they can be installed as a cron task + and run regularly to invoke htdig's rundig script to update mailing list + search indices. Only one of these scripts is used, the choice of which + depending on your system configuration. + + nightly_htdig is used where Mailman and htdig run on the same system. + + the remote_... scripts are used where Mailman and htdig live on different + systems. You choose which one suits your needs best: + + remote_nightly_htdig uses the same python files on both systems, that is + the same .py and .pyc files are accessed, and it hence depends on + compatible bytecode between the Mailman system and htdig system. It also + accesses Mailman data files and depends on compatibility of data files + contents, for example pickled python values. This should work OK if the + same version of python is being run on both systems even where the + systems are not heterogeneous, for example one is Sun/Solaris and the + other is PC/Linux. + + remote_nightly_htdig_noshare shares no python files between the two + systems. While it is still written in python it but acquires information + from the file system using directory listings and stat operations. + + remote_nightly_htdig.pl is a rewrite of remote_nightly_htdig_noshare in + Perl. It is for use where the htdig system does not have python + available on it: in which case, shame on you. + +$prefix/cgi-bin/updateTOC +$prefix/Mailman/Cgi/updateTOC.py + + these are a CGI script and its wrapper, for use where Mailman and htdig + live on different systems. The script is a work-around for the problem of + using remote_nightly_htdig, remote_nightly_htdig_noshare or + remote_nightly_htdig.pl which precludes these scripts from directly updating + the TOC page of each archived list. Instead, these scripts call this CGI + script to do that for them. This CGI script will not operate when entered as + a URL from a browser. + +Configuration of Mailman-htdig Integration +========================================== + +Configuration of the Mailman-htdig integration is carried out on the Mailman +side. While you must have to hand some information about your htdig +installation, you should not have to tinker with htdig for the integration to +work. + +Most of the configuration of the integration is done by values assigned to +python variables in either $prefix/Mailman/Defaults.py or +$prefix/Mailman/mm_cfg.py. + +If you opt to run htdig on a different machine or under a different HTTP server +to the one running the HTTP server which provides Mailman's Web UI you will also +have to edit whichever of the patch's three htdig related cron scripts you opt +to run (remote_nightly_htdig, remote_nightly_htdig_noshare, or +remote_nightly_htdig.pl) to add a small amount of configuration information. + +Health Warning on the packet! +----------------------------- + +Be careful when editing configuration information in $prefix/Mailman/mm_cg.py: +the only Mailman config file you should be editing. Check, double check and then +recheck before going ahead. If you get either variable names or their values +wrong a lot of confusion in the operation of both Mailman and htdig can result. +You (and others supporting you) can spend hours trying to identify problems and +looking for non-existent bugs as a consequence of such editing errors. Expect to +find errors in these instructions; compensate for them and tell me when you do +(r.barrett@ftel.co.uk). + +Also do read the htdig documentation, release notes etc. This patch integrates a +working htdig with htsearch available through CGI. These notes are about Mailman +and integrating it with that working htdig. It is up to you to sort out the +htdig end of things. + +Starting from Scratch (Again) +----------------------------- + +This is getting ahead of things but some of you may already be asking "What if +I've already been using an older version of this patch and want to start +afresh", or "I want to change from local to remote htdig or vice versa" + +In these cases your friend will be the $prefix/bin/blow_away_htdig script. It +removes existing htdig related stuff out of your Mailman installation to the +extent that it was added by this patch and added to by the normal operation of +pipermail and nightly_htdig. With that removed and a revised Mailman +configuration, the patched code will start rebuilding the htdig data. + +But before you get carried away with blow_away_htdig, read the rest of these +notes. + +General +------- +This patch adds a number of default variables to the file +$prefix/Mailman/Defaults.py that affect operation of the Mailman-htdig +integration. These are in addition to the standard Mailman defaults in that +file. If, in the light of what is said below, you decide any of these are +incorrect, you can override them in $prefix/Mailman/mm_cfg.py [NOT IN +Defaults.py! See the comments in Defaults.py for details]. + +By default the Mailman-htdig integration is NOT ENABLED by the installation of +this patch; a default variable in Defaults.py turns off the operation of the +integration. You have to actively override that default in mm_cfg.py to turn on +operation of the integration. + +Once a list is created, changing most of these variables will have either no +effect or a bad effect. You will need to run $prefix/bin/blow_away_htdig script +and/or $prefix/bin/arch to rebuild the archive pages if you make significant +changes to the Mailman-htdig integration configuration variables. + +The install process will not overwrite an existing mm_cfg.py file so you can +freely make changes to this file. If you are re-installing a later version of +this patch you may have to change what is already configured in the existing +file and, if necessary, add extra configuration variables to it. + +Most of the Mailman-htdig control variables default to sensible values which you +will not need to change, especially if you are using local htdig. The semantics +of most variables apply to both local and remote htdig operation but with some +the values assigned will depend on whether htdig is viewing things from the same +or a remote machine. + +The first two variables control what is indexed by htdig. The values assigned +are both embedded in the HTML generated by pipermail in the list archives and +added. Changing the values of these variables will mean that all previously +generated HTML pages in list archives will be out of date and you will probably +want to rebuild existing archives using $prefix/bin/arch: + +ARCHIVE_INDEXING_ENABLE + + defines a string telling htdig that it should look at the following material + when building it indices. + + Default: ARCHIVE_INDEXING_ENABLE = '' + +ARCHIVE_INDEXING_DISABLE + + defines a string telling htdig that it not should not look at the following + material when building it indices. + + Default: ARCHIVE_INDEXING_DISABLE = '' + +USE_HTDIG - Semantics 0 - don't use integrated htdig, 1 - use it + + turns Mailman-htdig integration on or off. + + Defaults: USE_HTDIG = 0 + + Notes: + + 1. when USE_HTDIG is turned on the patched code in Mailman will start adding + htdig stuff for any archiving-enabled mail lists as new posts for each + list are handled by Mailman. Until a new post is made after enabling with + USE_HTDIG an existing mail list's archive will not be htdig searchable. + When the new post is handled: + + a. the list's personalised htdig config file is created + + b. necessary links to the htdig config file are created + + c. a search form is added to the TOC page for the list + + Even with this done, htdig searches only become available when htdig + indices are constructed. This is done when one or other of the patch's + htdig related cron scripts are run (nightly_htdig, remote_nightly_htdig, + remote_nightly_htdig_noshare, or remote_nightly_htdig.pl, depending on + how you configure your system). These can be run from the command line + ahead of their scheduled cron time to get htdig searches operational. + + 2. Turning USE_HTDIG off will not remove htdig indices or search forms from + existing archive-enabled lists. It will however stop htdig features from + being added to newly created lists. If you want to eliminate htdig from + your existing lists then use the $prefix/bin/blow_away_htdig script. + +HTDIG_ARCHIVE_URL + + this is the URL path that equates to the wrapper $prefix/cgi-bin/htdig which + controls access to the $prefix/Mailman/Cgi/htdig.py script. + + Default: HTDIG_ARCHIVE_URL = '/mailman/htdig' + + It is highly unlikely that you will want to change from the default value + unless you are also changing other variables such as PRIVATE_ARCHIVE_URL + because of some non-standard installation decisions on your part. + +HTDIG_SEARCH_URL + + this is the URL of the htsearch CGI program part of the htdig package. + + Default: HTDIG_SEARCH_URL = '/cgi-bin/htsearch' + + The default assumes a single HTTP server providing access to htdig and to + Mailman's web UI are on the Mailman machine and htsearch has been installed + in the HTTP server's cgi-bin directory. This value will depend on your htdig + installation decisions and HTTP server configuration files (typically + /etc/httpd/httpd.conf on a late model Apache installation) i.e the + ScriptAlias through which the htsearch CGI program is reached. + +HTDIG_FILES_URL + + this is the URL of the directory containing various HTML and Graphics files + installed by htdig; files such as buttonr.gif, buttonl.gif and + button1-10.gif. The URL must end with a '/'. + + Default: HTDIG_FILES_URL = '/htdig/' + + The default assumes the HTTP servers providing access to htdig and to + Mailman's web UI are on the same machine and a symbolic link called 'htdig' + has been put into your HTTP server's top level HTML directory which points + to the directory your htdig install has put the actual files into; this link + is often to /usr/share/htdig. This value will depend on your htdig + installation decisions and HTTP server's configuration files (typically + /etc/httpd/httpd.conf on a late model Apache installation) i.e the Alias + through which the link to the htdig files are reached. + +HTDIG_CONF_LINK_DIR + + this is the name of a directory in which links to list specific htdig config + files are placed. + + Default: HTDIG_CONF_LINK_DIR = os.path.join(VAR_PREFIX, 'archives', 'htdig') + + The VAR_PREFIX of the default is resolved to an actual file system path when + when Mailman's 'make install' is run. The 'os.path.join' creates a full file + system path by gluing together the three pieces when Mailman is run. This + definition puts the directory alongside the default PUBLIC_ARCHIVE_FILE_DIR + and PRIVATE_ARCHIVE_FILE_DIR. Unless you are changing the value of these + variables you probably do not want to change HTDIG_CONF_LINK_DIR. + +HTDIG_RUNDIG_PATH + + this is the path in you file system to the rundig shell script that is + installed as part of htdig. This tells one or other of the patch's htdig + related cron scripts (nightly_htdig and remote_nightly_htdig) where to find + rundig in order that they can execute it. + + Default: HTDIG_RUNDIG_PATH = '/usr/local/bin/rundig' + +HTDIG_MAILMAN_LINK + + the value of this is the name of a symbolic link you must create in the + directory where htdig expects to find its configuration files. The target of + this link is the directory whose path is the value of HTDIG_CONF_LINK_DIR. + The value of this variable is embedded in the per list search forms in each + list's TOC page generated by the patched code, where it tells htsearch where + to find the list's htdig config file. + + Default: HTDIG_MAILMAN_LINK = 'htdig-mailman' + +REMOTE_HTDIG - Semantics 0 - htdig runs on local machine, 1 -on remote machine + + says whether htdig is run on the same machine as Mailman or on another + machine. + + Default: REMOTE_HTDIG = 0 + +REMOTE_PRIVATE_ARCHIVE_FILE_DIR + + only relevant if REMOTE_HTDIG = 1. It is the file system path to the + directory in which Mailman stores private archives, as seen by the machine + running htdig. + + Default: REMOTE_PRIVATE_ARCHIVE_FILE_DIR = = os.path.join(VAR_PREFIX, + 'archives', 'private') + + The VAR_PREFIX of the default is resolved to an actual file system path when + when Mailman's 'make install' is run. The 'os.path.join' creates a full file + system path by gluing together the three pieces when Mailman is run. If you + assign a value to this in mm_cfg.pfg, just put the relevant explicit file + system path in. + +Local htdig Configuration +------------------------- + +This configuration is for when you are running Mailman, htdig, the HTTP server +used to provide Mailman's web UI and htdig's htsearch CGI script, on the same +machine. + +You will need to: + + 1. Set up a symbolic link in the directory where htdig expects to find its + configuration files; this depends on how you configured and installed + htdig but it is usually the directory containing htdig's default + htdig.conf file. The target of this link is the directory whose path is + assigned as the value of HTDIG_CONF_LINK_DIR. The name of the link must + be same as the value you assign to HTDIG_MAILMAN_LINK. For example, use + the command: + + ln -s /home/mailman/archives/htdig /etc/htdig-mailman + + 2. If different to the default value, add the definition of + HTDIG_MAILMAN_LINK to file $prefix/Mailman/mm_cfg.py + + 3. If different to the default value, add the definition of + HTDIG_RUNDIG_PATH to file $prefix/Mailman/mm_cfg.py. + + 4. Add the definition of USE_HTDIG with the value 1 to + $prefix/Mailman/mm_cfg.py. + + USE_HTDIG = 1 + + +If necessary you can override the values of any of the other configuration +variables in file $prefix/Mailman/mm_cfg.py. In particular you might need to +change the following URL variables from their defaults: HTDIG_SEARCH_URL and +HTDIG_FILES_URL. + +These URLs can be just the path i.e. absolute URL on the same server as that +which serves Mailman's Web UI, or a full URL identifying the protocol (http), +server, server port and path, for example +http://mailer.your.com:8080/cgi-bin/htdig/htsearch. + +Remote htdig Configuration +-------------------------- + +This configuration is for when you are running htdig and an HTTP server +providing access to htsearch on a different machine to that running Mailman and +the HTTP server used to provide Mailman's web interface. + +For this configuration to work, htdig's programs, both those run from command +lines such as rundig and those run via CGI such as htsearch, must be able to see +Mailman archives through NFS. In the examples below we'll assume that +/mnt/mailman-archives on the htdig machine maps to $prefix/mailman/archives on +the Mailman machine. + +You should also arrange for he mailman UID and its GID to be common to both +machines. Remember that when rundig is called on the htdig machine to produce +search indices for each list it will be trying to write those files via NFS in +Mailman's archive area and will thus need to run with an appropriate identity +and permissions. + +The differences between the local and remote configuration are: + + 1. configuration values telling htdig where to find files are as viewed from + the remote machine. + + 2. configuration values giving URLs that refer to htdiggy things have to be + as viewed from the Mailman machine. + +You will need to: + + 1. Set up a symbolic link in the directory where htdig expects to find its + configuration files; this depends on how you configured and installed + htdig but it is usually the directory containing htdig's default + htdig.conf file. The target of this link is the directory whose path is + assigned as the value of HTDIG_CONF_LINK_DIR as seen from the remote + machine running htdig. The name of the link must be same as the value you + assign to HTDIG_MAILMAN_LINK. For example, use the command: + + ln -s /mnt/mailman-archives/htdig /etc/htdig-mailman + + 2. Add the definition of HTDIG_MAILMAN_LINK to file + $prefix/Mailman/mm_cfg.py. For example: + + HTDIG_MAILMAN_LINK = 'htdig-mailman' + + 3. Add the definition of HTDIG_RUNDIG_PATH to file + $prefix/Mailman/mm_cfg.py. This is path to rundig on the remote machine + running htdig. For example: + + HTDIG_RUNDIG_PATH = '/usr/local/bin/rundig' + + 4. Add the definition of HTDIG_SEARCH_URL to file $prefix/Mailman/mm_cfg.py. + This must be a full URL referring to the htsearch CGI program on the + remote htdig machine, as seen from the Mailman local machine. For + example: + + HTDIG_SEARCH_URL = 'http://htdiggy.your.com/cgi-bin/htsearch' + + 5. Add the definition of HTDIG_FILES_URL to file $prefix/Mailman/mm_cfg.py. + This must be a full URL referring to the directory containing htdig files + on the remote htdig machine as seen from the Mailman local machine. This + URL must end with a '/'. For example: + + HTDIG_FILES_URL = 'http://htdiggy.your.com/htdig/' + + 6. Add the definition of REMOTE_PRIVATE_ARCHIVE_FILE_DIR to + $prefix/Mailman/mm_cfg.py. This must be the absolute file system path to + the directory in which Mailman stores private archives as seen by the + machine running htdig. For example: + + REMOTE_PRIVATE_ARCHIVE_FILE_DIR = '/mnt/mailman-archives/private' + + 7. Add the definition of USE_HTDIG with the value 1 to + $prefix/Mailman/mm_cfg.py. + + USE_HTDIG = 1 + + 8. Add the definition of REMOTE_HTDIG with the value 1 to + $prefix/Mailman/mm_cfg.py. + + REMOTE_HTDIG = 1 + +You have to choose one of the three remote_nightly_htdig scripts found in +$prefix/cron - remote_nightly_htdig, remote_nightly_htdig_noshare and +remote_nightly_htdig.pl - and transfer it to the htdig machine. See above under +heading "What is Installed by the Patch/What the patch adds" for an explanation +of the differences between these scripts, which all do the same basic job. You +should add the script to the crontab for the mailman UID on the htdig machine. +But first you need to edit the selected script to add some configuration +information. What has to be added depends on which script you opt to use. In +each case the variables concerned are declared near the top of the script and +you just have to enter the appropriate values: + + remote_nightly_htdig + you only need to set the value of the python variable MAILMAN_PATH to be + the directory $prefix as seen from the htdig machine. The whole Mailman + installation must be accessible via NFS in order to use this script. + + remote_nightly_htdig_noshare + you need to copy the values for the following configuration + variables from either $prefix/Mailman/mm_cfg.py or + $prefix/Mailman/Defaults.py to the script: DEFAULT_URL, + REMOTE_PRIVATE_ARCHIVE_FILE_DIR, HTDIG_RUNDIG_PATH. The variables + declared in remote_nightly_htdig_noshare use the same names. This script + only requires that the archives directory of the Mailman installation be + accessible via NFS. + + Note: DEFAULT_URL is not a Mailman-htdig integration specific + configuration variable. In most installations DEFAULT_URL is setup + automatically by the 'make install' in $prefix/Mailman/Defaults.py and + not usually overridden in $prefix/Mailman/mm_cfg.py. You should find it + defined near the top of Defaults.py. + + remote_nightly_htdig.pl + you need to copy the values for the following configuration + variables from either $prefix/Mailman/mm_cfg.py or + $prefix/Mailman/Defaults.py to the script: DEFAULT_URL, + REMOTE_PRIVATE_ARCHIVE_FILE_DIR, HTDIG_RUNDIG_PATH. Being a Perl script, + the variables in remote_nightly_htdig.pl use the same names but prefixed + with the '$' character. This script only requires that the archives + directory of the Mailman installation be accessible via NFS. + + Note 1: DEFAULT_URL is not a Mailman-htdig integration specific + configuration variable. In most installations DEFAULT_URL is setup + automatically by the 'make install' in $prefix/Mailman/Defaults.py and + not usually overridden in $prefix/Mailman/mm_cfg.py. You should find it + defined near the top of Defaults.py + + Note 2: You may need to change the '#! /usr/bin/env perl' on the first + line of this script if that doesn't find your Perl executable. You may + also need to verify the Perl packages used by this script are installed + on your system. + +As with the nightly_htdig script when running with local htdig, these scripts +can be run from the command line using the mailman UID in order to get htdig to +construct an initial set of indices. + +Upgrading an Existing Standard Mailman Installation +--------------------------------------------------- + +You will want to suspend operation of Mailman while doing the upgrade. Consider +doing a shutdown of the MTA delivering mail to Mailman and removing Mailman's +crontab. + +Configure and install as described above. + +Restart Mailman's crontab and restart your MTA's delivery to Mailman. + +If your installation already has archives: + + 1. Send a message to each of your archive-enabled lists. This will stimulate + the setup of the new per list htdig config files in the Mailman archives. + + 2. Consider rebuilding your existing archives with $prefix/bin/arch. This + will embed the ARCHIVE_INDEXING_ENABLE and ARCHIVE_INDEXING_DISABLE in + the regenerated archive pages and, after nightly_htdig has been run, give + improved search results. + + 3. Run the nightly_htdig script from the command line to generate a new set + of per list htdig search indices. + +Changing from local to remote htdig or vice versa +------------------------------------------------- + +You will want to suspend operation of Mailman while making this change. Consider +doing a shutdown of the MTA delivering mail to Mailman and removing Mailman's +crontab. + +Run the $prefix/bin/blow_away_htdig script to remove all existing per list htdig +config files and htdig indices/db files. + +Configure per the instructions above for the local or remote target. + +Restart Mailman's crontab and restart your MTA's delivery to Mailman. + +Send a message to each of your archive-enabled lists. This will stimulate the +set up of the new per list htdig config files in Mailman archives. + +Run the nightly_htdig script from the command line to generate a new set of per +list htdig search indices. + +Coping with htdig Upgrades +-------------------------- + +If you change the version of htdig you run, you may find that the indices built +with the ealier version are not compatible with the newer version of htdig's +programs. In that case do the following: + + 1. You will want to suspend operation of Mailman while making this change. + Consider doing a shutdown of the MTA delivering mail to Mailman and + removing Mailman's crontab. + + 2. Run the $prefix/bin/blow_away_htdig script with the -i flag to remove all + existing per list htdig indices/db files. + + 3. Restart Mailman's crontab and restart your MTA's delivery to Mailman. + + 4. Run the nightly_htdig script from the command line to generate new sets + of per list htdig search indices. + +Changing the Addressing Scheme of your web_page_url +--------------------------------------------------- + +If you change the addressing scheme of the web_page_url for a list to or from +http then you will need to rebuild the list's htdig configuration file(s) and +the related htdig indices. Do the following: + + 1. You may want to suspend operation of Mailman while making this change. + Consider doing a shutdown of the MTA delivering mail to Mailman and + removing Mailman's crontab. + + 2. Run the $prefix/bin/blow_away_htdig script to remove all existing per + list htdig material for the list(s) concerned. + + 3. Restart Mailman's crontab and restart your MTA's delivery to Mailman. + + 4. Send a message to each affected list to provoke reconstruction of the + list's htdig config file(s). + + 5. Run the nightly_htdig script from the command line to generate new sets + of per list htdig search indices. + + +Operational Information +======================= + +If you have just turned USE_HTDIG on or just used $prefix/bin/blow_away_htdig +(without the -i flag) there will initially be no per list htdig information +saved in the archives. + +When the first post to each archive-enabled list is archived by pipermail, the +per list htdig config file will be constructed and some directories and links +added to your Mailman archive directories. The htdig search form will be added +to list's TOC page. + +However, until one of the nightly_htdig scripts is run no htdig indices will be +constructed. You can either wait for the script to run as a cron job or run it +(while using the mailman UID) from the command line. + +Notes and Warnings +================== + +Redhat 7.1 and 7.2 installations: + + If you install htdig from the htdig-3.2.0 binary rpm of RH7.1/2 Binary CD 1 + of 2 you also have to install the htdig-web-3.2.0 binary rpm. This may be + from RH 7.1/2 Binary CD 2 of 2 or CD 1 of 2 depending on whether you are + using actual CDs or downloaded CD images. + +Apache/htdig issues + + The htsearch CGI script part of htdig and some associated HTML and graphics + file must be accessible via you web server and the Mailman configuration + variables HTDIG_SEARCH_URL and HTDIG_FILES_URL setup accordingly. Depending + on how you install htdig and Apache you may need to add Alias and/or + ScriptAlias directives to you Apache configuration file to make the htdig + components accessible. Check the Apache and htdig documentation. + +Contributors +============ + +Original author and maintainer: Richard Barrett - r.barrett@ftel.co.uk + +Past bug fixes: Nigel Metheringham + +Testers: Mark T. Valites , + Rehan van der Merwe + +Appendices +========== + +Appendix 1 -Technique for htdigging when Mailman's DEFAULT_URL uses the https +---------------------------------------------------------------------------- + +A technique for htdigging when Mailman's DEFAULT_URL uses the https addressing scheme +is described in this archived e-mail: http://www.htdig.org/mail/1999/10/0187.html + +The text of that e-mail is as follows: + +[htdig] Re: Help about htdig indexing https files + +-------------------------------------------------------------------------------- + +Gilles Detillieux (grdetil@scrc.umanitoba.ca) +Wed, 27 Oct 1999 10:18:31 -0500 (CDT) + + +Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] +Next message: Avi Rappoport: "[htdig] indexing SSL (was: Help building the database)" +Previous message: Gilles Detillieux: "Re: Fw: [htdig] mutiple search results" +In reply to: Torsten Neuer: "Re: Fw: [htdig] mutiple search results" + +-------------------------------------------------------------------------------- + +According to Edouard DESSIOUX: +> >Currently, htdig will not support URLs that begin with https://, even when +> >using local_urls to bypass the server. A trick that might work would be +> >to index using http:// instead, but use local_urls to point to the directory +> >that contains the contents of the secure server. +> +> I used that, and now, when i use htsearch, it work, except the fact +> that all my URL are http://x.y.z/ instead of https://x.y.z/ +> +> >You'd need to use separate +> >configuration files for digging and searching, and use url_part_aliases in +> >each of these configuration files to rewrite the http:// into https:// in the +> >search results. +> +> This is the part i dont understand, and i would like you to explain. + + +It basically works as a search and replace. One url_part_aliases in the +configuration file used by htdig maps the http://x.y.z/ into some special +code like "*site", and another url_part_aliases in the configuration file +used by htsearch maps the "*site" back into the value you want, i.e. +https://x.y.z/. The substitution is left to right in htdig, and right to +left in htsearch. So, if you use the same config file for both, or the +same setting for both, you get back what you started with (but saved some +space in the database because of the encoding). However, if you use two +separate config files with different url_part_aliases setting for htdig +and htsearch, you can remap parts of URLs from one substring to another. + + +I hope this makes things clearer. I thought the current description +at http://www.htdig.org/attrs.html#url_part_aliases was already quite clear. + + + +-- +Gilles R. Detillieux E-mail: +Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil +Dept. Physiology, U. of Manitoba Phone: (204)789-3766 +Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 +------------------------------------ diff -r -u -P mailman-2.0.9-index/Mailman/Archiver/HyperArch.py mailman-2.0.9-htdig/Mailman/Archiver/HyperArch.py --- mailman-2.0.9-index/Mailman/Archiver/HyperArch.py Mon Apr 8 18:03:26 2002 +++ mailman-2.0.9-htdig/Mailman/Archiver/HyperArch.py Mon Apr 8 18:09:46 2002 @@ -39,8 +39,10 @@ import time import pickle import os +from stat import * import HyperDatabase import pipermail +import urlparse from Mailman import mm_cfg from Mailman import LockFile @@ -524,6 +526,7 @@ or you can download the full raw archive (%(size)s).

+ %(htsearch)s %(noarchive_msg)s %(archive_listing_start)s %(archive_listing)s @@ -559,7 +562,100 @@ arch_listing_end = '''\ ''' - + +TOC_htsearch_template = ''' +

+ To search this archive fill in the following form: +

+

+

+ + Match: + Format: + Sort by: + + + + +
+ Search: + + +
+

+

+ Note:The archive search index was last rebuilt at + %(lastrun)s. Any postings after that will not be found by + a search. Index rebuild is usally done once every 24 hours for + this list. You can use out the "View by date" link below for the + most recent postings. +

+''' + +htdig_conf_template = '''\ +# +# Taken from the example config file for ht://Dig, with most comments excised +# See the htdig.conf from the distribution you have installed +# +database_dir: %(databases)s +start_url: %(starturl)s +limit_urls_to: ${start_url} +local_urls: %(urlpath)s=%(filepath)s +local_urls_only: true +url_part_aliases: %(url_part_aliases)s +noindex_end: %(indexing_enable)s +noindex_start: %(indexing_disable)s +exclude_urls: /cgi-bin/ .cgi +bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \ + .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi +maintainer: %(maintainer)s +max_head_length: 10000 +max_doc_size: 200000 +no_excerpt_show_top: true +search_algorithm: exact:1 synonyms:0.5 endings:0.1 +template_map: Long long ${common_dir}/long.html \ + Short short ${common_dir}/short.html +template_name: short +next_page_text: next +no_next_page_text: +prev_page_text: prev +no_prev_page_text: +page_number_text: '1' \ + '2' \ + '3' \ + '4' \ + '5' \ + '6' \ + '7' \ + '8' \ + '9' \ + '10' +no_page_number_text: '1' \ + '2' \ + '3' \ + '4' \ + '5' \ + '6' \ + '7' \ + '8' \ + '9' \ + '10' +''' + class HyperArchive(pipermail.T): __super_init = pipermail.T.__init__ @@ -597,6 +693,9 @@ self._lock_file = None self._charsets = {} self.charset = None + + if mm_cfg.USE_HTDIG: + self.setup_htdig() if hasattr(self.maillist,'archive_volume_frequency'): if self.maillist.archive_volume_frequency == 0: @@ -613,6 +712,7 @@ html_hdr_tmpl = index_header_template html_foot_tmpl = index_footer_template html_TOC_tmpl = TOC_template + html_TOC_htsearch_tmpl = TOC_htsearch_template TOC_entry_tmpl = TOC_entry_template arch_listing_start = arch_listing_start arch_listing_end = arch_listing_end @@ -667,6 +767,7 @@ "listinfo": self.maillist.GetScriptURL('listinfo', absolute=1), "fullarch": '../%s.mbox/%s.mbox' % (listname, listname), "size": sizeof(mbox), + "htsearch": '', "indexing_enable": mm_cfg.ARCHIVE_INDEXING_ENABLE, "indexing_disable": mm_cfg.ARCHIVE_INDEXING_DISABLE, } @@ -679,6 +780,25 @@ d["noarchive_msg"] = "" d["archive_listing_start"] = self.arch_listing_start d["archive_listing_end"] = self.arch_listing_end + if mm_cfg.USE_HTDIG: + list_htdig_dir = os.path.join(self.maillist.archive_dir(), 'htdig') + rundig_file = os.path.join(list_htdig_dir, 'rundig_last_run') + if os.path.exists(rundig_file) and os.path.isfile(rundig_file): + last_rundig = time.localtime(os.stat(rundig_file)[ST_MTIME]) + lastrun = time.strftime("%A, %d %b %Y %H:%M:%S %Z", last_rundig) + else: + lastrun = '[has yet to be built for this new list]' + h = {"listname": self.maillist.internal_name(), + "htconfdir": mm_cfg.HTDIG_MAILMAN_LINK, + "htsearchcgi": mm_cfg.HTDIG_SEARCH_URL, + "lastrun": lastrun, + "htsearchconf": '', + } + conf_name_search = self.maillist.internal_name() + '.htsearch.conf' + conf_file_search = os.path.join(list_htdig_dir, conf_name_search) + if os.path.exists(conf_file_search): + h['htsearchconf'] = '.htsearch' + d["htsearch"] = self.html_TOC_htsearch_tmpl % h accum = [] for a in self.archives: accum.append(self.html_TOC_entry(a)) @@ -686,6 +806,108 @@ if not d.has_key("encoding"): d["encoding"] = "" return self.html_TOC_tmpl % d + + def remove_htdig(self, indices_only): + list_htdig_dir = os.path.join(self.maillist.archive_dir(), 'htdig') + if not os.path.exists(list_htdig_dir): + return + conf_name_dig = self.maillist.internal_name() + '.conf' + conf_file_dig = os.path.join(list_htdig_dir, conf_name_dig) + conf_name_search = self.maillist.internal_name() + '.htsearch.conf' + conf_file_search = os.path.join(list_htdig_dir, conf_name_search) + dual_conf_files = None + if os.path.exists(conf_file_search): + dual_conf_files = 1 + if indices_only: + cfd = open(conf_file_dig, 'r') + conf_data_dig = cfd.readlines() + cfd.close() + if dual_conf_files: + cfd = open(conf_file_search, 'r') + conf_data_search = cfd.readlines() + cfd.close() + os.system('rm -rf ' + list_htdig_dir + '/*') + cfd = open(conf_file_dig, 'w') + cfd.writelines(conf_data_dig) + cfd.close() + if dual_conf_files: + cfd = open(conf_file_search, 'w') + cfd.writelines(conf_data_search) + cfd.close() + else: + os.system('rm -rf ' + list_htdig_dir) + conf_file_link_dig = os.path.join(mm_cfg.HTDIG_CONF_LINK_DIR, conf_name_dig) + os.unlink(conf_file_link_dig) + if dual_conf_files: + conf_file_link_search = os.path.join(mm_cfg.HTDIG_CONF_LINK_DIR, conf_name_search) + os.unlink(conf_file_link_search) + + def setup_htdig(self): + listname = self.maillist.internal_name() + # we want to make a directory to put the mail list's htdig stuff in + list_htdig_dir = os.path.join(self.maillist.archive_dir(), 'htdig') + # but we bug out if this has already been done + if os.path.exists(list_htdig_dir): + return + mkdir(list_htdig_dir) + # assemble the mapping for characterising the htdig config + htdigfiles = mm_cfg.HTDIG_FILES_URL + if mm_cfg.HTDIG_FILES_URL[-1] == '/': + htdigfile = htdigfiles[:-1] + d = {'databases': list_htdig_dir, + "filepath": self.maillist.archive_dir() + '/', + "maintainer": mm_cfg.MAILMAN_OWNER, + "indexing_enable": mm_cfg.ARCHIVE_INDEXING_ENABLE, + "indexing_disable": mm_cfg.ARCHIVE_INDEXING_DISABLE, + "htdig_url": htdigfiles, + } + # we need to changes paths to be relative to file system of + # remote machine if we are not running htdig on mailman machine + if mm_cfg.REMOTE_HTDIG: + d['filepath'] = os.path.join(mm_cfg.REMOTE_PRIVATE_ARCHIVE_FILE_DIR, + listname + '/') + d['databases'] = os.path.join(d['filepath'], 'htdig') + # now the URL through which htdig access to the pipermail data will go + starturl_dig = starturl_search = self.maillist.GetScriptURL('htdig') + '/' + # we need to know if the addressing scheme for the URL as htdig cannot + # cope with other than http (https for instance) when building indices + # we'll need different conf files for htdig and htsearch in that case + dual_conf_files = None + urlbits = urlparse.urlparse(starturl_dig) + if urlbits[0] != 'http': + urlbits = ('http',) + urlbits[1:] + starturl_dig = urlparse.urlunparse(urlbits) + dual_conf_files = 1 + # create htdig config files. we may need one for digging and another for + # searching if the addressing scheme is https these config files are slightly + # different we'll put the files in the directory we just created above + conf_name_dig = listname + '.conf' + d['url_part_aliases'] = starturl_dig + " *mm-htdig*" + d['starturl'] = starturl_dig + d['urlpath'] = starturl_dig + conf_file_dig = os.path.join(list_htdig_dir, conf_name_dig) + fd = open(conf_file_dig, 'w') + fd.write(htdig_conf_template % d) + fd.close() + # we need symlinks so that htdig will be able to find the config files + conf_file_link_dig = os.path.join(mm_cfg.HTDIG_CONF_LINK_DIR, conf_name_dig) + if os.path.exists(conf_file_link_dig) and os.path.islink(conf_file_link_dig): + os.unlink(conf_file_link_dig) + os.symlink(conf_file_dig, conf_file_link_dig) + # make the second conf file and link to it for htsearch if necessary + if dual_conf_files: + conf_name_search = listname + '.htsearch.conf' + d['url_part_aliases'] = starturl_search + " *mm-htdig*" + d['starturl'] = starturl_search + d['urlpath'] = starturl_search + conf_file_search = os.path.join(list_htdig_dir, conf_name_search) + fd = open(conf_file_search, 'w') + fd.write(htdig_conf_template % d) + fd.close() + conf_file_link_search = os.path.join(mm_cfg.HTDIG_CONF_LINK_DIR, conf_name_search) + if os.path.exists(conf_file_link_search) and os.path.islink(conf_file_link_search): + os.unlink(conf_file_link_search) + os.symlink(conf_file_search, conf_file_link_search) def html_TOC_entry(self, arch): # Check to see if the archive is gzip'd or not Only in mailman-2.0.9-index/Mailman/Archiver: indexing-2.0.9-0.1.patch diff -r -u -P mailman-2.0.9-index/Mailman/Cgi/htdig.py mailman-2.0.9-htdig/Mailman/Cgi/htdig.py --- mailman-2.0.9-index/Mailman/Cgi/htdig.py Thu Jan 1 01:00:00 1970 +++ mailman-2.0.9-htdig/Mailman/Cgi/htdig.py Mon Apr 8 18:00:35 2002 @@ -0,0 +1,181 @@ +# Copyright (C) 1998,1999,2000 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +"""Provide an authentication wrapper around archives accessed via +returned results from htdig's htsearch. Access via htdig requires the +user's request present a valid cookie authorizing access to the +list's archives for private archives. +This cookie must be obtained by the same process as the user must +adopt for accessing the archive directly rather than via +htsearch results. Indeed the user should only be able to reach the +search facility, which appears on the list archives front page, if +they have been through the authentication process. However, this code +prevents someone hand fettling a URL on the browser or using one +given to them by an authorised user, which might compromise the +list's privacy. +""" + +# this code was derived from the private.py cgi script + +import sys, os, string, cgi +from Mailman import Utils, MailList, Errors +from Mailman.htmlformat import * +from Mailman.Logging.Utils import LogStdErr +from Mailman import mm_cfg +from Mailman.Logging.Syslog import syslog + +header_html = """Content-type: text/html + + + + htdig Archives Access Failure + + +

htdig Archives Access Failure

+""" +footer_html = """

+ If you want to make another attempt to access a list archive then go via the + list users information page. +

+

+ If this problem persists then please e-mail the following information to the +%s: +

+
+    %s
+    %s
+
+
+ + +""" + +path_error_html = """

+ The requested document cannot be found. + %s

+""" + +data_error_html = """

+ The requested document cannot be read. +

+""" + +auth_error_html = """

+ You are not authorised to access the URL referenced. +

+

+ This access failure may be due to: +

+
    +
  1. + If cookies are disabled in your browser then your attempt to + authenticate yourself for access to the desired list will have + been compromised. You should enable cookies in your browser and + try again. +
  2. +
  3. + You have not attempted to authenticate yourself and are trying + to access private data. +
  4. +
  5. + An earlier attempt to authenticate yourself for access to private + data failed. +
  6. +
+""" + +def true_path(path): + "Ensure that the path is safe by removing .." + path = string.replace(path, "../", "") + path = string.replace(path, "./", "") + return path[1:] + + +def make_footer(list_name=''): + mailto = mm_cfg.MAILMAN_OWNER + listinfo_link = mm_cfg.DEFAULT_URL + '/listinfo/' + list_name + try: + referer = os.environ['HTTP_REFERER'] + except: + referer = 'Referer not known' + try: + uri = os.environ['REQUEST_URI'] + except: + uri = 'URI not known' + return footer_html % (listinfo_link, mailto, mailto, referer, uri) + +def main(): + # first we'll check if the request is referring to a known + # mail list and a known archived article + path = '' + listname = '' + access_failure = None + try: + path = os.environ['PATH_INFO'] + except KeyError: + access_failure = 'path' + if path: + true_filename = os.path.join(mm_cfg.PRIVATE_ARCHIVE_FILE_DIR, + true_path(path)) + list_info = filter(None, string.split(path, '/')) + # The path should be: + # // + if len(list_info) == 3: + listname = string.lower(list_info[0]) + try: + list = MailList.MailList(listname, lock=0) + list.IsListInitialized() + except: + access_failure = 'list' + else: + if true_filename[-5:] != '.html' or \ + (not os.path.exists(true_filename)): + access_failure = 'file' + else: + access_failure = 'path' + if access_failure: + if access_failure == 'list': + listname = '' + print header_html + path_error_html + make_footer(listname) + sys.exit(0) + + # We only need to authorize the user if it's a private archive + if list.archive_private: + is_auth = 0 + try: + is_auth = list.WebAuthenticate(user=None, + password=None, + cookie='archive') + except (Errors.MMBadUserError, Errors.MMBadPasswordError, + Errors.MMNotAMemberError, Errors.MMExpiredCookieError, + Errors.MMInvalidCookieError, Errors.MMAuthenticationError): + pass + if not is_auth: + print header_html + auth_error_html + make_footer(listname) + sys.exit(0) + + # OK to output the desired file + try: + f = open(true_filename, 'r') + except IOError: + print header_html + data_error_html + make_footer(listname) + else: + print "Content-type: text/html\n" + while (1): + data = f.read(16384) + if data == "": break + sys.stdout.write(data) + f.close() diff -r -u -P mailman-2.0.9-index/Mailman/Cgi/updateTOC.py mailman-2.0.9-htdig/Mailman/Cgi/updateTOC.py --- mailman-2.0.9-index/Mailman/Cgi/updateTOC.py Thu Jan 1 01:00:00 1970 +++ mailman-2.0.9-htdig/Mailman/Cgi/updateTOC.py Mon Apr 8 18:00:35 2002 @@ -0,0 +1,87 @@ +# Copyright (C) 1998,1999,2000 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +"""Called to tell Mailman to update the TOC page of a list. Needed to +support the operation of the following cron scripts, which send +HTTP requests to this CGI script: + + remote_nightly_htdig + remote_nightly_htdig_noshare + remote_nightly_htdig.pl +""" + +import sys, os, string, cgi +from stat import * +from Mailman import Utils, MailList +from Mailman import mm_cfg +from Mailman.Archiver import HyperArch + +header = """\ +Content-type: text/plain + +""" + +error_response = "failed:" +ok_response = "ok:" + +def true_path(path): + "Ensure that the path is safe by removing .." + path = string.replace(path, "../", "") + path = string.replace(path, "./", "") + return path[1:] + +def main(): + path = '' + listname = '' + failure = None + upath = '' + try: + upath = os.environ['PATH_INFO'] + except KeyError: + failure = 'missing path' + if upath: + upath = true_path(upath) + list_info = filter(None, string.split(upath, '/')) + # The path should be: + # / + if len(list_info) == 2: + listname = string.lower(list_info[0]) + try: + check_time = string.atoi(list_info[1]) + mlist = MailList.MailList(listname, lock=0) + mlist.IsListInitialized() + except: + failure = 'list %s is unknown or TOC cannot be updated' % listname + else: + failure = 'path length wrong' + if failure: + print header + error_response + failure + sys.exit(0) + # simplistic security check - client and server must agree on not very obscure fact, + # namely the modification time of the list's rundig run file. That said the fact + # is not readily acquired via a web interface or that easily guessed + archive_dir = mlist.archive_directory + list_htdig_dir = os.path.join(archive_dir, 'htdig') + rundig_run_file = os.path.join(list_htdig_dir, 'rundig_last_run') + last_rundig_time = os.stat(rundig_run_file)[ST_MTIME] + if check_time != last_rundig_time: + print header + error_response + 'authentication' + sys.exit(0) + archive = HyperArch.HyperArchive(mlist) + archive.write_TOC() + # OK to output the desired file + print header + ok_response + sys.exit(0) diff -r -u -P mailman-2.0.9-index/Mailman/Defaults.py.in mailman-2.0.9-htdig/Mailman/Defaults.py.in --- mailman-2.0.9-index/Mailman/Defaults.py.in Mon Apr 8 18:03:26 2002 +++ mailman-2.0.9-htdig/Mailman/Defaults.py.in Mon Apr 8 18:00:35 2002 @@ -552,13 +552,39 @@ # Strings for wrapping html stuff we do not want a search engine to # pay attention to in the pipermail archives. Of course the search engine # must be able to interpret such strings. -ARCHIVE_INDEXING_ENABLE = '' -ARCHIVE_INDEXING_DISABLE = '' +#ARCHIVE_INDEXING_ENABLE = '' +#ARCHIVE_INDEXING_DISABLE = '' # For example, you could insert the following into your mm_cfg if you # were using htdig for searching archives. They are default values for # htdig config attributes noindex_end and noindex_start respectively -#ARCHIVE_INDEXING_ENABLE = '' -#ARCHIVE_INDEXING_DISABLE = '' +ARCHIVE_INDEXING_ENABLE = '' +ARCHIVE_INDEXING_DISABLE = '' + +# htdig integration parameters +# if you set USE_HTDIG then you must also set HTDIG_MAILMAN_LINK +# and HTDIG_RUNDIG_PATH to suit your htdig installation, for instance: +# HTDIG_MAILMAN_LINK = 'htdig-mailman' +# HTDIG_RUNDIG_PATH = '/usr/bin/rundig' +USE_HTDIG = 0 # 0 - don't use integrated htdig, 1 - use it +HTDIG_ARCHIVE_URL = '/mailman/htdig/' # must end in a slash +HTDIG_SEARCH_URL = '/cgi-bin/htsearch' +HTDIG_FILES_URL = '/htdig/' +HTDIG_CONF_LINK_DIR = os.path.join(VAR_PREFIX, 'archives', 'htdig') +HTDIG_MAILMAN_LINK = 'htdig-mailman' +HTDIG_RUNDIG_PATH = '/usr/local/bin/rundig' + +# remote htdig support parameters for mailman-htdig integration +# provides support for running htdig on a different machine from the one +# running mailman but one having NFS access to the installation directory +# of the Mailman package. +# set REMOTE_HTDIG if you are running htdig on a different machine to +# Mailman. Has no effect unless you also set REMOTE_HTDIG +# REMOTE_PRIVATE_ARCHIVE_FILE_DIR is the absolute path to the directory in +# which Mailman stores private archives as seen by the machine running htdig. +# It should resolve to the same directory as PRIVATE_ARCHIVE_FILE_DIR when +# viewed from the remote system. +REMOTE_HTDIG = 0 # 0 - htdig runs on Mailman machine, 1 - runs on remote machine +REMOTE_PRIVATE_ARCHIVE_FILE_DIR = os.path.join(VAR_PREFIX, 'archives', 'private') # Import a bunch of version numbers from Version import * diff -r -u -P mailman-2.0.9-index/Makefile.in mailman-2.0.9-htdig/Makefile.in --- mailman-2.0.9-index/Makefile.in Fri Sep 22 09:06:19 2000 +++ mailman-2.0.9-htdig/Makefile.in Mon Apr 8 18:00:35 2002 @@ -44,7 +44,7 @@ VAR_DIRS= \ logs archives lists locks qfiles data spam filters \ - archives/private archives/public + archives/private archives/public archives/htdig ARCH_INDEP_DIRS= \ bin templates scripts cron \ Mailman Mailman/Cgi Mailman/Logging Mailman/Archiver \ @@ -92,6 +92,7 @@ fi; \ done chmod o-r $(var_prefix)/archives/private + chmod o-r $(var_prefix)/archives/htdig @for d in $(ARCH_INDEP_DIRS); \ do \ dir=$(prefix)/$$d; \ diff -r -u -P mailman-2.0.9-index/bin/Makefile.in mailman-2.0.9-htdig/bin/Makefile.in --- mailman-2.0.9-index/bin/Makefile.in Thu May 4 23:51:31 2000 +++ mailman-2.0.9-htdig/bin/Makefile.in Mon Apr 8 18:00:35 2002 @@ -44,7 +44,8 @@ SCRIPTS= digest_arch mmsitepass newlist rmlist add_members \ list_members remove_members clone_member update arch \ sync_members check_db withlist check_perms find_member \ - version move_list config_list list_lists dumpdb + version move_list config_list list_lists dumpdb \ + blow_away_htdig # Modes for directories and executables created by the install # process. Default to group-writable directories but diff -r -u -P mailman-2.0.9-index/bin/blow_away_htdig mailman-2.0.9-htdig/bin/blow_away_htdig --- mailman-2.0.9-index/bin/blow_away_htdig Thu Jan 1 01:00:00 1970 +++ mailman-2.0.9-htdig/bin/blow_away_htdig Mon Apr 8 18:00:35 2002 @@ -0,0 +1,123 @@ +#! /usr/bin/env python +# +# Copyright (C) 1998,1999,2000 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. +# +"""\ +Blow away the per list htdig files. + +This script is for when you: + a. decide to stop using Mailman-htdig integration + b. move from local to remote htdig or vice-versa + c. are updgrading to a version of htdig which has an incompatible + index/db file format + +You really want to stop Mailman operating while you are running this. For +instance, shutdown the MTA delivering mail to Mailman and remove Mailman's +crontab. + +Usage: %(program)s [-v] [-h] [i] [listnames] + +Where: + --verbose + -v + print each list as htdig is run for it + + --help + -h + print this message and exit + + --indices + -i + only delete htdig search indices for the lists + leave the htdig conf file in place + + listnames + Optionally, only runs htdig for the named lists. Without + this, all archivable lists are processed. + +""" + +import sys +import os +from stat import * +import time +from stat import * +import getopt +import paths +from Mailman import MailList +from Mailman import Utils +from Mailman import mm_cfg +from Mailman.Archiver import HyperArch + +program = sys.argv[0] +VERBOSE = 0 +INDICES_ONLY = 0 + +def usage(code, msg=''): + print __doc__ % globals() + if msg: + print msg + sys.exit(code) + +def main(): + global VERBOSE, INDICES_ONLY + try: + opts, args = getopt.getopt(sys.argv[1:], 'vhi', ['verbose', 'help', 'indices']) + except getopt.error, msg: + usage(1, msg) + + # defaults + for opt, arg in opts: + if opt in ('-h', '--help'): + usage(0) + elif opt in ('-v', '--verbose'): + VERBOSE = 1 + elif opt in ('-i', '--indices'): + INDICES_ONLY = 1 + # limit to the specified lists? + if args: + listnames = args + else: + listnames = Utils.list_names() + + # make sure htdig use is off for the moment + mm_cfg.USE_HTDIG = 0 + + # process all the specified lists + for name in listnames: + mlist = MailList.MailList(name, lock=0) + if not mlist.archive: + continue + archive = HyperArch.HyperArchive(mlist) + if VERBOSE: + if INDICES_ONLY: + print "blowing away all htdig indices of list", name + else: + print "blowing away all htdig stuff of list", name + archive.remove_htdig(INDICES_ONLY) + archive.write_TOC() + +if __name__ == '__main__' and \ + mm_cfg.USE_HTDIG and \ + mm_cfg.ARCHIVE_TO_MBOX in (0, 2): + # we're only going to run this if messages are archived to + # the internal archiver and we are using htdig to provide archive search + omask = os.umask(002) + try: + main() + finally: + os.umask(omask) diff -r -u -P mailman-2.0.9-index/cron/Makefile.in mailman-2.0.9-htdig/cron/Makefile.in --- mailman-2.0.9-index/cron/Makefile.in Wed Nov 1 02:32:05 2000 +++ mailman-2.0.9-htdig/cron/Makefile.in Mon Apr 8 18:00:35 2002 @@ -41,7 +41,8 @@ SHELL= /bin/sh SCRIPTS= checkdbs crontab.in mailpasswds senddigests gate_news \ -nightly_gzip qrunner bumpdigests +nightly_gzip qrunner bumpdigests nightly_htdig \ +remote_nightly_htdig remote_nightly_htdig_noshare remote_nightly_htdig.pl # Modes for directories and executables created by the install # process. Default to group-writable directories but diff -r -u -P mailman-2.0.9-index/cron/crontab.in.in mailman-2.0.9-htdig/cron/crontab.in.in --- mailman-2.0.9-index/cron/crontab.in.in Wed May 31 19:29:11 2000 +++ mailman-2.0.9-htdig/cron/crontab.in.in Mon Apr 8 18:00:35 2002 @@ -12,6 +12,11 @@ # or want to exclusively use a callback strategy instead of polling. 0,5,10,15,20,25,30,35,40,45,50,55 * * * * @PYTHON@ -S @prefix@/cron/gate_news # +# At 2:19am every night, regenerate htdig search files. Only +# turn this on if the internal archiver is used and htdig +# use enabled in mm_cfg.py with USE_HTDIG +19 2 * * * @PYTHON@ -S @prefix@/cron/nightly_htdig +# # At 3:27am every night, regenerate the gzip'd archive file. Only # turn this on if the internal archiver is used and # GZIP_ARCHIVE_TXT_FILES is false in mm_cfg.py diff -r -u -P mailman-2.0.9-index/cron/nightly_htdig mailman-2.0.9-htdig/cron/nightly_htdig --- mailman-2.0.9-index/cron/nightly_htdig Thu Jan 1 01:00:00 1970 +++ mailman-2.0.9-htdig/cron/nightly_htdig Mon Apr 8 18:00:35 2002 @@ -0,0 +1,148 @@ +#! /usr/bin/env python +# +# Copyright (C) 1998,1999,2000 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. +# +"""\ +Re-generate the htdig archive search files. + +This script should normally be run nightly from cron. When run from the +command line,the following usage is understood: + +Usage: %(program)s [-v] [-h] [listnames] + +Where: + --verbose + -v + print each list as htdig is run for it + + --help + -h + print this message and exit + + listnames + Optionally, only runs htdig for the named lists. Without + this, all archivable lists are processed. + +""" + +import sys +import os +from stat import * +import time +from stat import * +import getopt +import paths +from Mailman import MailList +from Mailman import Utils +from Mailman import mm_cfg +from Mailman.Archiver import HyperArch + +program = sys.argv[0] +VERBOSE = 0 + +def usage(code, msg=''): + print __doc__ % globals() + if msg: + print msg + sys.exit(code) + +def main(): + global VERBOSE + try: + opts, args = getopt.getopt(sys.argv[1:], 'vh', ['verbose', 'help']) + except getopt.error, msg: + usage(1, msg) + + # defaults + for opt, arg in opts: + if opt in ('-h', '--help'): + usage(0) + elif opt in ('-v', '--verbose'): + VERBOSE = 1 + + # limit to the specified lists? + if args: + listnames = args + else: + listnames = Utils.list_names() + + # process all the specified lists + for name in listnames: + mlist = MailList.MailList(name, lock=0) + if not mlist.archive: + continue + archive_dir = mlist.archive_directory + try: + os.listdir(archive_dir) + except os.error: + # has the list received any messages? if not, last_post_time will + # be zero, so it's not really a bogus archive dir. + if mlist.last_post_time > 0: + print 'List', name, 'has a bogus archive_directory:', dir + continue + + # check htdig has been set up for this list and skip it if not + list_htdig_dir = os.path.join(archive_dir, 'htdig') + if not os.path.exists(list_htdig_dir): + if VERBOSE: + print 'Skipping htdig for list', name, 'no htdig setup' + continue + + # check if there have been any archive files created since we + # last ran htdig and skip list if not. well actually we only + # test if the archive volume directories mod times have changed + recent_posts = None + rundig_run_file = os.path.join(list_htdig_dir, 'rundig_last_run') + from types import * + archive = HyperArch.HyperArchive(mlist) + if os.path.exists(rundig_run_file): + last_rundig_time = os.stat(rundig_run_file)[ST_MTIME] + for volume in archive.archives: + archive_name = os.path.join(archive_dir, volume) + last_archive_change = os.stat(archive_name)[ST_MTIME] + if last_archive_change > last_rundig_time: + recent_posts = 1 + break + else: + recent_posts = 1 + if not recent_posts: + if VERBOSE: + print 'Skipping htdig for list', name, 'no recent posts' + continue + + # ok, so running htdig is worthwhile + if VERBOSE: + print "htdig'ing archive of list", name + htdig_conf_file = os.path.join(list_htdig_dir, name + '.conf') + res = os.system("%s -c %s" % (mm_cfg.HTDIG_RUNDIG_PATH, htdig_conf_file)) + os.system("touch %s" % rundig_run_file) + if res: + print 'rundig failed for list', name, 'exit code', res + archive.write_TOC() + +if __name__ == '__main__' and \ + mm_cfg.USE_HTDIG and \ + mm_cfg.ARCHIVE_TO_MBOX in (0, 2) and \ + os.path.exists(mm_cfg.HTDIG_RUNDIG_PATH): + # we're only going to run the nightly rundig if messages are archived to + # the internal archiver, we are using htdig to provide archive search + # and we know where rundig is. + omask = os.umask(002) + try: + main() + finally: + os.umask(omask) diff -r -u -P mailman-2.0.9-index/cron/remote_nightly_htdig mailman-2.0.9-htdig/cron/remote_nightly_htdig --- mailman-2.0.9-index/cron/remote_nightly_htdig Thu Jan 1 01:00:00 1970 +++ mailman-2.0.9-htdig/cron/remote_nightly_htdig Mon Apr 8 18:00:35 2002 @@ -0,0 +1,169 @@ +#! /usr/bin/env python +# +# Copyright (C) 1998,1999,2000 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. +# +"""\ +Python script to re-generate the htdig archive search files. Read +INSTALL.htdig-mm to determine if you should be running this script. + +This script has to be edited before use to provide a value for the +configuration parameter MAILMAN_PATH. The value should be the path +to Mailman's installation directory as seen by the script. + +This script should normally be run nightly from cron. When run from the +command line,the following usage is understood: + +Usage: %(program)s [-v] [-h] [listnames] + +Where: + --verbose + -v + print each list as htdig is run for it + + --help + -h + print this message and exit + + listnames + Optionally, only runs htdig for the named lists. Without + this, all archivable lists are processed. + +""" + +MAILMAN_PATH = '' + +import sys +import os +from stat import * +import time +import getopt +import string + +# Access the mailman installation +sys.path = [MAILMAN_PATH] + sys.path + +import urllib, urlparse +import paths +from Mailman import MailList +from Mailman import Utils +from Mailman import mm_cfg +from Mailman.Archiver import HyperArch + +program = sys.argv[0] +VERBOSE = 0 + +def usage(code, msg=''): + print __doc__ % globals() + if msg: + print msg + sys.exit(code) + +def updateTOC(listname, rdtime): + url = urlparse.urljoin(mm_cfg.DEFAULT_URL, 'updateTOC/%s/%d' % (listname, rdtime)) + result = urllib.urlopen(url).read() + result = string.strip(result) + if result == 'ok:': + return '' + return result + +def main(): + global VERBOSE + try: + opts, args = getopt.getopt(sys.argv[1:], 'vhm', ['verbose', 'help']) + except getopt.error, msg: + usage(1, msg) + # defaults + for opt, arg in opts: + if opt in ('-h', '--help'): + usage(0) + elif opt in ('-v', '--verbose'): + VERBOSE = 1 + # limit to the specified lists? + if args: + listnames = map(string.lower, args) + else: + listnames = Utils.list_names() + # process all the specified lists + for name in listnames: + mlist = MailList.MailList(name, lock=0) + if not mlist.archive: + continue + archive_dir = os.path.join(mm_cfg.REMOTE_PRIVATE_ARCHIVE_FILE_DIR, + name + '/') + try: + os.listdir(archive_dir) + except os.error: + # has the list received any messages? if not, last_post_time will + # be zero, so it's not really a bogus archive dir. + if mlist.last_post_time > 0: + print 'List', name, 'has a bogus archive_directory:', dir + continue + # check htdig has been set up for this list and skip it if not + list_htdig_dir = os.path.join(archive_dir, 'htdig') + if not os.path.exists(list_htdig_dir): + if VERBOSE: + print 'Skipping remote htdig for list', name, 'no htdig setup' + continue + # check if there have been any archive files created since we + # last ran htdig and skip list if not. well actually we only + # test if the archive volume directories mod times have changed + recent_posts = None + rundig_run_file = os.path.join(list_htdig_dir, 'rundig_last_run') + from types import * + archive = HyperArch.HyperArchive(mlist) + if os.path.exists(rundig_run_file): + last_rundig_time = os.stat(rundig_run_file)[ST_MTIME] + for volume in archive.archives: + archive_name = os.path.join(archive_dir, volume) + last_archive_change = os.stat(archive_name)[ST_MTIME] + if last_archive_change > last_rundig_time: + recent_posts = 1 + break + else: + recent_posts = 1 + if not recent_posts: + if VERBOSE: + print 'Skipping htdig for list', name, 'no recent posts' + continue + # ok, so running htdig is worthwhile + if VERBOSE: + print "htdig'ing archive of list", name + htdig_conf_file = os.path.join(list_htdig_dir, name + '.conf') + res = os.system("%s -c %s" % (mm_cfg.HTDIG_RUNDIG_PATH, htdig_conf_file)) + os.system("touch %s" % rundig_run_file) + if res: + print 'rundig failed for list', name, 'exit code', res + else: + res = updateTOC(name, os.stat(rundig_run_file)[ST_MTIME]) + if res: + print 'update list TOC failed for list', name, 'reason', res + +if __name__ == '__main__' and mm_cfg.USE_HTDIG and \ + mm_cfg.ARCHIVE_TO_MBOX in (0, 2) and os.path.exists(mm_cfg.HTDIG_RUNDIG_PATH): + + if os.path.exists(MAILMAN_PATH): + # we're only going to run the nightly rundig if messages are archived to + # the internal archiver, we are using htdig to provide archive search + # and we know where rundig is. + omask = os.umask(002) + try: + main() + finally: + os.umask(omask) + else: + print "Invalid configuration variables" + print "Edit this script in accordance with INSTALL.htdig-mm" diff -r -u -P mailman-2.0.9-index/cron/remote_nightly_htdig.pl mailman-2.0.9-htdig/cron/remote_nightly_htdig.pl --- mailman-2.0.9-index/cron/remote_nightly_htdig.pl Thu Jan 1 01:00:00 1970 +++ mailman-2.0.9-htdig/cron/remote_nightly_htdig.pl Mon Apr 8 18:00:35 2002 @@ -0,0 +1,165 @@ +#! /usr/bin/env perl +# +# Copyright (C) 1998,1999,2000 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. +# + +my $DEFAULT_URL = ''; +my $REMOTE_PRIVATE_ARCHIVE_FILE_DIR = ''; +my $HTDIG_RUNDIG_PATH = ''; + +use strict; +use File::Spec; +use File::stat; +use LWP::Simple; +use Getopt::Long; + +my $doc = < \$VERBOSE, + "help" => \$help); + if ($help) { + usage(0); + } + # limit to the specified lists? + my @listnames = (); + if (scalar(@ARGV)) { + @listnames = map lc, @ARGV; + } else { + opendir(DIR, $REMOTE_PRIVATE_ARCHIVE_FILE_DIR); + @listnames = grep { $_ ne '.' and $_ ne '..' and ! /\.mbox$/ } readdir DIR; + closedir(DIR); + } + # process all the specified lists + foreach my $name (@listnames) { + my $archive_dir = File::Spec->catfile($REMOTE_PRIVATE_ARCHIVE_FILE_DIR, $name); + next if (-e not $archive_dir); + # check htdig has been set up for this list and skip it if not + my $list_htdig_dir = File::Spec->catfile($archive_dir, 'htdig'); + if (not -e $list_htdig_dir) { + print "Skipping remote htdig for list $name, no htdig setup\n" if $VERBOSE; + next + } + # check if there have been any archive files created since we + # last ran htdig and skip list if not. well actually we only + # test if the archive volume directories mod times have changed + my $recent_posts = 0; + my $rundig_run_file = File::Spec->catfile($list_htdig_dir, 'rundig_last_run'); + if (-e $rundig_run_file){ + my $last_rundig_time = stat($rundig_run_file)->mtime(); + opendir(DIR, $archive_dir); + my @volumes = grep { $_ ne '.' and $_ ne '..' } readdir DIR; + closedir(DIR); + foreach my $volume (@volumes) { + my $archive_name = File::Spec->catfile($archive_dir, $volume); + my $last_archive_change = stat($archive_name)->mtime(); + if ($last_archive_change > $last_rundig_time) { + $recent_posts = 1; + last; + } + } + } else { + $recent_posts = 1; + } + if (not $recent_posts) { + if ($VERBOSE) { + print "Skipping htdig for list $name, no recent posts\n"; + } + next; + } + # ok, so running htdig is worthwhile + if ($VERBOSE) { + print "htdig'ing archive of list $name\n"; + } + my $htdig_conf_file = File::Spec->catfile($list_htdig_dir, $name.'.conf'); + my $res = system(($HTDIG_RUNDIG_PATH, '-c', $htdig_conf_file)); + system(("touch", $rundig_run_file)); + if ($res) { + print "rundig failed for list, $name, exit code, $res\n"; + } else { + my $res = updateTOC($name, stat($rundig_run_file)->mtime()); + print "update list TOC failed for list, $name, reason, $res\n" if ($res); + } + } +} + +if (-x $HTDIG_RUNDIG_PATH and + -d $REMOTE_PRIVATE_ARCHIVE_FILE_DIR and + $DEFAULT_URL) { + # we're only going to run the nightly rundig if we have a sensible + # set of configuration variables and we know where rundig is. + $omask = umask; + umask(002); + eval { main() }; + my $res = $@; + umask($omask); + die $res if ($res); +} else { + die "Invalid configuration variables.\nEdit this script in accordance with INSTALL.htdig-mm\n"; +} diff -r -u -P mailman-2.0.9-index/cron/remote_nightly_htdig_noshare mailman-2.0.9-htdig/cron/remote_nightly_htdig_noshare --- mailman-2.0.9-index/cron/remote_nightly_htdig_noshare Thu Jan 1 01:00:00 1970 +++ mailman-2.0.9-htdig/cron/remote_nightly_htdig_noshare Mon Apr 8 18:00:35 2002 @@ -0,0 +1,161 @@ +#! /usr/bin/env python +# +# Copyright (C) 1998,1999,2000 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. +# +"""\ +Python script to re-generate the htdig archive search files. Read +INSTALL.htdig-mm to determine if you should be running this script. + +This script has to be edited before use to provide values for the +configuration parameters DEFAULT_URL, REMOTE_PRIVATE_ARCHIVE_FILE_DIR and +HTDIG_RUNDIG_PATH. The values should be the same as those acquired by +other of Mailman's python code from $prefix/Mailman/Defaults.py or +overridden in $prefix/Mailman/mm_cfg.py. + +This script should normally be run nightly from cron. When run from the +command line,the following usage is understood: + +Usage: %(program)s [-v] [-h] [listnames] + +Where: + --verbose + -v + print each list as htdig is run for it + + --help + -h + print this message and exit + + listnames + Optionally, only runs htdig for the named lists. Without + this, all archivable lists are processed. + +""" + +DEFAULT_URL = '' +REMOTE_PRIVATE_ARCHIVE_FILE_DIR = '' +HTDIG_RUNDIG_PATH = '' + +import sys +import os +from stat import * +import time +import getopt +import urllib +import urlparse +import string + +program = sys.argv[0] +VERBOSE = 0 + +def usage(code, msg=''): + print __doc__ % globals() + if msg: + print msg + sys.exit(code) + +def updateTOC(listname, rdtime): + url = urlparse.urljoin(DEFAULT_URL, 'updateTOC/%s/%d' % (listname, rdtime)) + result = urllib.urlopen(url).read() + result = string.strip(result) + if result == 'ok:': + return '' + return result + +def main(): + global VERBOSE + try: + opts, args = getopt.getopt(sys.argv[1:], 'vhm', ['verbose', 'help']) + except getopt.error, msg: + usage(1, msg) + + # defaults + for opt, arg in opts: + if opt in ('-h', '--help'): + usage(0) + elif opt in ('-v', '--verbose'): + VERBOSE = 1 + # limit to the specified lists? + if args: + listnames = map(string.lower, args) + else: + listnames = filter(lambda m: m[-5:] != '.mbox', + os.listdir(REMOTE_PRIVATE_ARCHIVE_FILE_DIR)) + # process all the specified lists + listnames.sort() + for name in listnames: + archive_dir = os.path.join(REMOTE_PRIVATE_ARCHIVE_FILE_DIR, name) + # check if this list has an archive and skip it if not + if not os.path.exists(archive_dir): + if VERBOSE: + print 'Skipping remote htdig for list', name, 'no archive' + continue + # check htdig has been set up for this list and skip it if not + list_htdig_dir = os.path.join(archive_dir, 'htdig') + if not os.path.exists(list_htdig_dir): + if VERBOSE: + print 'Skipping remote htdig for list', name, 'no htdig setup' + continue + # check if there have been any archive files created since we + # last ran htdig and skip list if not. well actually we only + # test if the archive volume directories mod times have changed + recent_posts = None + rundig_run_file = os.path.join(list_htdig_dir, 'rundig_last_run') + from types import * + if os.path.exists(rundig_run_file): + last_rundig_time = os.stat(rundig_run_file)[ST_MTIME] + for volume in os.listdir(archive_dir): + archive_name = os.path.join(archive_dir, volume) + last_archive_change = os.stat(archive_name)[ST_MTIME] + if last_archive_change > last_rundig_time: + recent_posts = 1 + break + else: + recent_posts = 1 + if not recent_posts: + if VERBOSE: + print 'Skipping htdig for list', name, 'no recent posts' + continue + # ok, so running htdig is worthwhile + if VERBOSE: + print "htdig'ing archive of list", name + htdig_conf_file = os.path.join(list_htdig_dir, name + '.conf') + res = os.system("%s -c %s" % (HTDIG_RUNDIG_PATH, htdig_conf_file)) + os.system("touch %s" % rundig_run_file) + if res: + print 'rundig failed for list', name, 'exit code', res + else: + res = updateTOC(name, os.stat(rundig_run_file)[ST_MTIME]) + if res: + print 'update list TOC failed for list', name, 'reason', res + +if __name__ == '__main__': + if os.path.exists(REMOTE_PRIVATE_ARCHIVE_FILE_DIR) and \ + os.path.exists(HTDIG_RUNDIG_PATH) and \ + DEFAULT_URL: + # we're only going to run the nightly rundig if we have a sensible + # set of configuration variables and we know where rundig is. + omask = os.umask(002) + try: + main() + finally: + os.umask(omask) + else: + print "Invalid configuration variables" + print "Edit this script in accordance with INSTALL.htdig-mm" + + diff -r -u -P mailman-2.0.9-index/src/Makefile.in mailman-2.0.9-htdig/src/Makefile.in --- mailman-2.0.9-index/src/Makefile.in Sun Aug 6 06:03:00 2000 +++ mailman-2.0.9-htdig/src/Makefile.in Mon Apr 8 18:00:35 2002 @@ -74,7 +74,7 @@ # Fixed definitions CGI_PROGS= admin admindb archives edithtml options \ -listinfo subscribe roster handle_opts private +listinfo subscribe roster handle_opts private htdig updateTOC COMMONOBJS= common.o vsnprintf.o @@ -82,7 +82,7 @@ #ALIAS_PROGS= addaliases -SUID_CGI_PROGS= private +SUID_CGI_PROGS= private htdig updateTOC SUID_MAIL_PROGS=